Digital Transformation

The Importance of Regression Testing: What IT Leaders Can Learn from CrowdStrike

Phil Richards

On Friday, July 19th, 2024, my 6:00 a.m. Delta flight got canceled — along with at least 20 other flights leaving from the Kansas City Airport. At the time, I wondered if there was another global attack on the airline infrastructure. The reality was much more mundane.

A software update from CrowdStrike caused the boot cycle for Microsoft Windows to fail catastrophically. This impacted many major airlines’ reservation and control systems throughout airports worldwide. With the clarity of hindsight, this seems like a relatively easy issue to discover during testing. The details of the issues CrowdStrike experienced are still coming out, but CrowdStrike CEO George Kurtz indicated that a full regression test was not performed when this update was sent out. The rationale for this decision is that the update was considered to be content rather than software code.

Regression testing is defined as the re-running of functional and non-functional tests to ensure that previously developed and tested software still performs after a change. It is a vital part of the software release process. Integrating regression testing as an automated step in the Continuous Integration/Continuous Deployment (CI/CD) pipeline not only enhances the reliability and stability of the software but also accelerates the development cycle.

The recent CrowdStrike issue is a poignant example of why automated regression testing is indispensable. This article explains what automated regression testing is, why it is so important, and how companies can successfully implement it in their CI/CD pipeline.

Understanding regression testing

Regression testing is performed to detect bugs that may have been introduced inadvertently after updates or enhancements to the software. These updates could be new features, bug fixes, performance improvements, changes in the environment where the software operates, or additions of content that might not be considered code.

A diagram of typical software regression testing steps.

Regression testing is performed in the event of a bug fix, requirement chance, code update, or a new feature. Source.

The CrowdStrike update was to the content of their endpoint detection and response software. This type of update is traditionally considered a non-code update and therefore does not require regression testing. However, the traditional view of content deployments being less risky than code deployments does not take into account that modern content impacts machine learning algorithms and, ultimately, code.

Additionally, adding new files to an application like CrowdStrike impacts the boot cycle of the operating system, even if those files are not application code in the traditional sense. Because of this new reality, software developers need to consider performing regression testing before releasing any changes, not just code changes.

Benefits of automated regression testing

Automating regression tests in the CI/CD pipeline ensures that every change, no matter how minor, is validated. This process helps catch defects early in the development cycle, reducing the cost and effort required to fix them later. Furthermore, it enables developers to confidently make changes, knowing that any potential issues will be identified promptly. This brings many key values to the engineering process.

Early detection of bugs: Automated regression tests run with every code change or at specified intervals. This ensures that any defects introduced are detected early, allowing for quicker resolution and reducing the impact on the project timeline.
Consistency and reliability: Automated tests eliminate the variability associated with manual testing. They run the same way every time, ensuring consistent results and improving the reliability of the testing process.
Efficiency and speed: Automation significantly speeds up the testing process. Tests that would take hours or days manually can be executed in minutes, enabling faster feedback and quicker iterations.
Scalability: As projects grow, the number of test cases increases. Automated regression testing scales effortlessly to accommodate this growth, running a large suite of tests efficiently.
Cost-effectiveness: While the initial setup of automated tests requires investment, the long-term savings are substantial. Automated tests reduce the need for extensive manual testing, freeing up human resources for more complex and exploratory testing tasks.

Lessons from the CrowdStrike incident

The CrowdStrike incident serves as a cautionary tale for the software development industry. Here are the key takeaways:

Comprehensive test coverage: It's crucial to have comprehensive test coverage that includes all critical features. Automated regression tests should cover a wide range of scenarios to ensure that changes do not impact existing functionality.
Integration in CI/CD pipeline: Regression testing should be an integral part of the CI/CD pipeline. Every code change, no matter how minor, should trigger automated regression tests to catch issues early.
Increased investment in regression testing: There is simply too much risk to release software without a thorough regression test. Nonetheless, companies sometimes view regression testing as too time-consuming to perform before releasing emergency fixes and patches. However, automation tools and hardware performance improvements can accelerate regression testing so that the impact on release schedules is minimal.
Continuous monitoring and feedback: Continuous monitoring of the test results and immediate feedback to the development team prevents issues from escalating. Any failure in the regression tests is addressed promptly and maintains the stability of the software.
Prioritize critical features: Critical features, especially those related to security and core functionality, are prioritized in regression testing. It is essential to prioritize “show stopper” defects so that the type of issue that CrowdStrike experienced is always tested.
Learn from mistakes: Make sure you have a procedure for identifying and tracking “escapes”: bugs that are not found until a customer sees them in the production system. Escapes should be fed back into the regression testing cycle so they only “escape” once.
Post-deployment validation: In addition to pre-deployment testing, post-deployment validation can help catch issues that might not have been detected in the testing environment. This is particularly important for features that interact with external systems or have complex dependencies.

How to implement effective regression testing in CI/CD

To effectively implement regression testing in the CI/CD pipeline, organizations must follow these best practices:

Automate test cases: Identify and automate test cases that are critical and have high reusability. Use testing frameworks and tools that integrate seamlessly with your CI/CD pipeline.
Maintain test suites: Regularly update and maintain test suites to reflect changes in the application. Remove obsolete tests and add new tests for new features. Set aside time in your development iterations to work on the regression test cases as a matter of clearing technical debt.
Parallel execution: Use parallel execution to run multiple tests simultaneously, reducing the overall testing time and speeding up the feedback loop.
Incremental testing: Implement incremental testing strategies where only the affected parts of the application are tested based on the changes made. This can further speed up the testing process. Often, one well-chosen, well-designed test case can be implemented and cover many crucial test scenarios.
Continuous improvement: Continuously evaluate and improve the regression testing process. Analyze test results, refine test cases, and adapt to new challenges to maintain an effective testing strategy.

Conclusion

The CrowdStrike issue illustrates the need for automated and updated regression testing as a risk-mitigation component of every release. Effective regression testing ensures that changes do not disrupt existing functionality, maintains the reliability and stability of the software, and preserves your brand quality, helping you deliver consistently high-quality applications.

The risks of inadequate regression testing will impact software quality, consistency, and brand reputation. By adopting best practices for automated regression testing and learning from incidents like CrowdStrike, organizations can build robust, resilient software that meets user expectations and withstands the rapid pace of modern development.

Digital Transformation