How should we deal with long running test suites?

Our team policy is to run locally the complete test suite of client-side, server-side, integration tests and unit tests before checking in new code. However, the test suite has grown rapidly over time to the point that we spend at least an hour in total a day per person just waiting for the tests to run. This does not include the continuous build server, where builds are even slower. Several ways come to mind to deal with this issue. faster hardware (SSDs?) make better use of multiple cores by using multithreading etc. policy changes. Just running subset of the current test suite. What would you recommend?
Answer:

I like the strategy of progressively less testing at higher abstraction levels, which coincide with typically slower tests. Your unit tests can be plenty and fast; your full integration tests would then be few but slow and thorough. A complimentary strategy is to break up your systems into multiple, interdependent services or systems which have their own build process, test suite, etc., possibly with one master integration suite to tie it all together. The end result is running a subset of tests with each commit (and each build, even), but the divide is done at a code level, modularly, instead of as arbitrary "smoke test" suites that would themselves have to be maintained and updated.

Was this solution helpful to you?

Other answers

First I like to share a mind model that I have been using in multiple groups to approach the problem of tests taking too long. Here it is: In the chart, an arrow (A->B) means "A needs B" or "B will benefit A". The chart starts from the upper-left corner. This chart is about how to shorten the release tail (from the completion of coding new feature, to the time the new feature is shipped). A big part of it is to make the test pass shorter (see the â€œShorter Full Test Pass Cycleâ€ block). I would recommend using this chart to examine your team's test process, as well as help you identify what can be done/needs to be done. In this chart (framework/model/roadmap), you can find the advices from the other answers. For example: The "Ability to only run relevant test cases" block: that's where 's "Coverage-based test selection" suggestion falls into. The key is test selection, whether coverage based, or simply human judgement based. Using human judgement isn't too bad: it's subjective and time consuming, but it does help trim the test run. There are other selection methods that are non-coverage based (since code coverage isn't available/applicable in some places -- for example, I don't know if CSS has such code coverage thing), including the one that my team recently invented. The "Quick lightweight validation (aka BVT)" block: that's similar to what suggested about "Accept optimistically, revert aggressively." Such BVT (Build Verification Test) is relatively lower coverage, but takes much less time. It's kind of a 20-80 rule: the 20% of test cases can coverage 80% of the scenario/code path/functionality. Many groups adopted this approach: they only requires a passed BVT for each checkin, and run the rest of the tests (full test pass) multiple times for the code that is already in the main development branch. Also, as Steven Grimm pointed out, many groups set a time limit to the BVT. If tests grow over time and BVT duration exceed that time limit, people in those groups will relentlessly move test cases out of BVT to reduce the duration. BVT (as the pre-checkin test) is a part of the developer's inner loop and it's very important to ensure the inner loop is very fast, as pointed by https://www.linkedin.com/in/ericbrechner in his book "http://www.amazon.com/I-M-Wrights-Hard-Code-Microsoft/dp/0735661707" (picture shot directly from the book on my bookshelf): The "Smaller ship unit" block: that's similar to 's suggestion about "break up your systems into multiple, interdependent services or systems". The "Run tests in parallel" block: that's where 's suggest of using IncrediBuild to run (unit) tests in parallel falls into. It worth pointing out that none of things in that chart can be achieved easily. There are a lot of work behind each block: it can be a couple developers spending a few months to build/enhance a tool; it can be a months-long architectural overhaul of the product to allow improvements in testability; etc.. In particular, in this chart, You may find a loop. That means it really worth investing in the things on that loop. The benefits are self-enhancing. There are some hot-spot sinks: the blocks which have multiple arrows pointing at. That means the investment in these blocks will have multiple-folds benefit. They may get prioritized. Going back to the specific question you asked: "faster hardware" -- yes, that's no brainer. As much as you can afford. As long as the marginal return of investment on hardware is still significant. In one of my past group (that was before the cloud computing era), some people can finish the test pass in 4 hours on faster CPU and 7,200rpm hard disk, while the other took 6 hours on slower CPU and 5,400rpm disk. "multithreading" -- yes, of course. See the "Run tests in parallel" block in the chart. However, I would like to point out that running tests in parallel is not as easy as multithreading or multiple processes. As you can see in the chart, there are things to be done to enable tests to run in parallel, from product design, to test environment capacity, to test execution coordination. "just running subset" -- yes. See the earlier sections in my answer about "Ability to only run relevant test cases" and "Quick lightweight validation (aka BVT)". Hope this is helpful.

Eric Zheng

Two strategies I've seen used successfully to deal with the problem, in addition to things mentioned in other answers such as parallelizing and distributing the tests: Coverage-based test selection. At frequent intervals, run the entire test suite on an integration server with whatever execution tracing or coverage analysis tools are appropriate for the language(s) you're using, gathering a list of exactly which code is executed by which tests. (This can be at any level of granularity, though you'll probably get diminishing returns from anything lower than function/method level.) Then, when a developer is ready to check in changes, look at what the changes are, and only execute those tests that execute code that's been changed. This works best for tests with deterministic behavior -- for example, it's not a great strategy if a lot of your tests talk to a network service that returns different data each time or is offline a lot since that'll give you inaccurate coverage information. It's possible to mitigate that problem by, e.g., merging the coverage data from test runs with different results, but if a lot of your tests are nondeterministic, that's likely a big problem in and of itself. If the execution profile of your tests is highly dependent on data that's read dynamically at runtime, e.g., configuration files, you'll need to instrument your code to make a note of which data is being accessed so you can run the appropriate tests when it changes. Those caveats aside, though, the benefit of this is hard to overstate on a large code base with lots of tests. If you have 100 modules each with a suite of self-contained unit tests, you can sometimes completely skip running 99 of those suites if you're only changing one module. (Of course, in practice there'll be dependencies and you won't get such a dramatic win, but it happens.) Accept optimistically, revert aggressively. This is as much a philosophical approach as a technical one. Give yourself permission to say "oops" if a bad change hits the code base, and trust that most changes aren't bad. In practical terms: Before accepting a change into the central version control repository or branch, run some number of tests as a sanity check. One approach is to set a time limit that's acceptable to the developers and then, using historical data about the expected execution time and code coverage of each test, run the set of tests that you expect to cover the largest possible amount of relevant code in the allotted time. If they all pass, allow the change, but kick off the remaining tests in the background on an integration server somewhere. If one of those additional tests fails, automatically undo the change using your version control system and notify the developer of the failure. If other changes have been integrated since the failing one, you can either roll them back too (conservative and easy to implement) or roll back only the ones that intersect with the broken change, though of course you'll want to notify all the developers of apparently non-intersecting changes about the reversion. The obvious downside of this approach is that it allows broken code to be present in the integration repository for the duration of the remaining tests. If your developers are constantly breaking tests, that might be a problem, but in practice, you'll probably find that most changes don't break any tests at all, and the ones that do will often end up breaking one of the tests that runs within your time limit. With those two approaches combined, having a comprehensive set of tests for a code base doesn't act as a constant source of unacceptable delay for developers -- they'll only run the tests they need to run, and when that would take a while, they won't have to wait for all the tests to finish before moving on to the next task.

Steven Grimm

In [1] I describe a CCDD (Code Coverage Driven Development) environment. What I don't describe is the longer running tests. Functional tests to enforce 100% code coverage would be run before submitting code and after submitting code. Some fast running performance tests would also be run. However, some longer running (e.g. performance, stress, and fuzz) tests would simply be run all night on the latest build on the continuous build server. This seemed to work well because the quick running functional tests were so comprehensive. [1]

Simon Hardy-Francis

Gina Trapani of Think Up App had a good thread on Google+ on this: https://plus.google.com/113612142759476883204/posts/6Ed2XgC7Q5a In summary: Ram Disk and Caching. In addition to this â€¦ parallelize your testing. If you have three machines, divide up your tests into three sections and run them in parallel that way. You'd theoretically cut your testing time down to 20 minutes.

Alex Choi

If I had to bet, I'd guess that the client side integration tests are whats taking the longest. If so, I'd invest some time in the following areas: Look at prioritising your pre-commit tests to only high priority areas. Leave the CI server to take care of the bulk of your testing. If your client side/UI tests are browser based, consider using using the web based tools like Watir/Selenium for the execution part only, not the setup fixtures. For creating fixtures, use an http lib or restore your database from a backup. The time saved by removing UI fixture set up is enormous. Assuming the tests have been written to work statelessly, it makes sense to parallelize them. However this is usually not something that most companies planned for, so it can be costly to implement after the fact.

Michael Martinez

Retalix had a similar problem. Retalix was running 15,000 tests in each test cycle and each build took 12 minutes. After exploring many other failed options, they tried IncrediBuild and: Â· The tests time went down from 12 minutes to 1:20 minutes Â· The full test process duration was reduced by hundreds of percents Â· More test cycles per day enabled major delivery improvement. I would recommend you to take a look on how Retalix used Incredibuild in order to reduce testing time: http://www.incredibuild.com/retalix-case-study.html IncrediBuild accelerates build time through efficient parallel computing. IncrediBuild can accelerate development tools, including any unit testing and mocking framework. FYI, there is a free version available for up to 8 cores. (Disclaimer: I work for IncrediBuild)

Uria Mor

Have you thought about running the tests on the cloud? You can build your own infrastructure using AWS (or similar) and/or use a service like SauceLabs (http://saucelabs.com) to run your Selenium/WebDriver tests. In the ideal world you should not have to wait more than 10 minutes after committing code to receive feedback, but if even running the tests on the cloud takes too long, and assuming that you already tuned your tests, at least you can keep coding while waiting for the feedback. I also like the idea of having subsets, I don't think you really need to run all the tests (functional, integration...) after each commit.

Leonardo Ribeiro Oliveira

I just to want to add that from software architecture design perspective, redesign / refactor your system using Microservices Architecture(MSA) Pattern also would help greatly.MSA designs "break up your systems into multiple, interdependent services or systems which have their own build process, test suite, etc.", as said it in . Speeding up the running time of test suites is one of the many benefits MSA solves. Since each interdependent services has their test suites, all of each can be easily run parallelly, which in turn speeds up the overall test run time. A well-designed MSA has clearly defined API contracts between each micro service, this will also simplify each micro service's own test suites.Obviously MSA does introduce additional overhead of integration testing. But in the grand scheme of things, reduced system complexity and much faster unit test suites running time far outweigh the insignificant integration tests. Footnoteshttp://martinfowler.com/articles/microservices.html

Ye Wang

In my experience, one of the most effective (if not the most effective) means of cutting down the run time of a test suite is to run the tests in parallel. This can take some work on your end if you didn't write the tests with parallelism in mind from the start, but it's hard to argue against the value when you can cut your build time down from hours to minutes by running more tests at once. For example, with http://www.solanolabs.com (full disclosure: I work on Solano's Customer Success team), one of our customers, HotelTonight, took about an hour to run their whole test suite before switching to us. Through our software's parallelization process we were able to cut that time down dramatically. (See: https://twitter.com/chaslemley/status/576698802352046080) If you haven't looked into parallelizing your testing process, I definitely recommend it. Nobody should run their test suite in serial. Cheers!

Nick Travaglini

Related Q & A:

How to keep an Android Service running?Best solution by Stack Overflow
How can fix Internet Explorer currently running with Add-on disabled? Anybody that can help me?Best solution by Yahoo! Answers
How do I connect a laptop running XP to a wireless HP printer?Best solution by Yahoo! Answers
How to do flips in Free running?Best solution by youtube.com
How to sing really fast without running out of breath?Best solution by wiki.answers.com

Just Added Q & A:

How many active mobile subscribers are there in China?Best solution by Quora
How to find the right vacation?Best solution by bookit.com
How To Make Your Own Primer?Best solution by thekrazycouponlady.com
How do you get the domain & range?Best solution by ChaCha
How do you open pop up blockers?Best solution by Yahoo! Answers

For every problem there is a solution! Proved by Solucija.

Got an issue and looking for advice?
Ask Solucija to search every corner of the Web for help.
Get workable solutions and helpful tips in a moment.

Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.