As you might already know from your own experiences, tests can often be unstable. When that happens, you can retry tests several times, which may lead to doubling or even tripling the build time.

In this article, we’ll explain how we solved this problem and share a tool that our engineers developed for successfully retrying failed tests in parallel.

Work at Wrike

Our autotest project contains over 53,000 tests that we run in anywhere from 80 to 150 threads, depending on the build. However, we found that the majority of the build time is often occupied by retries of several tests that don’t use all the threads, and we wanted to find a way to reduce this. (After all, we pay for the dynamic agents in TeamCity and the dynamic environment!)

Here’s an example of a build timeline from Allure. In this build, 50 seconds of work out of 90 is spent retrying one test:

test retries 1

Because of this result, we wanted to reduce the retry time by using more threads.

The problem of long retries in JUnit 5

In the autotest project, we use Java SE 17 and JUnit 5, as well as Maven for the project building tool, so the tests are run via the Maven Surefire Plugin.

Previously, we used JUnit 4 and the Surefire Plugin would retry the failed tests of each class without waiting for the first run to finish.

test timeline
With JUnit 4, retries of Tests 1 and 2 initiate regardless of the completion status of Test 3, provided all tests are in different classes.

But now, with JUnit 5, the Maven Surefire Plugin waits for Test 3 to finish first before retrying Tests 1 and 2. This increases the test runtime for a project with a large number of classes.

test timeline 2
With JUnit 5, Tests 1 and 2 are retried only once Test 3 is complete, even if the tests are in different classes.

As the number of modules in our Maven project grew, this problem became even more acute. Each module would wait for the test run to complete and then retry the tests — only after that did the next module’s tests start.

test timeline 3

We partially solved this problem with our proprietary tool, Maven Modules Merger, which reduces build time by merging several Maven modules into one. You can read more about that in this article.

But even with our Merger tool, retries could still take up most of the build time (see the “All tests in one module” scenario in the image above).

So we had an idea: What if we could retry tests in parallel and not wait until the test fails several times in a row? It would certainly take much less time. A test would be considered passed if it had passed at least once already.

Here is a timeline for parallel retries:

parallel retries

The idea is that each test would be retried several times in parallel, increasing the number of repetitions for each test but reducing the overall build time. This method would also provide us with more statistics for failed tests because they will run more times.

The only question that remained was whether the tests retried in parallel could provide the same success rate as those retried sequentially, so we decided to give it a go.

However, we didn’t find any ready-made solutions for parallel retries. We tried to modify the JUnit 5 extension from junit-pioneer, but it’s implemented through TestTemplate, meaning we couldn’t use it with another TestTemplate (e.g., with parameterized tests — see issue #405). For that reason, it wasn’t possible to modify RepeatedTest. It’s a TestTemplate, which doesn’t work with parameterized tests, either. JUnit 5 does not support even sequential retries by default.

So we decided to extend the JUnitPlatformProvider class from the Maven Surefire Plugin, which can retry tests sequentially.

Implementing parallel retries

During the implementation, we encountered the following two major problems:

  1. The Allure report might mark a test as failed even if it has passed once.
  2. The standard JUnit 5 synchronization mechanisms only work within a single test run. This means that @ResourceLock, @Execution, and @Isolated annotations will not work correctly in a parallel retry.

Fixing the Allure report

During a parallel retry, a test might be marked as failed in the Allure report (i.e., an earlier retry succeeded, and a later one failed). This is because the results of each test run are sorted by start time, as shown below:

allure test

We wanted a test that passed at least once to be marked as a pass. To do this, all failed tests need to start before the successful one.

The logic for determining the order of retries cannot be changed — retries are sorted directly when compiling an Allure report from the result files. However, in these files, you can replace the start times of failed test attempts with the retry start time. This solution ensures that, when sorting the results, one of the successful retries is always the last one.

allure test 2

You can modify Allure test results via TestLifecycleListener. With TestLifecycleListener, we can change the start times of all failed attempts in a parallel retry. The actual test run time can be recorded in a separate Allure label, if necessary.

After these changes, a test that passes at least once will be marked as a pass. All successful and unsuccessful run attempts (except the last one) will be recorded in the Retries tab in Allure.

Tests start almost simultaneously, so a shift of a few milliseconds won’t be noticeable on the timeline. Below is an example of retrying one test in parallel without shifting the start time:

shift start time
The shift of the start time will not visually affect the timeline.

JUnit 5 synchronization mechanisms support issue

A test can be executed only once within a single test run so, for a parallel retry, you have to do several runs in parallel. Also, not all parallel runs may fit in the allocated number of threads, so it will be necessary to divide the tests into different runs.

The image below shows how tests can be retried with six dedicated threads:

six threads

JUnit 5 only synchronizes tests within a single run, so parallel retries ignore the JUnit 5 annotations for synchronization (e.g., @ResourceLock, @Execution, and @Isolated). In the example above, all retries of Test 1 will be executed in parallel, even if the class or test has synchronization annotations.

We don’t use JUnit 5’s synchronization mechanism in our tests because ours are completely independent of each other. If you use JUnit 5’s synchronization mechanism, modify your retry logic to run these tests separately.

Parallel retry testing conditions

In the end, we abandoned the idea of ​​always retrying tests in parallel. Now, we only retry tests in parallel if the following conditions are met:

  1. All retry attempts fit in the allocated number of threads (i.e., the number of retries remaining * the number of fails ≤ the number of threads used to run tests).
  2. The tests had to be retried more than once — otherwise, a parallel retry would be no different from a sequential one.

If the above conditions are not met, we do one sequential retry and check the conditions again until all failed tests have been retried the specified number of times.

Below is a block diagram of the algorithm for running and retrying tests:

algorithm diagram

Results of implementing parallel retries

After the introduction of parallel retries, our builds sped up by about 10% and, at the same time, the number of tests in builds increased by 26%. But such acceleration was not entirely free! We collected statistics on 10 million test runs before and after the implementation of parallel retries and calculated the effectiveness of different retries for our project and infrastructure.

Here are the results:

results

The following conclusions can be drawn from the table above:

  1. Four parallel retries are 5.2% more successful compared to a single retry, but they take the same amount of time.
  2. Four parallel retries are 1.7% less successful than four sequential retries, but they’re much faster.

Let’s look into what the percentage of retries means. Say we ran 10 tests, and six of them failed. Then, after the retry, three out of six failed tests passed. This means the percentage of successful retries is 50% (3/6 = 50%). This metric measures how much retries help tests pass.

results timeline
This timeline unpacks the percentages of successful retries.

The main purpose of parallel retries is to speed up builds with tests. We achieved this but, at first, the percentage of successful retries fell sharply. This happened due to, among other things, the fact that we loaded the infrastructure more than before. But we found the bottleneck, fixed the bug, and began getting more successful parallel retries.

Parallel retries without success

Here are a few reasons why parallel retries haven’t been very successful:

  1. The infrastructure can’t handle so many parallel runs.
  2. The temporary unavailability of services negatively affects the success of parallel retries, since all of them start at the same time.

We continue to set up parallel retries and improve the test infrastructure. However, the question of whether tests should be retried faster if their success rate drops remains up for debate.

Given that the number of autotests in our project is constantly growing, we often encounter new problems. Sometimes the decisions we make are successful, and sometimes they compromise our goals — but they always lead to a deeper understanding of the infrastructure and how to improve the project.

You can try parallel retries by forking from Wrike GitHub. We would love to hear your comments and suggestions on our continued work! Please note that the code has its limitations and may not be suitable for all projects.

This article was written by a Wriker, in Wrike. See what it’s like to work with us and what career development opportunities we offer here.

Also, hear from our founder, Andrew Filev, about Wrike’s culture and values, the ways we work and appreciate Wrikers, and more here.

Work at Wrike