Having a large number of Maven modules can slow down project builds and test run times. To maintain a multi-module project structure and run tests quickly, we developed a new tool — Maven Modules Merger — which helped reduce the time of some builds from 50 minutes down to just 12. In this article, I’ll go into detail about which problems Maven Modules Merger has helped us with and also share some details about its creation.
At Wrike, we developed a SaaS project management platform — and named it Wrike!
With over 53,000 tests in the Wrike autotests project, we ensure that our product is top-notch. 16,000 of them are REST API tests, while the remaining 37,000 are Selenium tests. About 30 Scrum teams add 1,000 new tests monthly, and they’re constantly improving the old ones.
In the autotest project, we use Java 17, JUnit 5, and Maven as project build tools. We write Selenium tests using the HtmlElements 2 library, and we use the Retrofit library for tests for the internal API.
All tests are distributed across 250 Maven modules and are located within one project that contains over 1.6 million lines of code. A Maven module is a subproject, and you can work with it independently of other modules (for example, to run tests or compile code).
Why did we decide to use a multi-module project structure?
A multi-module project structure has a number of advantages over a single-module one. Let’s get into them.
1. The multi-module structure allows you to logically divide tests for different components of the product
The same effect can be achieved by splitting code into packages, but that kind of approach will not provide the other benefits of a multi-module project.
2. Dividing code into modules helps avoid the misuse of classes
When all classes are in the same module, there’s a chance the developer will make a mistake and misuse a class with a similar name.
Let’s look at an example. In our case, the classes TSDateInputSteps (Typescript version), DateInputSteps (Dart version), and DateTimeInputSteps are in different modules. This ensures that one class is not accidentally used in place of another. If a developer wants to use another class, they will have to explicitly specify the dependency in the pom file, which will be evident during code review.
3. A multi-module structure allows you to run tests in each module separately
To run tests from certain modules, you will need -pl — the Maven key — which accepts a comma-separated list of modules. When passing a list, only the specified modules and their dependencies will be compiled (instead of compiling the entire project), which saves time. On my laptop, for example, it takes 19 minutes to compile an entire project and one minute to compile a single medium-sized module. My laptop specs are as follows: MacBook Pro (16-inch, 2019), CPU 2.6 GHz 6-Core Intel Core i7, RAM 16 GB 2667 MHz DDR4.
4. You can make changes and add new tests, abstracting from other modules
You can open one module separately and work with it as a small standalone project in the IDE.
5. Splitting a multi-module project into separate projects is much more convenient
In this case, each module can become an independent project.
Why not just create 250 separate projects then? We had this thought initially, and the division into modules was a kind of insurance in case we decided to divide the monolith. It turned out that maintaining a large number of highly related projects is very difficult — changes happen all the time, and there’s often a possibility of conflicts between different versions of projects.
Splitting a monolith was relevant for a backend project. You can read more about this in this article.
With a small number of modules, the multi-module structure worked quite well, but over time the number of modules began to reach into the hundreds, and we faced some problems:
- It became difficult for us to manage the number of threads to run tests in parallel.
- Retrying tests after each module took up to 50% of the time of the entire run.
The problem of managing the number of threads
Having a large number of tests made it impossible to run them sequentially. If we would run them all, then we would have to wait for the results of the tests for over two months. That’s why we usually run tests in TeamCity in 80–150 threads, depending on the build type.
As the number of modules increases, it becomes progressively more difficult to manage the number of threads used to run tests. If there are fewer tests in the module than the number of threads, then the number of tests running in parallel will be equal to the number of tests in the module.
For example, if we run tests in 80 threads, then a module with 10 tests will only use 10 threads. Meanwhile, each module will wait for the tests of the previous module to finish.
If there are 81 tests in a module and the tests are run in 80 threads, then there will be only one thread per test.
To solve this problem, you can try running the modules in parallel with -T %number%. Here, -T is a Maven parameter that runs the build of %number% modules in parallel, if possible.
However, this solution has a few drawbacks:
- In modules with a small number of tests, we will still have a small number of tests running in parallel (the first and second modules).
- In modules with a large number of tests, we will have too many threads (the third and fourth modules).
- Maven may not always be able to run modules in parallel (due to the interdependency of modules).
In a single-module project, this isn’t a problem — we can always easily adjust the required number of threads.
The issue of time-consuming test retries
At Wrike, we use a modified surefire test runner that can retry failed tests. It collects them and tries to run them again after all other tests. If one test fails in each module, then these failed tests will be restarted in one thread. Meanwhile, the tests from the next module will wait for exactly one test to finish. If the tests are in the same module, then it will take much less time to retry them.
It turns out that the multi-modularity of the project negatively affects the test run times.
But we didn’t want to give up the advantages of a multi-module project, and we didn’t combine all the modules into one. We had another idea: What if we just merged all the modules into one every time we would run tests in TeamCity? Then we’d be able to enjoy the benefits of a multi-module project at the test development stage as well as the advantages of a single-module project at the test run stage.
The idea of the new tool turned out to be simple: copy all the files to the new module, generate a pom.xml with all the dependencies, and run the tests in the new module. We named this tool Maven Modules Merger (or simply Merger).
Maven Modules Merger
Let’s take a look at the implementation details of Merger.
Merger’s input is:
- A comma-separated list of modules to merge
- The path to the project
- The path to the file to write the result into
- Operation mode: sources for source code or target for a compiled project
Why pass a list of modules when you can just merge all the modules? We pass the list of modules to Maven using the -pl parameter, allowing us to run tests only from certain modules. (In each build, we want to merge only the necessary modules.)
Merger’s algorithm looks like this:
- Define the list of modules to merge
- Copy the files
- Create a pom file for the merged_modules module
- Add the merged_modules module to the root pom file as a child module
- Write the resulting modules to a file
Now let’s analyze each step of the algorithm in detail.
Step 1: Define the list of modules to merge
At first, we wanted to allow all modules to be merged, but during implementation we encountered some difficulties. Currently, the resources folders of all modules are merged into a single directory, but our modules have different configuration files stored in this directory.
We use the test/resources/allure.properties file to separate API tests from Selenium tests in Allure. We merge only the modules with allure.properties for the Selenium tests and leave the rest of the modules (backend tests and others) as they are. There are only eight modules that run non-Selenium tests out of more than 250 in our project; they will not be merged.
In this step, we collect modules with the same configuration and remove duplicates from the modules list.
Step 2: Copy the files
In this step, we are copying the files from the collected modules into a new module named merged_modules in the project directory.
For the sources mode, we copy all files that are in the src folder. For the target mode, we copy all files from the target/classes and target/test-classes folders. The latter mode can be useful if you have an already compiled project and don’t want to compile the code again after running Merger. We keep a compiled version of the master branch code to save time on compiling the same code multiple times in different TeamCity builds.
In our project, there often were intersections in file names between different modules, leading to conflicts during the copying. To solve this problem, we gave packages in different modules unique names that matched the name of the module.
Step 3: Create a pom file for the merged_modules module
To turn the merged_modules folder into a module, you need to add a pom file to it. In the pom file for the merged_modules module, there’s only a list of all the dependencies of the merged modules.
Sometimes modules may depend on different versions of the same libraries. In our project, all modules use version 1.0-SNAPSHOT, so there are no version conflicts when using them as dependencies. Conflicts can arise when using third-party libraries, so we configure their versions in the root pom file.
Step 4: Add the merged_modules module to the root pom file as a child module
The next step is to add the generated merged_modules module as a child to the root pom file so that the Maven module structure is correct and Maven can run tests in the new module.
Step 5: Write the resulting modules to a file
In the final step, we write merged_modules and all modules that were not merged to the file separated by commas. This list of modules will later be used to pass to the -pl parameter.
After executing the algorithm, we get a project with a new module that combines most of the modules, plus a file with a new list of modules.
The execution time of such an algorithm on a project with 12,000+ files and more than 240 modules on a remote TeamCity agent takes about one or two seconds.
To ensure that the agreements are not violated, we have added new unit tests and PMD rules. You can read about how to set up Checkstyle and PMD in our other article.
How we use Merger
We use Merger in TeamCity builds. To do this, we’ve created a separate template in which we’re trying to merge modules. If something goes wrong, as a fallback option, we run tests without using Merger.
Here’s a flowchart of such a build:
Merger implementation results
As a result, we found that builds launched on a large number of modules were significantly accelerated.
For example, a build of component tests running 11,000 tests for frontend components in 138 modules now took exactly half the time it previously took.
Some builds ran a predefined set of tests in all 250+ modules. With the introduction of Merger, such builds were two to four times faster. For example, a build that ran all screenshot tests took 12 minutes, while it had previously taken 50.
At Wrike, new product functionality is deployed daily. Before deployment, we run all the tests in the project, some of which are run in different browsers. The total run time of all tests after the implementation of Merger decreased by more than a third — from 12.5 hours to eight!
The total run time for 61,000+ tests is now 50 minutes. This is possible because we’re running some builds in parallel, and the number of threads in each build is 150.
Merger source code can be found on our GitHub. We hope that our tool will be useful in your projects, as well. We welcome your comments and thoughts. Enjoy!
This article was written by a Wriker, in Wrike. See what it’s like to work with us and what career development opportunities we offer here.
Also, hear from our CEO, Andrew Filev, about Wrike’s culture, values, the ways we work and appreciate Wrikers, and more here.