No one cares that your unit tests are fast
You spent a few weeks tuning your unit tests - a 2x speed improvement from 60 to 30 seconds! But your colleagues don’t seem to give a ****. What gives?
The best unit/integration test suites feel snappy. Thousands of tests that run in seconds help us move faster as software engineers. Fast and stable test suites give us the confidence to ship lots of incremental changes.
But many scarred DevOps engineers, such as myself, might tell you - it doesn’t matter how performant your unit test suite is, your bottleneck will slow you down. Let’s get to the bottom of why, and what you can do about it.
Finding your CI/CD bottleneck
You’ve already started talking to a colleague about another project. You’ve forgotten that your pipeline was even running. You’ve lost context. Maybe there’s even someone waiting for your failed pipeline to be re-run before it’s their turn. This waiting time adds up - across your team, it could be costing you days of productivity each week.
The theory of constraints tells us that the overall throughput of any system is limited by its slowest component. Said in another way: there’s no point in optimizing your faster pipeline jobs. Even though it’s tempting, even though it’s easier.
Consider the full pipeline
When diagnosing pipeline slowness, it’s important to look at the full picture. Fancy automated metrics are great, but don’t wait for that if you don’t have it! Just looking at a few recent runs and drawing a diagram will get you most of the way there:
Ask yourself “what speedup would have the biggest impact on my life?”. That’s probably the best place to start.
“I’m bottlenecked by my build”
Luckily for us, there are some fairly cheap and simple upgrades that can speed up builds. We’ve used both depot.dev and blacksmith as alternate runners for our github actions to bring the cost-efficiency of builds down. Remember that raw compute normally isn’t everything here - caching portions of earlier build results often has a larger impact. Docker caching, compiler caches like sscache or simply loading a recent .next/cache
directory might do the trick!
“I’m bottlenecked by my deployment”
You may be at the mercy of your infrastructure provider. This problem is probably the most costly to fix. A few pointers:
- Don’t be afraid to switch providers, or use a separate, second provider to help speed up your development pipelines
- Ask about preview environment capabilities early. Some of the biggest increases in developer productivity I’ve seen in medium sized software organizations came from making preview environments available in every pull request, rather than waiting for a failed build stuck in a staging environment.
“I’m bottlenecked by my end to end tests”
Similarly to slow builds, slow end to end test jobs can be sped up by using more suited infrastructure. Playwright’s documentation on sharding explains how distributing tests across several machines can have a great impact here. At Endform, we’ve taken the sharding idea to the next level, running each of your tests on a separate machine.
“I’m bottlenecked by inconsistent, flaky pipelines”
If your pipelines are inconsistent, surfacing the most common pipeline failures is key. Do you have the right tools to answer the questions like “what was the most frequently occurring pipeline failure in the last month”?
- There is no silver bullet - these problems need taking head on, otherwise they will slowly undermine the confidence of your engineers
- Good reporting helps to spend time wisely, for example with Endform, you can spend time on the tests that most frequently cause headaches, and solve them easier with the right data.
Undervalued option - locally runnable pipelines
Most engineers sit with pre-warmed and powerful machines. Can you write pipelines that engineers can run locally instead? For me we’re getting close to the ultimate feedback loop here. A few things to consider:
- Everything should be possible to run locally - including spinning up a development environment and running end to end tests against it.
- Do you have a method of “proof of work”? Engineers should not have to run a full pipeline a second time on push if you want to go this route. Monorepo solutions with remote caches like turborepo and nx often come with this out of the box.
First things first
- Optimise the right thing first, follow the data - even if it’s not perfect.
- Don’t forget to consider the total feedback time - including CI setup, time of the build, time to collect results.
- Keep self-diagnosing your bottleneck - resist the urge to optimise the wrong things too early
Also if your end to end tests are slowing you down, Endform can probably help. We are probably the fastest way to run end to end tests you’ve ever seen.