Endform logo

Reduce your CI cycle time: marginal gains

OS
Written by Oliver Stenbom

As the time it takes to create code decreases, your continuous integration pipeline is rapidly becoming a bottleneck.

Have you tried reducing your CI cycle time from 10 minutes to < 5 minutes? You might find it deceptively difficult.

Marginal gains

A diagram showing how synchronized dependent jobs can speed up your pipeline

In 2010, British Cycling’s performance director Dave Brailsford introduced the concept of marginal gains. They searched for 1% performance improvements in better pillows, heated shorts and hand washing techniques. Each individual change was both expensive and marginal, but the compound effects all added up. They won the Tour de France five times in six years.

The effect of compounding small improvements to repeated workflows like continuous integration is often undervalued.

Take a look at the calculator below to see how much time some small improvements might be worth to you. Many improvements might be no-brainers even though they feel like large expenses now.

Our complete deploy, check, and end-to-end test pipeline runs in under 4 minutes - worst case. Getting here took us many months and was the result of hundreds of small improvements.

The rest of this post is a collection of our favourite improvements you can make to your CI pipeline to reduce your cycle time.

Level: Easy

These are the quick wins. Most take less than an hour to implement.

Dependency caching

The easiest win in any pipeline, you’d be crazy not to do this.

- name: Install Node
  uses: actions/setup-node@v4
  with:
    node-version-file: ./package.json
    cache: pnpm # Varies by stack, every language has one.

Same goes for .next/cache, Gradle caches, cargo target directories - whatever your stack produces that’s expensive to regenerate. To improve cache performance, make sure that you’re using a good set of restore keys, and that your main branch is producing regular fallback cache artifacts for new branches.

Basic parallelism

Normally your “base” checks like linting, typechecking, and unit tests don’t depend on each other. Use primitives like jobs in Github Actions to run them in parallel.

jobs:
  lint:
    runs-on: ubuntu-latest
    steps: [...]
  
  typecheck:
    runs-on: ubuntu-latest
    steps: [...]
  
  test:
    runs-on: ubuntu-latest
    steps: [...]

This is always one of the easiest things to do, but it’s also worth taking into consideration what the wall clock setup time for your setup is. If installing Node and your NPM dependencies is eating a majority of the time these jobs take, and they aren’t bottlenecking your pipeline, you might just be wasting resources.

Sometimes, running the linter and the typechecker in the same job is fast enough!

Better runners

It can often be easiest to simply purchase better performance. GitHub’s default runners are fine, but Blacksmith and Depot offer faster machines with better caching (often at a lower cost). Here at Endform, we have the fastest end-to-end testing infrastructure for Playwright on the market, if that’s your bottleneck.

Cancel in-progress runs

When you push a new commit, kill the old run. There’s no point finishing a pipeline for code that’s already stale:

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

Level: Medium

These require more investment but can dramatically change your pipeline’s ceiling.

Globally shared caches

How much of the cache you’re restoring from can be used?

  • If you can only restore from caches within a PR (poor restore key), almost none.
  • If you can restore from the main branch, you can use a lot more.
  • If you can restore from global caches, between developers machines and CI, the work that is left to do is minimal.

Tools like sccache for compiled languages or Turborepo’s remote cache let you share build artifacts across branches and team members.

Our experience when using these tools is that they need careful tweaking. Lots of parameters such as environment variables default to busting the cache. If you don’t take the time to configure these tools properly, you won’t be reaping the benefits of global caching and might even slow things down.

Dependency detection

The fastest pipeline is no pipeline. The simplest possible way to implement this is to only trigger on certain files:

on:
  push:
    branches:
      - main
    paths:
      - "apps/cli/**"

Normally this simpler view buckles under more complicated use cases like monorepo dependencies. When this happens, upgrade to a proper monorepo tool like Turborepo or Nx that can detect what actually changed and skip everything else.

turbo run build test --filter='...[origin/main]'

In our opinion, working with dependency detection is one of the biggest long-term wins in this whole blog. It pays to start using a technique like this early, since it’s harder to retrofit to complicated systems.

Sharding tests across machines

If you have 100 tests that take 10 minutes sequentially, run them on 10 machines in 1 minute. Vitest and many other test runners have built-in support:

strategy:
  matrix:
    shard: [1, 2, 3, 4]

steps:
  - run: npx vitest --shard=${{ matrix.shard }}/4

As mentioned earlier, be wary of the installation setup cost when sharding. If it takes a long time to install your dependencies, adding more shards will give diminishing returns. If you’re interested in sharding Playwright end-to-end tests, we wrote about how we do it.

Level: Hard

These are more serious investments, and may take a little longer to get right.

Synchronized Job Dependencies

Particularly if you have jobs that are dependent on each other, waterfall-style dependencies might lead to a large amount of your wall clock time spent installing and setting up dependencies.

For our most dependent jobs, we instead trigger all jobs at the same time, communicate dependencies between jobs with artifacts.

A diagram showing how synchronized dependent jobs can speed up your pipeline

Without this, we would never have got our total wall clock time under 5 minutes. And even if some of these jobs spend some time waiting for other dependents, we find the trade-off worthwhile for us.

Doing this is tricky. We have to implement a custom action to figure out the difference between “rerunning all jobs” and “rerunning only failed jobs” on workflow retries.

Here’s a gist of the action we use to do this.

P95 speed, not average

It feels incredible when you can get most of your CI runs down to under five minutes. If 1 in 20 runs takes 15 minutes or unexpectedly fails, you will find that your overall experience still feels crap.

This is where your p95 speed and success rate become important. The kinds of problems and failures that you encounter at this stage will be highly individual to your setup. The only way you have of dealing with these problems is to start taking observability in your pipeline seriously.

You need to be able to see your most common failures and slowest runs over a longer period of time. All sorts of providers have offerings for this. From Datadog, to Blacksmith, to ourselves. Find something that works for you.

Once you’ve got something set up, it’s blood, sweat and tears from there. Don’t give up.

Radical change

A few final left-field ideas that might be worth considering when all else fails:

  • CI providers that offer self-hosted runners have been popular for decades with good reason. If you own the instances, your repositories can already be installed, your caches are already warmed, and your dependencies already set up. For example, node_modules can be kept around and cleared ~once a week. The operational burden of this approach should not be taken lightly.
  • Switch out your stack. Within your ecosystem you might be able to switch out the tools you use for faster ones. Hell, if we’re taking it this far, why not consider switching programming languages to one that can offer faster cycle times? At the very least, it’s a metric you should take into consideration when choosing your stack.

Maintaining gains over time

However much you do today to improve your pipeline speed, it will be difficult to make your gains hold over time. We add more code, we add more checks, we add more tests. Gravity is working against us.

But as soon as you stop trying to make these investments, you will begin to slow down. And in the very same way that marginal gains will speed you up, marginal losses are the most sure-fire way to place you behind your competitors.

Calculate your CI cost

Want to see how much slow CI is costing your team? Use the calculator below to find out - and see how much time it’s worth investing to fix it.

Cost of slow CI calculator

15 minutes
5 minutes
20 PRs
5 engineers
$135/hour

Potential savings

$117,000
Annual savings
108 days
Worth investing to break even
10m
Time saved per PR
100 PRs
Total PRs per week
16h 40m
Total time saved/week

Calculation: By reducing your CI cycle time from 15 to 5 minutes across 100 weekly PRs (5 engineers × 20 PRs each), you could save $117,000 annually. You could invest up to 108 working days optimizing your CI and still break even within a year.

⚡ Speed up your E2E tests

Endform runs your entire Playwright suite in parallel. What used to take minutes now takes seconds.

Get started for free → Trial includes 2000 free test minutes. No credit card required.

Frequently Asked Questions

What is Endform?
Endform runs browser based end to end tests for web applications quickly and reliably. We target the end to end testing framework Playwright.
How do I get started with Endform?
Getting started with Endform is easy! Just switch out one CLI command and you are up and running. We are fully Playwright compatible - no configuration changes needed.
How does Endform work?
Endform distributes your Playwright tests across hundreds of machines in the cloud. We run one test per machine, and coordinate the collection of results. This way your test suite finishes in the fastest possible time, while letting you focus on writing tests instead of managing infrastructure.
How fast is Endform compared to other runners?

Endform runs Playwright tests significantly faster than traditional runners by utilizing full parallelization and a highly optimized runtime.

We have seen speedups of some test suites of over 20x, and we can run most test suites in under 2 minutes.

Do you support other test frameworks than Playwright?
No. As of today we only support running Playwright tests. This lets us focus on providing the best possible experience for Playwright users. In the future we may consider adding support for other frameworks.