In this article we'll go over definitions and how to utilize leading and lagging engineering metrics.
Lagging Metrics (Start here if you are new)
Change Lead Time
Definition: Time taken from first commit being made to that change being fully deployed in production. Haystack calculates it as 85th percentile time.
Why it’s important: Change Lead Time measures the entire time it takes to propose a software change through to that change actually being in the hands of a user. As this is often the most reproducible part of the software development process, it is the easiest to optimise.
Due to it's importance, it is one of the most popular metrics for engineering organizations being highlighted in the Four Key DevOps Metrics and Google’s State of DevOps reports
How to use it: Having a holistic picture of development cycle allows you to drill into problem areas. For example; after seeing typical Change Lead Time, you may notice that work is slowing down during Code Review. Drilling further, you might see that it takes too long for first code reviews to be completed due to slow builds blocking the approval workflow.
By first looking at the global picture and then drilling down into local areas, you are able to improve the entire Software Development Lifecycle, instead of just one area.
Benchmarks: See further
Definition: The amount of work done that are dealt with in a particular amount of time.
Haystack calculates it as
# of pull requests merged per member per week.
Why it’s important:
# of pull requests merged is a volume-based metric. It's a counter-part is Change Lead Time. Best teams want fast Change Lead Time and high throughput. Together, they give a clear picture of how fast and how much our team is delivering.
Haystack adds an additional
per member dimension to throughput scoping it down to per unit work done representing a much more useful metric.
Note: We recommend tracking Throughput alongside Change Lead Time and quality metrics. If not done, this might lead into undesired outcomes.
How to use it: When doing any changes, our goal is to improve unit productivity. Any initiative we aim to improve on Change Lead Time, should be cross checked with throughput to ensure it positively effected both the unit cost (time) and the unit delivery (volume).
Teams with great deployment infrastructure, can use deployment frequency as a throughput metric.
Highest: >8 pull requests per member per week
High: >4.5 pull requests per member per week
Medium: ~3.5 pull requests per member per week
Low: <2.5 pull requests per member per week
average - (average/2) | average + (average/2)
Definition: The number of times in a given time period that an engineering team will deploy a change to production.
Why it’s important: Deployment Frequency is a volume-based metric and can be used as a throughput metric. Its counterpart is Change Lead Time. Best teams want fast Change Lead Time and high Deployment Frequency. Together, they give a clear picture of how fast and how much our team is delivering.
It is one of the highlighted Four Key DevOps Metrics.
How to use it: Teams with great deployment infrastructure, use deployment frequency as a throughput metric - how often do we get value out of the door.
Teams with poor deployment infrastructure, it's better to use it as a platform team metric to see improvements on deployment infrastructure.
Until the deployment infrastructure is being improved, it's better to use
# of pull requests merged as a throughput metric.
Benchmarks: According to DORA, best teams are able to deploy multiple times per day.
Change Failure Rate
Definition: Percentage of changes to production resulting in degraded service and subsequently require remediation.
Why it’s important: We want to ship fast and not break things. Deployment frequency tackles the first part whereas change failure rate tackles the second part. Can be reframed as "How health are our deployments?".
It is one of the highlighted Four Key DevOps Metrics.
How to use it: Tracking change failure rate effectively requires rigorous process to tag deployments if they resulted in a degraded service. Most teams are not able to track it effectively due to requiring manual input.
A much easier metric to track is Full Resolution Time for Bugs.
Benchmarks: According to DORA, best teams have less than 15% rollback/hotfixes per deployment.
Haystack analyses shows best teams actually have less than 5% change failure rate.
Healthy Region: Based on statistic, 0-15 all the time
Mean Time to Recover (MTTR)
Definition: Time it takes to resolve a customer support bug when it is raised with an engineering team.
Why it’s important: Mean time to recovery is an essential metric in incident management as it shows how quickly you solve downtime incidents and get your systems back up and running.
How to use it: MTTR is a platform team metric. Uptime (mean time between failures) and MTTR are the opposites sides of the same coin.
Tracking uptime incentives the platform team to have less failures leading into less changes.
MTTR incentives the platform team to have short failures allowing the team to have more changes.
Benchmarks: According to DORA, best teams have less than 1 hour MTTR.
Full Resolution Time For Bugs
Definition: Time it takes to resolve a bug when it is raised with an engineering team. Haystack calculates it using 85th percentile time.
Why it’s important: Resolving bugs quickly after they are reported directly improves customer satisfaction.
How to use it: We recommend using FRT initially with highest and high priority bugs. Once the team has sufficient process to ensure critical issues are fixed fast, the next step is moving down the ladder to medium priority bugs.
Benchmarks: No benchmark available
Definition: Average of your sprints completion rate. Each sprints completion rate is calculated as
# of issue completed / (# of issues committed + # of issues injected) .
Why it’s important: Predictability is one of the most underrated and important metrics an engineering organization can track. For the developers, high predictability results in high degrees of accuracy in accomplishing achievable goals without creating burnout. Outside of the development team, the ability to reliably forecast delivery dates builds credibility and also prevents other teams and functional groups accurately plan their associated activities.
How to use it: If your team has lower than 85% predictability, look into the following and have retrospectives with your team.
Injected issues (issues that was prioritized after the sprint started)
Rolled issues (issues that was not completed and rolled to the next sprint)
There are a lot of tactics you can try from sprint story pointing, buffer times, spike tasks and many more.
The iteration process on improving predictability typically takes 1.5-2 months for a team to improve sprint predictability from 40% to +85%
Note: If your team has 40% predictability, you might want to 2.5 your estimations in the meantime.
Benchmarks: A healthy team is 100% (± 15%) predictable — they consistently deliver 85% to 115% of what they estimate.
Definition: Time from first commit to opening a pull request. Haystack calculates it as 85th percentile time.
How to use it: Development Time is a part of Change Lead Time. We want to improve time spent coding, and decrease time spent reviewing.
Definition: Time from open pull request to merging a pull request. Haystack calculates it as 85th percentile time.
How to use it: Review Time is a part of Change Lead Time. We want to improve time spent coding, and decrease time spent reviewing.
Review time has sub categories, each focusing on a different part of the delivery process. Having more focused metrics allows us to see where might our bottlenecks be and allowing our initiatives to be hyper focused on a certain problem giving faster feedback loop.
First Response Time: Time from open pull request to first comment
Rework Time: Time from first comment to the last commit
Idle Completion Time: Time after rework is completed to merging a pull request
Definition: Time from merge pull request to releasing a pull request Haystack calculates it as 85th percentile time.
How to use it: Release Time is a part of Change Lead Time. We want to decrease time time spent on release ensuring the value created is being delivered faster to customers hand.
Time to Respond
Definition: Time between each comment or commit inside a pull request made by different members. Haystack calculates it as 85th percentile time.
How to use it: Useful when you are iterating on process or tooling that would improve how fast engineers are unblocking teammates. See the results if the iteration actually improved team members commenting on each others pull requests.
Time to First Commit
Definition: Time between engineer joining Github and the first commit.
How to use it: There is really only one problem onboarding engineers effectively, and that is not having a formal onboarding process. Time to First Commit typically highlight 2 opportunities (1) lack of formal onboarding process (2) complex development environment.
Benchmarks: Best engineering teams have Time to First Commit <24 days
Time to 10th Pull Request
Definition: Time between the engineer joining Github and the 10th pull request they open.
Time to 10th Pull Request is a metric lead by Spotify. We have seen that typically 10 to 20 pull request is a good measure for majority of the teams.
How to use it: Time to First Commit captures the initial bottlenecks of the onboarding process, where as Time to 10th Pull Request answers "When do my engineers start being productive?". Best way to improve it is typically having a formal onboarding process that covers everything until the engineer is fully onboarded.
Benchmarks: Best engineering teams have Time to 10th Pull Request <1 month
Lines of Code
Definition: Median # of lines of code (added + removed) changed.
How to use it: This is a leading metric to improve Review Time. Smaller pull requests lead to more intentful and faster reviews.
We recommend avoiding creating pull requests that are more than 600 lines of code change. Research shows there is an exponential curve of how fast a pull request is completed vs. how many lines are added after 600.
Fastest pull requests typically less than 200 lines of code changes
Moderate-speed pull requests typically have 200 - 600 lines of code changes
Slow pull requests typically have more than +600 lines of code changes