North Star metrics provide a complete insight into how your engineering team is performing. From tech debt to product delivery, they provide a holistic measure of the health of your Software Development Lifecycle. Starting from these metrics and drilling-in, you are able to identify the constraints holding your engineering team back and allow them to reach the next level of effectiveness.
Whilst it is important to look at all North Star metrics as you bring in improvements to avoid unintended side-effects; of these metrics, Cycle Time is particularly important for most engineering teams. Optimising Cycle Time not only helps the team improve delivery against their business goals, but also improves the day-to-day developer experience of engineers on your team.
Cycle Time ultimately reflects most pain-points developers experience; from tech debt to the code review process. As you drive down Cycle Time; developers face less WIP (concurrent work in progress), less rework, and fewer inefficient workflows.
If you haven't read about Cycle Time yet, we suggest reading this article first: Drilling into Cycle Time: The Secret Weapon of Efficient Engineering Teams
Resolving Bottlenecks in your Software Engineering Team
Step 1: Get a Baseline
First we start by measuring our North Star metrics. These give us a picture of what the overall performance of the engineering process is. This should capture sufficient amounts of data to be statistically significant.
Establish a baseline Cycle Time by looking at the team's 6 month average. This tells us how long the average time from first commit to Pull Request merged and will help us diagnose issues that appear between these phases.
Step 2: Drill Down to Identify Improvement Areas
Next we should seek to understand where the constraints are in the software engineering process.
In the first instance, you can dive into granular metrics to understand what is constraining the North Star metrics. Is most the Cycle Time being spent on development time or review time? How much of the review time is being spent on First Response Time, Rework Time and Idle Completion Time?
Once we've drilled as far as we can into our problem area, we start to review the outlier data points. From here you can identify why they are outliers. For example; you may find that Pull Requests with a particularly long First Response Time are those which Haystack flags as having a Big Diff risk factor. This could tell you that your team are being deterred from reviewing particularly large Pull Requests.
On Haystack charts, you can select individual scatter points on a trend graph, or individual bars on a distribution histogram:
Doing so will open up a table showing all the associated Pull Requests below these charts - from here you can view the critical attributions, such as those indicating risk factors like "Big Diff" and Merge Without Review". In fact, you can even see the title, author and drill-in to look at the Pull Request on GitHub itself.
Haystack also offers a range of Leading Indicators (such as Pull Request Size, Work in Progress and Weekend Activity) that you can use to focus in on a problem area. By using the filters in the Haystack Dashboard to zone in on a problem area, you can identify unhealthy trends in these Leading Indicators which could be affecting your North Star indicators.
Be sure to use the powerful set of filters that Haystack makes available to ensure you are focusing your search in the right place:
Step 3: Understand the Root Cause and Propose Fix
After identifying the constraint, you can set about identifying the root cause and finding out how you fix it. You may find there are various technical, process and human factors at play that you need to remedy.
Often the Root Cause of an issue can be 3-5 levels deep and require exploration to understand. The 5 Whys technique is a particularly powerful way of doing this that has its origins in the Toyota Production System. By repeatedly asking "why?", you can identify the root cause of an issue.
The First Response Time is high. Why?
The Pull Requests are too big. Why?
The work isn't adequately broken down. Why?
The engineers have to deploy all their work together. Why?
Root cause: There is no way to do feature flipping in production.
Five Why's Analysis can be a complex process, we've attached external some resources to help you perform this:
Kanbanize - 5 Whys: The Ultimate Root Cause Analysis Tool
UK National Health Service Improvement - Root cause analysis using five whys
Wikipedia - Five whys
This part of the process involves collecting feedback from your engineering team and using professional engineering judgement to identify technical problems. It involves, not just quantitive data, but qualitative data too.
Critically, you must then apply management and technical skills to be able to prioritise and ultimately fix the problem.
Step 4: Repeat Frequently
Once you've got a fix in place, it's tempting to just keep measuring the Leading Metric that is associated to. For example; if your constraint was that your Pull Requests were too big, you could monitor the Pull Request Size metric as it keeps going further and further down. However, there becomes a point at which it is no longer the constraint.
There becomes a point where the constraint is broken and the area for optimisation moves elsewhere. Instead of focussing on local metrics, look at the Global North Star metrics. When constraints are broken, begin the process afresh to identify and break the new constraint.
Be careful to not let inertia lead you to continuously optimise something that is no longer the constraint.
This is a simplified and applied version of an optimisation process known as Theory of Constraints. For a more generic version of this process, see The Goal by Eliyahu M. Goldratt.
A summary video from Goldratt can be found here:
Tips to Reduce Cycle Time
Below we've included some rule-of-thumb tips we've found have helped teams cut their Cycle Time. This is what we've found to be the best (and easiest) ways to improve Cycle Time:
Limit Work In Progress (WIP)
Smaller Pull Request Size
Faster Development Time
Remember that every team is different and all teams face different constraints. Whilst these tips may give you ideas on the things to be mindful of, be aware that your challenges may be elsewhere and a root cause analysis will find different constraints.
Limit Work In Progress (WIP)
One of the largest levers we have to optimise Cycle Time is WIP (Work In Progress).
Little's Law does a great job explaining this universal truth about software development but to save you some reading, we've summarised it as: Reducing WIP leads to faster cycle times.
Gerald Weinberg's research shows that when adding a single extra project to a developer’s workload, 20% of their time is eaten up by context switching. When a third task is added, this increases to half of their time wasted as they struggled to move between tasks.
Programming, by nature, requires developers to keep thousands of things in their heads at any given time. If you disrupt their workflow by forcing them to multitask and switch between multiple projects at once, you greatly hinder their ability to think deeply and effectively about the work they’re doing.
But that's not all.
Since developers are not longer splitting time between multiple tasks at once, they can focus on a select few projects - leading to higher quality code as well. This leads to fewer bugs, rework, and inefficiency.
Reducing WIP makes developers faster, more productive, less stressed and producing higher quality code.
That's what we call a "Win Win Win... Win... Situation"
Work in Small Batches (Smaller PR Size)
Working in small batches has dramatic effects on Cycle Time. By focusing on small, manageable pieces of work - developers move quickly through the delivery lifecycle from development to review with less friction.
Smaller pull request size (aka working in small batches) also has these benefits:
It reduces the time it takes to get feedback on changes, making it easier to resolve problems.
It increases efficiency and motivation.
It prevents your organisation from succumbing to the sunk-cost fallacy.
Code Spends Less Time on your Laptop (Faster Dev Time)
Faster development times means that code spends less time on someone's laptop. Working on this goal essentially means that changes are likely smaller and easier to review. Smaller, easier to review changes are more likely to get reviewed sooner as they aren't a large disruption to other team member's workflows.
By making our changes smaller and faster to review, we're able to get feedback sooner - which is especially helpful if we're headed in the wrong direction.
This also signals to other team members what you're working on and in which portion of the code - allowing the team to avoid nasty merge conflicts.
Finally, faster development times help us reduce our engineering tendency to overthink, overbuild, and/or prematurely optimize code. All of these factors roll into the larger theme of decreasing Cycle Time.