Haystack

Haystack pulls data from version control systems. Version control is a system with long-tail distributions. Some of these long tails might be bottlenecks in your system, however, some might be just not so important tasks related to your business skewing your data drastically. We call the latter ones outliers. In this document, we'll go over the best practices to handle outliers. 

I'd like to give you 2 types of outliers we see the most often. 

Pull requests with change lead time &gt; 1 year

1. Pull requests with change lead time &gt; 1 year
2. Pull requests with LoC &gt; 20000

In both these situations, we need to make sure if the pull request delivers proportionate business value or not while affecting the data. This is a subjective element thus requires manual action. 

The first issue usually happens when there is an idle pull request like a configuration change. 4 lines changed, nobody reviewed the pull request for years, then it was merged. Suddenly we have a huge spike in our change lead time, indicating we are doing 10x worse than before. In reality, we're doing exactly the same with a long tail data point skewing our data. 

<i>The spike is caused by an idle pull request merged after 2 years</i>

Second issue usually happens when a developer pushes database seed data, huge json files, auto-generated files, linting the codebase, etc. These tasks provide business value but it skews the average in a disproportionate manner. 

For both these cases, we recommend going to <a href="https://dash.usehaystack.io/app/settings/filters" rel="nofollow noopener noreferrer" target="_blank">filters page</a> and excluding pull requests by labels or terms. 

The best practice Haystack recommends is adding a Github label named <code>haystack-ignore</code>. Whenever there is an outlier, add this label at Github allowing multiple teams to have a consistent way to manage outliers. 

If you have questions contact <a href="mailto:support@usehaystack.io" rel="nofollow noopener noreferrer" target="_blank">support@usehaystack.io</a> and we can help you with your outliers.

How to make your data represent your work flow, without skewing your data

Outliers are skewing my data

Find answers and get help from Intercom Support and Community Experts

This site employs cookies and other technologies that we and our third party vendors use to monitor and record personal information about you and your interactions with the site (including content viewed, cursor movements, screen recordings, and chat contents) for the purposes described in our Cookie Policy. By continuing to visit our site, you agree to our {websiteTermsLink}, {privacyPolicyLink} and {cookiePolicyLink}.

This site uses cookies and similar technologies ("cookies") as strictly necessary for site operation. We and our partners also would like to set additional cookies to enable site performance analytics, functionality, advertising and social media features. See our {cookiePolicyLink} for details. You can change your cookie preferences in our Cookie Settings.

We use cookies to make our site work and also for analytics and advertising purposes. You can enable or disable optional cookies as desired. See our {cookiePolicyLink} for more details.

You have the right to opt out of the sale of your personal information. See our {cookiePolicyLink} for more details about how we use your data.

Your Privacy Choices

We use cookies to enhance your experience. You can customize your cookie preferences below. See our {cookiePolicyLink} for more details.

Cookie Settings

Link, Press control-option-right-arrow to exit

Empty Help Center

Uh oh. That page doesn’t exist.

Disappointed

Neutral

Smiley

Thinking...

Searching through sources...

Analyzing...

Tickets submitted through the messenger or by a support agent in your conversation will appear here.