So you cut your cycle time by 10%.

...Or did you?

We are constantly experimenting. We make improvements, measure our performance and update standard work. In comparing the cycle time before and after a kaizen, you calculate a 10% improvement. Is it time to do declare victory and move on (as much as that's possible) to the next challenge?

How do you know if the measured improvement is “real”? That is, how do you know if the observed 10% improvement is significant or if it could have happened by chance? A simple t-test can help answer that question. It compares, in this situation, the average cycle time before the kaizen to the average cycle time after the kaizen and determines how likely it is that the change in cycle time occurred by chance.

If the 10% improvement has an 85% likelihood of occurring by chance, then we probably want to hold off on the celebration party. But if the improvement only has a 2% chance of occurring by chance, then we can be fairly confident that the change is real.

The interval plot below shows the 95% confidence interval for the mean cycle time before and after the kaizen. Based on our sample, we are 95% confident that the actual average cycle time before the kaizen is somewhere between 28.2 and 31.7 (by the way, this is purely an example, the unit of measure could be seconds, minutes, days, weeks, etc.). Likewise, we are 95% confident that the average cycle time after the kaizen is somewhere between 25.2 and 28.8. So, it is possible that the average cycle time after the kaizen is actually not less than the average cycle time before the kaizen, but clearly that scenario is unlikely.

How unlikely? That is exactly what the t-test calculates.

There are a number of software packages that perform t-tests. Two decisions need to be made prior to making the calculation: 1) should the calculation assume that the two samples have equal variances, and 2) are you interested in determining if the averages are not equal or if one is greater than or less than the other (i.e. what is the alternative hypothesis for the t-test)? For this dataset, the variances are sufficiently close such that they can be treated as equal and we are interested in determining if the average cycle time after the kaizen is less than the cycle time before the kaizen. Given these two assumptions, the likelihood that the observed differences in the average cycle time before and after the kaizen occurred by chance is 0.009 or 0.9%. (Statisticians refer to this as the p-value.)

So, given that it is unlikely that the observed 10% reduction in cycle time occurred by chance, the team can reasonably celebrate their improvement and move on to the next challenge.

## There are 3 Comments

In the example above, was the before-kaizen based on 10 operations and the after kaizen based on 8 operations? If so then the overall process time is reduced.

If not operations but observations, then why 10 observations before and only 8 after?

The beauty of the t-test is that it provides an answer for each of the scenarios you describe. I constructed the data set to have a different number of samples before and after the kaizen to illustrate that a t-test does not require that the data sets being compared have the same number of values. So the data can be interpreted as a comparison of a process with ten operations before the kaizen and eight after the kaizen, or the data could be interpreted as the cycle time of ten things that went through the process before the kaizen to eight things that went through the process after the kaizen.

Great question!

There is a simple way to calculate confidence intervals and assess if there is a real difference between two or more sets of data. The formula for 95% confidence intervals = Average +/- 2s/√n where s is the standard deviation of the data set and n is the number of data points in the data set.

Calculate the averages and the high and low confidence intervals for the two sets of data and compare the two. If the upper confidence interval of the lower averaged data set is the same as the average or higher of the higher averaged data set then there is no significant difference between the two sets of data. If the upper confidence interval of the lower averaged data set is the same or lower than the lower confidence interval of the higher averaged data set then there is a difference between the two sets of data. If your data is between these two points then you can check at what confidence you can declare that the data sets are different by calculating the confidence intervals at different confidence levels. For 80% confidence use 1.5 in the formula above, for 90% use 1.75 and 2.5 for 99% confidence level.

For the data presented in this article the 95% confidence interval for before Kaizen is 30-2x2.5/√10 = (30-1.58)=28.42. For the after Kaizen data the upper confidence interval is 27-2x2.2/√8 = (27+1.56)=28.56. Comparing 28.42 to 28.56 there is an overlap but very small. If this calculation is repeated using 90% confidence intervals the values are 28.62 and 28.36 and now there is no overlap meaning that the two data sets are different at 90% confidence.