Holistic Methodologies: Odd bed partners (Six Sigma and PMLC/SDLC), but Harmonious Relatives
Design of Experiments (DOE)
It has been said that one of the first signs of madness is doing the same thing over and over again and expecting a different result. If you have worked on a scrum team or used some other iterative methodology this concept will be familiar. Outwardly, iterative teams may appear to be doing the same thing over and over again, but in fact successful teams are making changes -- sometimes very small changes -- that can make dramatic differences in their results.
What are the differences in performance between a Scrum team just forming and a team that has done 10 or 20 sprints together? Can you put your finger on which changes the team made that had a material impact on the team's performance? The chances are that it was either a lot of little changes or one or two larger ones that proved to be significant. How can you definitively determine which changes were significant verses subjective "gut feel"? This article will help explore some tools and methods to assist with determining which changes proved to be significant.
In the last two articles we explored the impact of change brought about by the delivery of the project. These articles have been focused on Organizational Change Management. In this article we will explore the topic of change within the development lifecycle, primarily focusing on Agile/Scrum or Kanban. Once you understand the basic principles of DOE you may also be able to adapt it to other methodologies where there is a repetitive component to the overall process.
Design of Experiments Overview
The concept of an experiment is execution of a procedure within a controlled environment where you are attempting to discover the impact of changing a single variable. In this case, you are changing a variable and observing the result of that change to create the largest possible positive outcome.
Repetition should make you better at doing something. We call it experience or practice. But have you ever wondered why and how you get better? Typically it is because you changed something, consciously or subconsciously, that improved the outcome; you learned either from your mistakes or from what worked well in practice. This process of trial and error is the primary concept around DOE, but with some additional discipline and controls around the way change is applied. The key word is repetitive, as the repetitive nature of the process denotes the controlled environment under which change is executed. DOE is as instinctive as breathing! This article will explore how to quantify those differences between that very first iteration and what the team is doing right now.
Think about your drive to work in the morning, trying different routes or different times to avoid traffic. Each of these differences are input variables, or x's, that you instinctively try to use to improve the result. The result in this case is getting to work earlier, cheaper and safely. These are output variables or Y's. Other input variables may include using a different vehicle, such as a motorcycle, to cut down on the cost and perhaps be able to weave through traffic to improve the commute time. You may choose carpooling if the only result or output variable you are looking for is reduced cost, but that may also act as a constraint due to the flexibility of those in your carpool.
Sometimes the result from changing a variable may not be what you expect or may have a detrimental impact on the result; for instance, getting a ticket caused by taking undue risk. Chalk this up to experience and eliminate it from your list of variables.
Changing more than one variable at a time can lead to some confusion regarding which of the variables had the most impact on the result, or whether an interrelationship between the variables occurred. For this reason, changing multiple variables should be avoided unless the relationship of the variables to the outcome is clearly understood or you are sampling a broad range of variables to hone in on a few that would have the most impact.
You may be okay changing multiple variables if you are using the test as a scalar -- a way to test magnitude -- to determine if a "Big Bang" approach will work. For example, consider a new environment made up of multiple upgrades such as new versions of an Operating System, tools, and interfaces, all being tested along with the application. It might be beneficial to try an all-up test with all of these components in the controlled environment to determine the level of overall compatibility. However this could result in throwaway work if the results of the multivariate test prove catastrophic. The experiment may result in being unable to determine which component caused the incompatibility (too many x's at the same time).
Below is a simple mathematical formula to illustrate this concept. The Y is the result or output variable of your experiment, where the x1, x2, etc. are input variables. The concept of the Big Y is a combination of the overall result from the various Y's.
In the example above, if cost is the Big Y, changing the timing of your commute decreases the cost of fuel because you are avoiding traffic. By carpooling, you are sharing the cost of the fuel with others in the vehicle. In both experiments you are reducing the cost of your commute, but which Y has the greatest impact on the Big Y? The x's in this case can be the effect on fuel consumption of the time of the commute, the additional weight of the occupants if you're carpooling, or additional driving to pick up those occupants.
Uncontrollable Variables (z1, z2, zn) are outside your control but still impact your Y. For example, if one of the occupants of your carpool is late or sick, that impacts the result of the commute. Other examples include vehicle breakdowns or weather-related delays. For the scope of this article we will remove Z's from the equation. Typically they are used to eliminate candidates for improvement since the team has no control over their impact on the output of the experiment. But it is important to know they exist since it is necessary to determine their impact on the x's and Y's for the selection process.
There is a direct interdependency between the x's and Y, so changing one variable at a time will determine the impact of that interdependency as long as no other variables have changed. If more than one variable changes it may no longer be clear which variable impacted the result.
Agile Adaptation: Why?
Inherently, Agile methodologies such as Scrum and Kanban focus on the repetitive nature of work delivery within a controlled environment where the execution progress can be monitored. Therefore from Sprint 1 through Sprint n how does the team improve its performance? The magic happens when the team understands or can quantify how they approach the work, along with subtle changes each of the team members make based upon their own experience.
One point to consider is that the nature of the work being performed within each sprint by each of the team members can be very different. The use of subjective estimate measures such as story points can further add to the challenges of applying this concept to unique work. This article will focus on the mechanics of how the team sets up the controlled environment and executes the repetitive subprocesses within that environment, rather than take on the complexities of the work delivered, which are not repetitive or unique in nature.
In Agile, two primary documents to look for opportunities for improvement are the Working Agreement and Definition of Done (DOD). In a Working Agreement, the team is setting up the controls and Standard Operating Procedures (SOP) around the controlled environment, which is akin to input variables -- such as the frequency of or participants at a daily scrum. Some other examples could be the duration of the sprint, when planning will occur and with whom, or carrying out a retrospective at the end of a sprint. In Definition of Done, the variables could be the level of testing that occurs between deliveries at a Build Verification Testing (BVT), efficacy of a test harness derived from requirements for IT QA, or business UAT level testing to identify bugs. Another example is the exit criteria for sign off and what evidence is required for acceptance. The input variables from a DOD or Working Agreement have an impact on the big Y, which is the quality of the product you deliver.
The Importance of Retrospectives
How do you know the team is improving?
- Is it delivering more story points, measured by velocity?
- Is it delivering fewer post implementation defects?
- Is each team member delivering more because the overall team is more efficient at resolving issues or working together as a team?
- What is the customer's confidence level in the product?
How can you measure actual improvement? The first two points are quantifiable over time, whereas the third and fourth points are subjective, but may be just as important. They may be measured by the result of the first two. If the Big Y is delivering value to the customer or gauging the confidence of the customer, the small Ys are number of story points and fewer defects in the product.
The retrospective meeting plays a significant role in identifying opportunities for improvement. This is typically the only time when the team gets together to solely focus on what went well and what could be improved. At this time the team can then prioritize what improvements they want to make or what subprocesses they want to stop doing. (These are the input variables.) Some teams stop short of making retrospectives work by merely documenting these changes and not taking the suggested corrective action.
In my experience, applying some discipline by creating a user story around the corrective action or improvement and identifying how to measure the impact of applying that corrective action or improvement over time has been very beneficial. This is where DOE shines, as applying the corrective action or improvement is demonstrable by the outcome at a measureable or objective level. Did it work? If it did, update the Working Agreement or DOD if applicable to institutionalize it as SOP for future sprints. By applying DOE principles, it is easy to demonstrate the value of the change.
The Pareto principle or 80/20 rule is a method of quantifying the impact of the input or output variables on the overall Big Y. As discussed earlier in the driving to work example, think of which variables had the most impact. Was it the route you took or was it the time of day you made the journey? Instinctively you know what they are, but they can be more subtle in more complex environments.
Once a Y has been determined, the team can prioritize improvement opportunities -- input variables (x's). As mentioned earlier, if the input variables are mutually exclusive of each other or the interrelationships of the x's are clearly understood, it may be acceptable to change more than one input variable at a time for expedience. But in most situations it is highly recommended that you stay with one variable change at a time so the outcome is clearly measureable.
- The left Y axis shows the level of impact of an individual input variable and is measured using a stacked histogram, which is sorted high to low.
- The left Y axis can be a percentage or count of a given product (e.g. number of defects, story points, or hours resulting from remediation work). The right Y axis is the cumulative value of all the input variables, which adds up to 100%.
- The Cut Line is an arbitrary line drawn to determine how much of an impact the team is looking for to make the improvement worth the effort.
- The Cumulative Curve becomes important based upon where the cut line is drawn; anything to the left of that curve is a candidate for improvement.
Although instructions for creating your own Pareto chart are beyond the scope of this article, there are many templates available on the internet, for example this one by ASQ (Excel). There are also many tutorials on how to make your own template such as this one or this one on ProjectConnections.
Another related area where the team may experiment with a new technology or technique is called a Spike. A Spike is a story or task that is used to answer a specific question or gather information to solve a technical or design problem. You can think of a Spike as its own experiment, which may have one or many input or output variables.
Usage across the SDLC
DOE is best used in a controlled environment with a repetitive process, therefore DOE is primarily relevant in the Execute/Deliver phase of the SDLC. There are, however, other touch points across the methodology where DOE has some relevance.
Methodology note: This article is not a prescriptive approach to using Agile/Scrum or Kanban. In Phase Gate Agile, Initiate and Plan can be handled as a waterfall project. The Agile phase starts in Delivery where sprints or the Kanban board burns down on the backlog resulting from the Plan phase.
Alternatively, much of the Initiate and Plan phases can be rolled up into a Sprint Zero, which is unique and not a good candidate for DOE unless the concept of a Sprint Zero is used between releases where you can concentrate on a big bang approach to your DOE efforts.
A project could be initiated because of the result of a DOE at a core business process level showing that the process is deficient for some reason. This concept is a core tenet of Six Sigma. The outcome of market data may drive the requirements for a new capability (considering the market share to be the Big Y). The delivery of the product and its efficacy would then become the output variable into the business Big Y for increased sales or market share. Initiate is also the place where the output variables (Y's) are determined.
In the plan phase, Sprint 0 may occur; the team working agreement and DOD are created if they do not already exist. It is also an opportunity to make revolutionary changes to the controlled environment through the execution of Sprint 0, as well as setting a baseline for team capability by estimating when the project will be completed. KPIs (Y's) are typically also determined during this phase.
This phase is where the team tweaks its input variables, mostly in the first sprints after Sprint 0. Once the team has gone through the "Storming, Norming and Performing" lifecycle and honed their Working Agreement and DOD the number of changes should diminish, unless it is evident that improvements still are necessary. Each sprint retrospective is an opportunity for improvement and a measurement point for team performance at both a subjective and an objective metric-driven level.
Closing provides the opportunity to look at a retrospective from a macro level across the duration of the project. Closing provides a view across a longer lifecycle, to more accurately quantify the impact of one or a set of input variables on individual output variables or the overall Big Y. A similar view can be achieved by breaking the overall project into releases, each of which could be used as opportunities to refocus the small Y's and Big Y.
Team retrospectives can be done in two phases of identifying opportunities for improvements. From that list the team can select which improvements it wants to select.
- In the second phase, the team can filter the most important improvements and then do a root cause analysis (RCA) using a Fishbone or Ishikawa diagram. RCA determines if the team has the ability to effect change so it can be used as a further improvement selection scoping and analysis tool. Remember to rule out the Uncontrollable input variables (z's).
- A payoff matrix can also be used to determine which solution to choose if there is more than one option. It can be used alone or in conjunction with a Fishbone diagram. We will explore both of these in a later article.
DOE is a systematic method of identifying and managing the impact of change within a controlled environment by observing the impact of input variables against the result or output variables of the experiment or improvement opportunity. The impact of the changes can be shown using a Pareto chart -- a staked histogram that will help the team determine improvement candidates for fixing a defect in the process.