Holistic Methodologies: Odd bed partners (Six Sigma and PMLC/SDLC), but Harmonious Relatives
Measurement System Analysis (MSA) –
The art and science of datadriven decision making
This column is the fifteenth in a series.
Part 1 Why Do We Manage Projects?Part 2 Failure Modes and Effect Analysis (FMEA)
Part 3 SIPOC/COPIS
Part 4 Stakeholder Risk Analysis Using Forcefields
Part 5 The Softer Side of Change  GE CAP Model
Part 6 Design of Experiments (DOE)
Part 7 Quality Function Deployment
Part 8 Why Oh Y Does X Mark the Spot?
Part 9 PokaYoke without Getting Egg on Your Face
Part 10 Special K (Kaizen, Kano Model Delivered through Kanban)
Part 11 TPM – Total Productive Maintenance
Part 12 Project or Process? That Is the Question!
Part 13 The Value of Cycle Time Analysis
Part 14 Value Stream Mapping (VSM)
Part 15 Measurement System Analysis (MSA)
Part 16 Voices from Unexpected Places, Part 1
Part 17 Voices from Unexpected Places – Part 2, Sources
Overview
Statistics and metrics can be a tool or a weapon, just as dangerous as any gun or knife depending on how they are used. Statistics have been used to justify malfeasance and alert the public to risks such as global warming. Al Gore used statistics extensively in his Oscarwinning documentary An Inconvenient Truth. In this documentary, statistics were used to convey powerful messages about population growth and global warming. These messages were specifically targeted to influence the audience and provide a value proposition for change. Project managers ostensibly are in sales. We use metrics or statistics to demonstrate some value proposition such as return on investment (ROI), cost of risk, reducing total cost of ownership (TCO) and the delta between current and future state as part of a gap analysis. The common denominator between the documentary and our role is the use of metrics and data to influence the intended audience.
The maturity of an organization can be measured by its ability to use metrics and statistics from its core processes. These statistics can then be used to make tactical and strategic decisions, also known as datadriven decision making. We will explore why the art and science of gathering metrics and statistics is critical, and how variance is the true enemy of the credibility of your information supporting your value proposition.
Metrics or Statistics
What is the difference between metrics and statistics, since we are measuring? The definition of metrics derives from the Greek word metron meaning to measure, hence the word for the device thermometer to measure temperature. For our purposes, a metric is a standardized method or the science of measuring a value, whereas statistics is the science which deals with the collection, classification, analysis and interpretation of metrics using mathematical theories of probability to derive some meaning. In a nutshell, metrics are the measurements and statistics are how you turn the measurement into information used in datadriven decision making. An example would be a measurement of high and low temperatures over the past 20 years. A statistic from this set of measurements would be an average of those metrics showing warming, cooling, or no trend at all. I would use that statistic to determine if I should buy an AC unit for my house if the summers are getting warmer. I would need a baseline to identify the tipping point where the air temperature and humidity was uncomfortable on a significant enough number of days to make the cost of the AC unit justifiable or not.
Article Goal
Metrics and statistics are in themselves a very broad subject with numerous tangential subtopics, many beyond the scope of this article. The goal of this article is to create awareness of not only the choice of the metrics you gather, but how you use those metrics to support a value proposition through statistics. Before we go too much further and to manage your expectations, this isn't going to be an article on complex statistical models or other mathematical formulae. We will focus on variance within your dataset and the importance of normalized data. The key takeaway is that a good dataset should be normalized, meaning it is repeatable and reproducible by the methods used to gather the data points.
The Enemy in Our Midst  Variance
In the AC example mentioned above, we have both empirical and subjective metrics. The empirical metrics are the temperature measures over the past 20 years. The subjective measures are my baseline for what is uncomfortable and the number of days I can tolerate being uncomfortable. With subjective measures, variance is everywhere and is imposed by my own rules, which I can change at any time. One would think that empirical metrics are immune from variance, but even metrics such as temperatures are subject to variance or bias:
 Was the temp taken at a specific time of day or was it the high or low for the day regardless of time?
 Was the high temp taken in the shade or direct sunlight?
 How accurate was the thermometer?
 How much variance can we tolerate to derive a different meaning from the metrics?

Do we use all the metrics or just the values that suit our needs?
If I was trying to convince my wife to buy an AC unit, I might use just the hottest summers out of the past 20 years and ignore the cooler ones to create my average number of days at a certain temperature. I might also suggest that any temperature above 75 F is too hot. (I have left relative humidity out of this example for simplicity.) As you can see, even a simple example can become complex. Both metrics and statistics can be used to drive different conclusions even with the same dataset of metrics.
Measurement System Analysis (MSA)
Measurement System Analysis (MSA) to the rescue! MSA is an experimental and mathematical method to determine the variance level in a measurement process. MSA confirms the quality of the measurement system by its stability, linearity, accuracy, and precision. In other words, it sets rules by which the metrics will be gathered to drive variance out of the data gathering. The goal is to arrive at a normalized data set. There are some key words in the definition we should explore further.
Bias
Bias is just another term for accuracy in the model. We will discuss accuracy later in the article.
Stability
How much variance is there in the way the metrics are gathered? What are the variance factors that can influence the variance in the metrics and resulting dataset? How repeatable or reproducible are the methods for gathering the metrics?
Repeatability
Governs the variance created by the same appraiser repeating the same measurement using the same measurement device.
Reproducibility
Governs the variance created by different appraisers who gather the same data using the same measurement devices.
Precision vs. Accuracy
Primarily the difference between consistency (precision) and being close to the target value (accuracy), best described in the image below.
This distinction is very important since our goal is to have accuracy and precision in our analysis and having one without the other can cause its own set of variance in the analysis.
In my previous article I introduced the concept of MSA, which I will repeat here as an overview and as a stepping off point to go into further detail.
Within any process, the method of gathering metrics is just as important as the actual metrics being gathered. The metrics being gathered can be skewed by one or more of the following characteristics:
 Measurement approach  frequency/size of sample, units of measure, step in process, etc.
 Measuring device  objective (count, dimension etc.) or subjective (visual, cosmetic, expert judgment, etc.), ruler, scale, thermometer
 Measurement process/procedure  measurement of the procedure can be impacted by the training, skill, and care of the operator (does Operator A capture the measurement the same way Operator B does?)
 Measurement interaction  impacts by external forces, temperature, humidity, light levels, etc.
 Accuracy/Precision of MSA  a calculated level of uncertainty in the overall MSA model (all measurement systems have a certain degree of error)
Measurement Approach
With any metrics, there must be a judgment call on how much data is sufficient to support a decision or hypothesis. In IT, sometimes the metrics are not always readily available or the raw data set is too large or too small. A conscious decision has to be made to set some rules governing the sample size, frequency, and units of measure. It is important to set operational definitions around those decision points to remove bias between data gathers (appraisers). An example might be total cost of ownership (TCO) for a printer: Will it include the cost of toner, paper and repairs? It is important to set a goal for how you will use the metrics before making the effort to gather them. There is nothing worse that gathering metrics that do not support your value proposition or metrics that are not repeatable or reproducible.
Measurement Device
For the printing example, how do you measure the TCO? The metrics from the printer in the number of pages printed would be a simple measure of cost per page (cost of the consumables such as paper and toner). Other measures could be how many times the printer has jammed or had to be repaired (cost of those repairs). What is the mean time between failure for a particular model to determine the cost of failure and frequency of replacement? How many of those pages were unusable due to fading from running low on toner or a poor quality output that had to be reprinted? All of those and other measures could be used to derive TCO, but only you can decide which metrics are reasonable to include.
Measurement Process/Procedure
Some printers have a counter built in to show how many pages have been printed. Some printers show hours of operation and some may even show the incidence of error codes from jams or other faults. A combination of these and other metrics may be necessary to show a valid TCO. Counters built into printers will have little or no variation as there is a repeatable and reproducible measurement process. Counting the number of reams of paper or toner cartridges consumed per month can create variance and may be less precise. When defining an MSA, it is important to identify the method used to gather the dataset and be consistent in its usage. Even measuring just the consumables consistently may provide valid metrics of a level of accuracy and precision to support your value proposition.
Measurement Interaction
If the method of measuring TCO by consumables alone was used, you would need to determine if duplex printing was being used, cover sheets for reports, and the type of images being printed. For example, graphics/photos may take more toner than printed text, paper might be used for purposes other than printing (such as taking a ream of paper to raise the height of your monitor). Measurement interaction variance can be minimized in some cases by the duration of the measurements being taken or by increasing the sample size across multiple printers.
Precision and Accuracy of the MSA
So how good does your MSA have to be to support your value proposition? Every MSA will have a degree of uncertainty. Getting to Six Sigma (3.4 defects per million defect opportunities) may not be necessary to make a decision on whether to buy a new printer or not, or the features needed to reduce your TCO to a target level. To make a design decision on a critical aircraft part or medical device, the need for precision and accuracy become more critical due to the impact of their tolerances. You just have to remember that in every MSA there will be a degree of uncertainty and acceptable variance.
Other Assessments of Your MSA
Bias Assessment
Does the method or person taking the measure add a bias in it that could skew the measurement? An example could be if a printer is used for graphics printing only and the bias would be the ratio of graphics pages versus text pages. The process variation could be the number of like printers in the environment.
Bias percent = Bias /process variation
Repeatability and Reproducibility Assessment (Gauge R&R) or ANOVA
We touched on repeatability and reproducibility above. Below we will delve a little deeper into what it means to a normalized dataset.
ANOVA (Analysis of Variance) is also another name for measurement of variance in an MSA.
This assessment provides transparency into whether your MSA is capable of discriminating between different metrics, how much variance in the system is caused by factors such as measurement interaction and process, and how much variance in the MSA is due to process variance.
Below is an approach that can be used to conduct a Gauge R&R assessment:
 Determine the number appraisers and devices being used to gather the metrics (people or devices, time period, etc.).
 Confirm that "some" of the appraisers are trained to take repeatable and reproducible measurements (which is your control).
 Ensure that all the appraisers gather metrics across the population of measurement points during the time period.
 Capture your metrics across all the appraisers chosen.
 Analyze the metrics to see the range and average between the high, low, standard deviation, and mean.
 Determine if the variance will meet the accuracy and precision needed to support the use of the MSA.
The outcome of a Gauge R&R analysis is a measure of variance in the MSA that you can use to determine if that variance is acceptable or not.
Linearity
Linearity in the model is derived by variance being introduced when the sample size gets very big or very small. An example would be placing a onepound weight on a typical bathroom scale calibrated to weigh people. Would the scale give you a measurement of one pound or not? Any variance at the one pound level versus the known variance for the weight of the person would be linearity variance.
Stability
Stability is a measure of how repeatable and reproducible the model is over time or how accuracy and precision vary over time due to variance in the measurement device. This variance can be caused by wear in the measurement device or temperature or humidity for a physical device.
Template
Included with this article is a simplified template to get you started in a Gauge R&R study. You can adapt it to the number of appraisers, trials, and observations in your study. I also provided an example to illustrate the results and some tips on how to customize it.
Please note that the "Target agreement" and "Overall confidence levels" in the template are purely arbitrary and should be based on the level of precision you are seeking in the study and resulting dataset.
Inputs
 The scope of the MSA is where you provide a description of what, how, when, and where in the process you are measuring.
 Appraisers and trials performed are used to ensure that variance is removed from the appraiser performing the trials and number of trials taken by each appraiser.
 Count of observations must equal the number of observations taken in a given trial or null values will impact the dataset.
 Target Agreement level addresses the precision of what is being measured.
 Overall confidence level is the accuracy level you are trying to achieve.
 Observations are the actual data points gathered during a given trial.
 Trials can be specific printers, or other components in the environment. The components must be the same each time. The measurements can be taken at different days, if the measurement method involves a physical device, but the same conditions should be used for each trial. It is important to have at least two trials to demonstrate stability in the MSA.
Results
Repeatability
 Count OBV  Number of observations in a given trial (Must be consistent across trials)
 Count in Agreement  Out of the OBV count, how many are in agreement across the sum of the OBVs?
 Meets Target  Is the percent in agreement within the repeatability target set?
Reproducibility
The table values in this section are similar to repeatability in that they address the repeatability target set, but the difference is a comparison between each of the possible combinations of appraisers.
 Appraiser 1 vs. 2, 3, 4, and 5
 Appraiser 2 vs. 3, 4, and 5
 Appraiser 3 vs. 4 and 5
 Appraiser 4 vs. 5
Summary
This article just scratches the surface of this topic. You have learned how the art and science of building a dataset come together. By now you have some knowledge of the types of variance that can exist within or creating a simple dataset. You now know how critical removal of variance is in the dataset. The confidence level you can demonstrate in your dataset can bolster your argument to influence others into your value proposition. If this article is successful, you will know how lies and statistics can be differentiated. You may never look at a column of numbers the same way again and can now question how the data was gathered using repeatable and reproducible methods. Statistics and data can be an asset or an inconvenient truth!
Further reading
https://www.moresteam.com/toolbox/measurementsystemanalysis.cfm