Holistic Methodologies: Odd bed partners (Six Sigma and PMLC/SDLC), but Harmonious Relatives
TPM – Total Productive Maintenance
Back in the days of the Great Depression, President Roosevelt introduced the "New Deal" under which the Hoover Dam was built. President Eisenhower funded the interstate highway system through the Federal-Aid Highway Act of 1956. The results of both acts are amazing pieces of engineering and critical infrastructure in the US. But even state-of-the-art engineering requires ongoing maintenance. Today, over 65,000 bridges across the US are in disrepair, and the failure of one of these critical pieces of infrastructure often has catastrophic results even when it's not precipitated by flood or earthquake.
In the case of the Hoover Dam, an entire cement works was built on site to provide materials in a timely fashion, so a production line of sorts was implemented to drive efficiencies into the construction. For the interstate highway system, mile after mile of land had to be procured and graded, and bridges and tunnels constructed. Both products would have benefited from a Total Productive Maintenance (TPM) solution, which would have provided a framework to build and maintain the construction infrastructure.
Total Productive Maintenance (TPM) is a broad-reaching methodology focused on manufacturing and maintenance engineering processes. Holistically, the primary objective of TPM is to increase the productivity of plant and equipment while making appropriate (small) investments to maintain productivity and health.
Similarly our IT infrastructure needs constant maintenance to keep it running, current and patched to prevent nefarious entry. Regretfully, this is often a place where that appropriate investment is lacking, leaving teams to play catchup to support systems that have long outlived their most productive years. Updating these systems is considered too expensive, too complex, or not considered important enough, resulting in an "if it ain't broke don't fix it" mentality. In some cases there are zombie systems that just never got turned off and the tribal knowledge of why they existed in the first place has left the building. Software development lifecycles also have similar needs to keep tools and skills current in order to deliver quality and value to their customers.
Investment in TPM supports the philosophy of making appropriate investments to stay current with infrastructure and applications. Almost weekly we hear of some government agency or Fortune 500 company being hacked or an outage impacting a broad range of customers. Getting current and staying current is also an investment in managing operational risk around the complex systems we manage.
TPM is a very broad subject, and this article can only scratch the surface. I strongly encourage you to review the "Further Reading" section, where there are links to more exhaustive articles on this topic that are specific to manufacturing.
The many years I have spent working in IT using project management and Six Sigma methods have shown me that there are many direct and strong analogs between manufacturing applications and IT, especially when dealing with operations or maintenance of legacy systems for core business customers. Many of these concepts also apply in the case of development of products for a consumer market where technology is the product, to the tools and development platforms used to create and ship these products.
TPM and TQM (Total Quality Maintenance) are core operational components of an overall quality management system. This system is made up of Eight Pillars that we will explore in this article.
Pillars of TPM
Fundamentally, these eight pillars are the foundation of proactive planning and preventive maintenance to provide a baseline of stability, capability and performance for manufacturing processes.
Five S Foundation
At the foundation of the Pillars of TPM there are the Five S's:
- Sort -- Sort out and determine what is needed in the manufacturing area.
- Straighten -- Place items in a logical arrangement so they are easy to find and ready to use. Clearly note where they are to be stored when not in use so they can be returned.
- Shine -- Make sure that the workplace is clean and the equipment is in good working order to perform the task.
- Standardize -- Make sure that the first three S's are practiced frequently to remove any special cause variance in the manufacturing process.
- Sustain -- Maintain the rules and standards with a focus on continuous improvement.
Below are some analogs to IT. (Nothing here should be surprising for any mature IT organization.)
- Sort -- Determine what is important to the customer and what tools are needed to develop and maintain functional value.
- Straighten -- Make sure that the development or maintenance environment is sustainable and there are controls around how product capability and support is delivered/provided to the customer.
- Shine -- Manage defects and possible health issues for the application (reliability, performance, data quality and capability).
- Standardize -- Use sound software engineering practices, including naming conventions, development tools, test harnesses, and error handling within the applications.
- Sustain -- Manage the error log of defects during and after deployment as input into an enhancement list for the next release.
Pillars of TPM
Focused Improvement (Kobetsui/Kaizen)
The core tenet behind Focused Improvement is to maximize the overall effectiveness of equipment, systems and processes by elimination of losses and continuous improvements in performance. An analog in IT can be how we manage technology in a data center through load balancing or identification of critical failure points in our business applications such as interfaces and feeds. Another example could be making applications more fault-tolerant through error handling or algorithms which deal with potential show-stopping exceptions.
Within this pillar there are six zero breakdown measures (using an IT analog):
- Establish basic equipment conditions -- What are the Standard Operating conditions for usage of hardware or software?
- Comply with conditions of use -- Set up service level agreements (SLAs).
- Restore Deterioration -- Analyze breakdowns or failure points and restore to working order within SLAs and with operational norms.
- Abolish environments causing accelerated deterioration -- Identify special cause issues that are causing chronic breakdowns or outages. How often has an outage happened, was it fixed with a work around or fixed to prevent it happening again? Was a true root cause established and validated?
- Correct design weaknesses -- Identify and correct known exceptions. How much of the design can be pushed back into the business process vs. fixing it through complex algorithms that may be difficult to maintain in the future? Does the design post-implementation work?
- Improve operating skills -- Ensure that the actors in the process or application users know how to work within the conditions of use. Do they receive appropriate training on new features, not only for existing actors, but also new actors to the process?
This pillar primarily focuses on routine maintenance of the environment such as lubrication and cleaning of equipment by the operators rather than more in-depth maintenance performed by dedicated staff. In IT, an example could be users reporting error messages, or taking corrective action when data may cause an abnormal end to a batch program. Another example could be involvement of the Product Owner overseeing the implementation of a new feature and tracking its first few uses to provide feedback to the development team. This can also be looked upon as a part of preventive or predictive maintenance; both concepts support the first pillar of Focused Improvement.
This pillar is obvious by its name. Most equipment and many business applications require some down time for maintenance to be performed. It could take the shape of an enhancement/version upgrade, replacement of a power supply that is creating error messages on a server, or upgrade of firmware on a network appliance to patch a security hole, a mainframe IPL during a maintenance window to make needed updates to an environment, or even a time change.
This maintenance is typically carried out by skilled or trained professionals who will perform such maintenance during a planned outage and restore the environment back to prior or improved performance levels. This concept is also to increase meantime between failures, but can sometimes lead to failures if not performed correctly or introduce an unforeseen defect.
Quality maintenance deals with a concept I covered in a prior article, Poka-Yoke, by targeting quality issues with products and systems in the pursuit of reducing future defects. It focuses on the concept of fixing the problem before it becomes more expensive to fix later. The team looks for failure points using Failure Mode and Effect Analysis (FMEA) -- "what can go wrong and what would happen if it did?" -- to determine what preventive maintenance should be performed before an event happens. A significant tool in Quality Maintenance is inspection to seek out potential failure candidates. An example could be looking at error logs for warnings or errors that have occurred, followed by root cause analysis to determine the failure point with an implemented remediation path.
Cost deployment is a component of World Class Manufacturing (WCM). One example of its usage is by the Fiat Group Automobile Production System (FAPS). Fiat uses a financial model to reduce waste and optimize efficiencies in the manufacturing process. It holistically looks across the Eight Pillars from a financial perspective. Some core components of Cost Deployment include:
- Baseline total cost of Processing (Ownership)
- Identification of losses or waste -- use a matrix to identify the sub processes they occur within in order to identify elimination methods within the sub process
- Identification of the relationship of the type of loss or waste and qualitative T-shirt sizing of the loss or waste cost
- Transformation of the waste qualitative measures into quantitative costs (Capitalized and Operational components)
- Identify which TPM pillar they belong to, which pillar will be able to control the elimination of that waste or cost
- Creation of a portfolio of projects to address the highest waste/cost candidates. Pareto and Payoff matrix tools can be used effectively with prioritization of the remediation projects.
- Implement continuous improvement to prevent the costs and waste from creeping back into the process.
Fundamentally, Cost Deployment is a tool to identify the return on the investment in TPM improvement projects and is quantified by a retrospective ROI and use of Control Charts to monitor defects post improvement. Cost Deployment can consist of soft components when only a T-shirt size can be calculated and hard cost components when they can be tied to actual savings or benefits. An IT example could be in call center or big data processing, where even small efficiency gains can have significant payoffs when scaled up.
Early Equipment Management
Building on the earlier pillars, the focus of Early Equipment Management (EEM) is reducing development lead time for new product development by taking best practices from equipment, tool and engineering designs already in use. Early Management also incorporates Early Product Management, where the emphasis is on the product design and delivery approach rather than the equipment or processes used to deliver the product. Overall, the focus of EEM is to address potential failure modes that could be exposed by a new vertical process or product. Think of it as proactive risk identification and response from a product and equipment standpoint.
An IT analog could be a development team taking a proactive approach by taking the lessons learned from a prior release retrospective. The next step would be to conduct a look-forward by analyzing the potential for risks and issues to reoccur in the next development lifecycle. The team would then implement a plan to address those risks or failure modes based upon their likelihood of occurrence and impact with a continuous improvement goal in mind. As with all continuous improvement methods, each risk should be given an owner and their implementation strategies should follow guidelines under the team working agreement. These activities would be on top of a more traditional project risk management lifecycle.
A more specific example could be if a development team had difficulties with its bug glide towards the end of their last release, which caused them to miss their ship date. They may take steps to analyze the types of defects that caused the slippage and tighten up the engineering disciplines to address those specific root causes, such as staffing, skillsets, code walkthroughs and practices, test case coverage, enhanced specificity around functional and technical requirements, etc.
Training and Education
The focus of this pillar is to reinforce the knowledge of the actors in the manufacturing processes, such as machine operators and maintenance personnel, to use the best practices of the TPM pillars across their roles in support of a TPM holistic environment. This approach also encourages management to provide coaching and mentoring of team resources as well as facilitating the drive towards ongoing maturity of the manufacturing processes.
From an IT perspective this pillar could consist of a training plan or access to education in new industry tools and trends, for example building skillsets around cloud computing or hardening strategic business applications against cyber threats. Another example of this pillar is being a "student of the business," where the IT development team learns the business processes and how better to instantiate the functional requirements into the business application.
Safety Health Environment
This pillar may not have a clear analog when working with software development, but the analog is clearer when dealing with hardware in a datacenter or data closet where electrical and other hazards exist. Another example could be in highly regulated medical equipment or avionics software development, where software defects could create life-or-death situations. When working for a Company that delivers global chemistry solutions, health, safety and environment are of paramount importance and should always be a laser focus of everyone in the company. Group meetings in these organizations often start with a "safety share." Many of the safety shares are related to everyday life situations.
Whether you are building a bridge or a dam or maintaining legacy code on a production system, there are elements of TPM that can provide benefit to the integrated development environment your product is built and maintained within.
Getting current and staying current with tools, training/knowledge and infrastructure, and understanding overall costs to maintain a safe and productive environment are just as necessary in IT as they are in the manufacturing arena. TPM, through its eight pillars and the 5S foundation, covers a broad range of topics, many of which may seem to be just common sense on the surface. The strength of TPM is creating a framework where those concepts are formalized into an executable structure. ITIL performs a similar role in the IT industry. TPM is yet another example of a holistic methodology, which has numerous analogs in the IT industry.