The Science of Maintenance
Figure 4 Mission Reliability
Question: If we dispatch 1000 heavy bombers for an 8-hour mission, what percentage will complete the mission without mechanical failure?
Beginning in World War II, the War Department sponsored a new science called Reliability. Reliability is the science of maintenance. It uses statistics and failure theory to measure, understand and improve the performance of equipment and maintenance. Reliability theory can guide engineers as they design and test new equipment. After equipment has been in service, reliability data tells the maintenance engineer how to improve its performance.
As the Gulf Wars demonstrated, this science has produced outstanding results in defense. This knowledge has found its way into industry very slowly. Most maintenance operations still operate on the principal of "if it ain't broke, don't fix it".
Reliability uses many metrics for evaluating equipment and systems. The original metric, Mission Reliability, answered the question of figure 4. For industrial maintenance, the metric of Failure Rate is usually more relevant. Failure rate is the number of failures per 1000 hours of operation. It can apply to a complex system such as a machine tool or it can apply to a large number of simple components such as light bulbs. This discussion focuses on units of complex equipment.
Failures occur in one of several modes. Understanding modes and what mode is the likely cause for specific failures is important because different approaches or strategies may be more or less effective on the various modes. Table 1 summarizes the various failure modes and illustrates their characteristic failure rates over time.
These occur when the equipment is placed in service and are caused by sub-standard components and/or improper installation. Early life failures occur frequently when the equipment is first placed in service and then rapidly decline.
When the previous failure modes are combined, the result is the "Bathtub Curve", familiar to many.
With all modes combined, the failure curve is the familiar bathtub but with spikes of increased failures at irregular time. Determining the mode for specific failures requires additional investigation and cannot be determined from the failure curve alone.
Metrics help to focus efforts on the most critical equipment rather than reacting to the crisis de jour. They measure progress and help to adjust efforts accordingly. They are critical for identifying and resolving specific problems. Equipment metrics can be surprisingly simple. Only three data elements, collected for each machine and analyzed properly, are really necessary for most situations.
This discussion is about the metrics for machine performance. It does not include metrics of maintenance department productivity, budgeting or cost allocation. Such additional metrics are required to operate a maintenance department effectively.
All these metrics are most effective in graphical form. They are not very meaningful as individual numbers. However, in the context of past and future, trends, anomalies and patterns reveal themselves. Of all of the first four metrics, the most useful, derive from three numbers. Assuming a calculation period of one week, the following questions must be answered:
Calculations and tracking can be further simplified by assuming that each machine is scheduled for about the same production (say 40 or 80 hours) and simply using one week as the time bucket.
Failure rate is one of the simplest and most useful metrics for machine performance. It can be approximated by using a week, month or other convenient period in place of actual operating hours. If data is accumulated on a (say) weekly basis, the only input is the number of breakdowns during that week.
MTBF is also a metric for machine performance. It is the inverse of Failure Rate and is thus calculated from the same parameters. It is a meaningful metric for long periods of time but not suitable for daily or weekly monitoring. If there are no breakdowns in a given period, the MTBF for that period is mathematically "undefined."
Mean-Time-To-Repair is another simple yet valuable metric for industrial maintenance. It reflects both the severity of breakdowns and the efficacy of repair activities.
Availability is the portion or percentage of time that equipment is available for operation. It is commonly referred to as “Uptime”. Availability is another useful metric for industrial maintenance and you will want to track it along with Failure Rate and MTTR. Availability derives from the same data collected for MTTR and Failure Rate. It is easy to calculate.
Reliability is the probability that equipment will complete a mission of length “t” without failure. It is an exponential function. Reliability has limited use for most industrial maintenance although it is important for military and other applications.
Overall-Equipment-Effectiveness is a new metric that has received considerable publicity in recent years. It attempts to capture all the parameters in a single measure. However, the practical application is somewhat limited.
■ ■ ■ ■ ■ ■ ■
SEP 2007 |