Threat to safe operation,
Threat to the environment,
Threat to the commercial viability of the company,
Loss of customer satisfaction,
Loss of production or failure to complete the mission,
Breach of security, and
High repair cost.
The process of incident management is to identify and resolve plant or human failures that result in greater exposure or the loss of any of the above. Ideally, organisations would take steps to remove such risks before they occur, however in practice, predicting every risk and reducing each of them to an acceptable level is a very difficult thing to do.
Many organisations have found that they can create a focussed maintenance strategy for all equipment (critical and non critical) within 12 months by taking a review and rationalization approach. The problem with most maintenance strategy development activities is that information is never perfect and assumptions are made. This means that the maintenance strategy is a living program which needs incident management to improve it as better information comes to hand.
In addition to incorrect assumptions, there are other factors that could cause unexpected equipment failure. Some of these factors are as follows:
Temporary repairs installed and not removed,
Maintenance error caused by poor training or lack of adherence to procedures,
Maintenance not being done on time,
Incorrect operation of equipment, and
Installation of damaged or faulty parts.
The process undertaken to review reliability incidents is relatively simple and quite common place. It follows a typical investigation cycle found in many problem solving techniques. At a high level, the generic process that we prefer to use has seven steps which are listed below:
Originate
Allocate Analysis Responsibility
Analysis and Recommendations
Approve
Implement
Review, and
Close
As this approach is common, it is not considered necessary to discuss each step. However, it is worthwhile expanding on one unique aspect that pertains to Reliability Assurance. When reviewing equipment failure, there is a specific process flowchart that we recommend should be followed. The process is shown below in Figure 4.
The starting point [F] is any unexpected failure that has occurred in the plant. The first step is to define the failure mode or mechanism of failure. Following this [Failure Analysed?], it needs to be determined if this failure mode has been analysed previously using RCM / PMO logic. If it has not [N], then it should be put through an RCM / PMO2000™ analysis [Apply RCM / PMO2000™ ]. If it has been reviewed [Y], then the validity of the previous review needs to be assessed against the fact that the failure has now occurred unexpectedly [Failure Prevented?]. The previous analysis may have recommended a “No Scheduled Maintenance” policy in which case, the\ outcome was expected and no further action need be taken except if the failure has now become more of a problem than originally thought [Increasing problem?]. Then modifications and a revision of the RCM / PMO2000™ should be undertaken based on the decreased reliability.

Figure 4: Defect Elimination flow chart
If, however, the recommendation was for
preventive maintenance and the preventive maintenance has failed
[System Downfall], then the source of the problem needs to be
identified and rectification action taken.
Clearly, to undertake this work, the organisation needs to have an
efficient means of retrieving the maintenance strategy for any
given failure mode. Once again, the need to conduct either RCM or
PMO2000™ before deploying an incident management system is
shown.