Reliability incidents can be defined as failures of plant and equipment that lead to any kind of loss, or increased risk to the business. In capital-intensive industries, the categories vary in terms of exposure and likelihood. Typically however, they include the following categories:

Threat to safe operation,

Threat to the environment,

Threat to the commercial viability of the company,

Loss of customer satisfaction,

Loss of production or failure to complete the mission,

Breach of security, and

High repair cost.

The process of incident management is to identify and resolve plant or human failures that result in greater exposure or the loss of any of the above. Ideally, organisations would take steps to remove such risks before they occur, however in practice, predicting every risk and reducing each of them to an acceptable level is a very difficult thing to do.

Many organisations have found that they can create a focussed maintenance strategy for all equipment (critical and non critical) within 12 months by taking a review and rationalization approach. The problem with most maintenance strategy development activities is that information is never perfect and assumptions are made. This means that the maintenance strategy is a living program which needs incident management to improve it as better information comes to hand.

In addition to incorrect assumptions, there are other factors that could cause unexpected equipment failure. Some of these factors are as follows:

Temporary repairs installed and not removed,

Maintenance error caused by poor training or lack of adherence to procedures,

Maintenance not being done on time,

Incorrect operation of equipment, and

Installation of damaged or faulty parts.

The process undertaken to review reliability incidents is relatively simple and quite common place. It follows a typical investigation cycle found in many problem solving techniques. At a high level, the generic process that we prefer to use has seven steps which are listed below:

Originate

Allocate Analysis Responsibility

Analysis and Recommendations

Approve

Implement

Review, and

Close

As this approach is common, it is not considered necessary to discuss each step. However, it is worthwhile expanding on one unique aspect that pertains to Reliability Assurance. When reviewing equipment failure, there is a specific process flowchart that we recommend should be followed. The process is shown below in Figure 4.

The starting point [F] is any unexpected failure that has occurred in the plant. The first step is to define the failure mode or mechanism of failure. Following this [Failure Analysed?], it needs to be determined if this failure mode has been analysed previously using RCM / PMO logic. If it has not [N], then it should be put through an RCM / PMO2000™ analysis [Apply RCM / PMO2000™ ]. If it has been reviewed [Y], then the validity of the previous review needs to be assessed against the fact that the failure has now occurred unexpectedly [Failure Prevented?]. The previous analysis may have recommended a “No Scheduled Maintenance” policy in which case, the\ outcome was expected and no further action need be taken except if the failure has now become more of a problem than originally thought [Increasing problem?]. Then modifications and a revision of the RCM / PMO2000™ should be undertaken based on the decreased reliability.

incident
Figure 4: Defect Elimination flow chart

If, however, the recommendation was for preventive maintenance and the preventive maintenance has failed [System Downfall], then the source of the problem needs to be identified and rectification action taken.

Clearly, to undertake this work, the organisation needs to have an efficient means of retrieving the maintenance strategy for any given failure mode. Once again, the need to conduct either RCM or PMO2000™ before deploying an incident management system is shown.