Publications, events and an end of year message...

2007 has been very busy for the Modern Asset Management Blog.

At time of writing we are regularly publishing articles and papers for a range of trade publications throughout the world.

Including:

Plant Services.com - where we publish regular monthly columns on Effective Asset Maintenance.

Reliability Web.com - where we publish articles, papers and information pieces on an irregular basis. We are working with ReliabilityWeb to see if there can be a regular column produced for this site also.

Utility Week - The reigning utility industry information source for managers, Directors and senior asset engineering staff in particular. We have contributed the lead piece for UW on their October Asset Management issue, as well as regular contributions to their smaller magazine Utility Engineering.

Maintenance and Asset Management Journal - A UK based publication for the maintenance and asset management communities, where we publish from time to time articles, interest pieces and other information.

Other news:

In other news I have recently been appointed by the steering committee of UK Water Industry Research Limited as an international peer reviewer for their ongoing project
"Capital Maintenance Assessment Process RG/05/A-214". This project is focused on producing an assessment framework for evaluating asset management plans in the submission process of 2009, and has implications for the future of the UK water industry.

This is a great honor for me personally and I hope to be able to add significant value to the process and ultimate outcome of this important project.

A New Book:

I have recently reached agreement with my publisher Industrial Press to produce a second book for them. My first book for this publisher was The Maintenance Scorecard which has enjoyed moderate success as a guide to implementing balanced scorecard style programs into asset intensive industries.

The second book is tentatively titled Asset Resource Planning: lean strategies for efficient asset maintenance. I am very exited about this book and we are planning for it to be published in April of next year. Another great honor for me to be working with this publisher again and I look forward to continuing to work with them for many more years to come.

Speaking Engagements:

After many years of trying I will finally be speaking at one of the many asset management conferences that occur during the year. RCM-2007, run by Reliability Web.com is the premier event globally for reliability practitioners, particularly those engaged in the development and implementation of asset maintenance regimes. I will be delivering a keynote on Thursday or Friday morning (I don't recall at present) called "Beyond RCM" a look at shaking up some of the paradigms and thinking that we have traditionally associated with implementing and "selling" RCM services.

I will also be presenting a half-day seminar at the concurrent conference EAM-2007 on implementing CMMS systems. I am looking forward to this event as I began my career working with EAM and CMMS systems and I particularly enjoy this theme.

I have known Terry for many years now but have never had the chance to meet him personally. I am grateful for this opportunity and appreciate his continuing commitment to the growth of the managerial discipline of asset management.

I have been chosen to present at the annual British Institute of Facilities Management conference to be held in Keble College, Oxford, UK in march of 2007. This is a very prestigious event of which it is quite an honor to be accepted as a speaker. (Similar to the RCM-2007 event) I will be presenting a piece on 4the generation asset management concepts, and how these are able to assist facilities managers to add value to their physical asset base.

Some other conferences of which I am regularly in contact with, but are yet to be confirmed for 2007, include:

I have a lot more planned for 2007, and I hope that you will check back on the Modern Asset Management Blog frequently to catch up on postings etcetera. Also, please feel free to use one of the links on the side to subscribe it into your newsreader or My Yahoo new site.

I wish you all the very best for Christmas of 2006, I hope the festive season is as joyous for all of you as I am planning for my own family, and I wish you all the very best for a prosperous and happy 2007.

Best regards,

Daryl Mather

When is predictive maintenance a burden rather than a benefit?

A monthly column by Daryl Mather, author of “The Maintenance Scorecard”. First published by www.PlantServices.com

Even the most disciplined maintenance regimes can actually be increasing the lifecycle costs of machinery. At first glance this seems counter-intuitive doesn’t it?

Yet as a managerial discipline we seem to be aware of the detrimental effects of over-maintenance. Messing with things that are working fine, without any reason to do so, is a good way to introduce human error, reduce uptime and increase the costs of maintenance. This effect of higher activity costs for reduced performance is symptomatic of the mainly time-based maintenance thinking that most of us moved away from over the last two decades.

But where have we moved to? Today there are a range of techniques, technologies and methods that are shaping asset maintenance. However, more than anything else most of us moved to predictive maintenance technologies.

The basic thinking behind predictive maintenance technologies is; if we can predict failures with undesirable consequences in time then we can plan the corrective action and avoid costly, or dangerous, incidents from occurring. So far so good! And if we can do this without needing to stop the machinery, pull it to pieces and reassemble it again, then we are killing two birds with one stone. Avoiding consequences and increasing uptime. Even better!

In a managerial discipline where most commentators are intent on evangelizing about the latest technology, it is sometimes hard to tell people that if predictive technologies are misapplied they can cost us money, sometimes more than if we had never maintained the item at all.
Predictive maintenance works by detecting early signs of physical degradation of assets. Any system left to its own devices tends to move from order to disorder, its energy tending to be transformed into lower levels of availability, until it reaches the point of complete randomness or unavailability to do work. This is the second law of thermodynamics, and is the scientific basis for maintenance.

In practical terms degradation can be better understood looking at an item such as a bearing. Bearing failures are due mainly to metal fatigue. A good example of metal fatigue is the effect of bending a paper clip over and over until it breaks. The metal within the bearing races, balls and the cage will eventually become fatigued, until finally they begin to crack.

This is when the first signs of physical degradation begin to appear, most notably in the form of vibration. Depending on the severity of the crack they may be immediately detectable by most devices on the market, but more than likely it will take some time before they are detectable.

Once they can be detected we can then start to make judgments on how long the bearing is likely to last before we experience a functional failure. The point where it no longer does what we require of it, regardless of whether it is still working or not.

So in this case we can use vibration analysis to warn us that of a functional failure is going to occur, knowledge that we can use to plan in the corrective action, replacement of the bearing, in a way that avoids or reduces the consequences of failure.

But doesn’t it just beg the question; how did the metal become fatigued in the first place? What are the failure causes that led to this situation, and could these have been avoided? After drilling down into deeper levels of causality it suddenly becomes clear that we could be treating the symptom and not the cause of failure.

Bearings are quite complex items which have a myriad of potential failure causes, some of the more common causes of early failure include:
  • Misalignment between a pump and a motor, or imbalance of the rotating element itself. All of these lead to vibration, uneven stresses, and additional load on certain parts of the bearing. This in turn speeds the process of fatigue.
  • Axial thrust on the bearing. Pushing of the shaft sideways, rather than spinning around as per normal. Common where there are foreign objects passing through a pump for example.
  • Over greasing of the bearing is a commonly quoted failure mode. The reason is because it leads to over heating the grease through reduced lubricant viscosity, weakening the races, balls and the cage, and increasing wear.
  • The load being too far away from the bearing itself, such as with the impeller of a mixing tank, or in some cases where there are extremely long shafts between a motor and a pump. These were once commonplace in water and wastewater pumping stations.
  • And of course, a bad bearing or poor installation of a bearing. Poor installation is a training and quality control issue, one that is easily rectified. But poor quality of bearings is something that I have noticed becoming more commonplace.

There are obviously many other potential causes of failure, but I have chosen these because they are all avoidable through changes to operating practices, small design changes, or more effective maintenance regimes, thus eliminating many of the causes of early life failure.

In these situations, even if we are able to predict the end of life component failure, the bearing will still fail before time; we will have an unnecessary downtime period, and will have to spend the money for a new bearing earlier than we should have. If the reason for this is chronic, something that will repeat itself, then we will just be installing a new bearing back into this short life cycle. Over greasing and poor alignment practices are good examples of this.

This goes to the heart of one of the most common problems when we are developing a failure management program, that of managing the asset not managing the failure modes. Predictive technology is being used in this case to paper over deeper issues, and without performing further analysis it could even look like a success, yet the result is actually a reducing our cost effectiveness!

It has been my experience that to create a truly effective predictive maintenance program; one that delivers minimum lifecycle costs for a given level of performance and risk, then one of the first steps is to identify all of the likely causes of failure at the correct level of detail.

The challenge for your reliability analysts and technicians is to know when they have analyzed the failure to enough detail, and to realize when they are starting to veer into paralysis by analysis. Once the reasonably likely failure causes have all been identified, then we can put in place the failure management strategies. Among these will be changes to operating procedures, quality control procedures, asset designs and configurations, and maintenance strategies, including the correct application of predictive technologies.

Quick wins from operational changes

A monthly column by Daryl Mather, author of “The Maintenance Scorecard”. First published by www.PlantServices.com

As most of us are aware, maintenance is about the cost effective management of assets for a given level of performance and risk. Part of this equation requires failure management policies for equipment that provide the best value for each operational maintenance dollar, a reduction of the risk of critical failures to a tolerable level, and sustainable levels of production or operation.

If only it were that simple! The sad fact is that often many of the failures that maintainers are required to manage are not under their control. Fundamentally, there are three ways that equipment fails; (Discounting failures caused by human error)

  1. Failures caused by wear out of components or degradation due to either use, time or the environment they work in. Corrosion, erosion, wear, chemical breakdown and structural degradation are among the causes of these types of equipment stoppages.
  2. Failures caused by operating equipment out of limits, operating in a way that increases the probability of failure, or with incorrect feed materials. Equipment overload, lack of knowledge of operational parameters, and changes in equipment demands are the chief reasons of failure in this area.
  3. Failures caused by inadequate or unsuitable components. These are caused by assets being not fit-for-purpose. Poor purchasing practices, inadequate asset selection practices, and poor initial design are the chief reasons for these types of equipment stoppages.
Of all these failures it is often those in the first point that are able to be directly influenced by the maintenance department of most companies, yet they also required need to manage and react to the failures caused by operations, purchasing and/or asset design. This highlights one of the regular comments of a friend of mine, Mr. Steve Turner, “Maintenance is a process, not a department”. Maintenance managers need to be the plants diplomats, stitching together agreements on operational and purchasing strategies to ensure cost effective performance.

One area where there are some quick wins available for reducing operationally induced failures is in the area of equipment rotation. Over the years duty and stand-by have become almost redundant terms in many plants, and have been replaced by a 50/50 spilt of operational running.

The thinking behind this appears sound on the surface. It is most common with pumping systems, but can affect any duty/stand-by asset configuration. By regularly changing the pump (or whatever) that is operational at any one time the belief is that the company is getting two advantages:

  • It is ensuring that the stand-by pump (say) is going to be ready for operation if called upon, and
  • Ensuring a longer mean time to repair by interchanging the two pumps. Thus gaining the combined life out of the installation, rather than running one pump only.
These are intoxicating arguments, and as stated above, they do appear sound on the surface.

However, they are both wrong and misleading!

If we take the example of pumps, the most rigorous and stress inducing time that they go through is that of start up. This places significant strain on the pump in various areas. Most notably seals and bearings are prone to failures due to frequent starts. So by stopping and starting pumps regularly we are often increasing their probability of failure rather than reducing it.

If we are going to use a true duty/stand-by arrangement then the pumps can be running continuously for long periods of time without having to go through the start up process. In runs of shorter time frames, only one pump at a time is exposed to the regular stresses of start-up, while the other remains dormant and tested infrequently (but regularly) to ensure reliability.
So, higher reliability through frequent starts is not a valid argument in many cases. What about the second argument, that of increased time between failures of the process due to interchanging the pumps?

Pumps, to continue with the example, contain components that wear out at a rate dependant on what is being pumped, the rate required, and the design of the pump impellers and housing in particular. So, if they are pumping the same media, at the same head pressure and rate, then they will wear out at roughly the same rate.

If we discount the effects of frequent start up then this will allow us to swap pumps and extend the overall time between failures. But then what? The fundamental reason for having a stand by unit is because we do not want to run the risk of losing the function of the process, whatever it is. Stand-by units are effectively an insurance policy against the failure of the prime or duty unit.

Instead of operating the assets in a way that will decrease the risk of failure this strategy will in fact increase the likelihood that when one pump fails due to age related wear; the other one will fail at around the same time! Almost guaranteeing a loss of function instead of protecting it! When combined, these two effects actually call into question the whole strategy of having a standby pump in the first place! It would be cheaper to have just one and to lose the function when it fails.

In the vast majority of cases it is hard to justify anything less than a 90/10 split of operational time. In fact, even this could be reduced without any adverse effects at all.
All we want to do is to prove it will work when needed. Where safety is not an issue, the frequency of testing intervals is determined by cost of the failure and the likelihood of it occurring, and balancing that against the cost of doing the task itself, in this case the functional test. Because the consequences of failure are purely economic, there will come a point where the risk of failure is less than the cost of doing the task at the required frequency.
The failure we are trying to avoid is the one that will lead to total system failure. This would be a failure of the stand by asset, and in this case could be flat spots in bearings, hardening of whatever is being pumped, or some other reason.

A recent case that I reviewed included the lime dosing pumps in a water treatment plant. The configuration was three duty units and one standby pumping unit. One of the factors we needed to take into account the fact that lime hardens over time, so if the standby pump was left for too long it would never start as it would have set solid! As the cost of failure was too high, the frequency of the functional test was based on not allowing the lime to harden.

Other similar cases include a factory where we pumped milk, and this included a system clean out on changeover, and a gold processing plant. In the gold processing plant what we had to avoid was the settling out of the gold in the slurry, representing a loss of profit even though small.

Risky business: Problem areas uncovered by the transition to risk based asset management

An article by Daryl Mather, author of “The Maintenance Scorecard”, first published in ASSETS. The magazine of the institute of Asset management in the UK.

Within the past few years we have seen an institutional shift in the way that asset-intensive industries manage their physical asset base in the United Kingdom. On the 2nd of February 2006, Ofwat the water industry regulator issued letter MD212 for Managing Directors in that industry. This letter spoke of the progress of the Common Framework, an initiative of the United Kingdom Water Industry Research (UKWIR), and referred to its evolving role as a framework to guide capital maintenance planning.

On the 14th of July 2005 Ofgem, the regulator of the Energy utilities released a letter titled “Refocusing Ofgem's Asset Risk Management (ARM) Activity” which referred to a voluntary comparison process against the principles contained within PAS 55 as a tool that “promotes requirements, which allows operators to demonstrate effective asset management”.
Recent history has also included a report commissioned by the Office of the PPP Arbiter (OPPPA) to review good practice in Asset Management evaluation and to draw on this to develop an Asset Management Evaluation Framework, using PAS 55 as one of the key evaluating tools. PAS 55 has also been used in recent signalling management evaluations prepared for the Office of the Rail Regulator.

Regardless of the nuances between the various benchmark tools, and the differing approaches in each of them, it is now obvious that there has been a fundamental shift within asset-intensive industry towards a risk based approach to managing physical assets. There are several key differences between today’s modern risk management approach, and previous approaches. For example, today consequence is as big an element of asset-condition management as probability is, an approach focused on risk rather than likelihood of failure only.
This is obviously a welcome transition for those of us working in the field of modern asset management, and provides the managerial discipline with a strong basis for moving towards even greater economic and risk management efficiencies. However, it also presents us with some unique problems.

Modelling future asset performance requires us to have a good grasp of the two fundamental elements of risk, those of the consequence and probability of failure. Regardless of the method used, consequence can be determined relatively straight forward. There are ongoing debates regarding how this is done, and how to make it relative, but these are details only. The underlying concept is widely understood and able to be applied, albeit with some pain along the way.

Where things become significantly more difficult is in the drive to model the probability of failure. The underlying theories of maintenance and that of reliability are based on the theory of probability and on the properties of distribution functions that have been found to occur frequently, and to play a role in the prediction of survival characteristics. This requires the input of a range of variables including condition, usage and most importantly, failure data itself.
Resnikov, in his early work in the field of reliability, made the statement that historical analyses of data are rarely successful. While this has changed a little since this statement was first made, it still captures the challenge of modern asset management.

Defining critical is often contentious so for the sake of this paper critical failures will be those that cause the asset to perform at less than acceptable levels of performance.
Non-critical failures are those of low or negligible cost consequences only. These are acceptable and can be allowed to occur. Therefore a policy that focuses on data capture and later analysis as its base can be used effectively. Over time the level of information will accumulate to allow asset owners, and policy designers, to determine the correct maintenance policy with a high degree of confidence.

Critical failures are, by their very nature, serious. When they occur they are often designed out, a replacement asset is installed, or some other initiative is put in place to ensure that they don’t recur. As a result the volume of data available for analysis is often small; therefore the ability of statistical analysis to deliver results within a high level of confidence is questionable at best.

It has been the experience of the author that on commencing reliability initiatives most companies do so with a conservative estimate of 30% empirical data and 70% end-user knowledge. While this still leads to improvement, it is far from the high confidence risk based decisions required in today’s asset management environment. Particularly with the scale of economic impact of getting it wrong, or where getting it wrong could significantly impact upon safety.

This is the central polemic issue relating to risk modelling. Companies, by themselves, rarely have the level of failure data required to perform accurate probabilistic analyses. Even if their failure capture technologies and processes were able to deliver the right quality of failure data, (and many organisations have overcome this hurdle) they need to have a large number of asset failures before they can produce high confidence probability models.
It can be said that one of our goals as asset managers, either through operational or capital asset maintenance, is to reduce the number of critical failures. Therefore part of our goal is to reduce the level of failure information that is available for analysis, not increase it!

For simple assets where there is a dominant cause of failure such as erosion, corrosion, evaporation or oxidization, techniques such as age exploration, inspection and usage monitoring techniques can be put into practice. Modern technology has made this relatively applicable and economical. However, where assets are affected by random failures, subject to human error or unable to be gauged through standard asset monitoring techniques, then asset failure data is a critical element of high confidence decision making.

It is one thing to predict the failure of, say, a transformer based upon measurable indicators of the onset of failure. It is another thing entirely to be able to accurately forecast the most likely failure rates of a failure mode known to be random.

This is slightly alarmist. There are modern methods of taking decisions with small samples of dubious quality, as opposed to “crashing a few more assets”. Random number generation methods, sampling, and other mathematical procedures go some of the way to bridging the gap between what we have and what we need. Human error forecasting methods such as HEART and THERP also contribute to a more accurate model. Yet, truly high confidence decisions require us to base our judgment on real historical data.

So the scope and size of the challenge before us is clear; one company alone generally will not have the quantity of failure data required to be able to take high-confidence decisions regarding asset management without having experienced significant unacceptable events. The future in risk based asset management in the medium term will focus on the hunt for relevant quality data, produced by assets operating in similar conditions and of comparable designs.

Collaborative efforts to do this are just beginning in some industries, mature in others, and not even contemplated in yet other industries. If the companies want to quicken the journey to competitive advantage, then finding a way to capture, mine, and apply failure and performance data from as yet unexploited collaborative data bases will need to be a key strategy in their drive towards high confidence risk based decisions.

Good luck!

Daryl Mather has assisted companies to increase the profitability of their physical asset base in over 23 countries including the USA, European, Asian and Latin American countries and is the author of several books on the subject, including “The Maintenance Scorecard”.

Bibliography

  • Mathematical Aspects of Reliability-centered Maintenance, H. L. Resnikov, National Technical Information Service, US Department of commerce, Springfield
  • Captured by Data, Daryl Mather, © 2006
  • MD212, 2nd of February 2006, Ofwat, www.ofwat.gov.uk
  • Refocusing Ofgem's Asset Risk Management (ARM) Activity,14th of July 2005, Ofgem, www.ofgem.gov.uk
  • Asset Management Evaluation Report, © Lloyds Register, prepared for the Office of the PPP Arbiter, 2005
  • Independent Assessment of SICA using PAS 55 as a guide, © Lloyd's Register Rail, prepared for the Office of Rail Regulation, July 2005