Surely few people would argue that being proactive is better than being reactive. Certainly, intuition says it should be true. Similarly, we all know the classic ‘5 x Ps’ phrase ‘Proper planning prevents poor performance’ (perhaps with minor variations!). So why, when it comes to managing IT, are we so much better at managing reactively than proactively?
At Infrassistance, we have conducted numerous process maturity accelerator analyses over the years and almost without exception, one of the most mature service management process in the ITIL set is the only one that is purely reactive; incident management. The image below represents the maturity ranges for the processes we are most often asked to analyse.
Not surprisingly, the least mature processes are those that can most contribute to a proactive approach;
- Availability and capacity management are key service design processes intended to ensure the successful operation of a service when it transitions from the development to the live environment.
- Service asset and configuration management critically underpins pretty much every other process and is essential to effective and proactive change management. It is our experience that as many as 70% of all incidents are caused by poorly controlled change, primarily because of the inability to conduct an effective impact assessment informed by the relationships mapped into the configuration management system.
- Problem management in most organisations is still undertaken primarily in reaction to a major incident, yet proactive problem management can be so effective at eliminating the root cause of future incidents. We have one client who in the space of a year having appointed a problem manager and by introducing formal ITIL processes together with a new service management tool, reduced the occurrence of major incidents from an average of one a day to one a month; a remarkable twenty-fold reduction.
- Release and deployment (R&D) management is often not even recognised as a formal process and yet it is the gatekeeper of the live environment and therefore instrumental in protecting it. As well as testing, R&D is responsible for maintaining the release policy covering, amongst other aspects:
– agreeing and maintaining release schedules
– defining release naming and numbering conventions
– defining roles and responsibilities
– ensuring users are informed about releases through the release notes and suitably trained
– handing over the known error database to the support teams
– managing the licensing arrangements with suppliers
So, it should be pretty obvious why proactive is better than reactive! The question is; why are we not better at being proactive?
In our experience, there are three main factors that inhibit our ability to be more proactive, each of which can absolutely be managed out.
- Unrealistic deadlines
There is one universal truth about development, and that is that at some point and often on more than one occasion, the specification for the new or changed design will itself change. The sad truth is that unlike a managed service provider, an internal IT department will often fail to renegotiate the time and budget needed for the changed specification and end up being blamed for being late and over-budget! And when the original scheduled release date approaches and testing is incomplete, the organisation is too committed to the schedule to permit a delay. The obvious consequence is that the service goes live often without key functionality or performance built into the solution in order to meet the deadline.
- The ‘Hero culture.’ The Western world loves a hero. In the second part of the last century, a certain Paul Neal ‘Red’ Adair gained global recognition for being able to cap oil well ‘blow-outs.’ In modern times, we might think of the firefighter emerging from the flaming building with a child in their arms. But if we keep rewarding heroes who by definition deal with the fallout of failures, we are unlikely to encourage those who prevent the failure from even occurring. For instance, who knows the name of the person who found a way to stop oil wells exploding? In our modern example, the hero would have been the person who refused to approve the building of a tower block with flammable wall cladding, a central staircase with flammable carpet and no in-built sprinkler system.
- We don’t see the cost of ‘getting it wrong’ and therefore the value of investing in getting it right first time. In the IT world, the cost of an incident can be broken down into four components:
– the cost of the internal re-work
– the value of the lost user time
– the business impact
– the reputational damage to the organisation.
The first is easy to calculate but almost no-one does. Simply ask anyone who can be assigned an incident from the service desk to estimate the percentage of their time they spend on managing incidents. Despite this not being their primary role and often not even appearing in their job description, the value we most often find is 75-85%. To calculate the cost of this, use this simple equation:
% of time handing incidents x number of people x full employment cost
By way of an example, for an IT team with 40 incident assignees that spend on average 75% of their time fixing incidents at a fully-loaded employment cost of £50k per annum, this cost is 75% x 40 x £50k = £1.5 million per annum spent on managing incidents!
The second cost, lost user time is again hardly ever calculated but relatively simple to do. However, to do so, you need to record on the incident record the number of users affected by the incident. Most organisations simply say an incident affecting more than 5 or 10 users is high priority. To calculate this cost:
Number of users affected x service outage duration x fully-loaded employment cost
For example, a service outage of 2 hours affecting 100 users at £40k per user equals approximately £8k. Calculate this for all multi-user affecting incidents (usually P1s and P2s) over a three-month period and gross it up for a year. For one client of ours, this equated to £10 million per annum!
The business cost of the lost user time may be harder to measure, but in a commercial environment, it has to be higher than the user employment cost or the users would not be paid what they are! As an example, one of our logistics clients was fined £1million for missing a delivery window to a car production company by one hour.
The highest cost, of course, is reputational damage and there is no better example than a certain dominant mobile phone manufacturer that ten years ago mismanaged a major incident and when the iPhone was launched shortly afterwards, essentially folded. More recently, think TSB!
The key point here is that unless you measure the cost of getting it wrong, it is hard to justify investment in error prevention which is a highly visible, upfront cost. For instance, a Problem Manager can cost an organisation £70k per annum. But put that in the context of our client who achieved a 95% reduction in major incidents and it’s not hard to see the justification in hindsight.
Infrassistance is a consultancy and training company specialising in IT service management. We work with organisations of all sizes around the world and in all industry sectors, helping them optimise the management of IT services and the corresponding business benefits.