OBSERVE >>> ID RISKS >>> FIX
The key to keeping your deployed system meeting its mission for decades is to notice when new, emergent failure modes are starting to occur. In fact, depending on the system and your ability to respond with an integrated, deployed fix, you might need a “heads-up” of years.
Sure, you have the original Failure Modes and Effects Report and similar engineering documentation required during development. But it is now outdated. And now it is too late. Your complex system is dying undetected in ways no one could have anticipated. You see problems only when they have become blindingly obvious.
Finding these issues too late is the death knell for the system. But pouring lots of resources into constantly searching for these new failure modes with no overall plan is too expensive. So, what do you do?
The sustainment organization needs an on-going, integrated program that meets this need and does it affordably.
In a nutshell, here is the strategy that was used in ICBMs:
We again take a look at our “readiness factors” that were mentioned in our previous 3 articles.
Recall that your system readiness factors are 2 to 6 system independent characteristics that, if violated, will affect the system’s ability to perform its mission. For instance, here’s two from last month: “The vast majority of systems must be both reliable when used and available when needed.” If your system delivers bombs on target, perhaps another one is “accuracy”. For recon, maybe you have “loiter time”. In any case, these are precisely defined terms that directly support your mission. Your list needs to be as “orthogonal” as possible. That is, each factor is independent of the others. And they need to “cover the waterfront”. Practically speaking, if you discover you are not covering everything, you need to add that readiness factor to your repertoire.
To meet the mission of deterrence, ICBM readiness factors were: available, reliable, accurate, and hard against nuclear attack. (Some include safe and sure, but I could list those under available since no nuclear delivery system that is not safe and not sure will not be made available!)
Observing your system across all readiness factors means that you will be collecting data from operational monitoring, non-operational monitoring, and deliberate testing. Start with the first two. In other words, if your complex system is already generating data, you had better be sure you are capturing all of it for future use in assessing its health. It is less expensive than testing (and imagine how stupid you will look if you don’t).
For example, active components, such as ICBM guidance systems, can generate tetrabytes of operational data each year. Depot repair organizations have a wealth of stockage and diagnostic testing data. Knowing how to get that data, preserve it, and monitor it is central to your job as sustainers. Start now. Improve your ability each year.
On the other hand, for inactive components, such as solid rocket fuel, you must deliberately pull in, inspect, and test in time intervals that give you lead time to react to issues.
Existing tests, such as the 3 to 4 ICBM missile tests per year are important sources of data, especially if telemetry reveals non-failure anomalies that might point to systemic issues. For instance, a successful missile test might reveal a degrading set of electronics in an area that did not result in failure, but might point to a general need to inspect and test all electronics everywhere in the missile and ground systems.
Comparing various tests is an important source of information. For instance, testing of smaller components might seem to reveal a serious problem immediately affecting all of your weapon systems. But if it is that bad, you should have seen it already in your flight tests. You can conclude that you still have precious time to address the issue logically and systematically.
This approach establishes requirements for large data system and associated assessment tools. If you have none, start small and build as you have successes. Massive data systems projects inevitably fail, wasting valuable time.
This approach also establishes requirements for trained and skilled assessors. You must have a reliable plan to keep these skills over years of sustainment.
In way of summary, the affordability aspects of this integrated program can be summarized as follows:
Make sure you are looking at all your “free” data such as operations, stockage, etc
Plan ahead to create an age surveillance program for “inert” items
Create special tests to fill gaps.
It might not be obvious, but this search for data includes contractually instituting a FRACAS-like program at your repair depots to ensure that data is captured and reported.
Now comes the really hard part.
Even if you capture sufficient data and provide talented people with the tools to analyze it, there is still an important component that you must foster.
The general population is not good at comparing numbers, creatively looking for the unexpected, or avoiding confirmation bias. In my career, these skills were sought, encouraged, and rewarded. At the same time, stupid or lazy thinking was punished, often by a good chewing out. This raised the game and otherwise lazy thinkers improved or got out.
This “General LeMay leadership” approach is not an option today, yet well-informed but incapable people will still find their way on to your complex system assessment team. They will cost you, reduce moral, and generally throw a monkey wrench into your systemic approaches.
What will you do about it? Will your personnel system allow you to create and maintain a team of expert analysts? Will you be able to reward them and keep them? Can you create career paths for them?
Because, the real key to effectively and affordably observing your complex system for emerging issues is your talent pool of creative analysts.
Leave a Reply