This section assumes your sustainment risk process is working well enough that the team has a good grasp of system, mission, and readiness factors. If this is true, only 3 questions are needed to define the efforts required to execute an adequate system observation (AKA assessment) program:
- Are all parts of the system covered in some manner, i.e. are we complete?
- Are all readiness factors covered, i.e. are we thorough?
- Are we taking advantage of all data, i.e. are we economical?
In a successful sustainment organization, the first two questions are asked and answered continually throughout the life of the system, greatly aided by your team’s understanding of system, mission, and readiness. Formal, top-management, every-two-year reviews are the best way to keep these issues in the minds of your team members.
The last question can drive creation and execution of an economical program by using the following 5-steps:
- Use your “free” data
- Look to your repair depots
- Set up an age surveillance program
- Establish processes for special testing
- Analyze your data to create information
These questions also must be periodically revisited to ensure the best answers are being executed. In addition, we must also take a look at the enablers of people, process, and tech. And also review how this part of the model stands up to the criteria. All that is what this section discusses.
Complete and Thorough
When you evaluate “complete and thorough” you will no doubt find arguments such as “that subsystem is not our responsibility” or “I know we’re supposed to evaluate hardness to nuclear attack, but how do you even do that?” In other words, the arguments for not being complete and thorough will be compelling to the manager already feeling overwhelmed by crisis decisions. But these kinds of answers should be triggers for you to create a small team of action-oriented experts to break through the resistance and ensure orphaned subsystems or neglected readiness factors are given the attention needed. It may even be necessary, in order to prioritize resources, to write up specific assessment issues as risks to sustainment.
Another aspect of complete and thorough is the time-value of your observations. You will find that observing a new, emerging failure mode during a full up system test leaves you with little or no time to react with a useful mitigation. However, an observation program that digs into subsystems, components, and parts can alert you to developing trends early.
Your assessment program will be economical as possible if the 5 steps noted above are followed.
Use Free Data
Before you start to set up your comprehensive assessment program, be sure to look around and find out how much data your system generates “for free”. From usage data to repair depots to stock counts, there is countless data trying to tell you what is going on with your system. Don’t assume that you would have heard of the information already if it were important. And new sources of data can appear throughout the life of the system.
Look to Your Repair Depots
It is usually the case that many organizations exist outside of yours that heavily influence the sustainability of your system. The classic example is your repair depot. And you can be sure the repair depot is run to ensure efficiency via high throughput. This is a recipe for losing information on why your system components failed. Emerging failure modes will be undetected until a system crisis forces you to find the problem. A better way is to contract with your repair depot to get the data you need. Three areas should be covered at a minimum. Can the depot tell you that the item that was repaired actually created the failure in the system that drove it to the depot? Can the depot tell you about all the failures of parts and pieces that occur during depot diagnostic and sell-off testing? Can the depot show how they use this information to improve their diagnostics and sell-off processes? If you find yourself flooded with bad actor components recycling undetected through your repair process, you are already behind the curve.
Set up an Age Surveillance Program
Now that the more active data has been corralled, it is time to take a look at the less active parts of your system. If there are subsystems that could quietly rust or wear out without detection, a special detection program must be put in place. This could vary from field inspections to segregation of samples in special storage areas. Inspections may involve simply looking or they may involve complex testing. Economy can be found via regular schedules and efficient use of resources such as resource-leveling, reuse of test plans and expertise, and bulk buys during production. A special subset to this activity attempts to speed up time and wear with vibration, heat, or other stressful environments. This is an action that needs to be used with caution because the inherent assumptions of stress vs. time might not hold true.
Establish Processes for Special Testing
Having taken all of these steps, there will still be portions of your system that elude good observation. For instance, there will be components that have uncontested decades-long reliability making age surveillance a foolish investment. However, an informed look every 10 years may be just the right amount of observation.
Analyze Your Data to Create Information
Since this management model is most effectively used on very complex systems, it can be expected that observing the system to identify risks will occur in a domain of big data. That is, there will be lots of data (maybe tetra-bytes) and the data will be in many, many different formats. For instance, an ICBM guidance system could have gigabytes of data streaming from hundreds of missiles on alert that can be used, among other things, to define the health of the stabilizing gyroscopes. At the same time, other data may be entered by hand by maintenance personnel working on site. Glitches in the first set of data could then be explained by the fact that the system was being disturbed by maintenance crews as shown in the second set of data. Keeping track of all this data is essential to your analysts teasing out the importance differences early that point to emerging failure modes.
A powerful tool for making your analyses even more economical is cross-checking pending conclusions among data sources. For instance, if your component-level testing is telling you that your system reliability is poor, but you still score high on system level tests, something must be off – probably your statistical confidence factors. This directs you to more and better testing to better estimate the time you have remaining to fix the issue.
Key people in assessment are engineers and statisticians. As a minimum, they form a team to ensure the analyses are based in sound engineering (e.g. physics, chemistry, etc) and defendable math. Engineering without math cannot estimate how soon is the system impacted. Math without engineering can create many interesting graphs that cannot be explained or defended using fundamentals. For instance, trends might be more attributable to changes in testing than changes in the test article. Others, such as test technicians, can play critical roles. Be sure to look for the best team irrespective of which company or government organization they work for.
Great IT support is required for your team to surf vast amounts of data. Data gathering, sorting, and analysis tools must be maintained on a schedule compatible with the team’s workload and process improvements. It cannot be based on arbitrary software maintenance schedules and funding.
Everything discussed to this point depends on repeatable processes. Processes discipline is used to ensure monitor and test programs are documented and defendable. Imagine trying to sell a recently identified risk only to find your test documentation is missing or incomplete.
Does “system observation” have the characteristics of a useful model?
The sustainment management model must be…
- directly applicable to sustainment of today’s complex systems
- internally consistent
- practical, easy to remember, easy to apply
- constant, unaffected by changing public laws, regulations, management fads
Examples of how the “system observation” portion of the model does this…
- Digs down into the subsystem, components, and parts to ensure timeliness
- Uses readiness factors to link back to the mission.
- Asking did the repair match the failure?
- Gaps in assessment are treated as risks to sustainment
- Use of a toolbox of techniques