SMART BUILT-IN TEST

Dale W. Richards
Rome Air Development Center
Griffiss AFB NY

ABSTRACT

The work which built-in test (BIT) is asked to perform in today's electronic systems increases with every insertion of new technology or introduction of tighter performance criteria. Yet the basic purpose remains unchanged — to determine with high confidence the operational capability of that equipment. Achievement of this level of BIT performance requires tight management and assimilation of a large amount of data, both realtime and historical. Smart BIT has taken advantage of advanced techniques from the field of artificial intelligence (AI) in order to meet these demands. The Smart BIT approach enhances traditional functional BIT by utilizing AI techniques to incorporate environmental stress data, temporal BIT information and maintenance data, and realtime BIT reports into an integrated test methodology for increased BIT effectiveness and confidence levels. Future research in this area will incorporate onboard fault-logging of BIT output, stress data and Smart BIT decision criteria in support of a singular, integrated and complete test and maintenance capability. The state of this research is described along with a discussion of directions for future development.

Introduction

The approach to maintenance of electronics has for many years been narrowly scoped and highly segregated. Systems and subsystems located on an operational platform are identified as having failed or contributed to a failure. These units are removed and sent to a maintenance facility where the process is repeated with circuit cards being separated from the boxes and sent to another facility for removal and replacement of certain components. This decomposition follows naturally from the hierarchial manner in which electronics have been designed and built. Each maintenance step is performed independent of the others with items designated as "failed" at one step removed from the parent unit and sent to the next step of the parent unit and sent to the next step. Though performing well to date, this approach to maintenance has fallen short in recent times as the number of maintenance actions has risen to levels severely taxing the logistics and support resources provided. Compounding this situation is the increased complexity of those systems and the cost of their maintenance. In addition, the nature of many maintenance facility, requiring that extensive maintenance resources be deployed along with the equipment itself. The current technology is able to meet the immediate needs of the logistics and support community, but provides no inherent response to the concerns of maintenance and diagnostics, thus impacting the long-term needs of that community. The level of false removals, as indicated by high cannot duplicate and retest okay rates, sometimes in excess of 50%, which are associated with many currently fielded systems prevents the achievement of acceptable levels of availability. Any modification to on-board testing for these systems must properly reflect the increased need for even more efficient operation. Every attempt must be made to minimize the degree to which good units are removed from the operational platform and to maximize the extent to which knowledge concerning the equipment failure and subsequent designation of subunits contributing to that failure is effectively conveyed to subsequent maintenance and therein utilized.

Achievement of this level of performance requires an increase in the ability of built-in test (BIT) to properly distinguish between hard failures, intermittent behavior and one-time false alarms. Critical to this process is the correlation between the BIT output and knowledge of the physical environment incident to the perceived failure. This alone can only reduce the number of units entering the maintenance pipeline, but to reduce the incidences of cannot duplicate retest okay it is necessary that relevant data concerning the declaration of a "faulty" unit be automatically included with that unit as it is removed from the platform and sent on for further maintenance. Some form of non-volatile electronic storage is required along with corresponding capabilities in the associated automatic test equipment (ATE) to
recover this data and effectively utilize it in the diagnostic and failure confirmation process.

RADC Response

Rome Air Development Center (RADC) has sponsored a number of research efforts directed at developing technologies in support of the above capabilities. This work has been concentrated at the system/board/module level. There are two components to this research. The first and principle aspect of this research is concerned with decreasing the number of false removals which contribute significantly to the infamous false alarm/cannot duplicate/retest okay problem and increasing the ability to identify intermittent failures. The building blocks for this work are techniques developed through research in artificial intelligence (AI) and the approach is collectively known as Smart Built-in Test (Smart BIT). The second and supporting aspect is concerned with the need for more robust reasoning and correlation of environmental stresses such as vibration, shock and temperature along with quality of prime power supply. The goal of this work is the development and integration of a single micro-electronic package known as a micro Time Stress Measurement Device (TSMD) which could be located within a removable/repairable unit.

Smart Built-in Test

A major problem facing today's maintenance community is that of high false alarm rates. Units removed at one level of maintenance often test good at the next level. This leads to inefficient use of personnel, test equipment and spares and can contribute to lessened availability of equipment and increased demands on the logistics system, particularly when whole systems are deployed to areas without preexisting maintenance facilities. A number of approaches have been suggested for improving the efficiency of this process and targeted at every corner of the maintenance concept picture. However, the greatest impact is achieved when the front end of the process is improved — reduce the number of false removals! This is the approach taken by Smart BIT. Smart BIT is best thought of as an adjunct to the actual functional test performed by traditional BIT. In its current form it is a software filter on the output of the functional test but it could easily be integrated as a singular BIT function. Current BIT technology often places 100% confidence on the results of a test, even though these results could be biased by the behavior of the components themselves, influenced by transient environmental conditions. Incorporation of an N-out-of-M filter can improve the condition to some degree, yet even it can be easily missed. Smart BIT goes beyond these simplistic approaches to include a more robust reasoning process that learns from the pattern of faulty and incorporates knowledge of time and other information outside of the functional realm of BIT.

Research to date has centered around the application of Smart BIT techniques to two separate systems containing BIT. The first of these was a signal-conditioning and reporting module utilizing primarily discrete components and low levels of integration. The second was an air-data computer utilizing highly integrated ICs and a bus oriented architecture. Different techniques were developed for each system and then applied to a laboratory simulation of those systems. A common goal during the development of the techniques was to maintain a sufficient level of generality to ensure that the technique could be applied to other systems without starting over from the beginning. Thus an implementation of Smart BIT will involve the selection, ranking and integration of multiple Smart BIT techniques based on the BIT and mission needs of system being modified. The principal techniques identified for which software has been developed and demonstrated are: Information Enhanced BIT, Improved Decision Rule BIT, Temporal Monitoring BIT and Adaptive BIT.

Information Enhanced BIT was a precursor to the concept of integrating TSMD with Smart BIT. BIT decisions are based on information internal to the unit under test (UUT) as well as other, external sources. These could be test monitors or information concerning the operational mode of the platform or the health of other systems. In general, the output of the BIT is compared against known failure modes and the external data used to corroborate any fault identifications.

Improved Decision Rule BIT incorporates a structure suggestive of an expert systems format to increase the robustness of the BIT decision process. A simple BIT check such as "IF test-1 fails THEN report unit-12 faulty." could be augmented to be "IF test-1 fails AND unit-12 has exhibited intermittency AND stresses have been near threshold THEN declare unit-12 marginally healthy AND adjust sensing frequency." The important characteristics in this technique is that decisions are based not on a single fact, but that compound antecedents are often employed and any assumptions about the system that could influence the firing of a rule must be additionally substantiated.

Temporal Monitoring BIT uses Markov modeling techniques combined with a finite state machine representation of unit health to monitor performance over time. The transition from OK to HARD is forced to cross a state of intermittency. The probabilities of going between the states of recovering and faulty are dynamic and are adjusted according to the pattern of GO/NO-GO's coming from the BIT. If the chance of recovering becomes very high, subsequent NO-GOs can be treated as intermittents and the unit declared functional but degraded.

Adaptive BIT makes use of two general learning paradigms: k-nearest neighbor and neural network back-propagation. In both cases the BIT report in question is plotted into an n-space defined by the various parameters of interest,
such as vibration, GO/NO-GO, airspeed, duration of failure, etc. In k-nearest neighbor, the k previous values plotted which are closest in absolute distance from the new point are compared with the new point entered as a GO or NO-GO depending on the GO/NO-GO value of those k points. The approach used in neural nets is consistent with accepted neural net theory and in essence divides the earlier defined n-space into a number of regions, each classified as either GO or NO-GO.

**Time-Stress Measurement Devices**

The relationship between accumulated stresses and failure modes of equipment has long been recognized. Currently, there is no correlation between these entities — when failures occur only the effect of stresses on the equipment, the failure itself, is captured and not the actual stress conditions to which it was exposed and which may have precipitated its failing. The reason is that the ability to measure these stresses has until now been limited to their placement of a discrete transducer at the point of interest. Advances in sensor fabrication and integration, coupled with general increases in computational density now allow for a complete stress measurement and recording system to be fabricated in under two square inches. This is the purpose of research undertaken by RADC as part of its Time Stress Measurement Device work.

Stresses affecting electronic equipment can include thermal, vibration, shock, and electromagnetic signals. The variety in which these stresses exhibit themselves can range from simple discrete events to the cumulative effect of many events over a period of time. It is important to be selective about what stress characteristics to measure and what data to store for future use. This need for a measurement capability has led to the construction of a paperback novel-sized TSMD module suitable for a flight data collection program. Stresses measured include temperature, vibration/shock, and prime power quality. Data collected from both A-7 and A-10 aircraft is now being analyzed and correlated with relevant maintenance actions. A critical portion of this effort was the determination of appropriate methods of data compression as it became impractical to store a complete electronic "strip-chart" of each sensor output. To do this, many parameters are characterized by either cumulative time above a threshold or number of excursions above a threshold.

Following on the successful development and testing of the TSMD module effort is the on-going development of a micro-TSMD package to incorporate the capabilities of the module in the aforementioned two-square-inch hybrid chip. Currently at the advanced-development-model stage, this implementation will be amenable to mounting on cards in line-replaceable units or modules. Full-scale development of a qualified micro-TSMD will begin in late FY89 with availability projected for FY91. The first insertion of the micro-TSMD into an operational system will occur as part of a Warner Robins Air Logistics Center Microcircuit Technology in Logistics Application (MITTA) project related to automated identification technology for printed circuit boards.

**Technology Insertion**

It still remains for Smart BIT and TSMD technologies to be integrated and then inserted into a fielded system. Steps toward that end include additional research into the integration of the various technical components and a change to the overall maintenance concept as regards the management of maintenance related data. Central to this entire process is the recording of environmental, diagnostic and logistics data local to the level at which units are removed from the operational platform, be they boxes, Line Replaceable Units, Line Replaceable Modules, or some other similar elements. Currently TSMD technology has the capability to store some logistics data along with a compressed record of stress data. An audit trail of the diagnostic process resulting from Smart BIT should also be stored at that level, depending on memory constraints. Maintenance actions involving the removed unit would be able to access this data. ATE could be programmed to incorporate the TSMD and Smart BIT data into its own diagnostic process and the logistics data could then be appropriately updated.

Research is now underway to define the degree to which Smart BIT and TSMD need to share information and to identify pertinent characteristics of that data to be retained in memory. It is apparent that stress data should be available at three levels of temporal resolution: uncompressed in the temporal vacinity of a possible failure, compressed for duration of a mission and statistically characterized for all such similar equipment and missions. The most appropriate means for converting realtime sensor data into these time compressed formats remain to be determined. Additionally, a proper path will be identified for transitioning the integrated technology from the laboratory into the field. The distribution and sensitivity of the stress sensing elements need to be defined and methods of increasing BIT processing capabilities to perform additional reasoning functions determined. Not only is impractical to place a TSMD module in every circuit and a LISP machine into every BIT equipment — it isn't necessary. The relationship between a component or board in question and a remotely mounted sensor can be analytically calculated and the software to perform the reasoning can be written in a variety of languages. The development of powerful symbolic processors and greater density memories will provide the computational processing capabilities required. The more prospective approaches being considered involve the upgrade of a specific card/mobile for a particular system. This would have a minimum impact on the rest of the system and the upgrade itself could result in the necessary electronic real estate for the new functions.
Further Development

The benefits of Smart BIT can best be understood from the perspective of the overall maintenance and diagnostics process. A typical maintenance scenario involving full integration of Smart BIT and TSMD capabilities will now be described.

During a mission the TSMD portion is continually recording stress profiles in a wraparound fashion, replacing old data with more recent measurements. The older values being data-compressed and stored in long term memory. The TSMD will also detect specific stress profiles that could damage equipment and note their occurrences in the long term memory. When stress data is needed by either Smart BIT or maintenance equipment, this information is retrieved from the long term memory. When a failure condition is detected by the Smart BIT the TSMD is asked to return relevant stress data. Depending on the criticality of the system to the mission and flight safety, the Smart BIT will continue to analyze both the functional test data stream and the TSMD output. If necessary, this process may be performed offline while a spare unit is switched in place of the one in question. If a decision is made to declare a unit faulty, information relevant to that decision process will be stored local to that unit's non-volatile memory for access later by other maintenance processes. At the flight line, units identified as possibly faulty will make themselves known to either human personnel or directly to ATE. Maintenance at this stage will remain primarily a remove and replace process, excepting simple procedures such as tightening, aligning or connector replacement. The critical distinction is that the units removed have with them relevant diagnostic and logistic data. During subsequent maintenance action for those units ATE software will request from the non-volatile memory the information placed there by the TSMD, Smart BIT and previous maintenance cycles. This is data that the ATE has no other means of attaining and will be used to discriminate among fault isolation choices, guide the direction of diagnosis or suggest candidates when no failure is indicated by the ATE functional tests. Periodically, information collected from many maintenanced actions is analyzed, and revision and data updates forwarded to the field for loading into the Smart BIT and TSMD software, or modifications may be made to the logistics data stored within the unit-under-test itself.

Conclusions

The future direction for maintenance points toward a greater role for on-board testing and greater data interchange between stages of maintenance. The first can be provided by the inclusion of Smart BIT capabilities. They, in turn, will generate new and more confident data regarding the failure or operational status of the equipment being tested by the BIT. This alone will reduce the number of healthy units unnecessarily removed from a platform. And, in conjunction with an implemented embedded logistics data tracking system, can serve to reduce the burden on already scarce maintenance resources. Only with a streamlined maintenance process wherein every action is a necessary one and resources are utilized to optimal efficiency can the R&M goals of greater availability and reduced logistics support costs be met.

References
