Prognostics of Power MOSFET

José R. Celaya, Abhinav Saxena
SGT Inc. at NASA Ames Research Center
Prognostics Center of Excellence
Moffett Field, CA, 94035
Email: {jose.r.celaya, abhinav.saxena}@nasa.gov

Sankalita Saha
MCT at NASA Ames Research Center
Prognostics Center of Excellence
Moffett Field, CA, 94035

Vladislav Vashchenko
SGT Inc.
Moffett Field, CA, 94035

Kai Goebel
NASA Ames Research Center
Prognostics Center of Excellence
Moffett Field, CA, 94035

Abstract—This paper demonstrates how to apply prognostics to power MOSFETs (metal oxide field effect transistor). The methodology uses thermal cycling to age devices and Gaussian process regression to perform prognostics. The approach is validated with experiments on 100V power MOSFETs. The failure mechanism for the stress conditions is determined to be die-attachment degradation. Change in ON-state resistance is used as a precursor of failure due to its dependence on junction temperature. The experimental data is augmented with a finite element analysis simulation that is based on a two-transistor model. The simulation assists in the interpretation of the degradation phenomena and SOA (safe operation area) change.

I. INTRODUCTION

Prognostics is an engineering discipline focused on predicting the time at which an in-service component will fail or no longer perform its intended function. Predictions are made in-situ on individual in-service components. This is in contrast to statistical reliability methods that produce mostly a priori life estimates. The science of prognostics is based on the analysis of failure modes, detection of early signs of wear and aging, and fault conditions. These signs are then correlated with a damage propagation model and suitable prediction algorithms to arrive at a remaining useful life (RUL) estimate. The discipline that links studies of failure mechanisms to system lifecycle management is often referred to as prognostics and health management (PHM). PHM techniques have recently enjoyed considerable attention, for example, in the aerospace domain where the assessment of in-situ health of components and subsystem enables safe operations. Although the emphasis in PHM has so far been on mechanical components, the ability to perform health assessment of electronic components becomes essential as more safety-critical functionality is assumed by electronics. To that end, an in-depth understanding of aging mechanism and their manifestation is vital. The work reported here contributes to this undertaking.

In this paper a prognostics technique is presented for a power MOSFET based on an accelerated aging methodology. The methodology utilizes thermal and power cycling and was validated with tests using 100V power MOSFET devices. The major failure mechanism for the stress conditions is die-attachment degradation, typical for discrete devices with lead-free solder die attachment. It has been identified that ON-state resistance changes due to its dependence on junction temperature and can be used as a precursor of failure for the die-attach failure mechanism in the stress conditions. It has been shown that this particular degradation process provides characteristics to which data-driven prognostics algorithm can be applied. The experimental data is supported by a finite element analysis simulation. The numerical simulation assumes a two-transistor model. Results are used to interpret the phenomena of device degradation and SOA change. A Gaussian process regression framework is used for prediction of time to failure. The features used in the algorithm are based on normalized ON-resistance computed from in-situ measurements of the electro-thermal response. Results are presented from experiments on power MOSFET IRF520Npbf in a TO-220 package. The choice of the particular component is mainly due to its common use in switched mode power supplies in aerospace systems like radars and navigation equipment.

A. Related work

In [1] a model-based prognostics approach for discrete IGBTs was presented. RUL prediction was accomplished using a particle filter algorithm where the collector-emitter current leakage has been used as the primary precursor of failure. A prognostics approach for power MOSFETs was presented in [2]. There, the threshold voltage was used as a precursor of failure; a particle filter was used in conjunction with an empirical degradation model. The latter was based on accelerated life test data.

This work was sponsored by NASA Aviation Safety Program, IVHM Project.
Identification of parameters that indicate precursors to failure for discrete power MOSFETs and IGBTs have received considerable attention in the recent years. Several studies have focused on precursor of failure parameters for discrete IGBTs under thermal degradation due to power cycling overstress. In [3], collector-emitter voltage was identified as a health indicator; in [4], the maximum peak of the collector-emitter ringing at the turn of the transient was identified as the degradation variable; in [5] the switching turn-off time was recognized as failure precursor; and switching ringing was used in [6] to characterize degradation. For discrete power MOSFETs, on-resistance was identified as a precursor of failure for the die-solder degradation failure mechanism [7][8]. A shift in threshold voltage was named as failure precursor due to gate structure degradation fault mode [2][9].

There have been some efforts in the development of degradation models that are a function of the usage/aging time based on accelerated life test. For example, empirical degradation models for model-based prognostics are presented in [1] and [2] for discrete IGBTs and power MOSFET respectively. Gate structure degradation modeling discrete power MOSFETs under ion impurities was presented in [10].

II. ACCELERATED AGING EXPERIMENTS

Accelerated aging approaches provide a number of opportunities for the development of physics-based prognostics models for electronics components and systems. In particular, it allows for the assessment of reliability in a considerably shorter amount of time than running long-term reliability tests. The development of prognostics algorithms face some of the same constrains as reliability engineering in that both need information about failure events of critical electronics systems. These data are rarely ever available. In addition, prognostics requires information about the degradation process leading to an irreversible failure; therefore, it is necessary to record in-situ measurements of key output variables and observable parameters in the accelerated aging process in order to develop and learn failure progression models.

Thermal cycling overstress leads to thermo-mechanical stresses in electronics due to mismatch of the coefficient of thermal expansion between different elements in the component’s packaged structure. The accelerated aging applied to the devices presented in this work consists of thermal overstress. Latch-up, thermal runaway, or failure to turn ON due to loss of gate control are considered as the failure conditions. Thermal cycles were induced by power cycling the devices without the use of an external heat sink. The device case temperature was measured and directly used as control variable for the thermal cycling application. For power cycling, the applied gate voltage was a square wave signal with an amplitude of ~15V, a frequency of 1KHz and a duty cycle of 40%. The drain-source was biased at 4Vdc and a resistive load of 0.2Ω was used on the collector side output of the device. The aging system used for these experiments is described in detail in [4]. The accelerated aging methodology used for these experiments is presented in detail in [8].

Figure 1 shows an X-ray image of the device after degradation. It can be observed that the die solder has migrated and that voids have formed. This confirms that the thermal resistance from junction to case has increased during the stress time resulting in increase of the junction temperature and ON-resistance. Figure 2 presents a plot of the measured $R_{DS(ON)}$ as a function of case temperature for several consecutive aging tests on the same device. For each test run, the temperature of the device is increased from room temperature to a high temperature setting thus providing the opportunity to characterize $R_{DS(ON)}$ as a function of time at different degradation stages. It can be observed how this curve shifts as a function of aging time, which is indicative of an increased junction temperature due to poor heat dissipation and hence degraded die-attach.

A. Mixed-mode simulation

Numerical analysis was performed using a finite element model (FEM) representation of the device under consideration (figure 3). This numerical analysis provided I-V characteristics at different values of gate bias $V_{gs}$ for a device with generic simulation parameters roughly close to tested devices.

The electrical response was obtained with a mixed-mode circuit-device simulation using software DECIMM™ from Angstrom Designs Automation [11][12]. The mixed-mode circuit presented in figure 4 is simulated in conjunction with the FEM of the MOSFETs. This was implemented both with a single transistor and with two transistors as shown in Figure 4. Results for both models are discussed below. A voltage-controlled voltage source circuit was used to auto bias the gate voltage. This prevents the device from running outside the SOA.
The electro-thermal SOA for a single transistor mixed-mode simulation with auto bias control of the gate voltage is presented in figure 5 for two conditions a) slow transient pulse with constant gate bias; b) slow transient pulse with auto bias circuit. The observed instability points represent the critical voltages and currents limiting the safe operation area of the electrical regime.

**B. Two transistor degradation model**

The two-transistor model physically represents the device with partial area die-attachment degradation. The first transistor has original default parameters including the thermal resistance $R_{T1}$ and area factor 90% while the second transistor depicts degradation due to electro-thermal stress represented by 10% of area with deviation of the thermal resistance coefficient $K$ (figure 4). As can be seen from the simulation results in figure 6, even a small deviation in the thermal resistance of the second transistor ($R_{T2}=KxR_{T1}$) results in significant reduction of the critical voltage in auto bias conditions (figure 6).

**IV. PREDICTION OF REMAINING USEFUL LIFE**

Gaussian Process Regression (GPR) is a data-driven technique that can be used to estimate future fault degradation based on training data collected from measurement data. First, a prior distribution is assumed for the underlying process function that may be derived from domain knowledge [13]. Then this prior is tuned to fit available measurements which is used with the probabilistic function for regression over the training points [14]. The output is a mean function to describe the behavior and a covariance function to describe the uncertainty. These functions can then be used to predict a mean value and corresponding variance for a given future point of interest. The behavior of a dynamic process is captured in the covariance function chosen for the Gaussian process. The covariance structure also incorporates prior beliefs of the underlying system noise. A covariance function
consist of various hyper-parameters that define its properties. Proper tuning of these hyper-parameters is key in the performance. While a user typically needs to specify the type of covariance function, the corresponding hyper-parameters can be learned from training data using a gradient based optimization (or other optimization) such as maximizing the marginal likelihood of the observed data with respect to hyper-parameters [14].

In the application here, the ON-resistance was computed as the ratio of voltage and currents between the drain and source terminals of the device. By estimating the relationship between operational temperature and ON-resistance of the device, the computed ON-resistance was normalized to eliminate temperature effects. The signal was filtered by computing the mean of every one minute long window. Since the complexity of GPR is $O(n^3)$, computational effort increases with number of data points and hence it is important to keep the number of training points low. Therefore a uniform sampling of the curve was carried out to select the desired number of training points to train the GPR and make predictions. This process was repeated 35 times and the results were aggregated to produce final prediction. As shown in the figure, predictions were made at four (somewhat arbitrarily chosen) time instances: 160, 180, 200, and 220 minutes into aging. Subtracting the time when the prediction was made from the time when the dashed lines crosses the failure threshold gives the estimated remaining component life. As more data becomes available, the predictions become more accurate (as indicated by the proximity of the predicted dashed lines to the crossing of the failure threshold by the ON resistance) and the prediction spread becomes more precise (uncertainty cones are more narrow for later predictions).

![RDS(on) Prediction](image)

Figure 7. Prediction of RUL for aged device using Gaussian process regression technique.

V. DISCUSSION

The proposed prognostics technique reports on preliminary work that serves as a case study on the prediction of remaining life of power MOSFETs. There are several strong assumptions that need to be challenged in order to make the proposed process practical. For instance, the future operational conditions and loading of the device are considered constant at the same magnitudes as the loads and conditions used during accelerated aging. In addition, the algorithm development is conducted using accelerated life test data. In real world implementation, the degradation process of the device would occur in a considerably larger time scale. This is a topic of future work.

The proposed two-transistor model is shown to be a good candidate for a degradation model for model-based prognostics. The model parameters $K$ and $W_1$ could be varied as the device degrades as a function of usage time, loading and environmental conditions. Parameter $W_1$ defines the area of the healthy transistors, the lower this area, the larger the degradation in the two-transistor model. In addition, parameter $K$ serves as a scaling factor for the thermal resistance of the degraded transistors, the larger this factor, the larger the degradation in the model.

REFERENCES


