Based Re-programmable

An SRAM (static random access memory)-based re-programmable FPGA (field programmable gate array) is investigated for space applications. A new commercial prototype, named the RS family, was used as an example for the investigation. The device is fabricated in a 0.25p, m CMOS technology. Its architecture is reviewed to provide a better understanding of the impact of single event upset (SEU) on the device during operation. The SEU effect of different memories available on the device is evaluated. Heavy ion test data and SPICE simulations are used integrally to extract the threshold LET (linear energy transfer). Together with the saturation cross-section measurement from the layout, a rate prediction is done on each memory type. The SEU in the configuration SRAM is identified as the dominant failure mode and is discussed in detail. The single event transient error in combinational logic is also investigated and simulated by SPICE. SEU mitigation by hardening the memories and employing EDAC (error detection and correction) at the device level are presented. For the configuration SRAM cell, the trade-off between resistor de-coupling and redundancy hardening techniques are investigated with interesting results. static leakage current (static lcc) measured indicates a device tolerance of approximately 50krad(Si).

With regard to ionizing radiation effects, the increase in static leakage current (static lcc) measured indicates a device tolerance of approximately 50krad(Si).

|. INTRODUCTION
The antifuse FPGA has gained significant visibility in recent years as the programmable logic device choice for space applications.
The antifuse switch is nonvolatile and insensitive to both single event and total dose effects. Compared to mask-wired ASICs (application specific integrated circuits), it has the advantage in turnaround, flexibility, and (hardware) cost per design, while maintaining the same radiation performance.
However, there has been a tremendous interest in re-programmable FPGAs for the potential realization of a "re-programmable satellite" in the future. There are two re-programmable FPGA technologies in the market right now. One uses a FLASH/EEPROM (electrically erasable programmable read only memory) configuration switch and the other an SRAM switch. This paper will focus on the SRAM-based technology only. The development and radiation effects for FLASH-based FPGAs will be published elsewhere (see reference [ 1]).
Intrinsically, the SRAM-based FPGA is very sensitive to single events. The primary concern is the SEU of the SRAM configuration bits. Unlike the aforementioned nonvolatile switches, the volatile SRAM switch is prone to SEU. Figure 1 shows the simplified schematics of the SRAM controlled switch. Each design is implemented into an FPGA by the configuration of many switches like this. CSRAM is used in this paper as the acronym for this configuration-SRAM cell.  There have been serious efforts to develop a viable, radiation-hard SRAM-based FPGA. The most significant recent venture is a NASA/Honeywell/Atmel joint effort using SO1 (silicon on insulator) technology to implement an existing low (6k) gate count product.
While the radiation performance is still in development, the proposed approach will not satisfy the cheap, large gate count needs of commercial satellites.
Many more efforts are still needed to make a re-programmable FPGA meet the performance, cost and radiation requirements for the space market.
This paper presents results of the investigation of an SRAM-based FPGA for its radiation performance and potential enhancement.
A commercial prototype, named the RS product family is used as the starting point. Both SEE (single event effects) and total ionization dose (TID) effects areinvestigated. Focus is ontheSEUof memory bitsin the device.
Thescopeof this paperis relatively wideby nature. Because of thepublishing limitation, onlydetails for those fresh dataandnewfindings arepresented. The well-known materials will be kept short and the interested reader should consult the references tbr further details.

ll. DEVICE TECHNOLOGY
The Actel product presented in this paper is called RS. It is fabricated on the advanced 0.25!am CMOS technology in commercial foundries.
Key wafer-fabrication elements [3] relevant to radiation effects are listed in Table 1.

III. DEVICE ARCHITECTURE AND SEU
This section gives a brief introduction of the device architecture.
The main purpose is to identify different types of memory in the device for their SEU ramifications.

A. Architecture Overview
The architecture has three (3) levels of hierarchy [4]. Within each level of hierarchy, the structure is a two dimensional array. In the device investigated, the top level of the routing hierarchy is called B4 (Figure 2). It made out of a 4x4 rectangular array of tiles called B16x16, together with I/O (input/output) blocks on the periphery. It has approximately a 50k-gate capacity when counted as gate-array equivalent.
The B16x16 tile itself is the middle level of the hierarchy.
It is composed of 16x16 arrays of block called B1. This array is served by a two-dimensional mesh routing structure. Each B16x16 tile also includes 9K bits of user SRAM (USRAM) and additional routing resources for its employment.
The lowest level of the hierarchy is the B I block. As shown in Figure 3, BI has four quads of functional modules and associated interconnect routing resources. Each quad has a trio, composed of two three-input lookup tables (LUT3) and a flip-flop (FF), and a two input look-up The simplest error correction approach is to reload the configuration states from an uncorrupted storage.
There are potential high current modes in the device due to CSRAM SEU. For example, when two inverter outputs with different states are connected erroneously due to the CSRAM upset (Figure 4), there is a static current through transistors and interconnects from Vcc to GND. This situation is close to a micro-latch-up. However, the current is through well-designed transistors while latch-up occurs in parasitic structures.
A permanent damage in metallization, for example, is unlikely.

D. USRAM SEU
The user SRAM is the same design as the CSRAM except extra transistor logic is added for fast read and write. Its SEU sensitivity is the same too. However, the SEU ramification is totally different. The upset ofa USRAM can only cause a soft data error.
The functionality of the device is still intact.
However, there is a potential latent functional failure mode. When the USRAM is used for storing a configuration state, an SEUwill induce a functional failure ata later timewhen that erroneous configuration datumis then loadedinto the CSRAM.

E. User FF SEU
The user FF in the logic module (see Figure 3) is an edgetriggered master-slave with preset design. It also has logic for controlling the programming and testing. An SEU in the user FF will only cause a soft data error.

F. Control Logic SEU
In-flight programming will expose all shift registers in the control logic for SEU. But the programming period is short compared to the mission duration.
Once programming is complete, the control logic used for programming is disabled by a hardwired reset and thus the SEU sensitivity is eliminated.
The IEEE 1149 standard also known as JTAG is implemented in RS products for the board level testing. SEU in the TAP (test access port) controller will cause functional failures very similar to the failure due to CSRAM SEU. To eliminate this problem, the optional TRST (test rest input) input will be implemented to 'hard-reset' the TAP controller. Reference [2] presents a well summarized discussion regarding to JTAG SEU in the FPGA devices.

A. Heavy 1on Testing Results and SPICE Simulation
In the SPICE simulator, the heavy ion effect is modeled by injecting a hit current pulse at the (reverse-biased) active junctions. This current pulse is of a triangular shape with rise time and fall time equal to lOOps.
The injection node is clamped to either GND or Vcc by a variable capacitor during charge collection. This method assumes the hit pulse is much faster than the circuit response and ignores the detailed timing information carried in the pulse shape. The charge collection depth (dcoa), is extracted by: Where Qcr,, is the critical charge for upset. It is generated from SPICE simulation by the method mentioned above. LCT (linear charge transfer) is the charge equivalent of LET. Its value is: Where ,Os, is the density of silicon, and 3.6eV is the energy needed to generate an electron-hole pair in silicon. LETa, was measured by heavy ion testing at Brookhaven Memory Type Once the charge collection depth is known, we can derive the LETt_ for any memory circuit when Qcrit is acquired using a SPICE analysis.
The critical charge for different memory types in RS are simulated and listed in Table 2. The device populations of the different memory types are calculated using the 50k-gate, B4 device ( Figure 2) in the RS family.

B. SEU Rate Prediction
To evaluate the radiation tolerance at the device level, the upset rate of each memory type is calculated using Space Radiation 4.0. The space environment is the Geosynchrous orbit with the solar minimum condition. Spacecraft shielding is assumed as 100mil aluminum.
For device parameters, the LET is converted from Qcrit by equation 1. The saturation cross section is measured from the layout design.
The Weibull shape of 2 and width of 10MeV-cm2/mg are assumed. These numbers are believed to be conservative based on the fact that there are several active junctions in every single event, and also from previous heavy ion test data on Actel FPGA devices. Usually much more gradual Weibull curves were measured.
The result of the upset rate for each memory type is listed in

A. Mode 1 Single Event Transient
As shown in Figure 5, two inverters are in series. A heavy ion hits the first inverter and induces a transient pulse.
If this pulse passes the second inverter and still has a peak over half Vcc (1.25V), which is approximately the threshold voltage of the inverter, it will propagate through the logic chain and has the possibility of causing an error in the storage unit.
For the most sensitive node, the Qcrit tO cause a propagating pulse is 0.02pC and the threshold LET is 2MeVcm'/mg. in combinational

B. Mode 2 Single Event Transient
When the CSRAM is hit by an ion with deposited charge less than Qcrit, the state of the bit will be in a metastable state for a period of time and then recover to the original state. During the transient period, a signal passing through the pass transistor controlled by the CSRAM will be modified and so_ data errors may occur.
SPICE simulation results have shown that for a non-hardened CSRAM-switch, this may not be an issue, because the metastable period is very short (<Ins).
However, the hardened CSRAM-switch will have long enough metastable period to generate a significant erroneous pulse. In the following scenarios, a hit pulse with 0.31pC and a resistor hardened CSRAM with 500kf_ resistors are used for demonstrating this issue. Figure 6a shows one scenario. A CSRAM switch is gating the data path. A square wave signal is input from the le_. Assuming the CSRAM switch is normally on, a heavy ion hitting on the CSRAM will leave the pass gate temporarily off for a short period of time, depending on the charge generated and collected.
If the CSRAM transient is not near the signal transition edges, there is no effect.
As shown in Figure 6b, the pulse will be narrowed down if the CSRAM transient falls on the front edge of the signal, and the pulse widened if theCSRAMtransient fallsonthetrailingedge. Narrowing maycause the disappearance of thesignal and widening may cause racing issues.  Figure  7a shows the circuit schematic.
Depending on the CSRAM state, either A or B will be selected.
Assume A is low and B is high, and A is normally selected.
An ion hit induced CSRAM transient will temporarily select B and generate a pulse.
Using the aforementioned hit pulse and resistor-hardened CSRAM, a pulse of more than 2V high and 2ns wide resulted ( Figure 7b).
It is strong enough to propagate through the circuitry and cause a soft data error.

A. Memory Hardening
There are basically two strategies to harden a latch type of storage element. The objective of each strategy is to increase the threshold LET.
Referring to equation 1, one way is to increase the Qcrit by circuit design techniques.
The other is to reduce charge collection (d, ou) by technology changes such as SOl (silicon on insulator), or wafer fabricating process changes such as thin epitaxial silicon or double well. Two circuit techniques are investigated. Figure 8 shows the schematic of the resistor-decoupling technique. The effectiveness depends on the resistor value. SPICE simulations were done and the correlation between LETth and the resistor value is listed in Table 3. TheDICEredundancy design inreference [9]ischosen as anexample toillustrate theSSDU issue. Figure 9 shows the schematic oftheDICEdesign.Each statein thememory is held by two nodes. For example, XI and X3 are holding high, while X2 and X4 are holding low.
If only one of the two nodes holding the same state (e. g. X I and X3) has single event charge collection, the LETth of this circuit is very high.
The SPICE simulation results show that the 0.25gm version of the DICE SRAM is practically SEU-immune if only one node has been hit by the simulation heavy ion pulse mentioned in sub-section IV A. Space Radiation 4.0 has a module based on Edmonds theory [10] to predict the SSDU rates. This module is basically applied to two independent memory bits. Two key input parameters are LETth and the physical separation of the bits. However, the two nodes in DICE are not independent. SPICE has to be used to simulate the LETth for SSDU in DICE by injecting the same hit pulse at nodes X I and X3 simultaneously.
The resulting LETth for SSDU in DICE is 2.7MeV-cm:/mg, the same as the original memory cell without implementing any hardening technique. The rate calculation was done for several different physical separations. Table 4. If a normally sized DICE cell is designed, the separation is close to 2_m. In this case the simultaneously active nodes are one P+/N-well junction and the other N+/Psubstrate junction.

Results are listed in
The injection pulses for the P+ and N+ are of opposite polarity. Also, since the P+ is inside the N-well, compared to the N+, only about half the injected charge will be collected.
Thus the injection pulse for the P+ is always half that of the N+. The LET_ of SSDU process is listed in Table  5. The rate calculation (results also listed in Table 5) is done by using a 2pm separation, which is a realistic layout dimension. Comparing the error rates in Table 5 and Table 3, it indicates that the SSDU process dominates the error rates when a hardening resistor value of lMf) is used.
The present analysis draws the conclusion that the SSDU process will limit both resistor and redundancy hardening. If the layout design doesn't pay special attention to this issue, the redundancy hardening is less effective than the resistor hardening by about two orders in upset rate. However, the major disadvantage of the resistor hardening is that the polyresistor has a large negative temperature coefficient. The temperature coefficient is larger for larger resistance (lower poly doping).
The memory cell employing resistor hardening will have a very large upset-rate variation through the specified operation temperature range.
As the feature size shrinks to 0.18pm, the effectiveness of both resistor and redundancy hardening will be further compromised and more ingenious designs will be needed to mitigate SEU at the circuit level.

B. Device Level Mitigation by EDAC
Using EDAC to correct the corrupted CSRAM bits during operation will convert the device functional failure to a semisoft error mode. it is highly desired for a moderately tolerant device.
However, the method involves proprietary logic and circuit design information, which cannot be presented here. Vll.
It was also confirmed at the layout level that many minimum-sized parasitic SCR structures were active for a powered, non-programmed device. Figure 10 shows the standby ( As the technology advances, the feature size shrinks. The smaller feature requires thinner field oxide (FOX), higher Pwell doping and lower Vcc. Thinner field oxide reduces the hole generation and trapping. Higher P-well doping (to alleviate the punch-through) increases the threshold voltage of the parasitic NMOSFET.
Lower Vcc for smaller geometry also reduces the hole trapping in the field oxide.  Vcs (V) Figure 11. IDs-V6s ofa 0.25_tm NMOSFET pre-and post-100krad(Si) irradiation Figure  11 shows the IDD-Vcs curves of a 0.25_m Nchannel transistor before and after 100krad(Si) irradiation.
The radiation source is the ARACOR 4100. Ion (IDs defined at Vcs = 0) increased only to 0. l nA after irradiation. programmed B4 device. Only static lcc was measured during irradiation by using a gamma cell. By the similar argument as stated in section VII, the non-programmed part had all the active junctions in well-defined states, thus many relevant parasitic NMOSFETs were active, and the results of radiation induced static Icc change should be very close to a programmed device. Figure 12 shows the static lcc versus the accumulated total dose. Using a static lcc spec of 20mA, the tolerance is approximately 50krad(Si). For single event effects, the 0.25_m SRAM based reprogrammable FPGA presented in this paper needs to be hardened for space applications. The critical issue is the SEU hardening of the configuration SRAM (CSRAM) to avoid functional failure (or interrupt). CSRAM hardening is particularly difficult because of the constraint of its large population and small size. The traditional methods including resistor and redundancy hardening have their limitations, and the penalties paid for hardening are higher for smaller feature size. At high LET ranges, the hardened CSRAMs will induce single event transient glitches, and may increase the soft error rate of the device. SEL appeared to be not a big issue for this particular device.
The data show that it at least has relatively good SEL tolerance (LETth >74MeV-cm2/mg).
The TID issue is alleviated when feature size shrinks in the sub-micron regime. Several process parameters changed for scaling, such as thinner field oxide, higher well/substrate doping, and lower Vcc helped to reduce the TID induced leakage current, and consequently improved the TID tolerance.
The present device has TID tolerance of approximately 50 to 100krad(Si).