Acronyms

- Application specific integrated circuit (ASIC)
- Collected charge ($Q_{\text{coll}}$)
- Combinatorial logic (CL)
- Commercial off the shelf (COTS)
- Complementary metal-oxide semiconductor (CMOS)
- Critical charge ($Q_{\text{crit}}$)
- Device under test (DUT)
- Edge-triggered flip-flops (DFFs)
- Error rate ($\lambda$)
- Error rate per bit ($\lambda_{\text{bit}}$)
- Error rate per system ($\lambda_{\text{system}}$)
- Field programmable gate array (FPGA)
- Flip flop (DFF)
- Fluence ($\Phi$)
- Input – output (I/O)
- Intellectual Property (IP)
- Linear energy transfer (LET)
- Low cost digital tester (LCDT)
- Material density ($\rho$)
- Mean fluence to failure (MFTF)
- NASA Electronic Parts and Packaging (NEPP)
- Operational frequency ($f_s$)
- Personal Computer (PC)

- Probability of configuration upsets ($P_{\text{configuration}}$)
- Probability of Functional Logic upsets ($P_{\text{functionalLogic}}$)
- Probability of single event functional interrupt ($P_{\text{SEFI}}$)
- Probability of system failure ($P_{\text{system}}$)
- Processor (PC)
- Radiation Effects and Analysis Group (REAG)
- Reliability over fluence ($R(\Phi)$)
- Single event effect (SEE)
- Single event functional interrupt (SEFI)
- Single event latch-up (SEL)
- Single event transient (SET)
- Single event upset (SEU)
- Single event upset cross-section ($\sigma_{\text{SEU}}$)
- Shift register (SR)
- Voltage ($V_{\text{dd}}$)
- Windowed shift register (WSR)
- Xilinx Virtex 5 field programmable gate array (V5)
- Xilinx Virtex 5 field programmable gate array radiation hardened (V5QV)
Device Penetration of Heavy Ions and Linear Energy Transfer (LET)

- LET characterizes the deposition of charged particles.
- Based on average energy (E) loss per unit path length (x) (stopping power).
- Mass is used to normalize LET to the target material.

\[
LET = \frac{1}{\rho} \frac{dE}{dx} \cdot \frac{MeV}{mg} \quad \text{Units}
\]

Density of target material

Collected charge \( Q_{\text{coll}} \) > Critical Charge \( Q_{\text{crit}} \)

Off Transistor is Susceptible

\[ Q_{\text{coll}} > Q_{\text{crit}} \]
Characterizing Single Event Upsets (SEUs): Radiation Testing and SEU Cross Sections

SEU Cross Sections ($\sigma_{seu}$) characterize potential upsets that occur when a device is exposed to ionizing particles.

$$\sigma_{seu} = \frac{\# \text{errors}}{\text{fluence}}$$

Does simple error counting pertain to a complex system?...

Terminology:

- Flux: Particles/(sec-cm$^2$)
- Fluence: Particles/cm$^2$

$\sigma_{seu}$ is calculated at several LET values (particle spectrum)
FPGA Structure Categorization as Defined by NASA Goddard REAG

$\sigma_{SEU}$ Differentiation:

\[
P(f_s)_{error} \propto P_{\text{Configuration}} \sigma_{SEU} + P(f_s)_{\text{Functional Logic}} + P_{\text{SEFI}}
\]

Design $\sigma_{SEU}$  
Configurations $\sigma_{SEU}$  
Functional logic $\sigma_{SEU}$  
SEFI $\sigma_{SEU}$

Sequential and Combinatorial logic (CL) in data path
Global Routes and Hidden Logic

Test structures and various techniques target specific FPGA categories for $\sigma_{SEU}$ analysis
OVERVIEW OF UPDATES

• Academic versus mission specific single event effect (SEE) device evaluation
• SEE visibility enhancement during radiation testing
• Mean fluence to failure analysis (MFTF); i.e., testing flushable architectures versus non-flushable architectures
• Mission specific system-level single event upset (SEU) response prediction
• Heavy-ion energy and linear energy transfer (LET) selection
  • Proton versus heavy-ion testing
  • Fault injection
  • Intellectual property core (IP Core) test and evaluation
  • Unreliable design and its affects to SEE Data
• Mitigation evaluation (embedded and user-implemented)
• Single event latch-up (SEL) test and analysis
Academic versus Mission Specific Ground SEE Testing

• A distinction should be made regarding the purpose of data collection:
  – academic study for component-level SEE sensitivity; or
  – extrapolation for mission survivability predictions.

• A component level study will not be indicative of system behavior.
  – System topology considerations
  – Variation in transistor types
  – Co-dependencies between components
  – Electrical masking
  – Complexity of extrapolation from component to system

• Mission specific testing will be complex and will not cover full state space traversal.

• Benefiting from each of the pros to recover from cons: for FPGA test and evaluation, we propose testing a mixture of academic and mission specific.
Conventional Academic Testing: Long Chains of Inverters

- Testing long chains of inverters was a conventional method for evaluating combinatorial logic susceptibilities to single event transients (SETs).
- ASIC (lab-made) test structures showed elongation of SETs as they propagated through the inverter chain. This is misleading:
  - Test structures have unbalanced rise and fall times. This causes SET elongation.
  - Commercial ASIC circuits are created by experienced designers and are balanced; will not have the same response. **MISLEADING test results.**
- Commercial FPGA circuits are also balanced. No SET elongation.
- However, configuring long chains of inverters will cause too much noise in a FPGA design. Will cause catastrophic SEE test results.

**Long chains of inverters are noisy and are consequently not good design practice. They should not be used as test structures.**

Conventional Academic Testing: Long Chains of Flip-Flops (DFFs)

- The test structure is a long chain of DFFs connected serially; otherwise referred to as a shift-register (SR).
- **Pro:** Commonly used for measuring sequential logic SEUs in FPGAs.
- The number of DFFs is generally in the 100’s to 1000’s.
- Original SEU testing evaluated SRs that were purely sequential logic, i.e., only DFFs.
  - Currently, tests are also performed with combinatorial logic (CL) placed between the DFF stages.
  - Adding CL helps to analyze SET capture by DFFs.
- Due to I/O signal integrity issues, the SRs were also tested at very low frequencies.
  - Windowed shift registers can be reliably used to test at high frequencies.
Proposed Academic Testing Enhancements: Windowed Shift Registers

- Windowed output provides the option for high frequency testing without causing board-level signal integrity issues.
- All DFF nodes are observable by the tester.
- The inclusion of combinatorial logic facilitates evaluation of combinatorial logic effects, i.e., SET capture.
- Meets synchronous design requirements if all DFFs are connected to the same balanced clock tree.

Topology is still too simple to be the sole source of data extrapolation for a mission specific design.
Mission Specific Testing Considerations

• In order to predict mission reliability, it is best to analyze systems that closely resemble those that will be employed in the mission.
  – This requires the system-under-test have comparable complexity and maintain proper design topology.
• Challenge: mission-specific applications are complex systems that make SEU data collection challenging:
  – This is mostly because visibility into system circuitry and state space traversal are minimized per SEE test.
  – Data obtained during radiation testing can be misrepresentative.
  – Consequently the data might not correctly characterize SEU response per mission specific operational modes; and could lead to poor (and perhaps catastrophic) design implementations.
Proposed Enhancements to Mission Specific Testing

• Study system trends by parameter variation:
  – Investigate different test structures that vary in complexity;
  – Vary operational frequency and input patterns;
  – Force a variety of state-space traversal schemes per test;
  – Perform as many tests as possible;
• Increase visibility of internal circuits and their contributions to susceptibility.
• These actions help to identify dominant sources of error; and better extrapolate data to mission-specific systems.

Mission specific testing can provide data that better characterizes your target. However, visibility into DUT failure mechanisms is essential.
Mission Specific Testing: Increasing Visibility with Embedded Microprocessor Testing (1)

DUT: device under test

Halted
Error
Trace Instruction
Trace Valid Instruction
Trace Exception Taken
Trace Exception Kind
Trace Register Write
Trace Register Address
Trace data cache Request
Trace data cache Hit
Trace Data cache Ready
Trace Data cache Read
Trace Instruction cache Request
Trace Instruction cache Hit

TESTER

Send watchdog errors to host PC

PC

DUT: device under test

TESTER

Watchdogs

Mission Specific Testing: Increasing Visibility with Embedded Microprocessor Testing (2)

LCDT: Low cost digital tester

- Visibility was increased by isolating memory accesses as follows:
  - Moving the instruction and data storage to the LCDT for traffic observation.
  - Performing tests with and without cache to determine the influence cache has on upsets.

- Differentiating global upsets from the normal data set:
  - Helps to understand which upsets are prominent.
  - Gives insight to how the use of cache will affect $\sigma_{SEUs}$.

- Monitoring internal MicroBlaze™ signals
  - $\sigma_{SEUs}$ are not reliant on detecting erroneous memory read and writes anymore. Data are too limited and uninformative with solely relying on memory reads and writes.
  - Can now determine when a processor crashes and how.
Mission Specific Ground Testing and Mean Fluence to Failure (MFTF)

- Academic test circuits are flush through:
  - Faults occur and will be flushed through the circuit.
  - Can keep testing after the fault occurs.
  - Can use a counting metric of faults per particle exposure.
- Mission specific designs are complex and tend to crash upon fault.
  - They are not flush through circuits.
  - Test until fault occurs.
  - Proposed metric is MFTF.

\[
MFTF = \frac{1}{\sigma_{seu}}
\]

Goal: Predict System-Level SEU Reliability

• NEPP is investigating the application of classical reliability performance metrics combined with single event upset (SEU) empirical data to improve space application reliability prediction.

• Proposed methodology is being investigated in three phases:
  – Simplified proof-of-concept.
  – Omnidirectional effects of ions to system susceptibility.
  – Geometric limitations.
NEPP Proposed Prediction Methodology

- Calculate MFTF per LET (obtain SEU data via ground testing).
- Create a histogram of particle flux versus LET for the mission’s required time-window in the expected target environment.
- Note: Each bin’s maximum LET ($LET_{\text{max}}$) is a ground test point; and has an associated MFTF.
- Graph reliability across fluence ($R(\Phi)$) for each of the LET test points and their associated MFTFs. $R(\Phi) = e^{-\Phi/MFTF}$
- Each LETmax is associated with a bin of particle fluence for a given time window. Use this fluence to determine the reliability for each bin.
- Analyze the reliabilities across all bins.
Determining Expected Reliability using MFTF and Space Data Particle Fluence

\[ R(\Phi) = e^{-\Phi/3.0 \times 10^7} \]

Reliability calculation for the first bin of particle flux. Expected number of particles is approximately 3000. \( R(\phi) > 4-9\)'s.
Selection of LET for Ground Heavy-Ion SEE Testing

• The proposed methodology requires careful LET selection during ground testing.
• This is especially true for commercial (or sensitive) devices.
• Because of the high particle counts at low LETs, it is best to reduce the size of the histogram bins. Hence, tests should be performed at as many low LET values as possible.
• When possible, test at different energies to obtain similar LETs. Take note – SEU response should be statistically equivalent.
• Test at different angles to achieve similar LETs. SEU response should be statistically equivalent.
• When effective LETs do not provide statistically accurate SEU responses, geometrical device specifics need to be investigated.
Summary

• In 2012, NASA Electronic Parts and Packaging (NEPP) developed a robust test and analysis (hardness assurance) methodology for FPGA component evaluation and SEE data application.

• Since 2012, FPGA circuit complexity has increased exponentially.

• With the combination of complexity management and years of lessons learned material, the documentation is currently being updated.

• This presentation highlights a select portion of the guideline updates.
Acknowledgements

- This work has been sponsored by the NASA Electronic Parts and Packaging (NEPP) Program.
- Thanks is given to the NASA Goddard Radiation Effects and Analysis Group (REAG) for their technical assistance and support.

Contact Information:
Melanie Berg: NASA Goddard REAG FPGA
Principal Investigator:
Melanie.D.Berg@NASA.GOV