Revisions to Conventional Clock Domain Crossing Methodologies in Triple Modular Redundant Circuits

Melanie Berg, AS&D in support of NASA/GSFC
Melanie.D.Berg@NASA.gov
Kenneth LaBel, NASA/GSFC

To be presented by Melanie Berg at the Hardened Electronics and Radiation Technology Conference, April 16-20, 2018, Tucson, AZ.
Acronyms

- Application specific integrated circuit (ASIC)
- Block random access memory (BRAM)
- Block Triple Modular Redundancy (BTMR)
- Clock (CLK or CLKB)
- Clock to output time \( t_{co} \)
- Collected charge \( Q_{coll} \)
- Combinatorial logic (CL)
- Computer aided design (CAD)
- Configurable Logic Block (CLB)
- Configuration cross section \( P_{configuration} \)
- Critical charge \( Q_{crit} \)
- Digital Signal Processing Block (DSP)
- Distributed triple modular redundancy (DTMR)
- Dual interlocked cell (DICE)
- Dual redundancy (DR)
- Edge-triggered flip-flops (DFFs)
- Energy (E)
- Equivalence Checking (EC)
- Error detection and correction (EDAC)
- Field programmable gate array (FPGA)
- Finite state machine (FSM)
- Flip flop (DFF)
- Frequency of capture domain B \( f_{clkB} \)
- Frequency of incoming data \( f_{DataA} \)
- Functional logic cross section \( P_{functionalLogic} \)
- Gate Level Netlist (EDF, EDIF, GLN)
- Hardware Description Language (HDL)
- Hold time \( t_h \)
- Input – output (I/O)
- Linear energy transfer (LET)
- Local triple modular redundancy (LTMR)
- Mean Time between failure (MTBF)
- NASA Electronic Parts and Packaging (NEPP)
- Negative doped with electrons (N\textsuperscript{+})
- Operational frequency \( f_s \)
- Power on reset (POR)
- Place and Route (PR)
- Positive doped with holes (P\textsuperscript{+})
- Radiation Effects and Analysis Group (REAG)
- Set up time \( t_{su} \)
- Single event functional interrupt (SEFI)
- Single event functional interrupt cross section \( P_{SEFI} \)
- Single event effects (SEEs)
- Single event latch-up (SEL)
- Single event transient (SET)
- Single event upset (SEU)
- Single event upset cross-section \( \sigma_{SEU} \)
- System cross section \( P(f_s)_{error} \)
- Time delay \( \tau_{dly} \)
- Voltage connected to positive rail \( V_{DD} \)
- Voltage connected to ground rail \( V_{SS} \)
Agenda

- Metastability
- Single Event Upsets (SEUs).
- Triple modular redundancy (TMR).
- Metastability filters and TMR.
Metastability

• **Cause:** Introducing an asynchronous signal into a synchronous (edge triggered) system... Or creating a combinatorial logic path that does not meet timing constraints.

• **Effect:**
  - Flip-flop (DFF) clock captures signal during window of vulnerability.
  - DFF output Hovers at a voltage level between high and low, causing the output transition to be delayed beyond the specified clock to output ($t_{CO}$) delay.

• **Probability** that the DFF enters a metastable state and the time required to return to a stable state varies on the process technology and on ambient conditions.

• **Generally** the DFF quickly returns to a stable state. However, the resultant stable state is not deterministic.
Metastability Timing Diagram (Destination DFF)

**Source DFF**

**Clock A**

**Destination DFF**

**Clock B**

**Destination DFF**

**Input**

**Output**

**Setup time:** $t_{su}$

**Hold time:** $t_h$

**Clock-to-Output:** $t_{co}$

**Cause:**

Input violates $t_{su}$

Metastable output settles to new value after $t_{co}$

**Effects:**

Metastable output settles to old value after $t_{co}$

To be presented by Melanie Berg at the Hardened Electronics and Radiation Technology Conference, April 16-20, 2018, Tucson, AZ.
Solution: Metastability Filter

- Incoming signal is clocked in Domain A.
- Destination signals are clocked in Domain B.
- Filter: Use a capture DFF and at least one protection DFF.
  - Both DFFs are clocked in the capture domain.
  - The first DFF is expected to go metastable.
  - The second DFF is used to protect the rest of the system from potential metastable output.
  - However, there is no guarantee that the second DFF will not also become metastable. Metastability filters have a mean time between failure (MTBF).
  - Depends on slack time \( t_{\text{slack}} \) between the metastability DFFs; process parameters \((c1\text{ and } c2)\); frequency of incoming data \(f_{\text{DataA}}\); and frequency of capture domain \(f_{\text{clkB}}\).

\[
MTBF = e^{\frac{t_{\text{slack}}}{c2}} \left(\frac{c1 \times f_{\text{DataA}} \times f_{\text{clkB}}}{c2}\right)
\]
Slack Time \((t_{\text{slack}})\) between Metastability DFFs

- Nets and combinatorial logic add delay.
- Delay reduces slack time.
- Metastability filter rule: no combinatorial logic between metastability filter DFFs; and connection net length must be minimized.

\[
MTBF = \frac{e^{t_{\text{slack}}/c2}}{c1 \times f_{\text{DataA}} \times f_{\text{clkB}}}
\]
Device Penetration of Heavy Ions and Linear Energy Transfer (LET)

- LET characterizes the deposition of charged particles.
- Based on average energy (E) loss per unit path length (x) (stopping power).
- Mass is used to normalize LET to the target material.

\[ LET = \frac{1}{\rho} \frac{dE}{dx} \cdot \text{MeV cm}^2 \text{mg}^{-1} \]

Density of target material Units

Single event transient (SET)
Single event upset (SEU)
How SEUs Affect FPGAs

- SEU and SET error signatures vary between FPGA devices:
  - Temporary glitch (transient)
  - Change of state (in correct state machine transitions)
  - Global upsets: Loss of clock or unexpected reset
  - Route breakage (no signal can get through)
  - Configuration corruption
  - Current jumps or increases (contention)

\[
P(f_s)_{\text{error}} \propto P_{\text{Configuration}} + P(f_s)_{\text{functionalLogic}} + P_{\text{SEFI}}
\]

System malfunction
Configuration SEU that causes malfunction
Sequential and Combinatorial logic (CL) events in data path
Glitches in global Routes and Hidden Logic

*Triple modular redundancy (TMR):* A common approach to SEU mitigation.
How To Insert TMR into A Design:

Output of synthesis is a gate-level netlist that represents the given HDL function.

Automated: TMR can be inserted during synthesis or post synthesis.

If inserted post synthesis, the gate level netlist is replicated, ripped apart, and voters + feedback are inserted.

TMR can be manually written into the HDL. Generally not done because too difficult.

HDL: Hardware description language

To be presented by Melanie Berg at the Hardened Electronics and Radiation Technology Conference, April 16-20, 2018, Tucson, AZ.
Various TMR Schemes: Different Topologies

Block diagram of block TMR (BTMR): a complex function containing combinatorial logic (CL) and flip-flops (DFFs) is triplicated as three black boxes; majority voters are placed at the outputs of the triplet.

Block diagram of local TMR (LTMR): only flip-flops (DFFs) are triplicated and data-paths stay singular; voters are brought into the design and placed in front of the DFFs.

Block Diagram of distributed TMR (DTMR): the entire design is triplicated except for the global routes (e.g., clocks); voters are brought into the design and placed after the flip-flops (DFFs). DTMR masks and corrects most single event upsets (SEUs).
• Synchronize all signals prior to usage in BTMR copies.
• This will require pulling out internal metastability filters contained in each copy.

All three copies share clk\textsubscript{B}
LTMR And Metastability

Domain A: $D\overline{F}\overline{F}$

Domain B: $D\overline{F}$

Domain A: $D\overline{F}\overline{F}$

Domain B: $D\overline{F}$

Domain B cannot use signal until it is synchronized.

LTMR: voter placed between metastability filters. Violation
LTMR And Metastability

\[ MTBF = \frac{e^{t_{\text{slack}}/c2}}{c1 \times f_{\text{DataA}} \times f_{\text{clkB}}} \]

Mean time between failure (MTBF)

C2 and C1 are process dependent constants.

\( f_{\text{clkB}} \) is the capture clock domain frequency.

\( f_{\text{DataA}} \) is the maximum data switching frequency.

Voter placed between metastability filters.

Violation

One solution is to remove the voters between metastability DFFs

Another solution is to include additional DFFs in the metastability filter (increase \( t_{\text{slack}} \))
Another solution is to include additional DFFs in the metastability filter (increase $t_{\text{slack}}$).

Wrong implementation: voters are in the metastability filter.

Delete Voters and tightly place DFFs... increase $t_{\text{slack}}$ ...Increases MTBF.
Summary

• Complex systems require multiple clock domains.
• In a synchronous design, metastability filters are required to reliably capture signals that source from separate clock domains.
• In order to reduce MTBF in metastability filters $t_{\text{slack}}$ must be minimized: no combinatorial logic and short routes between metastability DFFs.
• Automated TMR tools have not been handling metastability filters correctly.
• We show the update to TMR automated tools for the following TMR methodologies:
  – BTMR,
  – LTMR,
  – DTMR.