

# Notional 1FT Voting Architecture with Time-Triggered Ethernet

AES A&S Tag-Up Meeting 7 November 2016

Andrew Loveless (NASA JSC) andrew.loveless@nasa.gov

# **General Overview**



- 1-Byzantine resilient C&DH system (fail-operational).
  - Uses triplex onboard computers (OBCs) executing identical flight software.
  - >1FT relies on sparing and crew intervention (e.g. independent backup).
- Assumes classical reliability requirement of 10<sup>-9</sup> failures/hour.
- Realizable with currently available COTS technology.\*
  - E.g. Can be implemented using a variety of SBCs and real-time OSs.
- Scalable fault tolerance (both in classification and quantity).
  - E.g. Through additional network planes, high-integrity devices, etc.
- Assumes full cross strapping between OBCs, network switches, and end devices/subsystems (e.g. RIUs, IMUs, MBSUs).
  - Minimizes number of 2-fault combinations which can cause system failure.
  - Prioritizes high data availability and architectural flexibility over low SWaP.
- Redundant Time-Triggered Ethernet network used for data exchange and synchronization between computing platforms.
  - Eliminates need for independent Cross-Channel Data Link (CCDL).

\* This presentation proposes the use of TTTech's rad-hard space ASIC (available Q3 2017).



#### **Different Fault Classifications (there is overlap)**

|  | Fault Type | Description                                                                                                                  | System                                                              |
|--|------------|------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------|
|  | Fail-Stop  | <ul><li>The node does not produce any output.</li><li>E.g. Process halts before "send to all".</li></ul>                     | Failover/Standby                                                    |
|  | Crash      | <ul><li>The node does not produce any output.</li><li>Can remain undetected by good nodes.</li></ul>                         | N-Modular<br>Redundancy<br>(synchronized majority<br>voting system) |
|  | Omission   | Follows algorithm, but messages are lost.                                                                                    |                                                                     |
|  | Value      | Node produces incorrect computation result.                                                                                  |                                                                     |
|  | Timing     | Outputs are delivered too early or too late.<br>• I.e. Node does not meet temporal specifications.                           |                                                                     |
|  | Symmetric  | <ul><li>Peers see the fault manifest in the same way.</li><li>E.g. Node send arbitrary data to all or nobody.</li></ul>      |                                                                     |
|  | Byzantine  | <ul><li>Peers see the fault manifest in different ways.</li><li>E.g. Node sends different data to different peers.</li></ul> | Byzantine<br>Agreement                                              |

More severe

# **Ensuring Input Data Consistency**



#### Where does byzantine tolerance matter? Agreeing on input data

- **Problem:** single source (internal or external) distribution to multiple receivers.
- In our case, the input seen by each redundant processor <u>must be bitwise</u> <u>identical</u> i.e. have *interactive consistency*.
- Why? If all processors get the same input, then all non-faulty processors are guaranteed to produce identical output.
  - > Can be used to ID faulty processors and resolve commands sent by the OBCs.

#### Consensus versus Correctness

- A faulty input device may provide arbitrary input data to the OBCs.
- The purpose is to guarantee all OBCs have the same view of the system, and can therefore decide on the same input value.
  - > I.e. the IC exchange guarantees consensus, but not that the input is "correct".
- If an accurate input value is important, you need redundant input devices.

#### Avoiding hardware shortcuts

- It is tempting to try circumventing the problem through increased connectivity.
  - > E.g. Trying to ensure all OBCs read some input data from the same shared wire.
- However, a faulty device may transmit a marginal signal that may be interpreted as different values by different OBCs.

### **Rules for Interactive Consistency**



#### What is an interstage?

- An interstage is an FCR that participates in the interactive consistency exchange, but does not require consensus.
- The purpose of an interstage is to provide the necessary functionality to perform byzantine agreement algorithms without requiring all FCRs to be full processors.

### Rules for interactive consistency in 1FT voting systems:

- Requires  $\geq 3(1) + 1 = 4$  Fault Containment Regions (FCRs).
- Each interstage must receive data through  $\geq 1$  disjoint paths.
- Devices requiring consensus get data from  $\geq 2(1) + 1 = 3$  disjoint paths.
- Above must be satisfied in (1) + 1 = 2 rounds of data exchange.
- After data exchange, devices requiring consensus perform an absolute majority vote of received messages.

Cross-Channel Data Link (CCDL)



NASA

#### **General Overview**

- A 1FT design can be ٠ realized with either:
  - 1. 4 full processors/OBCs
  - 2. 3 OBCs + 1 interstage
- End devices are networked directly ٠ to one of the OBCs via a bus.
- Fully channelized design ٠ Each OBC has access only to devices on its own local bus.
- Requires independent CCDL for data exchange ٠ and synchronization (or an external reference).
- Meeting Requirements
- $\geq$  3(1) + 1 FCRs? **Yes** each OBC/interstage + its CCDL links (4 FCRs total).
- $\geq$  2(1) + 1 disjoint paths b/w FCRs? **Yes**
- (1) + 1 rounds of data exchange? Yes performed in succession over the CCDL.
- (4 1) + 4(4 1) = 15 msgs per exchange.
- Examples
- NASA X-38, LM X-33, NASA Ares I, ULA Delta IV.



#### General Overview Cross-Channel Data Link (CCDL) A 1FT design can be Redundant external realized with either: timing reference 1. 4 full processors/OBCs 2. 3 OBCs + 1 interstage End devices are networked directly to one of the OBCs via a bus. **CCDL** Interface **CCDL** Interface **CCDL** Interface **CCDL** Interface Fully channelized design – • Each OBC has access only Interstage OBC1 OBC2 OBC3 to devices on its own local bus. Requires independent CCDL for data exchange • **Bus Interface Bus Interface Bus Interface** and synchronization (or an external reference). Meeting Requirements $\geq$ 3(1) + 1 FCRs? **Yes** - each OBC/interstage COM1 COM<sub>2</sub> + its CCDL links (4 FCRs total). Bus Bus Bus PDU1 PDU2 PDU3 • $\geq 2(1) + 1$ disjoint paths b/w FCRs? Yes Channel Channel Channel (1) + 1 rounds of data exchange? Yes – RIU1 RIU2 performed in succession over the CCDL. ω C ⊳ (4 - 1) + 4(4 - 1) = 15 msgs per exchange. IMU1 IMU<sub>2</sub> IMU3 Examples NASA X-38, LM X-33, NASA Ares I, ULA Delta IV.



### Channelized Bus – Reading Data (1)

#### Step 1: Read data **Cross-Channel Data Link (CCDL)** OBCs 1-3 reads data Redundant external from local input device. timing reference > No guarantee data agrees. Step 2: Exchange OBCs 1-3 send their initial values to OBCs 1-3 + interstage. **CDL** Interface **CCDL** Interface **CCDL** Interface **CCDL** Interface > An OBC may "lie" arbitrarily to its Inte OBC1 OBC2 OBC3 peers (results in an asymmetric view). Step 3: Exchange (Rd 2) **Bus Interface Bus Interface Bus Interface** OBCs 1-3 + interstage send round 1 data 59 60 56 to all OBCs 1-3 (round 2). > Still, any FCR 1-4 could fail asymmetrically. COM1 · COM<sub>2</sub>-Step 4: Create symmetry PDU1-PDU2 PDU3 OBCs 1-3 performs majority voting of round 2 data to "correct" round 1 data (non-faulty RIU1 RIU2 OBCs now share the same IC vector). Step 5: Make a decision IMU<sub>2</sub> IMU1 IMU3 OBCs 1-3 execute a choice() function to select a final value (e.g. median, mean).

#### Andrew Loveless (NASA JSC/EV2)

### Channelized Bus – Reading Data (2)

#### Cross-Channel Data Link (CCDL) Redundant external timing reference **R1 R1 R1** R1 56 56 56 56 59 Х Ζ 60 60 60 60 **CCDL** Interface **CCDL** Interface CCDL Interface **CCDL** Interface Interstage OBC1 OBC2 OBC3 **Bus Interface Bus Interface Bus Interface** COM1 COM<sub>2</sub> PDU1 PDU2 PDU3 RIU1 RIU2 IMU1 IMU<sub>2</sub> IMU<sub>3</sub>

#### Step 1: Read data

- OBCs 1-3 reads data from local input device.
- > No guarantee data agrees.
- Step 2: Exchange
- OBCs 1-3 send their initial values to OBCs 1-3 + interstage.
- > An OBC may "lie" arbitrarily to its peers (results in an asymmetric view).
- Step 3: Exchange (Rd 2)
- OBCs 1-3 + interstage send round 1 data to all OBCs 1-3 (round 2).
- > Still, any FCR 1-4 could fail asymmetrically.
- Step 4: Create symmetry
- OBCs 1-3 performs majority voting of round 2 data to "correct" round 1 data (non-faulty OBCs now share the same IC vector).
- Step 5: Make a decision
- OBCs 1-3 execute a choice() function to select a final value (e.g. median, mean).



### Channelized Bus – Reading Data (3)

#### Step 1: Read data Cross-Channel Data Link (CCDL) OBCs 1-3 reads data Redundant external from local input device. timing reference > No guarantee data agrees. R1 R2 R1 R2 **R1 R2** Step 2: Exchange 56 56 56 ... ... 59 Ζ ... . . . OBCs 1-3 send their initial 60 60 60 ... ... values to OBCs 1-3 + intersta **CCDL** Interface CCDL Interface CCDL Interface **CCDL** Interface > An OBC may "lie" arbitrarily to Interstage OBC1 OBC2 OBC3 peers (results in an asymmetric view). Step 3: Exchange (Rd 2) **Bus Interface Bus Interface Bus Interface** OBCs 1-3 + interstage send round 1 data to all OBCs 1-3 (round 2). > Still, any FCR 1-4 could fail asymmetrically. COM1 COM<sub>2</sub> Step 4: Create symmetry PDU1 PDU2 PDU3 OBCs 1-3 performs majority voting of round 2 data to "correct" round 1 data (non-faulty RIU1 RIU2 OBCs now share the same IC vector). Step 5: Make a decision IMU1 IMU<sub>2</sub> IMU<sub>3</sub> • OBCs 1-3 execute a choice() function to

Andrew Loveless (NASA JSC/EV2)

select a final value (e.g. median, mean).

### Channelized Bus – Reading Data (4)



## Channelized Bus – Reading Data (5)



## Channelized Bus – Commanding (1)

Redundant external

timing reference

#### Step 1: Prepare Command

- After computation, OBCs 1-3 each generate a command.
- > All non-faulty OBCs agree.
- Step 2: Exchange
- OBCs 1-3 send their output values to OBCs 1-3.
- Again, an OBC may "lie" arbitrarily to its peers (results in an asymmetric view).
- > This behavior is tolerated, since the non-faulty OBCs do not need to have consensus on the entire view of the system.
- Step 3: Majority Vote
- Each OBCs 1-3 performs a majority vote to correct its initial output value.
- Process can be used to detect faulty OBCs and initiate fault recovery or system reconfiguration.
- Step 4: Transmit Command
- OBCs 1-3 send the command to the output device connected to their local bus.



Cross-Channel Data Link (CCDL)

NASA (RES

## Channelized Bus – Commanding (2)

#### Step 1: Prepare Command **Cross-Channel Data Link (CCDL)** After computation, Redundant external **OBCs 1-3 each** timing reference generate a command. **R1 R1 R1** > All non-faulty OBCs agree. 37 37 37 Х Step 2: Exchange Ζ 37 37 37 OBCs 1-3 send their output CCDL Interface CCDL Interface **CCDL** Interface **CCDL** Interface values to OBCs 1-3. Interstage OBC1 OBC2 OBC3 > Again, an OBC may "lie" arbitrarily to its peers (results in an asymmetric view). **Bus Interface Bus Interface Bus Interface** > This behavior is tolerated, since the non-faulty OBCs do not need to have consensus on the entire view of the system. COM1 COM<sub>2</sub> Step 3: Majority Vote Each OBCs 1-3 performs a majority vote PDU1 PDU2 PDU3 to correct its initial output value. Process can be used to detect faulty OBCs and RIU1 RIU2 initiate fault recovery or system reconfiguration. Step 4: Transmit Command IMU1 IMU<sub>2</sub> IMU<sub>3</sub> OBCs 1-3 send the command to the output device connected to their local bus.

#### Andrew Loveless (NASA JSC/EV2)

## Channelized Bus – Commanding (3)



## Channelized Bus – Commanding (4)

#### Step 1: Prepare Command Cross-Channel Data Link (CCDL) After computation, Redundant external **OBCs 1-3 each** timing reference generate a command. > All non-faulty OBCs agree. Step 2: Exchange OBCs 1-3 send their output CCDL Interface CCDL Interface **CCDL** Interface **CCDL** Interface values to OBCs 1-3. Interstage OBC1 OBC2 OBC3 > Again, an OBC may "lie" arbitrarily to its peers (results in an asymmetric view). **Bus Interface Bus Interface Bus Interface** > This behavior is tolerated, since the non-faulty 37 37 37 OBCs do not need to have consensus on the entire view of the system. COM1 COM<sub>2</sub> Step 3: Majority Vote Each OBCs 1-3 performs a majority vote PDU1 PDU2 PDU3 to correct its initial output value. > .Process can be used to detect faulty OBCs and RIU1 RIU2 initiate fault recovery or system reconfiguration Step 4: Transmit Command IMU1 IMU<sub>2</sub> IMU<sub>3</sub> OBCs 1-3 send the command to the output device connected to their local bus.

### Channelized Bus – Detailed Exchange



### Channelized Bus – Detailed Exchange



### Channelized Bus – Detailed Exchange



#### Create Symmetry - Majority Voting

On OBCs 1-3, each element in the interactive consistency (IC) vector is set to the strict majority of its children.

 $\rightarrow$  I.e. OBCs 1,3 must agree on data from OBC 2.

#### Making a Decision

On OBCs 1-3, a choice() function is used to determine a final value from those contained in the IC vector.

 $\rightarrow$  E.g. a mid-value selection.



### IC Exchange – Alternate Viewpoint (1)

- "Flattening" the classical two-round exchange
  - · Can be analyzed as messaging over redundant paths (from different FCRs).
  - Makes it easier to see why 4 FCRs and 3 disjoint paths are necessary.



### IC Exchange – Alternate Viewpoint (2)

NASA RES

- "Flattening" the classical two-round exchange
  - Can be analyzed as messaging over redundant paths (from different FCRs).
  - Makes it easier to see why 4 FCRs and 3 disjoint paths are necessary.



Andrew Loveless (NASA JSC/EV2)

Approved for Public Release – No Export Controlled Data

### Generalizing use of interstages (1)

#### Example 1:

- Four total FCRs
- Two interstages
- Two devices require consensus

#### Rules for IC in 1FT voting systems:

- Requires  $\geq 3(1) + 1 = 4$  FCRs.
- Interstages need data from ≥1 paths.
- Devices requiring consensus need data from ≥ 2(1) + 1 = 3 disjoint paths.
- Two rounds of data exchange.
- Devices requiring consensus perform majority vote over received messages.
- O Device requiring consensus
- Interstage (does not require consensus)
- Designates originating device
- Designates faulty device

**Assumption:** Any device <u>may fail arbitrarily</u> (omission, symmetric, asymmetric, byzantine).



### Generalizing use of interstages (2)

#### Example 2:

- Five total FCRs
- Three interstages
- Two devices require consensus

#### Rules for IC in 1FT voting systems:

- Requires  $\geq 3(1) + 1 = 4$  FCRs.
- Interstages need data from ≥1 paths.
- Devices requiring consensus need data from ≥ 2(1) + 1 = 3 disjoint paths.
- Two rounds of data exchange.
- Devices requiring consensus perform majority vote over received messages.
- O Device requiring consensus
- Interstage (does not require consensus)
- Designates originating device
- Designates faulty device

**Assumption:** Any device <u>may fail arbitrarily</u> (omission, symmetric, asymmetric, byzantine).



### Switched Triplex (Fully Cross-strapped)



NASA

# **High-Integrity Devices in TTEthernet**

### High-Integrity Design

- Command/Monitor (COM/MON) design aims for error containment within the device.
  - Contains two fault containment regions.
- Input is forwarded to both COM and MON.
- Congruency exchange ensures both COM and MON have identical input data (i.e. IC).
- Both COM and MON process data in parallel.
- Output from COM is forwarded to MON.
- If disagreement, MON terminates the transmission.

### Device Failure Assumptions

- Standard devices may be subject to *byzantine* failures.
  - Device may send arbitrary messages (of any contents).
  - Device may transmit messages at arbitrary points in time.
  - Device may send different messages through different network planes (channels).
- High-Integrity devices may be subject to inconsistent omission failures.
  - Faulty device will not create (nor modify existing to produce) a new valid message.
  - Device may drop or fail to receive an arbitrary number of messages.
  - Device may fail to relay messages asymmetrically <u>some receivers may not get data</u>.



COM MON listen in listen out intercept Note that a transmitted IN OUT message may be truncated - the receiver rejects the message.

### **Rules for Interactive Consistency**



#### What is an interstage?

- An interstage is an FCR that participates in the interactive consistency exchange, but does not require consensus.
- The purpose of an interstage is to provide the necessary functionality to perform byzantine agreement algorithms without requiring all FCRs to be full processors.

### Rules for interactive consistency in 1FT voting systems:

- Requires  $\geq 3(1) + 1 = 4$  Fault Containment Regions (FCRs).
- Each interstage must receive data through  $\geq 1$  disjoint paths.
- Devices which require consensus must get data from:
  - $1. \geq 2(1) + 1 = 3$  standard-integrity devices, or
  - II.  $\geq$  (1) + 1 = 2 high-integrity devices, or
  - III. A combination of the above
- Above must be satisfied in (1) + 1 = 2 rounds of data exchange.
- After data exchange, devices requiring consensus perform an absolute majority vote of received messages.

## Generalizing use of (HI) interstages

#### Example 3:

- Six total FCRs
- Two HI interstages
- Two devices require consensus

#### Rules for IC in 1FT voting systems:

- Requires  $\geq 3(1) + 1 = 4$  FCRs.
- Interstages need data from ≥1 paths.
- Devices requiring consensus need data:
  - I. from  $\geq 2(1) + 1 = 3$  LI devices
  - II. from from  $\geq$  (1) + 1 = 2 HI devices
  - III. from a combination of the above
- Two rounds of data exchange.
- Majority vote used to reach consensus.
- Device requiring consensus
  LI Interstage (does not require consensus)
  HI interstage (does not require consensus)
  Designates originating device
  Designates faulty device
  Assumption: LI devices may fail arbitrarily, HI devices may fail via inconsistent omission.





## Generalizing use of (HI) interstages

#### Example 4:

- Six total FCRs
- One HI + two LI interstages
- Two devices require consensus

#### Rules for IC in 1FT voting systems:

- Requires  $\geq 3(1) + 1 = 4$  FCRs.
- Interstages need data from ≥1 paths.
- Devices requiring consensus need data:
  - I. from  $\geq 2(1) + 1 = 3$  LI devices
  - II. from from  $\geq$  (1) + 1 = 2 HI devices
  - III. from a combination of the above
- Two rounds of data exchange.
- Majority vote used to reach consensus.
- Device requiring consensus
  LI Interstage (does not require consensus)
  HI interstage (does not require consensus)
  Designates originating device
  Designates faulty device
  Assumption: LI devices may fail arbitrarily, HI devices may fail via inconsistent omission.





# Switched Triplex (Fully Cross-strapped)

#### General Overview

- Scalable 1FT design can be realized with:
  - 3 full processors/OBCs
  - 2-3 redundant network planes (interstages).
  - Majority voting of redundant messages.
- Fully-cross strapped design each OBC has access to any networked device.
- Time-Triggered Ethernet network provides data distribution and synchronization between platforms.
  - Does not require separate CCDL or timing/synchronization hardware.
- Triplex OBCs do not directly interface to any end devices (insulated by network).
- Device Characteristics
- COM/MON switches, standard integrity ESs.
- Error Containment Unit b/w switch ingress/egress.
- Switches provide 1FT or 2FT availability depending on number of channels.
- COM/MON switches required as trusted Compression Masters (CM) for sync.
- HI switches cannot protect against valid frames containing erroneous data.



NASA

#### Andrew Loveless (NASA JSC/EV2)

# Switched Triplex (Fully Cross-strapped)

#### General Overview

- Scalable 1FT design can be realized with:
  - 3 full processors/OBCs
  - 2-3 redundant network planes (interstages).
  - Majority voting of redundant messages.
- Fully-cross strapped design each OBC has access to any networked device.
- Time-Triggered Ethernet network provides data distribution and synchronization between platforms.
  - Does not require separate CCDL or timing/synchronization hardware.
- Triplex OBCs do not directly interface to any end devices (insulated by network).
- Device Characteristics
- COM/MON switches, standard integrity ESs.
- Error Containment Unit b/w switch ingress/egress.
- Switches provide 1FT or 2FT availability depending on number of channels.
- COM/MON switches required as trusted Compression Masters (CM) for sync.
- HI switches cannot protect against valid frames containing erroneous data



NASA

## Switched Triplex (Dual-Channel)



- A 1FT configuration requiring only two network planes is possible only if switches are fully self-checking (fail-silent).
- A restricted failure mode model requires the realization of two independent FCRs.
  - Inconsistent omission is a reduced model.
- Must eliminate common mode elements:
  - E.g. Shared timer, dielectric isolation, physical space, temperature.
- If the switch may fail arbitrarily, then three redundant channels are always required.
- In all cases, <u>3x channels minimizes number</u> of two-fault combinations resulting in system failure over 2x channels.

#### Current Implementation

- TTTech COM/MON devices share power (with separate power monitor).
- A shared oscillator is used for COM/MON, with a dedicated clock monitor to prevent common mode clock failures.

Fault-propagation from switches <u>theoretically</u> requires dual-correlated simultaneous faults.

→ 1e-6 X 1e-6 = ~1e-12 failures/hour



### Switched Triplex – Reading Data (1)

#### Step 1: Exchange (Round 1)

- Each redundant input device (any #) transmits its data to switches 1-3.
- > No guarantee non-faulty devices agree.
- > A failed device may transmit arbitrarily.
- Step 2: Exchange (Round 2)
- Switches 1-3 send each redundant input message to all OBCs 1-S.
- Step 3: Create symmetry
- OBCs 1-3 performs a majority vote of the message copies received from each redundant network channel.
- > Messages that violate the protocol are dropped.
- Majority <u>must be determined according to number</u> of messages received (i.e. not static 2/3).
- > Non-faulty OBCs now share the same IC vector.
- Step 4: Make a decision
- OBCs 1-3 execute a choice() function to select a final value from the redundant input devices (e.g. median, mean).



### Switched Triplex – Reading Data (2)

#### Step 1: Exchange (Round 1)

- Each redundant input device (any #) transmits its data to switches 1-3.
- > No guarantee non-faulty devices agree.
- > A failed device may transmit arbitrarily.

#### Step 2: Exchange (Round 2)

- Switches 1-3 send each redundant input message to all OBCs 1-3.
- Step 3: Create symmetry
- OBCs 1-3 performs a majority vote of the message copies received from each redundant network channel.
- > Messages that violate the protocol are dropped.
- Majority <u>must be determined according to number</u> <u>of messages received</u> (i.e. not static 2/3).
- > Non-faulty OBCs now share the same IC vector.
- Step 4: Make a decision
- OBCs 1-3 execute a choice() function to select a final value from the redundant input devices (e.g. median, mean).



### Switched Triplex – Reading Data (3)

#### Step 1: Exchange (Round 1)

- Each redundant input device (any #) transmits its data to switches 1-3.
- > No guarantee non-faulty devices agree.
- > A failed device may transmit arbitrarily.
- Step 2: Exchange (Round 2)
- Switches 1-3 send each redundant input message to all OBCs 1-3.
- Step 3: Create symmetry
- OBCs 1-3 performs a majority vote of the message copies received from each redundant network channel.
- > Messages that violate the protocol are dropped.
- Majority <u>must be determined according to number</u> <u>of messages received</u> (i.e. not static 2/3).
- > Non-faulty OBCs now share the same IC vector.
- Step 4: Make a decision
- OBCs 1-3 execute a choice() function to select a final value from the redundant input devices (e.g. median, mean).



Andrew Loveless (NASA JSC/EV2)

# Switched Triplex – Reading Data (4)

### Step 1: Exchange (Round 1)

- Each redundant input device (any #) transmits its data to switches 1-3.
- > No guarantee non-faulty devices agree.
- > A failed device may transmit arbitrarily.
- Step 2: Exchange (Round 2)
- Switches 1-3 send each redundant input message to all OBCs 1-3.
- Step 3: Create symmetry
- OBCs 1-3 performs a majority vote of the message copies received from each redundant network channel.
- > Messages that violate the protocol are dropped.
- Majority <u>must be determined according to number</u> <u>of messages received</u> (i.e. not static 2/3).
- > Non-faulty OBCs now share the same IC vector.
- Step 4: Make a decision
- OBCs 1-3 execute a choice() function to select a final value from the redundant input devices (e.g. median, mean).



# Switched Triplex – Commanding (1)

### Step 1: Prepare Command

- After performing computation, OBCs
  1-3 each generate a command.
- > All non-faulty OBCs agree on the output.
- Step 2: Exchange (Round 1)
- Each OBC 1-3 transmits its output value to all switches 1-3.
- Step 3: Exchange (Round 2)
- Switches 1-3 send each input message to all redundant output devices (any #).
- Step 4: Create symmetry
- Each output device performs a majority vote of messages received from each channel.
- > This IC exchange is required to ensure consensus of multiple output devices in case of one OBC.
- Step 5: Make a decision
- Each output device performs a second majority vote over the commands from each OBC.
- I.e. the choice() function for output devices is always a bitwise majority.



# Switched Triplex – Commanding (2)

### Step 1: Prepare Command

- After performing computation, OBCs
  1-3 each generate a command.
- > All non-faulty OBCs agree on the output.

### Step 2: Exchange (Round 1)

- Each OBC 1-3 transmits its output value to all switches 1-3.
- Step 3: Exchange (Round 2)
- Switches 1-3 send each input message to all redundant output devices (any #).
- Step 4: Create symmetry
- Each output device performs a majority vote of messages received from each channel.
- > This IC exchange is required to ensure consensus of multiple output devices in case of one OBC.
- Step 5: Make a decision
- Each output device performs a second majority vote over the commands from each OBC.
- I.e. the choice() function for output devices is always a bitwise majority.



# Switched Triplex – Commanding (3)

### Step 1: Prepare Command

- After performing computation, OBCs
  1-3 each generate a command.
- > All non-faulty OBCs agree on the output.
- Step 2: Exchange (Round 1)
- Each OBC 1-3 transmits its output value to all switches 1-3.
- Step 3: Exchange (Round 2)
- Switches 1-3 send each input message to all redundant output devices (any #).
- Step 4: Create symmetry
- Each output device performs a majority vote of messages received from each channel.
- > This IC exchange is required to ensure consensus of multiple output devices in case of one OBC.
- Step 5: Make a decision
- Each output device performs a second majority vote over the commands from each OBC.
- I.e. the choice() function for output devices is always a bitwise majority.



# Switched Triplex – Commanding (4)

### Step 1: Prepare Command

- After performing computation, OBCs
  1-3 each generate a command.
- > All non-faulty OBCs agree on the output.
- Step 2: Exchange (Round 1)
- Each OBC 1-3 transmits its output value to all switches 1-3.
- Step 3: Exchange (Round 2)
- Switches 1-3 send each input message to all redundant output devices (any #).
- Step 4: Create symmetry
- Each output device performs a majority vote of messages received from each channel.
- This IC exchange is required to ensure consensus of multiple output devices in case of one OBC.
- Step 5: Make a decision
- Each output device performs a second majority vote over the commands from each OBC.
- I.e. the choice() function for output devices is always a bitwise majority.



Andrew Loveless (NASA JSC/EV2)

# Switched Triplex – Commanding (5)

### Step 1: Prepare Command

- After performing computation, OBCs
  1-3 each generate a command.
- > All non-faulty OBCs agree on the output.
- Step 2: Exchange (Round 1)
- Each OBC 1-3 transmits its output value to all switches 1-3.
- Step 3: Exchange (Round 2)
- Switches 1-3 send each input message to all redundant output devices (any #).
- Step 4: Create symmetry
- Each output device performs a majority vote of messages received from each channel.
- > This IC exchange is required to ensure consensus of multiple output devices in case of one OBC.
- Step 5: Make a decision
- Each output device performs a second majority vote over the commands from each OBC.
- I.e. the choice() function for output devices is always a bitwise majority.



# Switched Triplex – Monitoring (1)

### Step 1: Prepare Command

- After performing computation, OBCs
  1-3 each generate a command.
- > All non-faulty OBCs agree on the output.
- Step 2: Exchange (Round 1)

### ↓ Happening Simultaneously

- Step 3: Exchange (Round 2)
- Switches 1-3 send each input message "reflected" back to each OBC 1-3.
- Why? Allows CFS app to monitor OBCs for the purpose of fault detection and reconfiguration.
- Step 4: Create symmetry
- Each OBC 1-3 performs a majority vote of messages received from each channel.
- Step 5: Identify faulty OBC
- OBCs 1-3 perform a majority vote over the commands from each OBC.
- > Identical to action performed by OUT 1-3.
- Can be used to identify OBCs that do not agree with the majority (for FDIR).



#### Andrew Loveless (NASA JSC/EV2)

# Switched Triplex – Monitoring (2)

### Step 1: Prepare Command

- After performing computation, OBCs
  1-3 each generate a command.
- > All non-faulty OBCs agree on the output.
- Step 2: Exchange (Round 1)

### ↓ Happening Simultaneously

- Step 3: Exchange (Round 2)
- Switches 1-3 send each input message "reflected" back to each OBC 1-3.
- Why? Allows CFS app to monitor OBCs for the purpose of fault detection and reconfiguration.
- Step 4: Create symmetry
- Each OBC 1-3 performs a majority vote of messages received from each channel.
- Step 5: Identify faulty OBC
- OBCs 1-3 perform a majority vote over the commands from each OBC.
  - > Identical to action performed by OUT 1-3.
  - Can be used to identify OBCs that do not agree with the majority (for FDIR).



CM

# Switched Triplex – Monitoring (3)

### Step 1: Prepare Command

- After performing computation, OBCs
  1-3 each generate a command.
- > All non-faulty OBCs agree on the output.
- Step 2: Exchange (Round 1)

### ↓ Happening Simultaneously

- Step 3: Exchange (Round 2)
- Switches 1-3 send each input message "reflected" back to each OBC 1-3.
- Why? Allows CFS app to monitor OBCs for the purpose of fault detection and reconfiguration.
- Step 4: Create symmetry
- Each OBC 1-3 performs a majority vote of messages received from each mannel.
- Step 5: Identify faulty OBC
- OBCs 1-3 perform a majority vote over the commands from each OBC.
- > Identical to action performed by OUT 1-3.
- Can be used to identify OBCs that do not agree with the majority (for FDIR).



# Side Note – Sharing between OBCs



- When sharing a value between OBCs (e.g. output monitoring, shared state), the original sender <u>cannot use its value directly</u>.
- Instead, it performs a majority vote of the values reflected back from the switches (i.e. IC).
   Round 1
- This ensures consensus in case of an arbitrary transmission error.





Х

5

Approved for Public Release – No Export Controlled Data Х

Χ

Χ

# Network-Level IC Advantages



### Network-Level IC = no host blocking

- Consensus between multiple receivers can be achieved transparent to the flight software (no impact on CFS).
- If you read a value, you already know it is the voted answer from a two round exchange – consistent across all receivers (1FT).
- Eliminates classical "acceptance window" for exchanges.
- No need for "read, send, wait ... read, send, etc."
- Minimizes use of host resources (especially if in NIC).



### The Role of the Remote Interface Unit (RIU)

- The RIU acts as a gateway between the TTE network, analog devices, and legacy buses (e.g. MIL-STD-1553, ARINC 429).
- Moves signal conditioning closer to sensor/effectors, reducing noise and wiring mass.
- Functions it may implement include A/D conversion, network formatting, range checking, scaling, linearization, and threshold/filter services specific to each subsystem.
- Uses configuration files to map local buffer data to TTE dataports.



#### Approach 1:

- One RIU
- One sensor

#### **Problems?**

• Sensor data sent to RIU may be wrong.

#### The Fix:

 Add redundant sensors and have RIU remediate between them.





#### Andrew Loveless (NASA JSC/EV2)



### Approach 2:

- One RIU
- Remediation b/w multiple sensors

#### **Problems?**

- RIU could fail internally, resulting in:
  - 1. No-transmission
  - 2. Symmetric faulty transmission

### The Fix:

- Increase resilience of the RIU:
  - 1. TMR of processor elements (e.g. Maxwell SCS750 used on ESA Gaia satellite).
  - 2. True dual-core lock-step processor (i.e. fully isolated self-checking).
    - COTS products like ARM Cortex-R4/R5
      not available in rad-tolerant variants.







3

### Approach 3:

- One RIU with HI processor
- Remediation b/w multiple sensors

#### **Problems?**

- TTE ES could fail arbitrarily, resulting in:
  - 1. No-transmission

**Onboard Flight Computer** 

Designates faulty device

Sensor or Actuator

Remote Interface Unit (RIU)

TTE network switch (COM/MON)

- 2. Symmetric faulty transmission
- 3. Byzantine transmission

#### The Fix:

- Increase resilience of the end system:
  - 1. TMR in the TTE Chip-IP MAC layer.
  - 2. Use a COM/MON HI end system.
    - Not available in TTTech Space ASIC.



Subsystem cannot function



### Approach 4:

- Multiple RIUs
- Each reads redundant sensors

#### **Problems?**

- None. Any arbitrary failure of an RIU is tolerated by the Triplex computers:
  - Choice() function is application specific.

#### Caveats:

- Each RIU performs only minimal local processing (e.g. message packing).
- No consensus is required between RIUs before transmitting data.
  - Since OBCs make decisions, OBCs require the consistency.





#### Andrew Loveless (NASA JSC/EV2)

Approved for Public Release – No Export Controlled Data

### Approach 5:

- Multiple RIUs
- Each reads redundant sensors
- RIUs require consensus

### Description

- If consensus between RIUs is necessary without interacting with the OBCs, then IC can be performed between RIUs.
  - Uses redundant network channels to provide the necessary FCRs.
  - Process is similar to classical channelized bus voting approach.

#### Caveats:

- Can make architecture <u>much more complex</u>.
- 1FT bus commanding may require 3 RIUs.







#### Andrew Loveless (NASA JSC/EV2)

## **Notional Onboard Traffic Flow**





Andrew Loveless (NASA JSC/EV2)

Approved for Public Release – No Export Controlled Data

Slide: 54/56

## **Notional Onboard Traffic Flow**





Andrew Loveless (NASA JSC/EV2)

Approved for Public Release – No Export Controlled Data

Slide: 55/56

## **Notional Onboard Traffic Flow**





Andrew Loveless (NASA JSC/EV2)

Approved for Public Release – No Export Controlled Data

Slide: 56/56



# **Questions?**