N94-71100

# A High Speed CCSDS Encoder for Space Applications

S. Whitaker and K. Liu NASA Space Engineering Research Center for VLSI System Design University of Idaho Moscow, Idaho 83843

Abstract – This paper reports a VLSI implementation of the CCSDS standard Reed Solomon encoder circuit for the Space Station. The  $1.0\mu m$  double metal CMOS chip is 5.9mm by 3.6mm, contains 48,000 transistors, operates at a sustained data rate of 320 Mbits/s and executes 2,560 Mops. The chip features a pin selectable interleave depth of from 1 to 8. Block lengths up to 255 bytes as well as shortened codes are supported. Control circuitry uses register cells which are immune to Single Event Upset. In addition, the CMOS process used is reported to be tolerant of over 1 Mrad total dose radiation.

# **1** General Description

This chip implements an encoder for the CCSDS standard (255,223) Reed Solomon (RS) code [1]. An RS code is a cyclic symbol error correcting code for correcting errors introduced into data during transmission through a communication channel. The CCSDS standard is a 16 symbol error correction code. The code block consists of 223 information symbols and 32 parity symbols. Each symbol is an 8 bit word. Due to the flexible nature of the algorithms being implemented, the circuit will support the encoding of shortened, as well as full length RS codes. Specifically, the codes which are supported are of the form: (255 - i, 223 - i), where *i* can be any integer from 0 to 222.

The code is defined over the finite field  $GF(2^8)$ . The field defining primitive polynomial is:

$$p(x) = x^8 + x^7 + x^2 + x^1 + x^0$$
 (1)

The generator polynomial is given by:

$$g(x) = \prod_{i=112}^{143} (x - \beta^i)$$
 (2)

where  $\beta = \alpha^{11}$ .

The encoder represents data in the dual basis such that

$$[z_0, z_1, \ldots, z_7] = [u_7, u_6, \ldots, u_0] T$$
(3)

1.3.1

where  $[z_0, z_1, \ldots, z_7]$  is the symbol represented by the dual basis,  $[u_7, u_6, \ldots, u_0]$  is the symbol represented by the normal basis and T is the following transform matrix:

$$T = \begin{bmatrix} 1 & 0 & 0 & 0 & 1 & 1 & 0 & 1 \\ 1 & 1 & 1 & 0 & 1 & 1 & 1 & 1 \\ 1 & 1 & 1 & 0 & 1 & 1 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 & 1 & 1 & 0 \\ 1 & 1 & 1 & 1 & 1 & 0 & 1 & 0 \\ 1 & 0 & 1 & 1 & 1 & 1 & 0 & 1 & 1 \\ 1 & 0 & 1 & 0 & 1 & 1 & 1 & 1 \\ 0 & 1 & 1 & 1 & 1 & 0 & 1 & 1 \end{bmatrix}$$
(4)

Normal data can be derived from data represented in the dual basis using the following inverse transform.

$$[u_7, u_6, \ldots, u_0] = [z_0, z_1, \ldots, z_7] T^{-1}$$
(5)

where

$$T^{-1} = \begin{bmatrix} 1 & 1 & 0 & 0 & 0 & 1 & 0 & 1 \\ 0 & 1 & 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 1 & 0 & 1 & 1 & 1 & 0 \\ 1 & 1 & 1 & 1 & 1 & 1 & 0 & 1 \\ 1 & 1 & 1 & 1 & 0 & 0 & 0 & 0 \\ 0 & 1 & 1 & 1 & 1 & 0 & 0 & 1 \\ 1 & 0 & 1 & 0 & 1 & 1 & 0 & 0 \\ 1 & 1 & 0 & 0 & 1 & 1 & 0 & 0 \end{bmatrix}$$
(6)

A dual field is simply a different representation of the original field. The coefficients of g(x) are linear operators. An operator O in the original representation of the field can be used in the dual representation by applying the following transform.

$$O_{dual} = TO_{original}T^{-1} \tag{7}$$

Additional details of the mathematics can be found in [2].

The coder circuit has data input and output ports. Data is input in a byte serial fashion at a constant rate, and is output in a byte serial fashion with a fixed one clock cycle latency. After the information bytes have been output, the 32 bytes of RS parity are appended to the data stream. The data rate for the chip is 40 Mbytes/sec when clocked at a rate of 40 MHz.

The encoder can be programmed to interleave the data at depths of one, two, ... or eight. Interleaving of two or more encoded messages allows higher burst error correction capabilities. The interleaving depth, I, is controlled by external pins,  $S_0$ ,  $S_1$  and  $S_2$ .

1.3.2

# 2 Chip Operation

### 2.1 Initialization

Before proper circuit operation can begin, the encoder sections must be initialized and the interleave depth chosen. This is accomplished by bringing the reset input (RST) high for at least two clock pulses and setting the interleave depth control lines,  $S_0$ ,  $S_1$  and  $S_2$ , to the appropriate state.

| $S_2$ | $S_1$ | $S_0$ | I |
|-------|-------|-------|---|
| 0     | 0     | 0     | 1 |
| 0     | 0     | 1     | 2 |
| 0     | 1     | 0     | 3 |
| 0     | 1     | 1     | 4 |
| 1     | 0     | 0     | 5 |
| 1     | 0     | 1.    | 6 |
| 1     | 1     | 0     | 7 |
| 1     | 1     | 1     | 8 |

At this time, it is also necessary to bring the input control (INC) inactive low to ensure no spurious messages are processed. Two clock pulses after RST is brought low, circuit operation may commence. The circuit may be re-initialized at any time but any messages being processed by the encoder section at that time will be lost. Zeros are clocked into the parity generator whenever INC is low.

### 2.2 Encoder Operation

Assuming the initialization sequence has been performed, encoding is performed in the following manner. INC is brought high coincidentally with the first message symbol to be encoded. It remains high while successive message symbols are clocked into the encoder on the data input bus (DI). Symbols are clocked in and out of the circuit on the rising edge of the symbol clock (CK).

INC is brought low again when the last message symbol has been clocked into the circuit. It must remain low at least 32I clock cycles, during which time the parity symbols will be clocked out of the circuit. This operation also fills the parity generator with zeros. If INC is held low longer than 32I clock cycles, zeros will appear on the the data output bus (DO). Bringing INC high after it has been low for 32I or more clock cycles starts the processing of the next message.

#### 2.3 Bypass Operation

After a reset operation or after a block has been encoded and the parity read from the chip, a data bypass operation can occur. Data can flow through the encoder without being encoded by bringing the bypass input control (BIC) high coincidentally with the first byte

of data to be passed unprocessed by the chip. After the one clock cycle latency, the data entering on DI appears on DO and continues to pass through the chip as long as BIC remains high. While BIC is high, INC should be held low to keep the registers in the parity generator held reset.

### 2.4 Space Enhancement Features

The CMOS fabrication process used is reported to be tolerant of total dose radiation levels exceeding 1 Mrad [3]. In addition, the chip is designed to provide protection against Single Event Upset (SEU) in two ways. First, control memory cells are designed to be electronically tolerant of SEU's. Second, the control structure and data path are configured to completely reset after each message insuring that an SEU of the data registers will effect at most one encoded message.

A 16 bit shift register has been included on the chip with the input driven by the test input pin (TI) and the output driving the test output pin (TO). This test structure will enable the SEU immune memory cell to easily be tested under conditions of irradiation to verify the immunity.

# **3** VLSI Implementation

Full custom VLSI was used to achieve both circuit density and speed. The basic VLSI architecture implemented here is similar to a previous full custom design [4]. The additional features include interleaving, high speed operation (320 Mbits/sec), radiation hardened processing and SEU protection.

### 3.1 General Organization

Figure 1 shows a top level logic diagram. The chip consists of an encoder section and a test shift register. The encoder contains 32 multipliers and 32 adders which operate in parallel so that the mathematics required for the parity generation can be performed at the data input clock rate. The encoder also contains the 2048 registers (32x8x8) required to interleave the data to a maximum depth of 8. The 16 bit shift register is a test structure that will be used to verify the SEU immunity of the registers used for the control circuitry.

Data is input on the DI0-7 pins and output on the DO0-7 pins. Input data is framed by the INC control signal. Output data is framed by OUTC which is a delay of INC. When data is input to the chip, it is presented to the parity generator and also passed out the output port DO0-7. At the end of the data block, INC transitions low and the output of the parity generator passes out DO0-7. With INC low, 0's are input to the parity generator clearing out the registers. With BIC high and INC low, data flows from the DI0-7 to DO0-7 without being input to the parity generator providing a bypass mode.

### 3.2 Parity Generator

1.3.4







Figure 2: Parity generator block diagram.



Figure 3: Layout of constant multiplier.

The parity generator was organized as a set of 32 slices. Each slice consisting of a multiply/add structure and a register stack for interleaving. Figure 2 shows a block diagram of the logic for the parity generator. Each register is a 1 to 8 bit shift register depending on the interleave depth set up during circuit initialization. The multiplier cell is a precharged exclusive or (XOR) chain. 8 of these chains form a constant multiplier. The input data word is multiplied by a constant,  $g_x$ , programmed into the multiplier as XOR cells or interconnect (ZERO) cells. The XOR cell consists of 4 NMOS transistors. The ZERO cell is a modified XOR cell which acts as an interconnect block. The layout of a constant multiplier is shown in Figure 3. The multiplication constant can be programmed with a single mask layer defining the pattern of XOR and ZERO cells in the XOR chains. For maximum speed the XOR chain is precharged from both ends. The addition function is folded into the evaluate structure for the multiplier. The XOR cell was designed in layout to consume minimum area and the registers were designed to match the pitch of two XOR cells. Half the registers were placed above the multiplier and half below.

Since the registers are twice as wide as the XOR chain, the outputs of the columns in



Figure 4: Layout of two adjacent slice details.

the multiplier alternate between top and bottom. The higher order nibble is output on one end of the constant multiplier and the lower order nibble on the opposite end. This requires the columns of the multiplier matrix to be rearranged such that the columns in the matrix are  $[C_0C_4C_1C_5C_2C_6C_3C_7]$ . Also, in order to avoid long interconnect runs between registers on the top and bottom to drive the adder inputs in the multiply/add structure, a second slice detail was drawn such that the top and bottom sections were reversed. These two slice details were then alternated in the parity core allowing connection of the adjacent slices by abutment. Figure 4 shows the layout of two adjacent slices. There is no interconnect required between any of the leaf cells. The entire structure is connected by abutment. This maximizes the speed of operation since the interconnect capacitance has been minimized.

A natural layout would place all 32 slices in a row. This would maximize the speed of operation and minimize the area required for the parity generator, but would result in a die size of approximately 10mm by 2mm. This aspect ratio would be hard to accommodate in packaging and the reliability of the bond wires would be in question, especially under the stresses expected during launch. The speed was therefore compromised by folding the array in half. The control was duplicated and an interconnect bank was run from the output of Slice 15 to the input of Slice 16. The final chip layout is shown in Figure 5.



Figure 5: Chip layout.

### 4 Summary

A Rad Hard, SEU tolerant implementation of the CCSDS standard RS16 encoder has designed for Goddard Space Flight Center. The chip was drawn in a  $1.0\mu m$  CMOS process and is being fabricated at Hewlett Packard's Circuit Technology Group. The encoder operates at a 320 Mbit/sec data rate.

Acknowledgement: This research was supported in part by NASA under grant NAGW-1406. The authors wish to acknowledge the support from Warner Miller at Goddard Space Flight Center. This chip will be commercially available as AHA 4611 from Advanced Hardware Architectures.

# References

- [1] H.F. Reefs and A.R. Best, "Concatenated Coding on a Spacecraft-to-ground Telemetry Channel Performance", Proc. ICC-81, 1981.
- [2] G. C. Clark and J. B. Cain Error Correcting Coding For Digital Communications, New York NY, Plenum Press, 1981
- [3] CMOS34 Radiation Hardness,

### 2nd NASA SERC Symposium on VLSI Design 1990

[4] G. Maki, P. Owsley, K. Cameron, and J. Shovic, "A VLSI Reed Solomon Encoder: An Engineering Approach," IEEE Custom Integrated Circuit Conference, pp. 177-181, May 1986

3