7.3 A HIGH-SPEED DIGITAL SIGNAL PROCESSOR FOR ATMOSPHERIC RADAR

J. W. Brosnahan and D. M. Woodard

Tycho Technology, Inc.
P.O. Box 1716
Boulder, CO 80306

GENERAL OVERVIEW

The Tycho Technology Model SP-320 is a high-speed pipelined digital signal processing system designed around the capabilities of the Texas Instruments TMS32010 16/32-Bit Signal Processor. This device is a monolithic realization of a complex general purpose signal processor, incorporating such features as a 32-bit ALU, a 16-bit x 16-bit combinatorial multiplier, and a 16-bit barrel shifter (TEXAS INSTRUMENTS, 1983a). The SP-320 is designed to operate as a slave processor to a host general purpose computer in applications such as coherent integration of a radar return signal in multiple ranges, or dedicated FFT processing. To the main PC board may be added piggyback modules for A/D conversion and I/O interfacing (see Figure 1). Presently available is an I/O module conforming to the Intel Multichannel interface standard (INTEL CORPORATION, 1983); other I/O modules will be designed to meet specific user requirements.

The main processor board (exclusive of A/D and I/O modules) includes input and output FIFO (First In First Out) memories, both with depths of 4096 words, to permit asynchronous operation between the source of data and the host computer. This design permits burst data rates in excess of 5 MHz.

(a) Areas of Application

The SP-320 was initially designed as a coherent integrator for atmospheric radar systems. In the course of development, it became apparent that with the addition of a few hardware features, the board could be made useful for a much broader class of mathematical and signal processing problems.

Coherent Integration for Radar. Design criteria for this application included a 1-MHz sample rate, 12 bits raw data precision, and 256 range gates. The SP-320 has a digital input data path width of 13 bits, and will support burst data rates to 5 MHz. In practice, the sampling rate is limited by the A/D conversion time required for the desired precision. For example, 12 bits in 0.5 μs is about the limit for a cost-no-object system using a single board-level converter per analog channel; a reasonable compromise is 10 bits in 1.0 μs using a hybrid A/D converter. Assuming range accumulators of 32 bits, 64 ranges can be accommodated using only the internal data memory of the TMS32010. Using external data memory, a maximum of 2048 ranges can be used. However, external range accumulators require approximately 3.6 μs for the read-add-write sequence compared to 1.4 μs for internal accumulators. Thus, for a large number of ranges, the system may become computation limited, requiring a reduction in pulse repetition frequency. Also, for a large number of ranges used in conjunction with a small number of points per integration, the system may become output limited at the processor-host interface. For a discussion of these and other performance trade-offs see (TYCHO TECHNOLOGY, INC., 1984). For the realistic case of a 1-MHz sample rate, 256 ranges, 6 parallel data channels, and a host interface capable of a DMA transfer rate of 1 Mbyte per μs, the SP-320 can support a pulse repetition period of 1 ms with any number of integration points greater than 8.

A feature of the TMS32010 that is of particular interest in the context of coherent integration is the MPYK (multiply by constant) instruction. This allows the raw data word to be multiplied by a 16-bit constant previously stored
Figure 1. SP-320 overall block diagram.

in the T register with no execution time penalty compared to simply entering the raw data into the processor. The user thus may impose a window function on the integration by periodically reloading the T register from a table in program memory.

For those users who wish to implement a pre-integration pulse-pair processing algorithm, the SP-320 provides an Inter-Channel Link. This allows the processors of two associated quadrature channels to pass data words back and forth as required for complex arithmetic.

Digital Filters. The TMS32010 was designed with a strong emphasis on the efficient execution of digital filter algorithms. In particular, hardware macro instructions such as LTD allow very fast manipulation of running lists of data points. See TEXAS INSTRUMENTS (1983b), section 8, and RABINER and GOLD (1975) for more details on digital filter design. The SP-320 can be used as a fast real-time digital filter by employing the TMS32010, using only internal data memory, and the input and output FIFOs.

FFT Processing. The SP-320 supports both real-time and batch FFT processing. Maximum conversion efficiency is achieved by using straight-line code and a maximum of 64 complex points of 16 bit integer precision. See Figure 5 in MAGAR et al. (1982) for a summary of conversion times for various sizes of transforms and for both the straight-line and looped code cases. For example, a 64 point complex transform with straight-line code can be completed in 738 \( \mu \text{s} \). A 1024-point transform with looped code requires 76 ms. Within the 64-point limit, the TMS32010 does not need to access external data memory, and with straight-line code can achieve conversion times that compare favorably with some dedicated FFT processors. Even with the much longer conversion times required by larger transforms and looped code (required because of the 4 kw program memory size limitation), the SP-320 may in some circumstances be a useful compromise between a hardware FFT processor and a software FFT running on a general purpose microprocessor.

HARDWARE DESCRIPTION

The SP-320 consists of a main PC board with external connections grouped on two headers. These headers mate with connectors on the two piggyback mod-
ules, the 320-AD (A/D Conversion Module), and the 320-MC (Multichannel Bus Interface). For those users who wish to design custom interfacing for the SP-320, the two headers allow direct access to the input and output FIFOs and to the program and data memories.

(a) Main Board

The TMS32010 is a "Harvard Architecture" device, i.e. the program and data memories are separate. The SP-320 implements the full 4k W addressable program memory space. The TMS32010 has 144 W of internal data memory and only 3 b of external data address. To make the SP-320 a general purpose processor, the 8 externally addressable data objects are regarded as I/O ports, and 4k W (or 2k DPW) of data memory are furnished, with auto-incrementing (or decrementing) address generation under control of the ports (see section entitled "Port Assignments").

Raw Data Input. Normal data input for real-time applications is by way of the input FIFO through header J1 using a Data Valid/Read control sequence. This input path is 12 b wide; each data word enters the TMS32010 in one instruction cycle (200 ns) by use of the MPYK instruction. A hardware switching scheme injects the current output word of the input FIFO into the constant field of the instruction. This feature may be enabled/disabled under software control to allow normal use of the MPYK instruction. Jumper settings allow the raw data word to be interpreted as either natural binary or two's complement binary. In the latter case, the data word is automatically sign-extended in the TMS32010 to 16 bits.

Data Memory. The SP-320 data memory is organized as 4096 16-bit words. However, odd and even addresses are accessed by separate I/O ports. This arrangement was determined to be optimum for implementing double precision accumulators for coherent integration. Address generation is external to the TMS32010 and address incrementing, under control of 4 bits in an increment control register, can be made to occur automatically after any or all of the following: read low word, read high word, write low word, write high word. For example, in coherent integration for radar, using double precision accumulators, one would set the increment control bit for "increment after write high word". Then, after the sequence: read low word, add, write low word, read high word, add, write high word; the data memory automatically would be incremented for the next range gate. Address, data, and control lines for block transfers to and from data memory are available at header J2.

Output FIFO. For pipelined processing, the SP-320 provides a 16 b wide by 4096 W deep output FIFO. The width (16 b as opposed to e.g. 32 b) was determined by the standard interface definitions (Multichannel, IEEE-488, etc.) in general use for scientific applications. For an application such as integration with double precision accumulators, one would program the system to write the results of the last sequence of additions to the output FIFO in low, high order instead of returning the results to data memory. The host would then be expected to transfer out the range sums at an average rate sufficient to keep up with the processor. For batch operations, the output FIFO may still be used as the output path, as an alternative to a DMA transfer out of data memory. FIFO output is available at header J2.

Program Memory. The TMS32010 executes instructions from a 16 b by 4096 W RAM which must be loaded by the host computer with TMS32010 object code. Assertion of the Reset line places the TMS32010 in an inactive state and makes the program memory address and data lines available at header J2 for block loading by the host. Upon deassertion of Reset, the TMS32010 begins execution at address 0000.
D = 0 Increment address after current access.
D = 1 Decrement address after current access.

RL (Read LO) Mask bits to determine which I/O instructions increment/decrement counter after current access. May be used in any combination.
RH (Read HI) 
WL (Write LO) 
WH (Write HI) 

Figure 2. Format for loading data address counter.

(b) A/D Converter Module

The 320-AD piggyback module (see Figure 3) mates with J1 on the SP-320. Its main components are a Teledyne Philbrick 4860 track-hold amplifier and a Burr-Brown ADC803 analog-to-digital converter. Input voltage ranges of ±10, ±5, and 0 to -10 are jumper selectable. The converter's clock rate is adjustable, and the end-of-conversion point is jumper selectable; these features allow the module to be set up for maximum conversion speed at any precision from 8 bits to 12 bits. The output format can be jumper selected to offset binary or two's complement binary. An output latch holds the result of the last conversion while the current conversion is in process, allowing continuous pipelined operation at the maximum speed of the converter. The output latch is 16 bits wide with full sign extension, for maximum versatility in interfacing the module to devices other than the SP-320. The maximum continuous sampling rate varies from 2 MHz at a precision of 8 bits to 0.66 MHz at 12 bits (BURR-BROWN CORPORATION, 1983).

(c) Multichannel™ Bus Interface Module

The 320-IC piggyback module mates with J2 on the SP-320. This module serves as a high-speed parallel interface between the SP-320 and a Multichannel™ bus at the Basic Talker/Listener level of compliance (see INTEL CORPORATION, 1983, section IV). The module includes counters for address generation and

Figure 3. Block diagram of 320-AD module.
word count to support high-speed DMA transfers. Up to 15 modules may be con-
ected to the same bus under the control of a single DMA device having Multi-
channel™ supervisor capabilities. All of the functions available at J2 are
supported by the 320-MC as register and memory assignments. The Multichannel™
Device Number for the module is jumper selectable.

PROGRAMMING CONSIDERATION

The following hardware details have a bearing on the programming of the
SP-320; they should be considered in conjunction with the TMS32010 instruction
set (TEXAS INSTRUMENTS, 1983a).

(a) Port Assignments

The 8 I/O ports of the TMS32010 are decoded separately for read and write,
and perform the following functions:

<table>
<thead>
<tr>
<th>Port</th>
<th>Read Function</th>
<th>Write Function</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>not used</td>
<td>load data address counter*</td>
</tr>
<tr>
<td>1</td>
<td>read LO data word</td>
<td>write LO data word</td>
</tr>
<tr>
<td>2</td>
<td>read HI data word</td>
<td>write HI data word</td>
</tr>
<tr>
<td>3</td>
<td>read inter-channel link</td>
<td>write inter-channel link</td>
</tr>
<tr>
<td>4</td>
<td>read system status byte</td>
<td>write flag byte</td>
</tr>
<tr>
<td>5</td>
<td>not used</td>
<td>write to output FIFO</td>
</tr>
<tr>
<td>6</td>
<td>not used</td>
<td>not used</td>
</tr>
<tr>
<td>7</td>
<td>not used</td>
<td>not used</td>
</tr>
</tbody>
</table>

*See Figure 2 for data address counter loading format.

(b) Timing and Interrupts

The SP-320 requires that an external 40-MHz TTL level clock signal be sup-
plied to J3 (an SMA female connector on the main PC board). Use of an external
clock allows the synchronization of multiple data channels executing the same
program.

Two provisions have been made to enable the system to avoid over-reading
the input FIFO. The empty line of the FIFO controller is routed to the INT
pin, and the Data Ready line to the BIO pin of the TMS32010 (see TEXAS INSTRU-
MENTS, 1983). Thus the user has the option of either checking BIO status be-
fore each data input, or using an interrupt handler to respond to the empty
condition.

The flag byte (I/O port 4) has 4 user-definable bits that control on-board
LEDs for diagnostic purposes. These bits are also readable at J2. The user
could, for example, define one of these bits as a "task completed" indicator
for batch processing and use it to initiate a host interrupt.

REFERENCES

Burr-Brown Corporation (1983), ADC803 Data Sheet, Tucson, AZ.
Intel Corporation (1983), Multibus Handbook, Santa Clara, CA.
Magar, S. S., R. Hester and R. Simpson (1982), Signal-processing U/C builds FFT-
based spectrum analyzer, Electronic Design, 19.
Processing, Prentice-Hall, Englewood Cliffs, NJ.
Texas Instruments (1983b), Signal Processing Products and Technology, Dallas,
TX.