FAST CARRY ACCUMULATOR DESIGN

By William C. Mastin
Astrionics Laboratory

February 9, 1970

NASA
George C. Marshall Space Flight Center
Marshall Space Flight Center, Alabama
Fast Carry Accumulator Design

William C. Mastin

George G. Marshall Space Flight Center
Marshall Space Flight Center, Alabama 35812

Prepared by Astrionics Laboratory, Science and Engineering Directorate

Methods for increasing the speed of binary addition by decreasing carry propagation time are reviewed. Particular attention is given to those methods that would be applicable to an accumulator using ones complements with end-around carry for subtraction. An iterative accumulator using pulse logic for input and carry signals is described. Realizations of gated carry, carry-completion detection, and carry-skip circuits that would be compatible with this accumulator are presented. NAND gates are used in the design of the required combinational networks.

Digital Accumulation
Gated Carry
Carry-Completion Detection
Carry Skip

Unclassified
# TABLE OF CONTENTS

<table>
<thead>
<tr>
<th>Section</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>SUMMARY</td>
<td>1</td>
</tr>
<tr>
<td>INTRODUCTION</td>
<td>1</td>
</tr>
<tr>
<td>REVIEW OF METHODS FOR DECREASING CARRY PROPAGATION TIME</td>
<td>3</td>
</tr>
<tr>
<td>Gated Carry</td>
<td>5</td>
</tr>
<tr>
<td>Carry-Completion Detection</td>
<td>8</td>
</tr>
<tr>
<td>Carry-Skip Techniques</td>
<td>12</td>
</tr>
<tr>
<td>DESCRIPTION OF A SIMPLE ITERATIVE ACCUMULATOR</td>
<td>19</td>
</tr>
<tr>
<td>RAPID ACCUMULATOR REALIZATIONS AND ANALYSIS</td>
<td>26</td>
</tr>
<tr>
<td>Gated Carry</td>
<td>26</td>
</tr>
<tr>
<td>Carry-Completion Detection</td>
<td>30</td>
</tr>
<tr>
<td>Carry-Skip Techniques</td>
<td>39</td>
</tr>
<tr>
<td>DISCUSSIONS AND CONCLUSIONS</td>
<td>51</td>
</tr>
<tr>
<td>APPENDIX: DETERMINATION OF OPTIMAL AND ECONOMICAL SKIP DISTRIBUTIONS</td>
<td>53</td>
</tr>
<tr>
<td>REFERENCES</td>
<td>62</td>
</tr>
<tr>
<td>BIBLIOGRAPHY</td>
<td>64</td>
</tr>
</tbody>
</table>
# LIST OF TABLES

<table>
<thead>
<tr>
<th>Table</th>
<th>Title</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>1.</td>
<td>Truth Table for Carry Determination</td>
<td>5</td>
</tr>
<tr>
<td>2.</td>
<td>Logic Element Delay Times</td>
<td>25</td>
</tr>
<tr>
<td>3.</td>
<td>Relative Cost of Logic Elements</td>
<td>25</td>
</tr>
<tr>
<td>4.</td>
<td>Accumulator Cost and Time Requirements</td>
<td>51</td>
</tr>
<tr>
<td>5.</td>
<td>Comparison of Accumulator Designs</td>
<td>52</td>
</tr>
</tbody>
</table>
# LIST OF ILLUSTRATIONS

<table>
<thead>
<tr>
<th>Figure</th>
<th>Title</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>1.</td>
<td>Closed-loop system with sampling.</td>
<td>1</td>
</tr>
<tr>
<td>2.</td>
<td>Digital compensator</td>
<td>2</td>
</tr>
<tr>
<td>3.</td>
<td>Simple series accumulator</td>
<td>3</td>
</tr>
<tr>
<td>4.</td>
<td>Typical full binary adder stage</td>
<td>4</td>
</tr>
<tr>
<td>5.</td>
<td>Simple iterated (pseudoparallel) adder</td>
<td>5</td>
</tr>
<tr>
<td>6.</td>
<td>Gated carry</td>
<td>6</td>
</tr>
<tr>
<td>7.</td>
<td>Stored carry</td>
<td>7</td>
</tr>
<tr>
<td>8.</td>
<td>Principle of carry-completion detection</td>
<td>9</td>
</tr>
<tr>
<td>9.</td>
<td>Carry-completion detection, one stage</td>
<td>10</td>
</tr>
<tr>
<td>10.</td>
<td>NOR gate adder stage</td>
<td>11</td>
</tr>
<tr>
<td>11.</td>
<td>NOR gate realization of adder carry and carry-completion detection circuits</td>
<td>13</td>
</tr>
<tr>
<td>12.</td>
<td>Full binary adder stage</td>
<td>14</td>
</tr>
<tr>
<td>13.</td>
<td>Three-bit adder group with full carry prediction</td>
<td>16</td>
</tr>
<tr>
<td>14.</td>
<td>Carry-look-ahead for three groups</td>
<td>17</td>
</tr>
<tr>
<td>15.</td>
<td>Carry-skip circuit</td>
<td>20</td>
</tr>
<tr>
<td>16.</td>
<td>A simple iterative accumulator</td>
<td>21</td>
</tr>
<tr>
<td>17.</td>
<td>Timing diagram for carry generation and assimilation</td>
<td>24</td>
</tr>
<tr>
<td>18.</td>
<td>Accumulator with gated carry</td>
<td>28</td>
</tr>
<tr>
<td>Figure</td>
<td>Title</td>
<td>Page</td>
</tr>
<tr>
<td>--------</td>
<td>----------------------------------------------------------------------</td>
<td>------</td>
</tr>
<tr>
<td>19.</td>
<td>Timing diagram for carry generation and assimilation (gated carry)</td>
<td>31</td>
</tr>
<tr>
<td>20.</td>
<td>Carry-completion detection circuit</td>
<td>33</td>
</tr>
<tr>
<td>21.</td>
<td>Timing diagram for detection enable signal generation</td>
<td>34</td>
</tr>
<tr>
<td>22.</td>
<td>Continuity of carry signal at sensing gates</td>
<td>35</td>
</tr>
<tr>
<td>23.</td>
<td>Timing diagram for carry generation and assimilation in carry-completion detection realization</td>
<td>36</td>
</tr>
<tr>
<td>24.</td>
<td>Timing diagram for carry-completion signal generation</td>
<td>38</td>
</tr>
<tr>
<td>25.</td>
<td>Carry propagation through group with ( t_1 \cdot t_2 \cdot t_3 \cdot t_4 \cdot t_5 ) true</td>
<td>43</td>
</tr>
<tr>
<td>26.</td>
<td>Carry skip group ( k )</td>
<td>45</td>
</tr>
<tr>
<td>27.</td>
<td>Gating level timing diagram</td>
<td>49</td>
</tr>
<tr>
<td>28.</td>
<td>Six-group representation</td>
<td>50</td>
</tr>
</tbody>
</table>
# DEFINITION OF SYMBOLS

<table>
<thead>
<tr>
<th>Symbol</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>+</td>
<td>Logical OR function</td>
</tr>
<tr>
<td>-</td>
<td>Logical AND function</td>
</tr>
<tr>
<td>-</td>
<td>Inversion</td>
</tr>
<tr>
<td>⊕</td>
<td>Logical EXCLUSIVE OR function</td>
</tr>
<tr>
<td>⊗</td>
<td>Logical COINCIDENCE function</td>
</tr>
<tr>
<td>AC</td>
<td>Accumulator register element; T flip-flop</td>
</tr>
<tr>
<td>DM</td>
<td>Delay multivibrator</td>
</tr>
<tr>
<td>⊗</td>
<td>NAND gate</td>
</tr>
<tr>
<td>⊗</td>
<td>NOR gate</td>
</tr>
<tr>
<td>⊗</td>
<td>Delay</td>
</tr>
</tbody>
</table>
TECHNICAL MEMORANDUM X-53983

FAST CARRY ACCUMULATOR DESIGN

SUMMARY

Methods for increasing the speed of binary addition by decreasing carry propagation time are reviewed. Particular attention is given to those methods that would be applicable to an accumulator using ones complements with end-around carry for subtraction. An iterative accumulator using pulse logic for input and carry signals is described. Realizations of gated carry, carry-completion detection, and carry-skip circuits that would be compatible with this accumulator are presented. NAND gates are used in the design of the required combinational networks.

INTRODUCTION

Digital techniques are finding application in closed-loop control systems. Sampling permits the control of large amounts of power by sensitive control elements and decreases the loading on sensing devices. The use of sampled data and digital components in control systems facilitates time sharing of portions of the system, which results in both economy and reliability in design. A digital compensator in a control loop is shown in Figure 1, where \( D(z) \) represents the transfer function of the compensator, \( G_a(s) \) is the transfer function of the system sensing element, and \( G_b(s) \) is the transfer function of the system driving element.

![Figure 1. Closed-loop system with sampling](image)

A model of a digital compensator is shown in Figure 2 [1]. A second-order compensator transfer function may be described by [2]
\[ e_2(KT) = e_1(KT) + a_1e_1(KT-T) + a_2e_1(KT-2T) - b_1e_2(KT-T) - b_2e_2(KT-2T) \]

where \( T \) is the sampling period.

The accuracy with which this equation actually describes the transfer function may be seen to depend upon the speed with which the computer performs the indicated arithmetic operations. If \( \tau \) is the time required to complete the computation, the output of the compensator is a function of \( KT + \tau \). This \( \tau \) corresponds to a lag in a continuous system and, if large enough, could contribute to stability problems.

Adder circuitry is essential not only for addition, but also for multiplication, subtraction, and division. The speed of the computational process is then heavily dependent upon that of the adder. In general, the basic design problem for arithmetic units is that of achieving high speed and accuracy at low cost. Size, weight, and sensitivity to environment are not generally of great importance in a general purpose computer. These factors, along with a greater emphasis on reliability, do become of great concern in the special purpose computers used in digital compensators.

The most economical adder is probably the serial machine represented in Figure 3. Only one full binary adder (FBA) unit is required to sum the addend and augend words. A typical FBA stage is shown in Figure 4. However, only one element of each word is added in one computation time so the time for the addition of the two words (of equal length) is then \( nT \) where \( n \) is the number of elements in a word and \( T \) is the time required for each addition.

The time required for the addition of the two words can be decreased by the use of a pseudoparallel or simple iterated adder as represented in Figure 5. Here the elements of the \( n \)-bit addend and augend are applied to \( n \) FBA's instantaneously, and all partial sums \((x_i \oplus y_i)\) are formed during one
addition time $T$. The addition is not completed, however, until all carries generated have been added to the partial sums in the succeeding stages. This carry propagation is essentially a serial process, and in the worst case a carry generated in the first stage would have to propagate through the entire length of the adder.

It is thus recognized that the problem of increasing the speed of the addition process reduces, to a great extent, to that of decreasing carry propagation time. A considerable amount of work has been done in this area and several solutions have been proposed.

**REVIEW OF METHODS FOR DECREASING CARRY PROPAGATION TIME**

Several methods for increasing the speed of binary addition by decreasing carry propagation time have been presented by Sklansky [3], MacSorely [4], and Lehman and Burla [5]. Those methods and others that might be applicable to accumulators using ones complements with end-around carry for subtraction will be reviewed here.
Figure 4. Typical full binary adder stage.
The truth table for carry determination for two-summand binary addition is given in Table 1.

**TABLE 1. TRUTH TABLE FOR CARRY DETERMINATION**

<table>
<thead>
<tr>
<th>(c_1)</th>
<th>(x_1)</th>
<th>(y_1)</th>
<th>(c_{i+1})</th>
<th>(\text{Carry out determined solely by carry in.})</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td></td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td></td>
</tr>
<tr>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td></td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td></td>
</tr>
</tbody>
</table>

| 0  | 0  | 0  | 0  | \(\text{Carry out determined solely by the summands, } x_1, y_1.\) |
| 1  | 0  | 0  | 0  | |
| 0  | 1  | 1  | 1  | |
| 1  | 1  | 1  | 1  | |

**Gated Carry**

Perhaps the simplest method for decreasing carry propagation time is that of gated carry. Table 1 [6] shows that the dependency of the carry out of a stage will fall into two categories; one in which the carry out is determined entirely by the carry in and one in which the carry out is determined only by the summands. Unlike summand bits in a stage may be detected and used to provide a path for carries from its preceding stage directly to its succeeding stage. In a stage with like summands, the above path is interrupted, and a one carry or zero carry is provided for its succeeding stage, depending upon whether the summands are both ones or both zeros. One such scheme using a 2-bit time addition is shown in Figure 6 [7]. A variation of this method using storage elements for the carry produced in each stage is called the stored-carry technique and is shown in Figure 7 [7].
$x_i$ - Element of Accumulator Register

$y_i$ - Element of Input Register

Sum, $X+Y$, stored in Accumulator Register

Figure 6. Gated carry.
End-Around Carry

Figure 7. Stored carry.

$x_i$ - Element of Accumulator Register

$y_i$ - Element of Input Register

$c_i$ - Element of Carry Register

Sum, $X+Y$, stored in Accumulator Register
In either of these techniques, a carry entering the first stage would pass through two gates per stage through the \( n-1 \) stage and through two gates in the \( n \) stage to be assimilated. The maximum carry propagation time is then \( 2(nd) \), where \( n \) is the number of stages and \( d \) is the delay time per gate.

**Carry-Completion Detection**

In methods using fixed-time addition, time for the worst case propagation through the entire accumulator plus a safety factor must be allowed. It can be shown that for the addition of numbers with a random distribution of ones and zeros, the average maximum carry length is less than \( \log_2 n \), where \( n \) is the word length [8]. For the propagation of both carry and no carry, the average length is \( \log_2 \frac{5n}{4} \) [9,10]. The addition could then be speeded up if only the time needed for the actual carry propagation could be allowed. Gilchrist, Pomerene, and Wong [6] proposed a scheme to detect the presence of a carry and use its completion to initiate the next addition step. The principle of carry-completion detection is illustrated in Figure 8. At the initiation of a carry operation, a one is inserted into the zero-carry line preceding the first adder stage. A one carry or zero carry is then gated or generated in the succeeding stages as determined by the inputs at those stages. The condition for a one carry is

\[
c_i^1 = x_i y_i + c_{i-1} \cdot (x_i \oplus y_i)
\]  

and the condition for a zero carry is

\[
c_i^0 = x_i y_i + c_{i-1} \cdot (x_i \oplus y_i)
\]

Note that both the one- and zero-carry lines must be monitored so that a false completion signal will not be generated. One stage of an accumulator using carry-completion detection is shown in Figure 9 [7]. A method for evaluating the reduction in time achieved by the use of asynchronous addition techniques has been given by Hendrickson [10], who arrived at a value of 90 percent savings for a 100-bit adder.
A NOR gate binary adder with carry-completion detection has been presented by Majerski and Wiweger [11]. One stage of their realization of an adder is shown in Figure 10. The Boolean formulas describing the circuit of Figure 10 are

\[ D_i = x_i y_i + x_i y_i \]  
\[ D_{i\bar{c}} = x_i y_i + x_i y_i + x_{i-1} y_{i-1} + D_{i-1} c_{i-1} \]  
\[ D_{i\bar{c}} = x_i y_i + x_i y_i + x_{i-1} y_{i-1} + D_{i-1} c_{i-1} \]
Figure 9. Carry-completion detection, one stage.

X - Accumulator Register Element

Y - Input Register Element

Sum, X+Y, stored in Accumulator Register
\[ s_i = D_i c_i + D_i \overline{c}_i \]

\[ = D_i c_i + D_i \overline{c}_i + x_{i-1} y_{i-1} + D_{i-1} c_{i-1} \]  \hspace{1cm} (6)

where

\[ x_i y_i \quad (i = 1, 2, \ldots, n) \] = bits of summands

\[ c_i \quad (i = 1, 2, \ldots, n+1) \] = carry from \( i-1 \) to \( i \) position

\[ s_i \quad (i = 1, 2, \ldots, n) \] = \( i \)th bit of sum
Figure 11 shows the adder carry and carry-completion detection circuits of Majerski and Wiweger [11]. An advantage of the adder carry circuit is that a carry must propagate through only one gate per stage, whereas two gates are required in other known designs. The Boolean formulas for the carry-completion detection circuit are

\[
H_i = x_i y_i + x_i y_i + D_i c_i h + D_i c_i h
\]

(7)

\[
i = 2, 4, 6, \ldots, 2E \ (n/2)
\]

\[
H = H_2 + H_4 + H_6 + \ldots + H_{2E} \ (n/2)
\]

(8)

where \( h \) is the carry control signal.

Operation begins with \( h = 1 \) and the add control signal. After at least \( 2d \), where \( d \) is the maximum propagation time of a NOR gate, \( h \) is switched to 0. The carry propagation process then begins and lasts until all \( D_i c_i \) and \( D_i c_i \) gates stabilize. The \( h = 1 \) signal is applied to prevent a false carry-completion signal from being generated. The carry-completion signal may precede the real completion by up to \( d \) since only every second position is monitored.

**Carry-Skip Techniques**

Assuming the complements of both summands are available, an FBA stage may be represented as shown in Figure 12. The functions representing the sum and carry are then

\[
s_i = x_i \oplus y_i \oplus c_i
\]

(9)

\[
c_{i+1} = x_i y_i + x_i c_i + y_i c_i
\]

(10)
Figure 11. NOR gate realization of adder carry and carry completion detection circuits [11].
Let
\[ c_{i+1} = r_i, \quad (c_i = r_{i-1}) \]  

Equation (10) can be written as
\[ r_i = (x_i \oplus y_i) c_i + x_i y_i \]  

Let
\[ t_i = x_i \oplus y_i = \text{condition for transmission of a carry} \]  

\[ g_i = x_i y_i = \text{condition for generation of a carry.} \]  

Then
\[ r_i = g_i + t_i c_i = \text{carry out of a stage.} \]  

The carry into any stage may then be expanded as follows:
The expansion of \( r \) may be carried as far back as desired. The limit for an \( n \)-stage adder is an expression for \( c_n \) containing \( r_n \).

Application of this principle to a section of an adder is illustrated in Figure 13 [4]. Allowing the \( i \) to take on successive values in equation (4), and omitting all terms with negative subscripts, it is seen that each stage of the adder will require one \( i \)-input OR gate and \( i \) AND gates having one through \( i \) inputs. Thus the number of circuit elements would become prohibitive for adders of more than a few stages. However, the maximum carry path between any two stages is two levels and it is four levels for a complete addition.

The adder can be broken up into groups of stages connected in the manner described. The carry into each group would be designated \( c_{g1} \), the carry out of a group designated \( r_{g1} \), and the group containing the lower-order stages designated group 1. Assume that six stages would be a reasonable number of stages to be connected with full carry-look-ahead. If the six-stage groups are now connected in series with \( c_{g1} = r_{g1} \), a carry will require four levels to be generated, two levels to be transmitted through each intermediate group, and four levels to reach and produce a sum in the final group. For 6-bit groups then, the maximum carry path length would be \( 4 + (2n/6) \), where \( n \) is the number of stages. For a 30-bit adder, this would be 14. This technique may be extended by providing carry-look-ahead circuitry between groups and even further by dividing the groups into sections and providing look-ahead between sections. A carry-look-ahead, or carry-skip, network for three groups is illustrated in Figure 14.
Figure 13. Three-bit adder group with carry prediction [4].
Figure 14. Carry-look-ahead for three groups.
Majerski [12] has proposed and implemented a NOR gate realization of a carry-skip circuit. For use of this circuit the skips may comprise only odd numbers of adder stages and their distribution relative to one another must always be an even number of stages. The adder with which this circuit is used is an asynchronous binary NOR gate implementation, of which the structure is given by the following equations:

For odd adder stages,

\[
s_i = D_{i-1}c_i + x_{i-1}y_{i-1} + D_{i-1}c_{i-1} + x_1y_1 + x_1y_1 + D_1c_1,
\]

(17)

\[
D_{i-1}c_i = x_{i}y_{i-1} + x_{i}y_{i-1} + x_{i-1}y_{i-1} + D_{i-1}c_{i-1} + h,
\]

(18)

where \( h \) is the carry control signal;

for even adder stages,

\[
s_i = D_{i}c_i + x_{i-1}y_{i-1} + D_{i-1}c_{i-1} + x_1y_1 + x_1y_1 + D_1c_1,
\]

(19)

\[
D_{i-1}c_i = x_{i}y_{i-1} + x_{i}y_{i-1} + x_{i-1}y_{i-1} + D_{i-1}c_{i-1},
\]

(20)

where

\[
D_i = x_1y_1 + x_1y_1
\]

(21)

Carry propagation begins with a change in \( h \) from 1 to 0 and ends with the stabilization of the states of all gates.

The carry-skip circuit, comprising \( k \) stages from \( j + 1 \) to \( j + k \) of an adder, is described by the function:
where \( j \) is an even number and \( k \) is an odd number. The carry and skip circuits for \( k = 3 \) are shown in Figure 15 [12].

Majerski's equations for determining optimal and economical skip distributions in adders using his design are presented in the Appendix.

**DESCRIPTION OF A SIMPLE ITERATIVE ACCUMULATOR**

The parallel binary accumulator shown in reduced block diagram form in Figure 16 has been used by the Auburn Research Foundation in the design of a digital compensator [2]. It was selected because of its automatic carry feature and relatively simple control circuitry. Its properties will be briefly described here and used as a basis on which to judge improvements in carry propagation time resulting from the application of fast carry techniques.

The blocks labeled AC are \( T \) flip-flop storage elements of the accumulator register, and those labeled DM are delay multivibrators used for carry generation. The NAND gate pairs in the lower part of the figure form the augend-complementing structure. The input from the complementing-gate structure is a negative-going pulse when the input is positive. The negation output of the DM is used. The input to the \( i^{th} \) -stage flip-flop (AC) is then

\[
y_i \cdot c_i = y_i + c_i
\]

The output equation of the \( i^{th} \) flip-flop is

\[
x_i = t_i x_i + t_i x_i
\]

where \( t_i \) is the input.
Figure 15. Carry-skip circuit [12].
Figure 16. A simple iterative accumulator [2].
The total addition for a stage takes place in two steps. In the first step,

\[ t_1 = y_i \]

Then

\[ x_{i+1} = y_i \bar{x}_i + \bar{y}_i x_i \]

\[ = y_i \oplus x_i \tag{25} \]

The complement output of the flip-flop is the function

\[ y_1 \oplus x_1 = y_i \oplus x_i \tag{26} \]

In the second step of the addition, the input of the \( i^{\text{th}} \) flip-flop is \( c_i \). The final output is then

\[ x_{i+2} = c_i (y_i \oplus x_i) + \bar{c}_i (y_i \oplus x_i) \]

\[ = c_i \oplus y_i \oplus x_i \tag{27} \]

The accumulate operation is initiated by the application of the accumulate pulse. The contents of each stage of the augend, or input, register are gated to the input of the corresponding stage of the addend, or accumulator, register. This operation is simultaneous for all stages within the limits of the variation in propagation time between NAND gates. If the augend bit is a logic one, with the sign bit a logic zero (positive), the accumulator flip-flop changes state. Thus if a logic one from the augend stage is added to a logic one in the accumulator stage, the partial sum, a logic zero, is stored in the accumulator. The transition of the accumulator stage from a one to a zero state triggers the delay
multivibrator and generates a carry, which is routed to the succeeding stage. If a logic one from the augend stage is added to a logic zero in the corresponding accumulator stage, the partial sum, a logic one, is stored in the accumulator. The transition of the accumulator stage from a zero state to a one state does not trigger the delay multivibrator, and no carry is generated. At this point the accumulator contains the results of the operation \( x_1 \oplus y_1 \). The carry generated in the preceding stage is then added to the contents of each stage of the accumulator, forming another partial sum and possibly generating other carries. The delay time of the multivibrator is such that all partial sums are formed before the carries are generated. A timing diagram for the operation of one stage of the accumulator is shown in Figure 17. The AC registers trigger when there is a change from the one to the zero level. Therefore, the trailing edge of the accumulate pulse is used as the zero-time reference. At the end of the accumulate operation, the augend in the input register has been added to the addend in the accumulator register, and the sum has been stored in the accumulator register.

Typical delay times of the elements used in the accumulator are given in Table 2 [13]. The DM pulse width is set at 100 ns to assure operation of all circuits to which it is applied. It can be seen from the values in this table and by referring to Figure 16 that a maximum delay time of 120 ns is required from the end of the accumulate pulse to the formation of partial sums in the accumulator register. The time required to form a carry and transmit it to the succeeding register is a 30-ns delay for the multivibrator, a 100-ns pulse width of the multivibrator, and a 30-ns delay for the input gate, yielding 160 ns. The total time for the generation and transmission of the first carry is then 280 ns. The time for the generation and transmission of each secondary carry (generated by the assimilation of a carry) would be 220 ns, since the delay time of the complementation gate and one input gate would then be omitted. The worst case carry propagation time for a 30-stage accumulator of this design would then be 280 ns for the generation of a primary carry in the first stage, 6160 ns (28×220) for transmission through the intermediate stages, and 60 ns for assimilation in the 30th stage, yielding 6.50 \( \mu s \).

The relative cost of the circuit elements used in the accumulator is given in Table 3 [13], using a 2-input NAND gate as a basis. Inverters will be considered as 2-input NAND gates with the inputs common. A considerable variation in cost is found, but these figures should be realistic enough for comparative evaluation of designs.

The cost figure for the portion of this accumulator dealing with carry propagation is then 630.
Figure 17. Timing diagram for carry generation and assimilation.
### TABLE 2. LOGIC ELEMENT DELAY TIMES

<table>
<thead>
<tr>
<th>Logic Element</th>
<th>Typical Delay Time</th>
<th>Maximum Delay Time (ns)</th>
</tr>
</thead>
<tbody>
<tr>
<td>NAND gate</td>
<td>25 ns</td>
<td>30</td>
</tr>
<tr>
<td>AC register:</td>
<td></td>
<td></td>
</tr>
<tr>
<td>T input</td>
<td>45 ns</td>
<td>60</td>
</tr>
<tr>
<td>DC input</td>
<td>65 ns</td>
<td>80</td>
</tr>
<tr>
<td>Delay multivibrator:</td>
<td></td>
<td>60</td>
</tr>
<tr>
<td>Assertion delay</td>
<td></td>
<td>30</td>
</tr>
<tr>
<td>Negation delay</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Minimum</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Pulse width</td>
<td>50 – 300 ns,</td>
<td></td>
</tr>
<tr>
<td></td>
<td>300 ns – 3μs</td>
<td></td>
</tr>
</tbody>
</table>

### TABLE 3. RELATIVE COST OF LOGIC ELEMENTS

<table>
<thead>
<tr>
<th>Logic Element</th>
<th>Relative Cost</th>
</tr>
</thead>
<tbody>
<tr>
<td>2-input NAND gate</td>
<td>1</td>
</tr>
<tr>
<td>3-input NAND gate</td>
<td>2</td>
</tr>
<tr>
<td>4-input NAND gate</td>
<td>4</td>
</tr>
<tr>
<td>6-input NAND gate</td>
<td>6</td>
</tr>
<tr>
<td>12-input NAND gate</td>
<td>12</td>
</tr>
<tr>
<td>Flip-flop</td>
<td>8</td>
</tr>
<tr>
<td>Delay multivibrator</td>
<td>12</td>
</tr>
</tbody>
</table>
RAPID ACCUMULATOR REALIZATIONS AND ANALYSIS

The methods previously reviewed were investigated with regard to their application to the accumulator. This accumulator utilizes pulse logic for the input and carry signals. NAND gates were used in the design of combinational circuits because of their availability in integrated circuit form and their use in the basic accumulator.

The techniques presented by Majerski and Wiweger [11, 12] offer faster propagation times than those given in the following sections when used with adders of their design, but were not directly applicable to this accumulator because of its use of pulse logic.

Both one- and two-layer carry-skip methods were investigated. It was found that two-layer skips offered no appreciable speed advantage over one-layer skips for this accumulator.

The results of the investigation are presented in the following sections.

Gated Carry

It is seen from Table 1 that the carry out of a stage depends solely upon the carry in when the summands are unlike. The function representing unlike summands is

\[ x_1 \oplus y_1 = \overline{x_1 y_1} + \overline{x_1 \overline{y_1}} \]

The carry out of a stage depends only on the summands when they are both alike. The function representing like summands is

\[ x_1 \odot y_1 = x_1 y_1 + \overline{x_1 \overline{y_1}} \]

In the accumulator previously described, the above two functions are available from the \( i^{th} \)-stage flip-flop of the accumulator at the true and complement outputs, respectively. The carry from the \( i^{th} \) stage is generated when the
complement output goes to the logic-one level. The functions necessary for applying the gated-carry technique presented are available. The assimilation of a carry in a stage for which the function $x_i \oplus y_i$ is true will produce another carry, which poses no problem with level logic. In the accumulator, extraneous pulses would be inserted into the carry propagation line. The suppression of these extraneous pulses must be considered in applying the gated-carry technique.

Let $r_i$ be the carry inserted into the carry propagation line by the $i$th stage, and let

$$c_i = r_{i-1}$$  \hspace{1cm} (28)

Let $g_{i-1}$ be the carry pulse generated by the condition $x_{i-1} y_{i-1}$, which results when both summands of the $i-1$ stage are at the logic-one level. The function for $c_i$ must then be

$$c_i = r_{i-2} \cdot (y_{i-1} \oplus x_{i-1}) + e g_{i-1}$$  \hspace{1cm} (29)

where $e$ is a carry-generation enable pulse. This pulse must be present for the generation of a carry in step one of the addition process, and absent to prevent the generation of an extraneous carry when a carry is assimilated.

A circuit to realize this $c_i$ function is shown in Figure 18. The equation for $c_i$, written from the figure, is

$$c_i = r_{i-2} \cdot \bigg(y_{i-1} x_{i-1}\bigg) \cdot e g_{i-1}$$
Figure 18. Accumulator with gated carry.
Applying DeMorgan's theorem,

\[ c_i = r_{i-2} \cdot (y_{i-1} \oplus x_{i-1}) + e_{i-1} \]

which is the desired result.

The accumulation step is initiated with the accumulate pulse. If \( x_i \) and \( y_i \) are logic ones, the AC complement output changes state from a zero to a one, triggering the DM and generating a carry pulse. The carry pulse enters the carry propagation line through the gate labeled 4 in Figure 18. If \( x_i \) and \( y_i \) are both at the logic-zero level, the complement output of the AC is at the logic-one level and does not change state and no carry is generated. This is the situation in which the carry out is determined solely by the summands. The true output of the AC is then at the logic-zero level and attains this level 120 ns before a carry is generated in the preceding stages. This level is then applied to the gate labeled 3 in Figure 18, blocking the transmission of any carry generated in the preceding stages. A carry entering the \( i^{th} \) stage is routed through gates 5 and 6 to the AC input, causing a change in state. The assimilation of the carry then removes the transmission block, but does so 120 ns after the carry pulse has ceased to be present at gate 3.

If \( x_i \) and \( y_i \) are different, \( x_i \oplus y_i \) is true, and the true output of the AC takes on the logic-one level. If \( x_i \) is one and \( y_i \) is zero, there is no AC state change. If \( y_i \) is one and \( x_i \) is zero, the AC would change state from a complement logic-one level to a true logic-one level. In either case, no carry would be generated. The resulting situation is that the carry out is determined solely by the carry in. The true level is applied to gate 3, enabling the transmission of an incoming carry. A carry into the stage is then assimilated through gates 5 and 6, causing a state change of the AC. This removes the gating level to gate 3, 120 ns after the carry is transmitted. No new carry is generated by the stage because of the removal of the carry-generation enabling level, \( e \), from gate 2. The enabling level, \( e \), is generated by a DM, which is actuated by the accumulate pulse. The DM pulse width is set at 370 ns to gate all primary carries generated and to block all secondary carries generated.
A timing diagram for one stage of the accumulator is shown in Figure 19. Assuming the necessary conditions, the time required to insert a carry into the propagation line after the end of the accumulate pulse is a 30-ns delay each for the complementing gate and gate 6, a 60-ns delay for the state change of the AC, a 60-ns delay for the DM, a 100-ns delay for the DM pulse width, a 30-ns delay for gate 2, and 30 ns for the insertion-gate delay, or a total of 340 ns. There is a delay of 30 ns per gate per stage, or 60 ns per stage through the carry propagation line. It requires 120 ns to assimilate a carry in a stage. The equation for determining the total accumulation time for two binary numbers is

\[ A = (460 + 60 n') \]  

(30)

where \( n' \) is the number of stages through which a carry must be propagated before assimilation. The total time required to generate a carry in the first stage of a 30-stage accumulator, propagate it through the entire length, and assimilate it in the last stage is

\[ A = [460 + 60(28)] \text{ ns} = 2.14 \mu s \]

The equipment that must be added to the accumulator previously described to facilitate carry gating consists of four 2-input NAND gates per stage and one DM unit to generate the carry-generation enable pulse, \( e \). The cost figure for the portion of this accumulator concerned with carry propagation is then 756.

The gated-carry design gives a speed increase of \( 6.50 \mu s / 2.14 \mu s \) or 3.37 to 1 over the simple iterative accumulator. The cost increase is 21 percent.

**Carry-Completion Detection**

The gated-carry method produces an appreciable decrease in the maximum carry-propagation time. To operate the accumulator on a fixed-time basis, maximum delays with an added safety factor must be considered, and the accumulation rate must always allow time for the maximum length of carry propagation.
Figure 19. Timing diagram for carry generation and assimilation (gated carry).
As pointed out previously, the average carry propagation length is \( \log_2 n \). This value is 4.91 for an \( n \) of 30, so more rapid accumulation could be accomplished using a variable rate, with the rate controlled by the completion of carry propagation. Let \( r_i \), \( i = 1, 2, \ldots, 30 \), be the carry pulse inserted into the carry-propagation line by the \( i \)th stage. Carry propagation is complete when a pulse is not present at any of the insertion points. The required condition for carry completion, \( D \), is then

\[
D = \overline{r_1} \cdot \overline{r_2} \cdot \overline{r_3} \cdot \ldots \cdot \overline{r_{30}}
\]  

(31)

A carry-completion-detection circuit that could be used with the gated-carry accumulator is shown in Figure 20. From the figure, the carry-completion signal is a positive level described by the equation

\[
D = \overline{d} \cdot \overline{r_1} \cdot \overline{r_2} \cdot \ldots \cdot \overline{r_{10}} \cdot \overline{d} \cdot \overline{r_{11}} \cdot \overline{r_{12}} \cdot \ldots \cdot \overline{r_{20}}
\]

\[
\cdot \overline{d} \cdot \overline{r_{21}} \cdot \overline{r_{22}} \cdot \ldots \cdot \overline{r_{30}}
\]

Applying DeMorgan's theorem gives

\[
D = d \cdot \overline{r_1} \cdot \overline{r_2} \cdot \ldots \cdot \overline{r_{30}}
\]

where \( d \) is the detection-enable level. This level is generated from the accumulate pulse by means of a DM and a T flip-flop. The delay of the DM is set to ensure that any carry being propagated will be present at the output of the gates labeled 5 before the detection system becomes operative. This prevents the generation of a false completion signal. The completion detection signal then resets the flip-flop in preparation for the next accumulate pulse, which will also be controlled by \( D \). The timing diagram for this operation is shown in Figure 21. The overlap of carry signal at adjoining sensing points along the carry propagation line is shown in Figure 22 and is seen to be 40 ns.

The timing diagram for the generation and assimilation of a carry is shown in Figure 23. The maximum delay times of circuit elements were used.
Figure 20. Carry-completion detection circuit.
Figure 21. Timing diagram for detection enable signal generation.
Carry generated

Carry out of gate $2_i$

Carry out of gate $4_i$

Carry out of gate $5_i$

"d" generated

Carry out of gate $3_{i+1}$

Carry out of gate $4_{i+1}$

Carry out of gate $5_{i+1}$

Figure 22. Continuity of carry signal at sensing gates.

in defining this diagram. It is seen that 340 ns are required to insert a carry into the carry propagation line. The time required to assimilate a carry after it has arrived at a stage is 120 ns. The equation for accumulation time is

$$A = (460 + 60n') \text{ ns}$$  \hspace{1cm} (32)

where $n'$ is the number of stages through which a carry must be propagated before it is assimilated. The time required to generate a carry and propagate it through the entire length of a 30-stage accumulator is

$$A = [460 + 60(28)] \text{ ns}$$

$$= 2.14 \mu\text{s}$$
Figure 23. Timing diagram for carry generation and assimilation in carry-completion detection realization.
The carry propagation complete signal is generated 210 ns after the last carry has ceased to be present at a sensing point, as shown in Figure 24. The final accumulator-register transition is completed 90 ns from that time. The difference, 120 ns, must be added to the 2.14 μs to yield the total maximum accumulation time of 2.26 μs. For a one-stage propagation, $x_{i-1}$ would change state 460 ns after the end of the accumulate pulse. The total time for the accumulation step in this case is 460 ns + 120 ns = 580 ns. If no carries are generated in an accumulation, the limiting factors on speed are the generation of the detection enable signal, $d$, and its propagation through the completion detection circuit. The minimum accumulation time is 300 ns + 210 ns = 510 ns.

For the theoretical average propagation length of 4.91 (or 5) stages, the accumulation time is

340 ns for carry generation and insertion into carry propagation line

+ 5 × 60 ns for propagation through 5 stages

+ 30 ns for the delay of gate 5

+ 210 ns for completion circuit delay

= 880 ns.

Compared with the accumulation time of 6.50 μs which must always be allowed in the accumulator previously described, this is an increase in speed of 7.39 to 1 for the addition of a large number of summands with random bit distribution.

The circuit elements that must be added to the gated-carry accumulator to implement the carry-completion detection feature are three 11-input NAND gates, one 3-input NAND gate, four 2-input NAND gates, two flip-flops, and one delay multivibrator. From Table 3 it can be seen that the cost figure will increase by 62, yielding a total of $818. This is a 29.8-percent cost increase over that of the simple iterative accumulator.
Figure 24. Timing diagram for carry-completion signal generation.
Carry-Skip Techniques

The accumulator under investigation uses ones complements representation of negative numbers with end-around carry, so a skip distribution using equal size groups is considered [6]. The accumulator of n bits is divided into k groups of m bits each so that

\[ mk = n \]  \hspace{1cm} (33)

The greatest carry propagation time results when the conditions for the summands are

\[ x_1 \cdot y_1 \hspace{1cm} (34) \]

\[ x_1 \oplus y_1 \hspace{1cm} i = 2, 3, \ldots, n-1, \hspace{1cm} (35) \]

and

\[ x_{30} \cdot y_{30} \hspace{1cm} (36) \]

The time required for propagation is then

\[ \tau' = [1 + (m-1) + (k-2) + (m-1)] \text{ t.u.} \]
\[ = [2m + k-3] \text{ t.u.} \]
\[ = (2m + n/m - 3) \text{ t.u.} \hspace{1cm} (37) \]

where t.u. is the propagation time of two NAND gates [6]. Differentiating equation (37) with respect to m and equating to zero yields
\[2 - n/m^2 = 0,\]
\[m^2 = n/2,\]
\[m^2 = \sqrt{n}/2,\]  \hspace{1cm} (38)

or

\[m^2 = km/2,\]  \hspace{1cm} (39)
\[k = 2m\]

for the minimum propagation time. For \( n = 30 \), the approximations \( m = 5 \) and \( k = 6 \) are used. The group containing the lower-order bits is called group 1.

The maximum propagation time for a 30-bit accumulator is

\[\tau' = [1 + (5-1) + (6-1) + (5-1)] \times 60 \text{ ns}\]
\[= 780 \text{ ns}\]

The time required to generate a carry was 340 ns, and the time to assimilate a carry after it arrived at a stage was 120 ns. The maximum accumulation time for a 30-stage accumulator using one-layer carry-skips is then

\[A = (460 + \tau'_{30}) \text{ ns}\]
\[= 1.24 \mu s\]

Equations (11) through (16) will be rewritten here using notation for a five-stage group. Let

\[r_{k,i} = (x_{k,i} \oplus y_{k,i}) c_{k,i} + x_{k,i} \cdot y_{k,i}\]  \hspace{1cm} (40)
be the carry signal produced by the \(i\)\(^{th}\) stage of the \(k\)\(^{th}\) group, where \(y_{k,i}\) and \(x_{k,i}\) are the summands of the \(i\)\(^{th}\) stage of the \(k\)\(^{th}\) group and \(c_{k,i}\) is the carry into that stage and

\[
c_{k,i} = r_{k,i-1}
\]  

Let

\[
t_{k,i} = x_{k,i} \oplus y_{k,i}
\]

be the condition for the transmission of a carry through a stage. This signal is present at the true output of the \(k,i\)\(^{th}\) stage accumulator register 110 ns after the accumulate pulse has ended (Fig. 23).

Let

\[
g_{k,i} = x_{k,i} \cdot y_{k,i}
\]

be the condition for the generation of a carry in the \(k,i\)\(^{th}\) stage. This signal is generated by the \(k,i\)\(^{th}\) DM when the complement output of the \(k,i\)\(^{th}\) accumulator register becomes a one. Figure 23 shows that \(g_{k,i}\) is generated 280 ns after the end of the accumulate pulse. The carry produced by the \(k\)\(^{th}\) group is then

\[
r_k = g_{k,5} + t_{k,5} \cdot g_{k,4} + t_{k,5} \cdot g_{k,3} + t_{k,5} \cdot t_{k,4} \cdot g_{k,2} + t_{k,5} \cdot t_{k,4} \cdot t_{k,3} \cdot g_{k,1} + t_{k,5} \cdot t_{k,4} \cdot t_{k,3} \cdot t_{k,2} \cdot c_k
\]  

(44)
Equation (44) is satisfactory for carry levels, but with carry pulses 100 ns in width, Figure 25 shows that for \( t_1 \cdot t_2 \cdot t_3 \cdot t_4 \cdot t_5 \) true, each \( c_k \)
pulse into a group will result in two \( r_k \) pulses out of that group. For pulse
carries, equation (44) must then be changed to

\[
r_k = (t_k, 1 \cdot t_k, 2 \cdot t_k, 3 \cdot t_k, 4 \cdot t_k, 5) \cdot (g_k, 5

+ t_k, 5 \cdot g_k, 4 \cdot t_k, 3 \cdot t_k, 4 \cdot g_k, 3

+ t_k, 5 \cdot g_k, 2

+ t_k, 3 \cdot t_k, 2 \cdot g_k, 1)

+ (t_k, 1 \cdot t_k, 2 \cdot t_k, 3 \cdot t_k, 4 \cdot t_k, 5) c_k

= (t_k, 1 \cdot t_k, 2 \cdot t_k, 3 \cdot t_k, 4 \cdot t_k, 5) g_k, 5

+ (t_k, 1 \cdot t_k, 2 \cdot t_k, 3 \cdot t_k, 4 \cdot t_k, 5) \cdot (t_k, 5 \cdot g_k, 4

+ t_k, 5 \cdot g_k, 3 + t_k, 5 \cdot t_k, 4 \cdot t_k, 3 \cdot f_k, 2

+ t_k, 5 \cdot g_k, 2)

+ t_k, 1 \cdot t_k, 2 \cdot t_k, 3 \cdot t_k, 4 \cdot t_k, 5) c_k \) (45)

The carries that must be produced by the stages within the group are described
by the following equations:

\[
r_k, 1 = c_k \cdot t_k, 1 + g_k, 1 \) (46)
Figure 25. Carry propagation through group with $t_1 \cdot t_2 \cdot t_3 \cdot t_4 \cdot t_5$ true.
\[ r_{k,2} = r_{k,1} \cdot t_{k,2} + g_{k,2} \]
\[ = c_k \cdot t_{k,1} \cdot t_{k,2} + g_k,1 \cdot t_{k,2} + g_{k,2} \]  \hspace{1cm} (47)

\[ r_{k,3} = r_{k,2} \cdot t_{k,3} + g_{k,3} \]
\[ = c_k \cdot t_{k,1} \cdot t_{k,2} \cdot t_{k,3} + g_k,1 \cdot t_{k,2} \cdot t_{k,3} \]
\[ + g_k,2 \cdot t_{k,3} + g_k,3 \]  \hspace{1cm} (48)

\[ r_{k,4} = r_{k,3} \cdot t_{k,4} + g_{k,4} \]
\[ = c_k \cdot t_{k,1} \cdot t_{k,2} \cdot t_{k,3} \cdot t_{k,4} + g_k,1 \cdot t_{k,2} \cdot t_{k,3} \cdot t_{k,4} \]
\[ + g_k,2 \cdot t_{k,3} \cdot t_{k,4} + g_k,3 \cdot t_{k,4} + g_k,4 \]  \hspace{1cm} (49)

\[ r_{k,5} = t_{k} \]  \hspace{1cm} (50)

A reduced block diagram of the five-stage group using NAND gates to produce the combinational functions in equations (46) through (50) is given in Figure 26. To show that this circuit does satisfy the required conditions, DeMorgan's theorem is applied to equations written from the diagram. The carry-generation enable level, \( e \), is a necessary condition for \( g_{k,1} \) as explained in the section on gated carry, but will be omitted from the equations as it does not directly affect the argument.

\[ r_{k,1} = c_k \cdot t_{k,1} \cdot g_{k,1} \]
\[ = c_k \cdot t_{k,1} + g_{k,1} \]
Figure 26. Carry skip group k.
\[ r_{k,2} = r_{k,1} \cdot t_{k,2} \cdot g_{k,2} \]
\[ = r_{k,1} \cdot t_{k,2} + g_{k,2} \]

\[ r_{k,3} = r_{k,2} \cdot t_{k,3} \cdot g_{k,3} \]
\[ = r_{k,2} \cdot t_{k,3} + g_{k,3} \]

\[ r_{k,4} = r_{k,3} \cdot t_{k,4} \cdot g_{k,4} \]
\[ = r_{k,3} \cdot t_{k,4} + g_{k,4} \]
\[ = c_k \cdot t_{k,1} \cdot t_{k,2} \cdot t_{k,3} \cdot t_{k,4} + g_{k,1} \cdot t_{k,2} \]
\[ \cdot t_{k,3} \cdot t_{k,4} + g_{k,2} \cdot t_{k,3}, 4 \cdot g_{k,3} \]
\[ \cdot t_{k,4} + g_{k,4} \]

It is seen that these equations are in agreement with equations (46) through (49).

The carry produced by the fifth stage of the group is

\[ r_{k,5} = r_{k} = g_{k,5} \cdot t_{k,1} \cdot t_{k,2} \cdot t_{k,3} \cdot t_{k,4} \cdot t_{k,5} \]
\[ \cdot r_{k,4} \cdot t_{k,5} \cdot (t_{k,1} \cdot t_{k,2} \cdot t_{k,3} \cdot t_{k,4} \cdot t_{k,5}) \]
\[ \cdot c_k \cdot (t_{k,1} \cdot t_{k,2} \cdot t_{k,3} \cdot t_{k,4} \cdot t_{k,5}) \]
\[
\begin{align*}
&= g_{k,5} \cdot (t_{k,1} \cdot t_{k,2} \cdot t_{k,3} \cdot t_{k,4} \cdot t_{k,5}) \\
&\quad + r_{k,4} \cdot t_{k,5} (t_{k,1} \cdot t_{k,2} \cdot t_{k,3} \cdot t_{k,4} \cdot t_{k,5}) \\
&\quad + c_k (t_{k,1} \cdot t_{k,2} \cdot t_{k,3} \cdot t_{k,4} \cdot t_{k,5})
\end{align*}
\]

It must be shown that

\[
\begin{align*}
&= (t_{k,1} \cdot t_{k,2} \cdot t_{k,3} \cdot t_{k,4} \cdot t_{k,5}) \cdot (g_{k,4} \cdot t_{k,5} \\
&\quad + g_{k,3} \cdot t_{k,4} \cdot t_{k,5} + g_{k,2} \cdot t_{k,3} \cdot t_{k,4} \cdot t_{k,5} \\
&\quad + g_{k,1} \cdot t_{k,2} \cdot t_{k,3} \cdot t_{k,4} \cdot t_{k,5})
\end{align*}
\]

We see that

\[
\begin{align*}
&= t_{k,5} \cdot r_{k,4} = c \cdot (t_{k,1} \cdot t_{k,2} \cdot t_{k,3} \cdot r_{k,4} \cdot t_{k,5}) \\
&\quad + g_{k,4} \cdot t_{k,5} + g_{k,3} \cdot t_{k,4} \cdot t_{k,5} \\
&\quad + g_{k,2} \cdot t_{k,3} \cdot t_{k,4} \cdot t_{k,5} \\
&\quad + g_{k,1} \cdot t_{k,2} \cdot t_{k,3} \cdot t_{k,4} \cdot t_{k,5}
\end{align*}
\]
Then

\[
(t_k, 1 \cdot t_k, 2 \cdot t_k, 3 \cdot t_k, 4 \cdot t_k, 5) \cdot t_k, 5 \cdot r_k, 4
\]

\[= c \cdot (t_k, 1 \cdot t_k, 2 \cdot t_k, 3 \cdot t_k, 4 \cdot t_k, 5) \rightarrow 0
\]

\[
\cdot (t_k, 1 \cdot t_k, 2 \cdot t_k, 3 \cdot t_k, 4 \cdot t_k, 5)
\]

\[+ (t_k, 1 \cdot t_k, 2 \cdot t_k, 3 \cdot t_k, 4 \cdot t_k, 5) \cdot (g_k, 4 \cdot t_k, 5
\]

\[+ g_k, 3 \cdot t_k, 4 \cdot t_k, 5 + g_k, 2 \cdot t_k, 3 \cdot t_k, 4 \cdot t_k, 5
\]

\[+ g_k, 1 \cdot t_k, 2 \cdot t_k, 3 \cdot t_k, 4 \cdot t_k, 5)
\]

which is the required result.

It is necessary that the proper gating levels be at gates 2, 3, and 13 of Figure 26 before a carry pulse reaches them. Figure 27 shows that the blocking level will be present 50 ns before \(g_k, 5\) reaches gate 2 and 90 ns before \(r_k, 4\) reaches gate 3. The gating level will be present 80 ns before \(c_k\) reaches gate 13.

The 30-stage accumulator is made up of six groups of five stages with \(c_k = r_{k-1}\), except that \(c_1 = r_6\), the end-around carry. This is shown in Figure 28.

The accumulator using carry skips yields a speed advantage of 6.50/1.24 \(\mu s\) or 5.21/1 over the accumulator previously described.

The components that must be added to the gated-carry circuit to facilitate carry skips are one 6-input NAND gate, one 2-input NAND gate, and one inverter for each group. In every fifth stage, two 2-input NAND gates must be replaced with 3-input NAND gates. The cost figure for the 30-stage accumulator with carry skips is then 816. This is an increase of 29.5 percent over the cost of the simple iterative accumulator.
Accumulate Pulse

t_{k,1}'s generated

\((t_{k,1} \cdot t_{k,2} \cdot t_{k,3} \cdot t_{k,4} \cdot t_{k,5})\) to gates 2 and 3 (blocking level)

\(g_{k,5}\) to gate 2

\(r_{k,4}\) to gate 3

\((t_{k,1} \cdot t_{k,2} \cdot t_{k,3} \cdot t_{k,4} \cdot t_{k,5})\) to gate 13

\(c_k\) to gate 13

Time (ns)

Figure 27. Gating level timing diagram.
Figure 28. Six-group representation.
DISCUSSION AND CONCLUSIONS

The relative cost figures and accumulation times for the simple iterative accumulator and accumulators using fast carry techniques are presented in Table 4. It is seen that all three realizations investigated offer a considerable decrease in accumulation time without too great a difference in cost figure.

**TABLE 4. ACCUMULATOR COST AND TIME REQUIREMENTS**

<table>
<thead>
<tr>
<th>Accumulator Design</th>
<th>Cost Figure</th>
<th>Max Accum Time (µs)</th>
<th>Avg Accum Time (µs)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Simple iterative accumulator</td>
<td>630</td>
<td>6.50</td>
<td>6.50</td>
</tr>
<tr>
<td>Accumulator with gated carry</td>
<td>756</td>
<td>2.14</td>
<td>2.14</td>
</tr>
<tr>
<td>Accumulator with carry-completion detection</td>
<td>818</td>
<td>2.26</td>
<td>0.88</td>
</tr>
<tr>
<td>Accumulator with carry skips</td>
<td>816</td>
<td>1.24</td>
<td>1.24</td>
</tr>
</tbody>
</table>

The comparison may be formalized by the use of a criterion similar to that used by Buria [5] for parallel adders. The criterion is

\[ Q = \frac{t}{t_0} \cdot \frac{F}{F_0} \cdot 100 \]  

(51)

where

\[ t = \text{accumulation time of the circuit under consideration} \]

\[ t_0 = \text{accumulation time of the iterative accumulator} \]

\[ F = \text{cost figure of the network under consideration} \]

\[ F_0 = \text{cost figure of the iterative accumulator} \].
The results of applying this criterion are shown in Table 5. Using this criterion, the circuit for carry-completion detection appears best. However, the control circuitry for it is more complicated than for the others considered. The average accumulation time was calculated using carry propagation lengths resulting from the accumulation of numbers with random bit distribution. If this distribution is not random, the accumulation time for the carry-completion detection circuit could be greater than that for the carry-skip circuit.

**TABLE 5. COMPARISON OF ACCUMULATOR DESIGNS**

<table>
<thead>
<tr>
<th>Accumulator Design</th>
<th>$t$ (μs)</th>
<th>$t/t_0 \times 100$</th>
<th>F</th>
<th>$F/F_0$</th>
<th>Q</th>
</tr>
</thead>
<tbody>
<tr>
<td>Simple iterative accumulator</td>
<td>6.50</td>
<td>100</td>
<td>630</td>
<td>1</td>
<td>100</td>
</tr>
<tr>
<td>Accumulator with gated carry</td>
<td>2.14</td>
<td>33</td>
<td>756</td>
<td>1.2</td>
<td>40</td>
</tr>
<tr>
<td>Accumulator with carry-completion detection</td>
<td>(Avg)</td>
<td>14</td>
<td>818</td>
<td>1.3</td>
<td>18</td>
</tr>
<tr>
<td></td>
<td>0.880</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Accumulator with carry skips</td>
<td>1.24</td>
<td>19</td>
<td>816</td>
<td>1.3</td>
<td>25</td>
</tr>
</tbody>
</table>

In some applications the gated-carry circuit might be desirable. If the accumulation time for it is acceptable, it has the advantage of having fewer components.

The gated-carry circuit offers the fastest accumulation time for fixed-time operation. In general, the carry-completion detection circuit would be recommended for application when a variable accumulation rate is acceptable; the carry-skip circuit would be recommended when a fixed accumulation rate is desired.
A discussion of Majerski's methods for determining optimal and economical skip distributions in an adder with a maximum possible number of stages for a given maximum allowable propagation time will require the introduction of several definitions and notational conventions. One-layer skip distribution occurs when adder stages are included in carry skips, and every stage is included at most in one skip.

Two-layer skip distribution occurs when adder stages are included in two adder skips, and every stage is included at most in two skips. If two skips include common stages, all stages contained in the shorter skip are included in the longer skip. The longer skip is called first-layer skip, and the shorter one is called second-layer skip. If none of the stages in a skip are contained in another skip, the skip is treated as a first-layer skip.

An adder with one-layer skip distribution is divided into groups, each of which includes all stages in a skip circuit or a single stage not contained in a skip.

An adder with two-layer skip distribution is divided into sections, which include all stages composing one first-layer skip or a single stage not included in skips. A section is divided into groups. A group includes all stages in a second-layer skip or a single stage in a second-layer skip. Sections and groups are numbered successively, starting with the least significant bits.

The notation used is:

\[ n = \text{Number of adder stages} \]

\[ m = \text{Number of groups in an adder with one-layer skip distribution} \]

\[ k_i, (i = 1, 2, \ldots, m) = \text{Number of stages of the } i^{\text{th}} \text{ group of an adder with one-layer skip distribution} \]
\( p \) = Number of skips in an adder with one-layer skip distribution

\( M \) = Number of sections in an adder with two-layer skip distribution

\( K_i, (i = 1, 2, \ldots, M) \) = Number of stages in the \( i^{th} \) section of an adder with two-layer skip distribution

\( P \) = Number of first-layer skips in an adder with two-layer skip distribution

\( m_i, (i = 1, 2, \ldots, M) \) = Number of groups in the \( i^{th} \) section of an adder

\( k_{ij}, (i = 1, 2, \ldots, M); j = 1, 2, \ldots, m_i \) = Number of stages in the \( j^{th} \) group of the \( i^{th} \) section of an adder

\( p_i, (i = 1, 2, \ldots, M) \) = Number of second-layer skips in the \( i^{th} \) section of an adder

\( d \) = Maximum propagation time of a NOR gate

\( T \) = Maximum carry propagation time (mcpt); the greatest carry propagation time for all possible combinations of adder input variables.

The optimal skip distribution in an \( n \)-stage adder is defined as occurring when, for a given \( n \), \( T \) is minimal and the number of skips is minimal. The economical skip distribution is defined as occurring when, for a given \( n \) and \( T \), the number of skips is minimal.

The mean carry propagation time (mcpt) in the NOR gate adder with one-layer skip distribution is given as:

\[
T = \text{maximum } (T_1, T_2), \text{ where }
\]

\[
T_1 = \text{maximum } [ (k_\alpha - 2) + (\beta - \alpha - 1) + k_\beta ]_{\alpha, \beta} \tag{A-1}
\]
\[ T_2 = (m - 1) + \max_{\alpha} (k_\alpha - 1) \]  

and where the \( \alpha, \beta \) indexes of groups satisfy the conditions

\[ 1 \leq \alpha \leq m, \; \alpha < \beta < m + \alpha \text{ and } k_{m+1} + k_1 \]

The terms \( k_\alpha - 2 \), \( k_\beta \), and maximum \( k_\alpha - 1 \) denote maximum times of carry propagation through positions of the groups in which the carry propagation begins and ends. The -2 in the term \( k_\alpha - 2 \) is due to the carry propagation time beginning with the change of the carry control signal rather than at the change of states of the adder inputs.

For the optimal skip distribution in an \( n(T) \)-position adder, the maximum number of adder positions is the function of \( m e p T, \; T \geq 5 \), given by the formulas

\[ \nu_T = \left( \frac{T + 7}{4} \right)^2 \]  

\[ n(T) = 2E \frac{(\nu_T)^2}{2} = 2E \left[ \frac{(T + 7)^2}{32} \right] \]

where \( \nu_T \) is defined by equation (A-4).

For odd values of \( T \) for which \( T/8 \) does not produce a remainder equal 3, the optimal skip distribution is the one in which the numbers of stages in groups are \( k \) and \( 1 \) alternately. For this case, the following formulas apply:

\[ k = 2E \left[ \frac{1}{2} \left( K_T' + 1 + \sqrt{\nu_T - n(T)} \right) \right]^{-1} \]

\[ K_T' = \frac{T + 3}{4} \]
\[m = T - 2k + 5 \text{ (m \ldots even number)} \quad (A-8)\]

For a 36-stage adder, the values given by these formulas are a minimal value of \(T = 17\) and a skip distribution of stages as \(5, 1, 5, 1, 5, 1, 5, 1, 5, 1\).

For the values of \(T\) such that \(T/8\) gives a remainder of 3, the optimal skip distribution is the one with the numbers of stages in successive fours of groups equaling \(k-2, 1, k, 1\). In this case, the following formulas are used:

\[k = 2E \left[ \frac{1}{2} \left( K''' + 1 + \sqrt{\frac{T - n(T)}{n(T)}} \right) \right] - 1 \quad , \quad (A-9)\]

\[K''' = \frac{T + 7}{4} \quad , \quad (A-10)\]

\[m = T - 2K + 7 \text{ (m \ldots number divisible by 4)} \quad . \quad (A-11)\]

To determine an economical skip distribution, the distributions for two adders are computed and the one with the smaller number of skips is chosen.

The first adder, with \(k \text{ and } 1\) stages in groups alternately, is determined by formulas \((A-4), (A-6), (A-7), \text{ and } (A-8)\), with \(n\) replacing \(n(T)\) in \((A-6)\). An adder with \((m/2)(k + 1)\) stages is obtained. If \((m/2)(k + 1)\) is greater than \(n\), \((m/2)(k + 1) - n\) stages are canceled. This should be done in pairs in groups and preferably in such a way as to cancel one skip.

For the adder with \(k-2, 1, k, 1\) positions in successive fours of groups, formulas \((A-4), (A-5), (A-9), (A-10), \text{ and } (A-11)\), with \(n\) replacing \(n(T)\) in \((A-10)\), apply. The number of stages obtained is

\[m' (k - 1) + m'' (k + 1) \quad , \quad \]
The adder with the smaller number of skips is then the one with economical skip distribution.

The mcpt for an adder with two-layer skip distribution is given by

\[ T = \text{maximum} \ (T_1, T_2, T_3, T_4) \]

where

\[ T_1 = \underset{\phi, \psi, \alpha, \beta}{\text{maximum}} \ \left\{ \left[ (k_{\phi \alpha} - 2) (m_{\phi} - \alpha) \right] + (\psi - \phi - 1) \right. \]

\[ + \left. \left[ (\beta - 1) + k_{\psi \beta} \right] \right\} \]

\[ 1 \leq \phi \leq M, \ \phi < \psi < M + \phi, \ 1 \leq \alpha \leq m_{\phi}, \ 1 \leq \beta \leq m_{\psi}, \ \text{and} \ \ k_{M+i,j} = k_{ij} \]

\[ T_2 = \underset{\phi, \alpha, \beta}{\text{maximum}} \ \left[ (k_{\phi \alpha} - 2) + (\beta - \alpha - 1) k_{\phi \beta} \right] \]

\[ 1 \leq \phi \leq M, \ 1 \leq \alpha < \beta \leq m_{\phi} \]

\[ T_3 = (M - 1) + \underset{\phi, \alpha}{\text{maximum}} \ \left[ (m_{\phi} - 1) + (k_{\phi \alpha} - 1) \right] \]

\[ m' = E \left( \frac{m + 2}{4} \right), \ m'' = E (m/4) \]
The methods presented here apply exclusively to the NOR gate adders with an even number of stages, in which $T$ always takes odd values.

In an adder composed of an even number of $M$ sections, having $K$ and $I$ stages alternately, with identical skip distributions in multistage sections, $T$ may be expressed as:

$$T = t + M - 3 \quad (t \text{ an even number})$$  \hspace{1cm} (A-21)

where $t$ is a function of the number of stages and of the skip distribution in a section. Formula (A-21) defines $t$.

Optimal skip distribution in a section occurs when, for a given $t$, the number $K^{(t)}$ positions in a section is a maximum and, with maximum $K^{(t)}$, the number of skips in the section is a minimum. The parameters for such a distribution are given by the following formulas:

$$m^{(t)} = 2E\left(\frac{t}{6}\right) + 1 \quad \text{if } t \geq 16$$  \hspace{1cm} (A-22)

$$m^{(t)} = 2E\left(\frac{t + \frac{2}{3}}{6}\right) + 1 \quad \text{if } t < 16$$  \hspace{1cm} (A-23)
\[ k_j(t) = \frac{1}{2} \left[ 1 + (-1)^j \right] + \frac{1}{2} \left[ 1 - (-1)^j \right] \times \min \left[ 2E \left( \frac{t+4}{4} \right) - j, 2E \left( \frac{t-2m(t)+4}{4} \right) + j \right] \]
for \( j = 1, 2, \ldots, m(t) \) \hspace{1cm} (A-24)

\[ K(t) = 2E \left[ \frac{(2t - 3m(t)) m(t)}{16} + 2t + 6m(t) \right] + 1 \] \hspace{1cm} (A-25)

The parameters \( m(t), k_j(t), K(t) \) take exclusively odd values.

Majerski found for \( T = 13, t = 8, t_1 = 8, 1, 8, 1, 8, 1, 8, 1, n^{(T)} = 32 \), and for \( T = 15, t = 10, t_1 = 12, 1, 8, 1, 12, 1, 8, 1, n^{(T)} = 34 \).

The optimal skip distribution in an \( n^{(T)} \)-position adder with the maximum number of positions for a given \( T \geq 9 \) may be determined as follows:

\[ \frac{2T - 6}{3} \leq t \leq \frac{2T + 9}{3} \] \hspace{1cm} (A-26)

\[ M = T - t + 3 \] \hspace{1cm} (A-27)

For \( t = 6 \) and \( t = 8, 12, 16, \ldots \), the adder is made up of sections of \( K(t) \) and 1 stages alternately. The number of adder stages is

\[ n' = \frac{1}{2} M \left( K(t) + 1 \right) \] \hspace{1cm} (A-28)

For \( t = 10, 14, 18, \ldots \), successive adder sections are taken according to the parameters given by the terms of sequences
if $M$ is a number divisible by 4 and

$$t+2, 1, t-2, 1, t+2, 1, t-2, 1, \ldots, t+2, 1, t-2, t, 1 \quad (A-30)$$

if $M$ is a number not divisible by 4.

The 1 terms in formulas (A-29) and (A-30) correspond to one-stage sections. A cyclic change of adder stages is permitted. The number of adder stages is then

$$n'' = E\left(\frac{M}{4}\right)\left(K^{(t+2)} + K^{(t-2)} + 2\right) + \frac{1}{2} \left[1 - (-1)^{M/2}\right]K^{(t)} + 1.$$  

(A-31)

From among the skip distributions for all $t$ obtained from formula (A-26), the one with maximum $n^{(T)}$ stages and for $n^{(T)}$ with minimum number of skips is chosen.

For an economical skip distribution for an $n$-stage adder with $t \geq 9$, the skip distribution is determined for two adders with

$$t = 2E\ (T/3), \ 2E\ (T/3) + 2, \ 2E\ (T/3) + 4, \ldots$$

In the first adder, accept $K^{(t)}$ and 1-stage sections alternately. The number of adder sections and the number of adder stages are given in formulas (A-27) and (A-28).

In the second adder (only if $t \geq 8$), accept the same number of adder sections, but for successive sections use instead of $t$ the successive terms of the sequences

$$t+2, 1, t-2, 1, \ldots, t+2, 1, t-2, 1$$

if $M$ is a number divisible by 4,  

(A-32)
or
\[ t+2, 1, t-2, 1, t+2, 1, t-2, 1, \ldots, t+2, 1, t-2, 1, t, 1 \]  \hspace{1cm} (A-33)

if \( M \) is a number not divisible by 4 and \( t = 10, 14, 18, \ldots \)

or
\[ 2, 1, t+2, 1, t-2, 1, t+2, 1, \ldots, t-2, 1, t+2, 1, t, 1 \]

if \( M \) is a number not divisible by 4 and \( t = 8, 12, 16 \ldots \)

Such skip distributions are determined for at least three successive integers \( t \) and for further \( t \) until both numbers \( n' \) and \( n'' \) are smaller than \( n \). Then for all skip distributions in the adders for which \( n' > n \) or \( n'' > n \), cancel \( n' - n \) or \( n'' - n \) stages to cancel the greatest number of skips. From among the obtained skip distributions of \( n \)-position adders, the one with the minimum number of skips is chosen.
REFERENCES


REFERENCES (Concluded)


BIBLIOGRAPHY


FAST CARRY ACCUMULATOR DESIGN

By William C. Mastin

The information in this report has been reviewed for security classification. Review of any information concerning Department of Defense or Atomic Energy Commission programs has been made by the MSFC Security Classification Officer. This report, in its entirety, has been determined to be unclassified.

This document has also been reviewed and approved for technical accuracy.

Bobby F. Walls
Chief, Optical Sensors Section

C. H. Mandel
Chief, Guidance and Control Division

P. H. Broussard, Jr.
Chief, Sensors Branch

F. B. Moore
Director, Astronics Laboratory
DISTRIBUTION

INTERNAL

DIR
DEP-T
AD-S
PM-PR-M

S&E-DIR

S&E-CSE-DIR
  Dr. Haeussermann

S&E-ASTR-DIR
  Mr. Moore

S&E-ASTR-A
  Mr. Hosenthien
  Miss Flowers

S&E-ASTR-G
  Mr. Mandel
  Dr. Doane
  Mr. Wood
  Mr. Broussard
  Mr. Walls
  Mr. Jones
  Mr. Doran
  Mr. Mastin (3)
  Mr. Carter
  Mr. Turner
  Mr. Richards

S&E-ASTR-S
  Mr. Wojtalik

S&E-ASTR-C
  Mr. Swearingen
  Mr. White

S&E-ASTR-ZX

A&TS-MS-IL (8)
A&TS-MS-IP (2)
A&TS-MS-H
A&TS-PAT
  Mr. Wofford
A&TS-TU (15)
  Mr. Winslow

EXTERNAL

Dr. C. C. Carrol (2)
Dept. of Electrical Engineering
Auburn University
Auburn, Alabama

Scientific and Technical Information Facility (2)
P. O. Box 33
College Park, Maryland 20740
Attn: NASA Representative (S-AK/RKT)

NASA—MSFC