Instruction Set Commutivity

P. Windley
Laboratory for Applied Logic
Department of Computer Science
University of Idaho, Moscow, Idaho 83843

Abstract- We present a state property called congruence and show how it can be used to demonstrate commutivity of instructions in a modern load–store architecture. Our analysis is particularly important in pipelined microprocessors where instructions are frequently reordered to avoid costly delays in execution caused by hazards. Our work has significant implications to safety and security critical applications since reordering can easily change the meaning and an instruction sequence and current techniques are largely ad hoc. Our work is done in a mechanical theorem prover and results in a set of trustworthy rules for instruction reordering. The mechanization makes it practical to analyze the entire instruction set.

1 Introduction.

Instruction pipelining is critical to good performance in modern microprocessors. Almost every microprocessor developed in the last several years contains an instruction pipeline. Significant attention has been given to the development of scheduling algorithms to reduce the occurrence of pipeline hazards because of the performance degradation that they cause. Typically, scheduling involves reordering the instruction stream produced by a compiler.

The approaches to code reordering are largely ad hoc with little or no analysis showing whether the rules are correct and under what conditions they should be avoided. Indeed, when asked about the rules that he gave for avoiding semantic changes one researcher, who developed the code scheduling algorithms for the C compiler in a widely available commercial UNIX system, stated “Its all ad hoc. It never occurred to me that there might be any other way to do it.”

Clearly, the current approach to code scheduling is unacceptable in safety and security critical applications. On one hand, modern pipelined architectures perform poorly without the aid of a compiler that is smart enough to reschedule code to avoid pipeline hazards. On the other hand, the reordering that the scheduler performs has the potential to initiate semantic changes in the code stream. We must either give up performance or live with untrustworthy code. Neither of these approaches is satisfactory.

This paper describes the analysis of a microprocessor instruction set for commutivity. We are interested in establishing, by analysis, under what circumstances the instructions can be reordered while avoiding semantic changes. The next section describes related work and the following sections present our analysis.

1This work was sponsored by the Department of Defense under University Research Program contract No. MDA904–91–C–7054

2See [HP90] for an excellent introduction to pipelining and pipeline hazards.
2 Related Work

The formal analysis of code reordering is related to at least three active areas of research: microprocessor verification, compiler verification, and the automatic generation of optimizers. This section discusses these three areas of research and relates them to the work presented in this paper.

Microprocessor Verification There have been numerous efforts to verify microprocessors. These efforts have been mostly for research purposes and none have included any kind of analysis of their instruction sets regarding code reordering. Only one formally verified general purpose microprocessor has been fabricated and it has so few features as to be impractical for real use. Descriptions of these efforts can be found in [Coh88, Joy89a, Joy88, Hun89]. It is important to note that none of these projects involved verification of a pipelined processor.

In [SB90], Srivas et al. describe the formal verification of a pipelined microprocessor called Mini Cayuga, comparable in complexity of design to that of Hunt’s FM8501. However, the structure and behavior of the pipeline were hidden from the abstract specification. Only prefetching of the next instruction was incorporated into the specification. This precluded the possibility of formally reasoning about pipeline hazards and instruction scheduling.

We are designing, specifying, and verifying a microprocessor called AVM-2. AVM-2 has a load store architecture and will be pipelined. The architecture of AVM-2 is largely the same as that of AVM-1, but the design is substantially different. We described early results of this research in [Win91] where we demonstrated the integrity of the supervisory mode of AVM-1. In this paper, we describe results regarding instruction set commutivity for AVM-2.

Compiler Verification There has been much work on verified compilers. Space considerations do not present a full treatment here. Joyce [Joy89b] gives an excellent review. Most of the early work [Rus77, Coh80, CM86] on the compiler correctness problem used idealizations of the hardware. Recent work by Joyce [Joy89b], Moore [Moo88], and Young [You89] have looked at compiler verification under the constraints of a real instruction set. None of these efforts has addressed code reordering although Young states that initial work on an optimizer has begun. Even so, our approach is different from Young’s due to the nature of the specification. The specifications in Young’s work are operational while ours are denotational.

Optimizer Generation Current approaches to instruction scheduling is largely ad hoc. Current compiler technology utilizes rule-based, heuristic algorithms for optimizing code sequences. Representative of the state of the art, the IBM RISC System/6000 XL compiler family uses special flags associated with opcodes to indicate instructions which are “dangerous” to move [War90, War92]. To date, there has not been any published results describing the application of formal methods to pipeline scheduling.

There has been some work on the automatic generation of optimizers from specifications of one sort or another.

In [Kes84], Kessler describes a tool called Peep. Peep is an architectural description driven peephole optimizer. The description of the architecture, given in LISP, is used to generate a table of optimizable instructions that can be used in a an optimizing compiler.
In [DF84], Davidson and Fraser present a system that generates peephole optimizations called PO. PO uses productions which describe the effect of assembly language instructions in a simulator to determine substitutions for 2 and 3 assembly instruction sequences.

While this work is interesting and related to the work presented here, these efforts differ in several important ways:

- The descriptions used for generating optimizations are not related to the implementation in anyway. Our work uses a specification that is related through proof to the implementational specification.

- It is not clear whether or not any kind of theory underlies the generation of the optimizers. As we will show later, there are concepts regarding reordering that can be generalized and used as a basis for reasoning about reordering.

- It is not clear how much faith can be placed on the simulation used to determine equivalent sequences. Our work will be done in a widely accepted theorem proving environment.

3 \textit{AVM-2}

We have designed a computer designated \textit{AVM-2} (\textit{A Verified Microprocessor}). \textit{AVM-2} is a second generation design that will be implemented in CMOS. The design, specification, and verification of \textit{AVM-1}, the predecessor to \textit{AVM-2}, are given in [Win90] where it is used as an example to demonstrate the utility of generic models in hardware verification. \textit{AVM-2}, like \textit{AVM-1}, will feature a RISC-like instruction set and a large register file. Unlike \textit{AVM-1}, \textit{AVM-2} will have a pipelined implementation.

The Registers. \textit{AVM-2} has a load–store architecture based on a large register file. The register file is divided into seven supervisor-mode registers and twenty-four general purpose registers.

Two additional registers are visible at the architectural level: the program counter and the program status word. The program counter (denoted \texttt{pc}) is used to sequence the computer—it indicates which instruction in memory to execute next. The program status word (denoted \texttt{psw}) is used to keep track of the status of the last ALU operation, whether or not interrupts are enabled, and the privilege level of the CPU.

The Instruction Set. \textit{AVM-2} has 30 programming level instructions. There is a group of eight, 3-argument (source A, source B, and destination) arithmetic and logical instructions and another group of 8 arithmetic and logical instructions that use two arguments and a 16-bit immediate value. There are 4 instructions for loading and storing registers. Only the load and store instructions communicate with memory. In addition, there are instructions for performing user interrupts, jumps, subroutine calls, and shifts.
4 Instruction Set Specification

The instruction set for AVM-2 has been formally specified as part of the design process. The specification represents a denotational description of the machine language. In this section, we present several of the state transition functions representing instructions. These examples will be used in later sections describing the analysis.

The instructions are modeled by state transition functions. In general, each function operates on a state tuple and an environment tuple. The state tuple, contains variables representing the register file, reg, the program status word, psw, the program counter, pc, and the memory, mem. The environment tuple contains variables representing the interrupt vector, ivec, the interrupt line, int, and the reset line, reset. Each function returns a state tuple updated to reflect the behavior of the instruction being modeled.

The NOOP instruction updates the state tuple by incrementing the program counter. No other actions are performed.

\[ \begin{align*}
\text{NOOP} & \quad \text{(reg, psw, pc, mem)} \\
& \quad \text{(ivec, int, reset)} = \\
& \quad \text{let new\_pc = inc pc in} \\
& \quad \text{(reg, psw, new\_pc, mem)}
\end{align*} \]

Note that NOOP is not an identity function, although it is often thought of that way. The fact that NOOP does affect the state and resides in memory affects its commutivity.

Other instructions are quite a bit more complicated than the NOOP instruction. For example, the ADD instruction is shown below:

\[ \begin{align*}
\text{ADD} & \quad \text{(reg, psw, pc, mem)} \\
& \quad \text{(ivec, int, reset)} = \\
& \quad \text{let } a = \text{EL (GetSrcA pc mem)} \text{ reg and} \\
& \quad \text{b = EL (GetSrcB pc mem) reg and} \\
& \quad \text{d = GetDest pc mem in} \\
& \quad \text{let result = add (a, b) in} \\
& \quad \text{let cflag = addp (a, b, result) and} \\
& \quad \text{vflag = aovfl (a, b, result) and} \\
& \quad \text{nflag = negp result and} \\
& \quad \text{zflag = zerop result and} \\
& \quad \text{sm = get\_sm psw and} \\
& \quad \text{ie = get\_ie psw in} \\
& \quad \text{let new\_reg = UPDATE\_REG psw d reg result and} \\
& \quad \text{new\_psw = mk\_psw(sm, ie, vflag,} \\
& \quad \text{ nflag, cflag, zflag) and} \\
& \quad \text{new\_pc = inc pc in} \\
& \quad \text{(new\_reg, new\_psw, new\_pc, mem)}
\end{align*} \]

The ADD instruction updates every member of the state tuple except the memory. The primary action, summing two registers and updating the register file accordingly is reflected by the updated register file, reg. The instruction also calculated new values for the overflow, carry, negative, and zero flags of the program status word, psw. The supervisory mode bit
and the interrupt enable bit remain unchanged, as expected. The program counter, pc, is incremented.

5 State Congruence

Our notion of state correspondence is motivated by the denotational description of the instruction set. Because the specification is denotational, we are interested in showing that state transitions made by one sequence of instructions are equivalent to the state transitions made by another sequence of instructions.

Unfortunately, the instructions sequences themselves are part of the state and so we cannot use equivalence as the relation between states. Instead, we relate the states using a property we call congruence. We call this relationship congruence because the states are equivalent except for (i.e. modulo) the ordering of the instruction sequences in memory.

Figure 1 illustrates congruence. As the figure shows, we start with two states, S1 and S2, which are related by the relation ρ. After they have been transformed by a sequence of instructions, we are left with two modified states S1' and S2'. Ideally, these new states are still related by ρ, but as we will see in Section 5.2, this is not always the case and so we show them related by ρ'.

The primary congruence relation that we use in the examples is given by the following predicate:
The predicate operates over a location and two state tuples. We say that the state tuples are congruent if their register files, \( \text{reg1} \) and \( \text{reg2} \), program status words, \( \text{psw1} \) and \( \text{psw2} \), and program counters, \( \text{pc1} \) and \( \text{pc2} \), are the same. Additionally, we require that the memories, \( \text{mem1} \) and \( \text{mem2} \) be the same except that the words located at \( \text{loc} \) and \( \text{loc}+1 \) in \( \text{mem1} \) are swapped in \( \text{mem2} \).

### 5.1 Swapping NOOP

To start with, we examine the commutivity of the NOOP instruction. While this may seem like a trivial problem, the problem is not as straightforward as it seems since NOOP does affect the state. Also, the difficulties encountered in NOOP commutivity are typical of other commutivity proofs.

The following theorem shows that NOOP can be commuted with any instruction that does not modify memory or alter program flow:

\[
\text{let } s_1 = (\text{reg1},\text{psw1},\text{pc1},\text{mem1}) \text{ and } \\
\text{s2} = (\text{reg2},\text{psw2},\text{pc2},\text{mem2}) \text{ and } \\
\text{e} = (\text{ivec},\text{ireq},\text{reset}) \text{ in } \\
\forall \text{ inst.} \\
(\text{inst} = \text{macro_inst (Opcode } s_1 \text{ e})) \Rightarrow \\
\text{Congruent pc2 } s_1 \text{ s2 } \\
\text{NON_MEM_INST (Opcode } s_1 \text{ e}) \Rightarrow \\
\text{let } s_1' = (\text{NOOP (inst } s_1 \text{ e) e) and } \\
\text{s2'} = (\text{inst (NOOP s2 e) e) in } \\
\text{Congruent pc2 } s_1' \text{ s2'}
\]

We assume that the environment does not change during execution of the two instructions (\( e \), representing the environment, is used as an argument for both of them). The theorem states that for every instruction in the instruction set, if the initial states are congruent the modified states are also congruent.

The theorem is not true for instructions that alter flow control because the NOOP would never execute in one case and some other instruction would execute in its place. For instructions that modify memory, we can prove a more restricted version that assumes that the memory instruction does not interfere with the program (instructions and data are stored in the same memory). We will see an example using a non-interference condition in the next section.
5.2 Swapping Arithmetic Instructions

The arithmetic instructions of AVM-2 that do not use the carry flag commute under weak congruence. Weak congruence is the same as strong congruence (defined above), except that it does not require equivalence for the entire program status word. Rather we require only that the supervisory mode bit and the interrupt enable bit be the same. Most instructions that modify the overflow, carry, negative, and zero flags of the program status word do so without regard to their previous values, so the value of the two program status words cannot be equal.

\[
\text{Weak Congruent loc} \ (\text{reg1,psw1,pc1,mem1}) = \]
\[
\begin{align*}
& (\text{reg1} = \text{reg2}) \land \\
& (\text{pc1} = \text{pc2}) \land \\
& ((\text{get_sm psw1}) = (\text{get_sm psw2})) \land \\
& ((\text{get_ie psw1}) = (\text{get_ie psw2})) \land \\
& (\forall a \ . \ (a = \text{address loc}) \land \\
& \quad \neg (a = \text{address (inc loc)}) \Rightarrow \\
& \quad (\text{fetch (mem1,a)} = \text{fetch (mem2,a)})) \land \\
& (\text{fetch (mem1,address loc)} = \\
& \quad \text{fetch (mem2,address (inc loc))}) \land \\
& (\text{fetch (mem2,address loc)} = \\
& \quad \text{fetch (mem1,address (inc loc))})
\end{align*}
\]

In order to commute two arithmetic instructions, we require that they be non-interfering. That is, the destination registers cannot be the same as the source registers. For example, the following instructions do not commute:

\[
\begin{align*}
\text{a} & := \text{b} + \text{c} \\
\text{d} & := \text{a} + \text{e}
\end{align*}
\]

Since the second instruction uses the value computed in the first, we cannot swap them without changing the resulting state. In addition, we require that the destination registers be different. The following predicate defines the non-interference property in terms of the functions used by the instruction set definition to retrieve the source and destination register indices from memory.

\[
\text{Non Interfering pc mem} =
\begin{align*}
& \neg (\text{GetSrcA (inc pc)} \ \text{mem} = (\text{GetDest pc mem}) \land \\
& \neg (\text{GetSrcB (inc pc)} \ \text{mem} = (\text{GetDest pc mem}) \land \\
& \neg (\text{GetDest (inc pc)} \ \text{mem} = (\text{GetDest pc mem}))
\end{align*}
\]

Using the weak congruence predicate and the non-interference predicate, we can show, for example, that ADD and SUB commute:
\[ \begin{align*}
& \text{let } s_1 = (\text{reg}_1, \text{psw}_1, \text{pc}_1, \text{mem}_1) \text{ and} \\
& s_2 = (\text{reg}_2, \text{psw}_2, \text{pc}_2, \text{mem}_2) \text{ and} \\
& e = (\text{ivec}, \text{ireq}, \text{reset}) \text{ in} \\
& \text{Congruent } \text{pc}_2 \ s_1 \ s_2 \Rightarrow \\
& \text{Non-Interfering } \text{pc}_1 \ \text{mem}_1 \land \\
& \text{Non-Interfering } \text{pc}_2 \ \text{mem}_2 \Rightarrow \\
& \text{let } s_1' = (\text{ADD} (\text{SUB} s_1 e) e) \text{ and} \\
& s_2' = (\text{SUB} (\text{ADD} s_2 e) e) \text{ in} \\
& \text{Weak_Congruent } \text{pc}_2 \ s_1' \ s_2'
\end{align*} \]

Note that we use strong congruence in the assumptions, but can only show weak congruence between the resulting states.

We have presented only two theorems regarding instruction commutivity in AVM-2. We can prove more general theorems about the commutivity of arithmetic instructions. We can also show arithmetic instructions commute with load and store instructions provided they are non-interfering.

6 Discussion

We have presented only a few small theorems regarding the analysis of the AVM-2 instruction set. A more thorough analysis is presently underway. Even so, we believe the results to be interesting.

The fact that we can only show weak congruence when commuting arithmetic instructions is a function of the design of the instruction set. Other instruction sets would provide different results. The contribution of formal analysis is that this property is clearly and unambiguously stated in the resulting theorem.

Also, the weak congruence result affects when we can actually commute arithmetic instructions in a program. We cannot, for example, commute two arithmetic instructions that are followed by a conditional jump since the values of the flags are changed. Strong congruence is required to maintain the program meaning in this case.

We should note that our initial efforts in this area have had an affect on the architecture of AVM-2. In light of the results presented here regarding weak congruence, we have undertaken a modification of the instruction set semantics so that arithmetic instructions commute under strong congruence. This will allow greater freedom in code reordering to avoid pipeline hazards.

Certainly none of our specific discoveries regarding the AVM-2 instruction set will surprise veteran compiler writers. The rules that we have demonstrated for code reordering in the AVM-2 instruction set are well known. What is important, however, is that we are not veteran compiler writers. Analysis allowed us to show that they were correct rather than relying on years of experience and intuition. This, it seems, is the heart and soul of engineering [Sha90].
7 Future Work

We plan to extend our analysis of commutivity to explore code motion for larger code fragments. Our intent is to automate a complete analysis of instruction commutivity for all instruction pairs and use these results to determine when large instruction sequences are congruent.

As mentioned earlier, the behavioral specification of AVM-2 will be verified against its implementation. Because of the hierarchical nature of the specification (see [Win90]), the phase-level mode of the pipeline will have a similar structure to the behavioral model of the top-level. We believe that the techniques demonstrated in this paper will allow us to perform an analysis of the pipeline to identify hazards. This work is underway.

8 Conclusion

This paper has presented examples from an analysis of commutivity in a modern instruction set. Commutivity is start at a more general notion of instruction reordering that is important to both compiler optimizations and pipeline scheduling. Analysis of instruction set reordering is important in both safety and security critical applications because of the danger ad hoc approaches present to the semantic integrity of the instruction stream.

Our analysis could have been performed without the benefit of a formal specification of the instruction set or a formal statement of the desired properties. However, the formal analysis has several benefits:

- The theorems about commutivity form a set of rules for commuting instructions. These rules have a demonstrated correctness.
- The formal analysis was helpful in finding interferences between instructions that were not immediately obvious.
- The assumptions and results are unambiguously stated and can be used for further reasoning about optimization and scheduling.
- In providing a set of rules about commutivity, the instructions must be analyzed on a case-by-case basis whether the analysis is done by hand or automated. The formal analysis provides a tool for quickly analyzing the instructions.

References


6.5.10


