Locally optimal optimisation of code fragments

From ScienceZero
Jump to: navigation, search

The problem

If we assume a code fragment (function) of 4 instructions with a 32 bit instruction word and two input registers with 32 bit words then we have: (232)4 * 232 * 232 possible combinations of code and data. This number is so large that there is no way of testing all possible combinations to find the shortest code fragment with a given function. The full search space is practically infinite.

Example:

ADD  R0,R1,R2
EOR  R3,R1,R0
SUB  R4,R3,R2
OR   R4,R4,R3

The density of functions that do something "useful" is very low and the density of functions that does something "useful" in an optimal way is extremely low. So if we can reduce the search space to include only the useful functions it would help a lot.

A solution for boolean logic

Bitwise boolean logic is different from arithmetic in that every bit is computed completely independently from the others. A 32 bit AND operation is just 32 AND gates side by side. That means we only need to consider a single bit when reasoning about the code, when we have the conclusion we can just add 31 more copies of the same to get the full 32 bit operation. That means we now have (232)4 * 21 * 21 possible combinations of code and data, a huge improvement but still impossibly large. So it is clear that using brute force alone will not work.

Since we now have reduced the input to two bits then there are just 22 = 4 possible data inputs.

A B
0 0
0 1
1 0
1 1

the result of each input is either 0 or 1 so there are as many output bits as there are input combinations, we can write down the result of the function in a separate column and we have what is called a truth table.

The truth tables for some common functions:

AND      OR       EOR
A B  T   A B  T   A B  T
0 0  0   0 0  0   0 0  0   
0 1  0   0 1  1   0 1  1
1 0  0   1 0  1   1 0  1
1 1  1   1 1  1   1 1  0

The truth column T contains only 4 bits so we have exactly 24 = 16 possible boolean functions with two inputs. So out of all those trillions of code fragments there are only 16 unique functions with two input variables, that means that there must exist at least 16 unique code fragments that have no shorter equivalents.

If we can find those 16 optimal code fragments and find a mapping between all possible code fragments and those optimal code fragments we can use a small look up table to optimise all code fragments.

The truth table is that mapping, it tells exactly what the code does without any influence of how it is done. So to optimise a function we can generate the truth table for it and use that as the index into a table that holds the shortest code fragment for that truth table.

Generating the truth table

As an example we use a function with 4 inputs. To generate the truth table we simply feed the function with all possible inputs and record the output. With 4 inputs we have 24 = 16 different combinations so we run the function 16 times to get the complete truth table.

Truth table for the function T = not ((((not D) or A) or C) and B)
A B C D  T
0 0 0 0  1
0 0 0 1  1
0 0 1 0  1
0 0 1 1  1
0 1 0 0  0
0 1 0 1  1
0 1 1 0  0
0 1 1 1  0
1 0 0 0  1
1 0 0 1  1
1 0 1 0  1
1 0 1 1  1
1 1 0 0  0
1 1 0 1  0
1 1 1 0  0
1 1 1 1  0

Generate the truth table in parallel

If we turn the truth table by 90 degrees we can do all 16 iterations in parallel by loading the whole 16 bit sequence into each variable. Making the process more than 16 times as fast.

 A 0000000011111111
 B 0000111100001111
 C 0011001100110011
 D 0101010101010101
 T 1111010011110000

Discovering the optimal bitwise logic code fragments

Since the full search space is too large to even imagine we have to be efficient and search where it is most likely we will find the code fragments. The first thing we do is to limit the search to operations that make sense, that is constants, variables and bitwise logical operations.

Example: LOAD 0, LOAD 1, LOAD A, LOAD B, LOAD C, LOAD D, AND, OR, EOR, NOT

In some cases the number of operations is 16 or less, then each instruction will fit in 4 bit and a 64 bit integer can hold 16 instructions, making the search very fast. To get the next program to evaluate we just add 1 to the 64 bit integer.

To find the shortest code we test the shortest fragments first, that way the first one we find for a given truth table is shorter or as short as any other equivalent code fragment. It is also faster to test the short fragments and they are the most commonly used. The longest fragments are rarely or never used by humans or generated by compilers.

For a stack based language or virtual machine all stack mover operations can be considered bitwise logical operations. For example the SWAP operation that swaps two cells on the stack is logically equivalent to:

a = a EOR b
b = a EOR b
a = a EOR b

So any operation that moves variables or affects all bits identically and separately can be optimised using the same algorithm. The only difference is that the concept of truth table must be expanded to fit the type of the operations. In the following example a stack with one or more truth tables is used instead of a single truth table.

// This pseudocode shows how to create a dictionary of optimal bitwise code fragments
// The fragment size starts at 0 to catch no operations and replace them with an empty fragment
for size = 0 to 6 do                                 // Try code fragments in increasing lengths
    for prg = 0 to (1 <<< (size * 4)) - 1 do
        dataStack.Clear                              // Create an empty stack
        loadBitPatterns                              // Add the bit patterns for calculating the truth table in parallel
        for j = 0 to size - 1 do                     // Execute the current code fragment
            evalPrg ((prg >>> (j * 4)) &&& 0x0F)

        if optimalFunc.TryGetValue (dataStack) then  // Search the dictionary for any previous stacks with the same content
           optimalFunc.Add (dataStack) prg           // If none are found then prg is the shortest possible code fragment
                                                     // so add prg to the dictionary with stack content as the key

Why the boolean solution does not work for arithmetic functions

In bitwise logic, all bits in a register share the history. So for two input registers and one output register of 32 bit there are only 222 possible truth tables. In arithmetic operations each bit has a separate history because information also flows sideways between bits so there are (232)2 possible truth tables. Far too many to process.

A solution for arithmetic functions with constant input

All functions with constant input can be evaluated and replaced by the resulting constant.

Example:

(10 * 20) + 35 = 235

Possible solutions for general arithmetic functions

The method of using truth tables as a canonical representation of arithmetic functions failed because of the extreme size of the truth tables for even the simplest functions.

More advanced data structures

All truth tables with the same number of variables are the same size. We know that the information content of the truth tables we need is very low because they can be generated by just a few simple operations. So more advanced data structures may be able to compress the truth tables to a manageable size and still be capable of canonical representation. One such data structure is the binary decision diagram.

Solvers

Boolean satisfiability problem solvers can determine if two functions are equivalent. In theory it is an extremely difficult problem. In reality most simple functions are solvable in a realistic time frame.

Large database and hashing

All useful and optimal functions up to a specifiec complexity can be fingerprinted using a hashing function. When a match is identified during optimisation an automated solver can be used to verify if the functions are equivalent. The result can be cached to speed up future optimisations.