Locally optimal optimisation of code fragments

From ScienceZero
Revision as of 12:32, 2 October 2013 by Bjoern (Talk | contribs) (The problem)

Jump to: navigation, search

The problem

If we assume a code fragment of 4 instructions of 32 bit each and the two input registers to be 32 bit then we have: 2324 * 232 * 232 possible combinations of code and data. This number is so large that there is no way of testing all possible combinations to find the shortest code fragment with a given function. The search space is practically infinite.

Example:

ADD  R0,R1,R2
EOR  R3,R1,R0
SUB  R4,R3,R2
OR   R4,R4,R3

Some solutions

Bitwise boolean logic is different from arithmetic in that every bit is computed completely independently from the others. A 32 bit AND operation is just 32 AND gates side by side. That means we only need to consider 1 bit when reasoning about the code, when we have the conclusion we can just add 31 more copies of the same to get the full 32 bit operation. That means we now have 2324 * 21 * 21 possible combinations of code and data, a huge improvement but still impossibly large. So it is clear that using brute force alone will not work.

Since we now have reduced the input to two bits then there are just 22 = 4 possible data inputs.

A B
0 0
0 1
1 0
1 1

the result of each input is either 0 or 1 so there are as many output bits as there are input combinations, we can write down the result of the function in a separate column and we have what is called a truth table. The truth tables for some common functions:

AND      OR       EOR
A B  T   A B  T   A B  T
0 0  0   0 0  0   0 0  0   
0 1  0   0 1  1   0 1  1
1 0  0   1 0  1   1 0  1
1 1  1   1 1  1   1 1  0

The truth column T contains only 4 bits so we have exactly 24 = 16 possible boolean functions with two inputs. So out of all those trillions of code fragments there are only 16 unique functions with two input variables, that means that there must exist at least 16 unique code fragments that that have no shorter equivalents.

If we can find those 16 optimal code fragments and find a mapping between all possible code fragments and those optimal code fragments we can use a small look up table to optimise all code fragments.

The truth table is that mapping, it tells exactly what the code does without any influence of how it is done. So to optimise a function we can generate the truth table for it and use that as the index into a table that holds the shortest code fragment for that truth table.

Generating the truth table

As an example we use a function with 4 inputs. To generate the truth table we simply feed the function with all possible inputs and record the output. With 4 inputs we have 24 = 16 different combinations so we run the function 16 times to get the complete truth table.

Truth table for the function T = not ((((not D) or A) or C) and B)
A B C D  T
0 0 0 0  1
0 0 0 1  1
0 0 1 0  1
0 0 1 1  1
0 1 0 0  0
0 1 0 1  1
0 1 1 0  0
0 1 1 1  0
1 0 0 0  1
1 0 0 1  1
1 0 1 0  1
1 0 1 1  1
1 1 0 0  0
1 1 0 1  0
1 1 1 0  0
1 1 1 1  0

Generate the truth table in parallel

If we turn the truth table by 90 degrees we can do all 16 iterations in parallel by loading the whole 16 bit sequence into each variable. Making the process more than 16 times as fast.

 A 0000000011111111
 B 0000111100001111
 C 0011001100110011
 D 0101010101010101
 T 1111010011110000

Discovering the optimal code fragments

Since the search space is too large to even imagine we have to be efficient and search where it is most likely we will find the code fragments. The first thing we do is to limit the search to operations that make sense, that is constants, variables and bitwise logical operations.

Example: 0, 1, A, B, C, D, and, or, eor, not

In some cases the number of operations is 16 or less, then each instruction will fit in 4 bit and a 64 bit integer can hold 16 instructions, making the search very fast.

To find the shortest code we test the shortest fragments first, that way the first one we find for a given truth table is shorter or as short as any other equivalent code fragment. It is also faster to test the short fragments and they are the most commonly used. The longest fragments are rarely or never used by humans or generated by compilers.

For a stack based language or virtual machine all stack operations can be considered bitwise logical operations. For example the SWAP operation that swaps two cells on the stack is logically equivalent to:

a = a EOR b
b = a EOR b
a = a EOR b

So any operation that moves variables or affects all bits identically and separately can be optimised using the same algorithm. The only difference is that the concept of truth table must be expanded to fit the type of the operations. In the following example a stack with one or more truth tables is used instead of a single truth table.

// This pseudocode shows how to optimise bitwise operations from a stack based Forth like language
// The fragment size starts at 0 to catch no operations and replace them with an empty fragment
for size = 0 to 6 do                                 // Try code fragments in increasing lengths
    for prg = 0 to (1 <<< (size * 4)) - 1 do
        dataStack.Clear                              // Create an empty stack
        loadBitPatterns                              // Add the bit patterns for calculating the truth table in parallel
        for j = 0 to size - 1 do                     // Execute the current code fragment
            evalPrg ((prg >>> (j * 4)) &&& 0x0F)

        if optimalFunc.TryGetValue (dataStack) then  // Search the dictionary for any previous stacks with the same content
           optimalFunc.Add (dataStack) prg           // If none are found then prg is the shortest possible code fragment
                                                     // so add prg to the dictionary with stack content as the key