Locally optimal optimisation of code fragments

From ScienceZero
Revision as of 17:38, 6 September 2013 by Bjoern (Talk | contribs) (Discovering the optimal code fragments)

Jump to: navigation, search

The problem

If we assume a code fragment of 4 instructions of 32 bit each and the two input registers to be 32 bit then we have: 2324 * 232 * 232 possible combinations of code and data. This number is so large that there is no way of testing all possible combinations to find the shortest code fragment with a given function. The search space is practically infinite.

Some solutions

Boolean logic is different from arithmetic in that every bit is computed completely independently from the others. So it makes no difference if we use the full 32 bit or just one bit, because changing the value of one bit will never affect the others. That means we now have 2324 * 21 * 21 possible combinations of code and data, a huge improvement but still impossibly large. So it is clear that using brute force alone will not work.

Since we now have reduced the input to two bits then there are just 22 = 4 possible data inputs.

A B
0 0
0 1
1 0
1 1

the result of each input is either 0 or 1 there are as many output bits as there are input combinations, we can write down the result of the function in a separate column and we have what is called a truth table. For the Eor function (T = A EOR B) it would be:

A B  T
0 0  0
0 1  1
1 0  1
1 1  0

The truth column T contains only 4 bits so we have exactly 24 = 16 possible boolean functions with two inputs. So out of all those trillions of code fragments there are only 16 unique functions with two input variables and from that it follows that it must exist 16 unique code fragments that that have no shorter equivalents.

If we can find those 16 optimal code fragments and find a mapping between all possible code fragments and those optimal code fragments we can use a small look up table to optimise all code fragments.

The truth table is that mapping, it tells exactly what the code does without any influence of how it is done. So to optimise a function we can generate the truth table for it and use that as the index into a table that holds the shortest code fragment for that truth table.

Generating the truth table

As an example we use a function with 4 inputs. To generate the truth table we simply feed the function with all possible inputs and record the output. With 4 inputs we have 24 = 16 different combinations so we run the function 16 times to get the complete truth table.

Truth table for the function T = not ((((not D) or A) or C) and B)
A B C D  T
0 0 0 0  1
0 0 0 1  1
0 0 1 0  1
0 0 1 1  1
0 1 0 0  0
0 1 0 1  1
0 1 1 0  0
0 1 1 1  0
1 0 0 0  1
1 0 0 1  1
1 0 1 0  1
1 0 1 1  1
1 1 0 0  0
1 1 0 1  0
1 1 1 0  0
1 1 1 1  0

Generate the truth table in parallel

If we turn the truth table by 90 degrees we can do all 16 iterations in one operation by loading the whole 16 bit sequence into each variable. Making the process more than 16 times as fast.

 A 0000000011111111
 B 0000111100001111
 C 0011001100110011
 D 0101010101010101
 T 1111010011110000

Discovering the optimal code fragments

Since the search space is too large to even imagine we have to be efficient and search where it is most likely we will find the code fragments. The first thign we do is to limit the search to operations that make sense, that is constants, variables and bitwise logical operations.

0, 1, A, B, C, D, and, or, eor, not

In some cases the number of operations is 16 or less, then each instruction will fit in 4 bit and a 64 bit integer can hold 16 instructions, making the search very fast.

To find the shortest code we start the search with the shortest code fragments and make them longer and longer, that way the first one we find for a given truth table is the shortest or that there is no shorter ones.

For a stack based language or virtual machine all stack operations can be considered bitwise logical operations. For example the SWAP operation that swaps two cells on the stack is logically equivalent to:

a = a EOR b
b = a EOR b
a = a EOR b

So any operation that moves variables or affects all bits identically and separately can be optimised using the same algorithm. The only difference is that the concept of truth table must be expanded to fit the result of the operations. In the following example a stack with one or more truth tables is used instead of a single truth table.

// This pseudocode shows how to optimise bitwise operations from a stack based Forth like language
for size = 0 to 6 do                                 // Try code fragments in increasing lengths
    for prg = 0 to (1 <<< (size * 4)) - 1 do
        dataStack.Clear                              // Create an empty stack
        loadBitPatterns                              // Add the bit patterns for calculating the truth table in parallel
        for j = 0 to size - 1 do                     // Execute the current code fragment
            evalPrg ((prg >>> (j * 4)) &&& 0x0F)

        if optimalFunc.TryGetValue (dataStack) then  // Search the dictionary for any previous stacks with the same content
           optimalFunc.Add (dataStack) prg           // If none are found then prg is the shortest possible code fragment
                                                     // so add the stack content to the dictionary with the code that generated it