Difference between revisions of "Forth parser"
From ScienceZero
(→How to encode numbers) |
(→Implementation as a state machine) |
||
(44 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
+ | == Implementation as a state machine == | ||
+ | state = scan | ||
+ | do | ||
+ | word = "" | ||
+ | skip until not white space or eof | ||
+ | if word starts with " | ||
+ | then read word up to " or eof | ||
+ | if word does not end with " then exit loop | ||
+ | else read word up to white space or eof | ||
+ | if word ="" then exit loop | ||
+ | |||
+ | match state with | ||
+ | | scan -> match word with | ||
+ | | ":" -> state = func | ||
+ | | ";" -> failwith "Found end of function before definition" | ||
+ | | "(" -> push state | ||
+ | state = rem | ||
+ | | x -> failwith "Found illegal word %A outside definition" x ( this assumes that words can't exist outside functions ) | ||
+ | | rem -> match word with | ||
+ | | ")" -> pop state | ||
+ | | func -> match word with | ||
+ | | "(" -> push state | ||
+ | state = rem | ||
+ | | name -> print "Name of function is %A" name | ||
+ | state = def | ||
+ | | def -> match word with | ||
+ | | "(" -> push state | ||
+ | state = rem | ||
+ | | ";" -> print "End of function" | ||
+ | state = scan | ||
+ | | _ -> print "Assembling word %A" word | ||
+ | |||
+ | // Check if compilation was successful | ||
+ | match state with | ||
+ | | scan when word = "" -> print "Success" | ||
+ | | _ -> print "Error, exit in state %A" state | ||
+ | |||
+ | == Comment == | ||
+ | Anything between '''(''' and ''')''' that is not a part of a string or an immediate character should be ignored. | ||
+ | So after detecting '''(''' ignore all '''"''' until ''')''' | ||
+ | After an opening '''"''' ignore all '''(''' and ''')''' until closing '''"''' | ||
+ | |||
+ | == String == | ||
+ | When you reach '''"''', start copying characters into a string dictionary until the matching '''"''' | ||
+ | Emit '''LDC ... LEA''' instructions to point to where the string is located in the final binary | ||
+ | The length of the string is stored in front of it. | ||
+ | |||
== Variable inference == | == Variable inference == | ||
Variables are stored locally on the return stack. | Variables are stored locally on the return stack. | ||
Line 11: | Line 58: | ||
Emit '''LDL #<index in the ordered list>''' | Emit '''LDL #<index in the ordered list>''' | ||
− | + | If there are more than 0 variables emit '''DIM #<largest index>''' as the first instruction of the function. | |
− | + | This can be skipped if the function calls no other functions and do not use the stack for any or purpose. | |
+ | |||
+ | TODO: Check if every variable is read and eliminate the ones that are not in use and emit a warning. | ||
+ | This can be done with a simple bitmask to make sure a variable is written and then read. | ||
− | == How to encode | + | == How to encode constants == |
if number = 0 then | if number = 0 then | ||
emit '''LDC #0''' | emit '''LDC #0''' | ||
Line 24: | Line 74: | ||
emit '''LDE #n''' | emit '''LDE #n''' | ||
− | # | + | #0x1230 can be encoded as: |
− | '''LDC # | + | '''LDC #0x1''' |
− | '''LDE # | + | '''LDE #0x2''' |
− | '''LDE # | + | '''LDE #0x3''' |
− | '''LDE # | + | '''LDE #0x0''' |
− | More efficient code can sometimes be achieved by using | + | More efficient code can sometimes be achieved by using '''LDN LSL INC DEC DUP OVER''' and other opcodes. |
== Example 1 == | == Example 1 == | ||
Line 50: | Line 100: | ||
The parser should create an "object" in memory like: | The parser should create an "object" in memory like: | ||
− | "lsl",0xF0,0xE0,0xC0,0xF1 | + | "lsl",0xF0,0xE0,0xC0,0xF1 ( the '''return''' opcode is implied ) |
+ | |||
+ | == Example 2 == | ||
+ | |||
+ | : xorpattern | ||
+ | 0 >yval | ||
+ | 1080 for | ||
+ | 0 >xval | ||
+ | 1920 for | ||
+ | xval dup 1 + >xval yval | ||
+ | eor | ||
+ | xval yval plotpixel | ||
+ | next | ||
+ | yval 1 + >yval | ||
+ | next ; | ||
+ | |||
+ | Unique variable names being written to | ||
+ | 0 "yval" | ||
+ | 1 "xval" | ||
+ | |||
+ | |||
+ | '''DIM #1''' ( Largest index of a variable is 1 ) | ||
+ | '''LDC #0''' ( 0 ) | ||
+ | '''STL #0''' ( >yval ) | ||
+ | '''LDC #0x4''' ( 1080 = 0x483 ) | ||
+ | '''LDE #0x8''' | ||
+ | '''LDE #0x3''' | ||
+ | '''FOR''' ( for ) | ||
+ | '''LDC #0''' ( 0 ) | ||
+ | '''STL #1''' ( >xval ) | ||
+ | '''LDC #0x7''' ( 1920 = 0x780 ) | ||
+ | '''LDE #0x8''' | ||
+ | '''LDE #0x0''' | ||
+ | '''FOR''' ( for ) | ||
+ | '''LDC #1''' ( xval ) | ||
+ | '''LDC #0''' ( yval ) | ||
+ | '''LDC #<off>''' ( Load offset to the function plotpixel ) | ||
+ | '''...''' | ||
+ | '''CALL''' | ||
+ | '''NEXT''' | ||
+ | '''LDC #0''' ( yval ) | ||
+ | '''INC''' | ||
+ | '''STC #0''' ( yval ) | ||
+ | '''NEXT''' | ||
+ | '''RETURN''' | ||
== Default opcode equivalency for forth words == | == Default opcode equivalency for forth words == | ||
Line 59: | Line 153: | ||
* - mul | * - mul | ||
/ - div | / - div | ||
− | |||
− | |||
− | |||
== Questions == | == Questions == |
Latest revision as of 06:19, 27 April 2018
Contents
Implementation as a state machine
state = scan do word = "" skip until not white space or eof if word starts with " then read word up to " or eof if word does not end with " then exit loop else read word up to white space or eof if word ="" then exit loop match state with | scan -> match word with | ":" -> state = func | ";" -> failwith "Found end of function before definition" | "(" -> push state state = rem | x -> failwith "Found illegal word %A outside definition" x ( this assumes that words can't exist outside functions ) | rem -> match word with | ")" -> pop state | func -> match word with | "(" -> push state state = rem | name -> print "Name of function is %A" name state = def | def -> match word with | "(" -> push state state = rem | ";" -> print "End of function" state = scan | _ -> print "Assembling word %A" word // Check if compilation was successful match state with | scan when word = "" -> print "Success" | _ -> print "Error, exit in state %A" state
Comment
Anything between ( and ) that is not a part of a string or an immediate character should be ignored. So after detecting ( ignore all " until ) After an opening " ignore all ( and ) until closing "
String
When you reach ", start copying characters into a string dictionary until the matching " Emit LDC ... LEA instructions to point to where the string is located in the final binary The length of the string is stored in front of it.
Variable inference
Variables are stored locally on the return stack. Any word that starts with ">" is a write to a variable. For each word in function If word starts with ">" Add the word to an ordered list of unique words that starts with ">", remove the leading ">" If there are more than 16 variables emit error Emit STL #<index in the ordered list> If word is in the list but does not start with ">" Emit LDL #<index in the ordered list> If there are more than 0 variables emit DIM #<largest index> as the first instruction of the function. This can be skipped if the function calls no other functions and do not use the stack for any or purpose. TODO: Check if every variable is read and eliminate the ones that are not in use and emit a warning. This can be done with a simple bitmask to make sure a variable is written and then read.
How to encode constants
if number = 0 then emit LDC #0 else find the leftmost hex digit n that is not 0 emit LDC #n zero out that digit for each remaining digit n, zero or not emit LDE #n #0x1230 can be encoded as: LDC #0x1 LDE #0x2 LDE #0x3 LDE #0x0 More efficient code can sometimes be achieved by using LDN LSL INC DEC DUP OVER and other opcodes.
Example 1
: lsl for dup + next ; ( Simple complete function that implements left shift by repeated addition ) ":" this tells the parser to define a new function "lsl" is the name of the function Adr Value 0 0xF0 ( for ) 1 0xE0 ( dup ) 2 0xC0 ( add ) 3 0xF1 ( next ) ";" tells the parser that it has reached the end of the definition The parser should create an "object" in memory like: "lsl",0xF0,0xE0,0xC0,0xF1 ( the return opcode is implied )
Example 2
: xorpattern 0 >yval 1080 for 0 >xval 1920 for xval dup 1 + >xval yval eor xval yval plotpixel next yval 1 + >yval next ; Unique variable names being written to 0 "yval" 1 "xval" DIM #1 ( Largest index of a variable is 1 ) LDC #0 ( 0 ) STL #0 ( >yval ) LDC #0x4 ( 1080 = 0x483 ) LDE #0x8 LDE #0x3 FOR ( for ) LDC #0 ( 0 ) STL #1 ( >xval ) LDC #0x7 ( 1920 = 0x780 ) LDE #0x8 LDE #0x0 FOR ( for ) LDC #1 ( xval ) LDC #0 ( yval ) LDC #<off> ( Load offset to the function plotpixel ) ... CALL NEXT LDC #0 ( yval ) INC STC #0 ( yval ) NEXT RETURN
Default opcode equivalency for forth words
Forth string - opcode + - add - - sub * - mul / - div