Difference between revisions of "Forth parser"

From ScienceZero
Jump to: navigation, search
(Implementation as a state machine)
(Implementation as a state machine)
 
(11 intermediate revisions by the same user not shown)
Line 2: Line 2:
 
  state = scan
 
  state = scan
 
  do
 
  do
   skip whitespace
+
  word = ""
   read word up to white space
+
   skip until not white space or eof
   ( remember to check for end of file before reading each character and exit loop if eof )
+
   if word starts with "
 +
    then read word up to " or eof
 +
          if word does not end with " then exit loop
 +
    else read word up to white space or eof         
 +
   if word ="" then exit loop
 
   
 
   
 
   match state with
 
   match state with
 
   | scan -> match word with
 
   | scan -> match word with
 
             | ":" -> state = func
 
             | ":" -> state = func
   | func ->
+
            | ";" -> failwith "Found end of function before definition"
   
+
            | "(" -> push state
 +
                      state = rem
 +
            | x  -> failwith "Found illegal word %A outside definition" x ( this assumes that words can't exist outside functions )
 +
  | rem  -> match word with
 +
            | ")" -> pop state
 +
   | func -> match word with
 +
            | "(" -> push state
 +
                      state = rem
 +
            | name -> print "Name of function is %A" name
 +
                      state = def
 +
  | def -> match word with
 +
            | "(" -> push state
 +
                      state = rem
 +
            | ";" -> print "End of function"
 +
                      state = scan
 +
            | _  -> print "Assembling word %A" word
 +
 
 
  // Check if compilation was successful
 
  // Check if compilation was successful
 
  match state with
 
  match state with
 
  | scan when word = "" -> print "Success"
 
  | scan when word = "" -> print "Success"
  | _                  -> print "Error, exit in state %A with last word = %A" state word
+
  | _                  -> print "Error, exit in state %A" state
  
 
== Comment ==
 
== Comment ==

Latest revision as of 06:19, 27 April 2018

Implementation as a state machine

state = scan
do
  word = ""
  skip until not white space or eof
  if word starts with " 
    then read word up to " or eof
         if word does not end with " then exit loop
    else read word up to white space or eof          
  if word ="" then exit loop

  match state with
  | scan -> match word with
            | ":" -> state = func
            | ";" -> failwith "Found end of function before definition"
            | "(" -> push state
                     state = rem
            | x   -> failwith "Found illegal word %A outside definition" x ( this assumes that words can't exist outside functions )
  | rem  -> match word with
            | ")" -> pop state
  | func -> match word with
            | "(" -> push state
                     state = rem
            | name -> print "Name of function is %A" name
                      state = def
  | def  -> match word with
            | "(" -> push state
                     state = rem
            | ";" -> print "End of function"
                     state = scan
            | _   -> print "Assembling word %A" word
 
// Check if compilation was successful
match state with
| scan when word = "" -> print "Success"
| _                   -> print "Error, exit in state %A" state

Comment

 Anything between ( and ) that is not a part of a string or an immediate character should be ignored.
 So after detecting ( ignore all " until )
 After an opening " ignore all ( and ) until closing "

String

 When you reach ", start copying characters into a string dictionary until the matching "
 Emit LDC ... LEA instructions to point to where the string is located in the final binary
 The length of the string is stored in front of it.

Variable inference

Variables are stored locally on the return stack.
Any word that starts with ">" is a write to a variable.

For each word in function
  If word starts with ">" 
    Add the word to an ordered list of unique words that starts with ">", remove the leading ">"
    If there are more than 16 variables emit error 
    Emit STL #<index in the ordered list>
  If word is in the list but does not start with ">"
    Emit LDL #<index in the ordered list>

If there are more than 0 variables emit DIM #<largest index> as the first instruction of the function.
  This can be skipped if the function calls no other functions and do not use the stack for any or purpose.

TODO: Check if every variable is read and eliminate the ones that are not in use and emit a warning.
      This can be done with a simple bitmask to make sure a variable is written and then read.

How to encode constants

   if number = 0 then 
     emit LDC #0
   else
     find the leftmost hex digit n that is not 0
     emit LDC #n
     zero out that digit
     for each remaining digit n, zero or not
     emit LDE #n

     #0x1230 can be encoded as:
       LDC #0x1
       LDE #0x2
       LDE #0x3
       LDE #0x0

More efficient code can sometimes be achieved by using LDN LSL INC DEC DUP OVER and other opcodes.

Example 1

 : lsl for dup + next ;    ( Simple complete function that implements left shift by repeated addition )

 ":" this tells the parser to define a new function
 "lsl" is the name of the function
 
 
 Adr Value  
 0   0xF0  ( for )
 1   0xE0  ( dup )
 2   0xC0  ( add )
 3   0xF1  ( next )
 
 ";" tells the parser that it has reached the end of the definition
 
 
 The parser should create an "object" in memory like:
 "lsl",0xF0,0xE0,0xC0,0xF1 ( the return opcode is implied )

Example 2

 : xorpattern 
   0 >yval 
   1080 for 
     0 >xval
     1920 for 
       xval dup 1 + >xval yval 
       eor 
       xval yval plotpixel 
     next 
     yval 1 + >yval 
   next ;

Unique variable names being written to
  0 "yval"
  1 "xval"


DIM #1     ( Largest index of a variable is 1 )
LDC #0     ( 0 )
STL #0     ( >yval )
LDC #0x4   ( 1080 = 0x483 )
LDE #0x8
LDE #0x3
FOR        ( for )
LDC #0     ( 0 )
STL #1     ( >xval )
LDC #0x7   ( 1920 = 0x780 )
LDE #0x8
LDE #0x0
FOR        ( for )
LDC #1     ( xval )
LDC #0     ( yval )
LDC #<off> ( Load offset to the function plotpixel )
...
CALL
NEXT
LDC #0     ( yval )
INC
STC #0     ( yval )
NEXT
RETURN

Default opcode equivalency for forth words

Forth string - opcode 
+            - add
-            - sub
*            - mul
/            - div

Questions