The KILO LISP 22 System

T3X Part | LISP Part

This document describes the implementation of KILO LISP 22, a small, yet quite capable interpreter for purely symbolic LISP. The interpreter is written in a simple procedural language called T3X. A quick reference for the language can be found at t3x.org/t3x/uman.html.

Prelude

! KILO LISP, a kilo byte-sized LISP system
! Nils M Holm, 2019, 2020
! In the public domain

A module is a program. The KLISP module uses the classes T3X and TTYCtl. The T3X class contains methods for opening, closing, reading, and writing files, as well as for memory operations, such as copying, comparing, and scanning. The TTYCtl class is used to control a terminal by moving the cursor, clearing the screen, etc.

module klisp(t3x, ttyctl);

The classes have to instantiated, so messages can be sent to them.

object   t[t3x], tty[ttyctl];

Parameters

The NNODES constant specifies the total number of nodes in the system. Most nodes are cons cells. Atoms allocate one (atomic) node per two characters in their name.

const    NNODES  = 12288;

SYMLEN is the maximum length of a symbol/atom name. BUFLEN is the size of an input buffer for file and TTY I/O. EVDEPTH is the maximum number of times EVAL can be called recursively. This happens only during macro expansion. PRDEPTH is the maximum depth (number of nested CAR fields) in an S-expression when printing it. NLOAD is the maximum number of nested calls to LOAD. E.g., when NLOAD=3, a file can load a file that loads a file, but no more nesting is possible.

const    SYMLEN  = 64;
const    BUFLEN  = 128;
const    EVDEPTH = 512;
const    PRDEPTH = 64;
const    NLOAD   = 3;

Special Values

These are special values. They are represented by integers that are not valid offsets into the node pool. NIL is NIL. EOT is the end of file, DOT is a dot read by the reader, RPAREN is a right parenthesis read by the reader. UNDEF is an undefined value. It is the value of undefined or unbound symbols, and it is also used internally for different purposes.

const    SPCL    = NNODES;
const    NIL     = NNODES+1;
const    EOT     = NNODES+2;
const    DOT     = NNODES+3;
const    RPAREN  = NNODES+4;
const    UNDEF   = NNODES+5;

Some Useful Functions

Here are some string functions. The module could import the STRING class, but, to keep things obvious, they are defined here.

STRLEN scans the first 32767 characters of S for NUL and returns the offset of the first occurrence, which coincides with the length of a NUL-terminated string. STREQU returns truth (−1; sometimes written %1 in T3X) when the strings A and B are equal. STRCOPY copies B to A.

strlen(s) return t.memscan(s, 0, 32767);
strequ(a, b) return t.memcomp(a, b, strlen(a)+1) = 0;
strcopy(a, b) return t.memcopy(a, b, strlen(b)+1);

NTOA writes the decimal representation of the signed integer N to the global buffer NTOAB. It returns a pointer to the first character of the resulting string.

var Ntoab::20;

ntoa(n) do var q, p;
        p := @Ntoab::19;
        q := p;
        p::0 := 0;
        while (n \/ p = q) do
                p := p-1;
                p::0 := n mod 10 + '0';
                n := n / 10;
        end
        return p;
end

Input and Output

The global variables XPOS and YPOS keep track of the X and Y position of the cursor on the terminal screen.

var      Xpos, Ypos;

CLEAR clears the terminal screen and updates XPOS/YPOS.

clear() do
        tty.clear();
        Xpos := 0;
        Ypos := 0;
end

When the variable PASS is true (non-zero), the interpreter will use the SYSIN/SYSOUT interface instead of the terminal screen. SYSIN/SYSOUT is basically the same as stdin/stdout in C.

var      Pass;

The PR function writes the string S to the terminal screen and updates the XPOS and YPOS variables accordingly. It interprets carriage return and linefeed characters, handles wrap-around at the end of a line, and scrolls the screen up when the bottom line is reached.

Note that it reserves the bottom line itself for status messages. The bottom line does not scroll when the bottom of the screen is reached.

When PASS=%1, none of the above happens. The string S is simply written to T3X.SYSOUT (stdout).

pr(s) do var i;
        ie (Pass) do
                t.write(T3X.SYSOUT, s, strlen(s));
        end
        else do
                i := 0;
                while (s::i) do
                        ie (s::i = '\n') do
                                Ypos := Ypos+ 1;
                                tty.move(Xpos, Ypos);
                        end
                        else ie (s::i = '\r') do
                                Xpos := 0;
                                tty.move(Xpos, Ypos);
                        end
                        else do
                                tty.writec(s::i);
                                Xpos := Xpos + 1;
                        end
                        if (Xpos >= tty.columns()) do
                                Xpos := 0;
                                Ypos := Ypos + 1;
                                tty.move(Xpos, Ypos);
                        end
                        if (Ypos >= tty.lines()-1) do
                                tty.scroll(0, tty.lines()-2);
                                Ypos := tty.lines() - 2;
                                tty.move(Xpos, Ypos);
                        end
                        i := i+1;
                end
        end
end

The NL function moves the cursor to the beginning of the next line.

nl() do var b::3;
        ie (Pass)
                pr(t.newline(b));
        else
                pr("\r\n");
end

Input from files and the terminal is buffered, and the following is the structure of an input buffer. INPUT is the file descriptor being read, INBUF is a pointer to the vector in which input is being buffered; it points to BUFFER initially. INP is the offset of the next character to extract from the buffer, and INK (IN-K, not the liquid!) is the number of characters in the buffer. When INP=INK, the buffer is empty.

Note that INPUT is ignored unless LOADing a program or in PASS mode, and the terminal keyboard is read directly.

REJECTED is used to put characters back into the input buffer. When REJECTED is not EOT, the character in REJECTED will be returned by the next read operation.

var      Input, Inp, Ink;
var      Inbuf, Buffer::BUFLEN;
var      Rejected;

FLUSHINP clears the input buffer, points it back to the default vector, and connects it to the default input descriptor, T3X.SYSIN (stdin).

flushinp() do
        Input := T3X.SYSIN;
        Inbuf := Buffer;
        Inp := 0;
        Ink := 0;
        Rejected := EOT;
end

AUXB1 and AUXB2 store the two lines most recently submitted to the interpreter, thereby implementing a two-line history. The ROTATEBUFS function rotates the buffers through the parameter B, thereby loading a line from the history into B.

var      Auxb1::BUFLEN, Auxb2::BUFLEN;

rotatebufs(b) do var i, c;
        for (i=0, BUFLEN) do
                c := Auxb1::i;
                Auxb1::i := Auxb2::i;
                Auxb2::i := b::i;
                b::i := c;
        end
end

READCON reads a line from the terminal into the buffer B. B must have a size of at least BUFLEN bytes. The function provides cursor movement inside of the line, deletion of characters, history, erasing the entire input line, and aborting input. See the code for details, it is very straight-forward.

The following variables are used: P0 = X-position of the first character in the line on the terminal. P=position of the character under the cursor. N=number of characters in the line. R=redraw line when not zero. M=maximum number of characters to accept. K=key pressed.

The difference between erasing the input line and aborting input is that aborting kills all previous lines of a multi-line expression, too, while erasing the line only erases the current line.

readcon(b) do var p0, p, r, m, k, n;
        m := tty.columns() - 4;
        p0 := Xpos;
        p := 0;
        r := 1;
        n := 0;
        b::0 := 0;
        while (1) do
                if (r) do
                        tty.move(p0, Ypos);
                        tty.writes(b);
                        tty.writec('\s');
                        r := 0;
                end
                tty.move(p+p0, Ypos);
                k := tty.readc();
                ie (k = 'B'-'@' \/ k = ttyctl.K_LEFT) do
                        if (p > 0) p := p-1;
                end
                else ie (k = 'F'-'@' \/ k = ttyctl.K_RIGHT) do
                        if (p < n) p := p+1;
                end
                else ie (k = 'A'-'@') do
                        p := 0;
                end
                else ie (k = 'E'-'@') do
                        p := n;
                end
                else ie (k = 'L'-'@') do
                        clear();
                        if (p0) pr("* ");
                        r := 1;
                end
                else ie (k = 'P'-'@' \/ k = ttyctl.K_UP) do
                        rotatebufs(b);
                        n := strlen(b);
                        p := n;
                        r := 1;
                        tty.move(Xpos, Ypos);
                        tty.clreol();
                end
                else ie (k = '\r' \/ k = '\n') do
                        if (n > 0) do
                                if (Auxb1::0 \= 0)
                                        strcopy(Auxb2, Auxb1);
                                strcopy(Auxb1, b);
                        end
                        b::n := k;
                        n := n+1;
                        b::n := 0;
                        nl();
                        return n;
                end
                else ie (k = 'C'-'@') do
                        nl();
                        b::0 := 0;
                        error("abort", UNDEF);
                        return 0;
                end
                else ie (k = 'U'-'@') do
                        b::0 := 0;
                        n := 0;
                        p := 0;
                        r := 1;
                        tty.move(p0, Ypos);
                        tty.clreol();
                end
                else ie ((k = '\b' \/ k = ttyctl.K_BKSP \/ k = 127) /\ p > 0)
                do
                        n := n-1;
                        p := p-1;
                        t.memcopy(@b::p, @b::(p+1), n-p);
                        b::n := 0;
                        r := 1;
                end
                else ie (k = 4 /\ n = 0) do
                        b::n := 4;
                        b::(n+1) := 0;
                        pr("^D");
                        nl();
                        return n;
                end
                else if ('\s' <= k /\ k <= '~') do
                        if (n >= m) loop;
                        t.memcopy(@b::(p+1), @b::p, n-p);
                        b::p := k;
                        p := p+1;
                        n := n+1;
                        b::n := 0;
                        r := 1;
                end
        end
end

The RDCH function reads a character from the input, be it the terminal or an input file. It reads the input buffer and when it is empty it refills it by either reading a file descriptor (in PASS mode) or by calling the line editor (READCON). It returns the next character from the buffer or EOT when the end of input is reached. (You can press control-D in READCON to send an EOT.)

rdch() do var c;
        if (Rejected \= EOT) do
                c := Rejected;
                Rejected := EOT;
                return c;
        end
        if (Inp >= Ink) do
                ie (Input = T3X.SYSIN /\ \Pass)
                        Ink := readcon(Inbuf);
                else
                        Ink := t.read(Input, Inbuf, BUFLEN);
                if (Ink < 1) return EOT;
                Inp := 0;
        end
        c := Inbuf::Inp;
        Inp := Inp+1;
        if (c = 4) return EOT;
        return c;
end

RDCHCI is the case-insensitive variant of RDCH. It converts all input to lower case.

rdchci() do var c;
        c := rdch();
        if (c >= 'A' /\ c <= 'Z') return c+32;
        return c;
end

The MESSAGE function writes the string S to the message line at the bottom of the screen. In PASS mode it does nothing. MESSAGE does not update XPOS/YPOS, but returns to the original position after writing the message.

message(s) do
        if (Pass) return;
        tty.move(0, tty.lines()-1);
        tty.clreol();
        tty.writes(s);
        tty.move(Xpos, Ypos);
end

Error Handling

The FINI function shuts down the TTYCtl interface (when not in PASS mode), exits, and returns an exit code to the operating system.

fini(x) do
        if (\Pass) do
                tty.reset();
                tty.mode(0);
                tty.fini();
        end
        ie (x) halt 1; else halt 0;
end

The ERROR function writes the error message M to the terminal (or T3X.SYSOUT in PASS mode). When N does not equal UNDEF, it also prints a colon and the LISP object passed to it in N. It uses the forward-declared PRINT function to print the LISP object.

ERROR also flushes the input buffer and sets the variable ERRFLAG to true (−1) to notify the interpreter of the error. No further errors will be reported as long as ERRFLAG is set.

In PASS mode ERROR halts the interpreter after reporting an error.

var      Errflag;

decl print(1);

error(m, n) do
        if (Errflag) return;
        pr("? ");
        pr(m);
        if (n \= UNDEF) do
                pr(": ");
                print(n);
        end
        nl();
        flushinp();
        Errflag := %1;
        if (Pass) fini(1);
        return NIL;
end

FATAL reports an error and then halts the interpreter. There are not many fatal errors. Most are internal errors (which should never happen) or handle corrupt image files.

fatal(m) do
        message("");
        error(m, UNDEF);
        pr("? aborting");
        nl();
        fini(1);
end

Memory Allocation

Memory of the LISP system is organized in "nodes" or "cells". Each node consists of three fields called "car", "cdr", and "tag". The fields are kept in vectors and are addressed using their offsets in the vectors. For instance, given a LISP object N (which is an offset into the above vectors), CAR[N] would be the car of N and CDR[N] would be the cdr of N.

CDR[N] always points to another LISP object. CAR[N] points to a LISP object, if TAG::N does not have the ATOMB flag set. (X::N is T3X notation for X[N] when X is a byte vector.) When the ATOMB flag of TAG::N is set, CAR[N] contains a small integer value and not a LISP object.

A node with the ATOMB flag set is an "atomic node", which is used to build atoms/symbols. A node with the ATOMB flag cleared is a cons cell.

var      Car[NNODES],
         Cdr[NNODES];

var      Tag::NNODES;

There are two more flags in the TAG field of each node. They are used by the garbage collector and will be explained below.

const    ATOMB   = 0x01;
const    MARKB   = 0x02;
const    SWAPB   = 0x04;

The FREELIST contains a list of unused nodes that are linked together via their cdr fields. When FREELIST=NIL, memory is exhausted and a garbage collection will be initiated.

var      Freelist;

There is a small difference between atoms and symbols in KILO LISP 22, because there are some atoms that are not symbols, like NIL, or the end-of-file object, or the "undefined" object. Hence all symbols are atoms, but not all atoms are symbols. ATOMP and SYMBOLP test whether the given object is an atom or symbol.

atomp(n) return n >= SPCL \/ Car[n] < SPCL /\ Tag::Car[n] & ATOMB;

symbolp(n) return n < SPCL /\ Car[n] < SPCL /\ Tag::Car[n] & ATOMB;

The MARK function implements a Deutsch/Schorr/Waite graph marker. It traverses a tree of nodes N and sets the MARKB flags of all nodes in the tree. It performs the traversal in constant space, i.e. it only requires three integers worth of space addition to the tree itself.

The basic idea of the D/S/W algorithm is to reverse the direction of a link (edge) in the tree when following it, so the child points to the parent while the tree starting at the child is traversed. When returning from a subtree after traversal, the original direction is restored. The algorithm uses the MARKB and SWAPB flags to direct the traversal as follows (M=MARKB, S=SWAPB):

M=0, S=0: Set M=1, S=1, reverse CAR link, continue with CAR node

M=1, S=1: Set S=0, restore CAR link, reverse CDR link, continue with CDR node

M=1, S=0: restore CDR link, move to parent

The algorithms starts at a tree root N with parent P=NIL. It terminates when returning to P=NIL. When the MARK function exits, all nodes in the tree N have their MARKB flag set and their SWAPB flag cleared.

mark(n) do var p, x;
        p := NIL;
        while (1) do
                ie (n >= SPCL \/ Tag::n & MARKB) do
                        if (p = NIL) leave;
                        ie (Tag::p & SWAPB) do
                                x := Cdr[p];
                                Cdr[p] := Car[p];
                                Car[p] := n;
                                Tag::p := Tag::p & ~SWAPB;
                                n := x;
                        end
                        else do
                                x := p;
                                p := Cdr[x];
                                Cdr[x] := n;
                                n := x;
                        end
                end
                else ie (Tag::n & ATOMB) do
                        x := Cdr[n];
                        Cdr[n] := p;
                        p := n;
                        n := x;
                        Tag::p := Tag::p | MARKB;
                end
                else do
                        x := Car[n];
                        Car[n] := p;
                        Tag::n := Tag::n | MARKB;
                        p := n;
                        n := x;
                        Tag::p := Tag::p | SWAPB;
                end
        end
end

The "garbage collector" (GC) of a LISP system is the part that recycles allocated but no longer used objects, so that the space allocated by them can be reused for different objects. In order to find out whether an object is "unused", it marks all nodes that can be addressed by any program in memory. The nodes that are not marked after this step are inaccessible and hence can be recycled safely. The present implementation of the GC uses the D/S/W graph marker to mark trees.

The trees that are marked by the GC start at locations called the "GC roots". In KILO LISP 22, these locations are:

the list of symbols known to the system (SYMBOLS)
the accumulator/result register (ACC)
the current lexical environment (ENV)
the stack and mode stack (STACK, MSTACK)
two GC-safe locations (TMPCAR, TMPCDR)
the last-resort node (ZNODE)

Don't worry — these will be explained in the following!

var      Symbols;
var      Acc, Env;
var      Stack, Mstack;
var      Tmpcar, Tmpcdr;
var      Znode;

When the VERBOSE_GV variable is non-zero, the GC will print some interesting information after each collection.

var      Verbose_GC;

The GC function performs a garbage collection. It first marks the GC roots and then adds all still-unmarked nodes to the freelist. It also unmarks the nodes marked during traversal, thereby restoring their original state.

gc() do var i, k;
        if (Verbose_GC) pr("GC: ");
        mark(Znode);
        mark(Acc);
        mark(Env);
        mark(Symbols);
        mark(Stack);
        mark(Mstack);
        mark(Tmpcar);
        mark(Tmpcdr);
        k := 0;
        Freelist := NIL;
        for (i=0, NNODES) do
                ie (Tag::i & MARKB) do
                        Tag::i := Tag::i & ~MARKB;
                end
                else do
                        Cdr[i] := Freelist;
                        Freelist := i;
                        k := k+1;
                end
        end
        if (Verbose_GC) do
                pr(ntoa(k));
                pr(" NODES");
                nl();
        end
        return k;
end

The CONS3 function is the principal memory allocator of KILO LISP 22. It allocates a new node by removing it from the freelist and initializes its fields with car=A, cdr=D and tag=TA.

When the freelist is empty initially, it performs a garbage collection first. In this case the variables A and D are temporarily saved in the GC roots TMPCAR and TMPCDR, so they cannot be recycled by the GC. Therefore expressions like

cons3(a, cons3(b, c))

do not need to temporarily protect the values B or C or the return value of the inner CONS. Only A has to be protected while cons(b,c) runs.

When there are no free nodes after a garbage collection, CONS3 reports an error and returns the last-resort node ZNODE, which is linked to the LISP object (NIL . NIL). This is done to cover the case where the function calling CONS3 wants to modify the car or cdr field of the allocated object.

cons3(a, d, ta) do var n;
        if (Freelist = NIL) do
                Tmpcdr := d;
                if (\ta) Tmpcar := a;
                gc();
                Tmpcar := NIL;
                Tmpcdr := NIL;
                if (Freelist = NIL) do
                        error("out of nodes", UNDEF);
                        return Znode;
                end
        end
        n := Freelist;
        Freelist := Cdr[Freelist];
        Car[n] := a;
        Cdr[n] := d;
        Tag::n := ta;
        return n;
end

CONS allocates and returns a fresh cons cell.

cons(a, d) return cons3(a, d, 0);

More Useful Functions

CAAR and friends. Not all variants are used.

caar(x) return Car[Car[x]];
cadr(x) return Car[Cdr[x]];
cdar(x) return Cdr[Car[x]];
cddr(x) return Cdr[Cdr[x]];

caadr(x) return Car[Car[Cdr[x]]];
cadar(x) return Car[Cdr[Car[x]]];
caddr(x) return Car[Cdr[Cdr[x]]];
cdadr(x) return Cdr[Car[Cdr[x]]];
cddar(x) return Cdr[Cdr[Car[x]]];

caddar(x) return Car[Cdr[Cdr[Car[x]]]];
cdddar(x) return Cdr[Cdr[Cdr[Car[x]]]];

NREV reverses a list destructively and returns the reversed list.

nrev(n) do var m, h;
        m := NIL;
        while (n \= NIL) do
                h := Cdr[n];
                Cdr[n] := m;
                m := n;
                n := h;
        end
        return m;
end

SAVE saves a LISP object on the stack. UNSAVE removes K objects from the stack and returns the last one removed. It also reports a fatal error when the stack is empty. This should never happen.

save(n) Stack := cons(n, Stack);

unsave(k) do var n;
        while (k) do
                if (Stack = NIL) fatal("stack empty");
                n := Car[Stack];
                Stack := Cdr[Stack];
                k := k-1;
        end
        return n;
end

MSAVE saves a small integer value on the "mode stack", which is used by the evaluator to save its current "mode" (state). Hence the name of this stack. MUNSAVE works like UNSAVE, but always removes and returns the top element.

msave(n) Mstack := cons3(n, Mstack, ATOMB);

munsave() do var n;
        if (Mstack = NIL) fatal("mstack empty");
        n := Car[Mstack];
        Mstack := Cdr[Mstack];
        return n;
end

Symbols

The STRSYM function creates a new (uninterned) symbol from a string. The internal structure of a symbol is a chain of atomic nodes consed to the value of the symbol. The chain of atomic nodes carries two characters of the name per node. When the number of characters is not even, the last node will be filled with NUL.

Here is the structure of the symbol named LAMBDA:

   CONS
  +---+---+
  | O | O-----> VALUE
  +-|-+---+
    |
    V
+----+---+    +----+---+    +----+---+
| LA | O----->| MB | O----->| DA |NIL|
+----+---+    +----+---+    +----+---+
 ATOM          ATOM          ATOM

Note that the resulting symbol does not have a value box and hence cannot be used as a variable. Value boxes will be added by the ADDSYM function, which follows further below.

strsym(s) do var i, n;
        i := 0;
        if (s::i = 0) return NIL;
        n := cons3(NIL, NIL, ATOMB);
        save(n);
        while (1) do
                Car[n] := s::i << 8 | s::(i+1);
                if (s::(i+1) = 0 \/ s::(i+2) = 0) leave;
                Cdr[n] := cons3(NIL, NIL, ATOMB);
                n := Cdr[n];
                i := i + 2;
        end
        n := unsave(1);
        return cons(n, UNDEF);
end

SYMSTR is the inverse function of STRSYM: given a symbol it returns a string containing the name of the symbol. The name is returned in a global buffer (SYMB) that will be overwritten each time SYMSTR is called.

var Symb::SYMLEN+2;

symstr(n) do var i;
        i := 0;
        n := Car[n];
        while (n \= NIL) do
                Symb::i := Car[n] >> 8;
                Symb::(i+1) := Car[n] & 0xff;
                i := i + 2;
                n := Cdr[n];
        end
        Symb::i := 0;
        return Symb;
end

FINDSYM locates a symbol in the symbol list (SYMBOLS) and returns it. It identifies symbols by comparing their names, hence it can be used to locate not-yet-interned symbols. When the given symbol is not in the list, FINDSYM returns NIL.

findsym(s) do var p;
        p := Symbols;
        while (p \= NIL) do
                if (strequ(s, symstr(Car[p])))
                        return Car[p];
                p := Cdr[p];
        end
        return NIL;
end

ADDSYM interns a symbol S and adds a value box to it. "Interning a symbol" means to look the symbol up in the symbol list (SYMBOLS) and return the symbol in the list instead of S. When S is not in SYMBOLS, it is first added and then returned. The reader (READ) will return the return value of ADDSYM for symbols, hence the same symbol name will always return the same member of SYMBOLS. This is the reason why addresses of symbols are unique and can be compared by EQ.

ADDSYM also adds a "value box" to the symbol S and puts the value of V into the box, giving the following structure:

  +---+---+    +---+---+
  | O | O----->| O |NIL|
  +-|-+---+    +-|-+---+
    |            |
    V            V
  SYMBOL       VALUE
   NAME

When the special value SPCL is passed to ADDSYM as the value of V, the value box will refer to the symbol itself, thereby forming a "constant":

      +--------------------+
      |                    |
      V                    |
  +---+---+    +---+---+   |
  | O | O----->| O |NIL|   |
  +-|-+---+    +-|-+---+   |
    |            |         |
    V            +---------+
  SYMBOL
   NAME

(Note that constants are not "constant" in the usual sense: nothing stops you from modifying them!)

When V=UNDEF, the resulting symbol will remain "unbound".

addsym(s, v) do var n;
        n := findsym(s);
        if (n \= NIL) return n;
        n := strsym(s);
        Symbols := cons(n, Symbols);
        ie (v = SPCL)
                Cdr[n] := cons(n, NIL);
        else
                Cdr[n] := cons(v, NIL);
        return n;
end

The LOOKUP function looks up the value of the symbol X in the current "lexical environment" (ENV) as well as in the "global environment". Lexical bindings are independent from (global) symbol bindings! A lexical environment is a list of association lists. It has the following general form:

( ( (var-1,1 . (val-1,1)) ... (var-1,M . (val-1,M)) )
  ...
  ( (var-N,1 . (val-N,1)) ... (var-N,K . (val-N,K)) )

I.e., each lexical environment consists of a list of N association lists, each containing some number (M,K,...) of associations. Each association

(var . (val))

consists of a variable name (symbol) "var" and a value box "(val)". The value box will be used to store values in the variable.

The LOOKUP function traverses the environment and locates the symbol X. Because symbols are interned at this point, the operation X=Y can be used to find out if X and Y are the same symbol. When LOOKUP finds a variable associated with the given symbol, it returns its value box (not its value).

When no variable with the given name is found in the current lexical environment, the global binding (symbol binding) of the symbol is returned (also as a value box).

This approach uses deep binding (looking up symbols in lists) for local (lexical) variables and shallow binding (symbols are bound directly to value boxes) for global bindings. Shallow binding is much faster and lexical environments are typically small, so this method is almost as fast as shallow binding.

In case you wonder why not to use shallow binding for lexical variables, too: In lexically scoped systems there can be multiple, independent variables with the same name, and shallow bindings allows only one binding per symbol. This is a problem that deep binding solves, but at the price of linear-time lookup instead of constant-time lookup.

lookup(x) do var a ,e;
        e := Env;
        while (e \= NIL) do
                a := Car[e];
                while (a \= NIL) do
                        if (caar(a) = x) return cdar(a);
                        a := Cdr[a];
                end
                e := Cdr[e];
        end
        return Cdr[x];
end

The Reader

The variable PARENS keeps track of the number of open left parentheses.

var      Parens;

The RDLIST function reads the members of a list (which in turn may be lists) and returns it. It recurses through the READ function, which implements the LISP reader, in order to read each individual member of the list. Note that the name "READ" is not reserved for a system call in T3X, so it can be used to name the LISP reader.

RDLIST appends new elements to its result destructively by maintaining a pointer (A) that points to the last cons in the list. This is why it starts with LST=(NIL): the first member of the resulting list will be inserted in the place of the NIL. The append pointer (A) is initially NIL in order to identify the empty list, (). It will be pointed to the next free slot after reading and inserting the first member of a list. I.e., initially:

LST --> (NIL)
  A --> NIL

and after reading the first and second element:

                A                            A            
                |                            |            
                V                            V            
LST --> (first NIL)   LST --> (first second NIL)

The next element will always be inserted at car(A), another (NIL) will be attached at cdr(A), and then A will move to that (NIL).

RDLIST also reads dotted lists. There are lots of ugly edge cases to cover here, like:

finding a dot in the first slot of a list
finding a dot in the last slot of a list
finding multiple elements after a dot

decl read(0);

rdlist() do var n, lst, a, badpair;
        badpair := "bad pair";
        Parens := Parens+1;
        lst := cons(NIL, NIL);
        save(lst);
        a := NIL;
        while (1) do
                if (Errflag) return NIL;
                n := read();
                if (n = EOT) return error("missing ')'", UNDEF);
                if (n = DOT) do
                        if (a = NIL) return error(badpair, UNDEF);
                        n := read();
                        Cdr[a] := n;
                        if (n = RPAREN \/ read() \= RPAREN)
                                return error(badpair, UNDEF);
                        unsave(1);
                        Parens := Parens-1;
                        return lst;
                end
                if (n = RPAREN) leave;
                ie (a = NIL) 
                        a := lst;
                else
                        a := Cdr[a];
                Car[a] := n;
                Cdr[a] := cons(NIL, NIL);
        end
        Parens := Parens-1;
        if (a \= NIL) Cdr[a] := NIL;
        unsave(1);
        return a = NIL-> NIL: lst;
end

SYMBOLIC returns truth, if the character in its parameter, C, can appear in a symbol name. It also returns truth for the slash ('/') character, which escapes (slashifies) special characters in symbol names.

symbolic(c) return c >= 'a' /\ c <= 'z' \/
                   c >= 'A' /\ c <= 'Z' \/
                   c >= '0' /\ c <= '9' \/
                   c = '-'  \/ c = '/';

The following variables will be bound to internal symbols known to the KILO LISP 22 system. First special forms and then built-in functions (and the T symbol indicating truth).

var     S_apply, S_if, S_ifnot, S_lambda, S_lamstar, S_macro, S_progn,
        S_quote, S_quasiquote, S_unquote, S_unquote_splice, S_setq;

var     S_t, S_binding, S_cons, S_car, S_cdr, S_atom, S_eq, S_gensym,
        S_it, S_suspend, S_gc, S_eofp, S_load, S_rplaca, S_rplacd,
        S_read, S_prin1, S_print, S_error, S_symbols;

RDSYM reads a symbol name, which may consist of letters, decimal digits, and minus signs, as indicated by the SYMBOLIC function, above. When a slash ('/') is found in a symbol name, the following character is included, no matter what it is. For instance, the text /+/− will result in a symbol consisting of the characters '+' and '−'.

RDSYM folds all characters to lower case. When the symbol read by it is "nil", it will return the NIL object. In all other cases it will intern the accepted symbol and return it.

rdsym(c) do var s::SYMLEN+1, i;
        i := 0;
        while (symbolic(c)) do
                if (c = '/') c := rdchci();
                ie (SYMLEN = i)
                        error("long symbol", UNDEF);
                else if (i < SYMLEN) do
                        s::i := c;
                        i := i+1;
                end
                c := rdchci();
        end
        s::i := 0;
        Rejected := c;
        if (strequ(s, "nil")) return NIL;
        return addsym(s, UNDEF);
end

The SYNTAX function is called in various places where a syntactically wrong object (X) is encountered. This includes expressions like (IF X Y), where the third argument is missing, but also lexically wrong text, like *FOO* (which is not a valid symbol).

syntax(x) error("syntax", x);

QUOTE is used to construct expressions of the form (Q N), where N is any expression and Q is one out of QUOTE, QUASIQUOTE, UNQUOTE, and UNQUOTE-SPLICE. Q does not have to be protected from GC, because it is a symbol and all symbols are protected through the symbol list.

quote(q, n) return cons(q, cons(n, NIL));

RDWORD reads a "word". In KILO LISP 22 a word is a list of single-character symbols. Words may be abbreviated (condensed) with a # character followed by the symbols they contain. For example

(f o o b a r b a z)

may be written as

#foobarbaz

There is no empty word, so there must be at least one symbolic character after the # sign. Non-symbolic characters may be included by slashifying them. E.g.:

#hello/,/ world/!

rdword() do var s::2, n, c;
        s::1 := 0;
        c := rdchci();
        if (\symbolic(c)) syntax(UNDEF);
        n := cons(NIL, NIL);
        save(n);
        while (1) do
                if (c = '/') c := rdchci();
                s::0 := c;
                Car[n] := addsym(s, UNDEF);
                c := rdchci();
                ie (symbolic(c)) do
                        Cdr[n] := cons(NIL, NIL);
                        n := Cdr[n];
                end
                else
                        leave;
        end
        Rejected := c;
        n := unsave(1);
        return n;
end

The READ function implements the LISP reader, which parses textual representations of LISP objects and translates them to trees of nodes. This is what the function does:

skip over space characters (blank, tab, CR, LF) and comments
return EOT when the EOT or the '%' character is read
delegate parsing of lists, symbols, and words to the above functions
expand @x and `x to (QUASIQUOTE X)
expand ,x to (UNQUOTE X)
expand ,@x to (UNQUOTE-SPLICE X)
report dots outside of lists and extra right parentheses

READ returns the tree it generated from the parsed object. It reports a syntax error when it fails to parse its input.

read() do var c;
        c := rdchci();
        while (1) do
                while (c = ' ' \/ c = '\t' \/ c = '\n' \/ c = '\r') do
                        if (Errflag) return NIL;
                        c := rdchci();
                end
                if (c \= ';') leave;
                while (c \= '\n' /\ c \= '\r') c := rdch();
        end
        if (c = EOT \/ c = '%') return EOT;
        if (c = '(') do
                return rdlist();
        end
        if (c = '\'') do
                return quote(S_quote, read());
        end
        if (c = '@' \/ c = '`') do
                return quote(S_quasiquote, read());
        end
        if (c = ',') do
                c := rdchci();
                if (c = '@') return quote(S_unquote_splice, read());
                Rejected := c;
                return quote(S_unquote, read());
        end
        if (c = ')') do
                if (\Parens) error("extra paren", UNDEF);
                return RPAREN;
        end
        if (c = '.') do
                if (\Parens) error("free dot", UNDEF);
                return DOT;
        end
        if (symbolic(c)) do
                return rdsym(c);
        end
        if (c = '#') do
                return rdword();
        end
        syntax(UNDEF);
        return UNDEF;
end

The Printer

The PRINT and PRINT2 functions implement the LISP printer, which turns LISP objects — trees of nodes — into textual representations. It is (almost) the inverse function of READ. Almost the inverse function, because there are some outputs of PRINT that READ cannot read (like the string *CLOSURE* that will print when passing a function to PRINT).

The printer will stop printing when the error flag is set or when the maximum print depth is exceeded. Therefore, the values of expressions like

(let ((x '(1))) (rplaca x x))

will not overflow the stack. (Lists that cycle through cdr will print indefinetely, though.)

print2(n, d) do
        if (d > PRDEPTH) error("print depth", UNDEF);
        if (Errflag) return error("stop", UNDEF);
        ie (n = NIL) do
                pr("nil");
        end
        else ie (n = EOT) do
                pr("*eot*");
        end
        else ie (n >= SPCL \/ Tag::n & ATOMB) do
                pr("*unprintable*");
        end
        else ie (symbolp(n)) do
                pr(symstr(n));
        end
        else do   ! List
                if (\atomp(n) /\ Car[n] = S_lamstar) do
                        pr("*closure*");
                        return;
                end
                pr("(");
                while (n \= NIL) do
                        print2(Car[n], d+1);
                        n := Cdr[n];
                        ie (symbolp(n)) do
                                pr(" . ");
                                print2(n, d+1);
                                n := NIL;
                        end
                        else if (n \= NIL) do
                                pr(" ");
                        end
                end
                if (\Errflag) pr(")");
        end
end

print(n) print2(n, 0);

Image Files

An "image file" is a file containing the entire node pool of the LISP system plus some meta data. The layout of an image file is as follows ("W" is the size of a machine word in bytes, two-byte integers are stored in big-endian order):

Size (Bytes)	Content	Location
4	Magic ID ("KL22")	MAGIC
2	Number of nodes in pool	NNODES
2	Offset of freelist	FREELIST
2	Offset of symbol list	SYMBOLS
2	Next GENSYM ID	ID
W × NNODES	Car fields of node pool	CAR
W × NNODES	Cdr fields of node pool	CDR
NNODES	Tag fields of node pool	TAG

By saving an image file and reading it back later, the previous state of the LISP system is restored.

The SUSPEND function writes an image to the file specified in the string S and the FASLOAD function loads an image file from the given file. Both functions will report a fatal error when encountering an I/O error. In addition, FASLOAD will report a fatal error when the image file passed to it has the wrong magic ID or the wrong pool size.

var      Magic;
var      Id;

dowrite(fd, b, k)
        if (t.write(fd, b, k) \= k)
                error("image write error", UNDEF);

suspend(s) do var fd, k, buf::20;
        fd := t.open(s, T3X.OWRITE);
        if (fd < 0) error("suspend", strsym(s));
        k := strlen(Magic);
        t.memcopy(buf, Magic, k);
        buf::(k) := (NNODES >> 8);
        buf::(k+1) := NNODES;
        buf::(k+2) := (Freelist >> 8);
        buf::(k+3) := Freelist;
        buf::(k+4) := (Symbols >> 8);
        buf::(k+5) := Symbols;
        buf::(k+6) := (Id >> 8);
        buf::(k+7) := Id;
        dowrite(fd, buf, k+8);
        dowrite(fd, Car, NNODES * t.bpw());
        dowrite(fd, Cdr, NNODES * t.bpw());
        dowrite(fd, Tag, NNODES);
        t.close(fd);
end

doread(fd, b, k)
        if (t.read(fd, b, k) \= k)
                fatal("image read error");

fasload(s) do var fd, k, n, m, buf::20;
        fd := t.open(s, T3X.OREAD);
        if (fd < 0) return;
        k := strlen(Magic);
        doread(fd, buf, k+8);
        n := buf::k << 8 | buf::(k+1);
        if (n \= NNODES \/ t.memcomp(buf, Magic, k))
                fatal("bad image");
        Freelist := buf::(k+2) << 8 | buf::(k+3);
        Symbols := buf::(k+4) << 8 | buf::(k+5);
        Id := buf::(k+6) << 8 | buf::(k+7);
        doread(fd, Car, NNODES * t.bpw());
        doread(fd, Cdr, NNODES * t.bpw());
        doread(fd, Tag, NNODES);
        t.close(fd);
end

Built-In Functions

CHECK makes sure that the list X has at least K0 and at most KN elements. When KN=−1, it makes sure than X has at least K0 elements (and there is no upper limit). When neither case is given, it reports a syntax error. It also reports a syntax error when X is a dotted list (i.e. when it does not end with NIL).

The CHECK function is used to make sure that applications of built-in functions and special forms have the proper number of arguments. Therefore expressions like (CAR '1 '2) will be reported as syntax errors.

check(x, k0, kn) do var i, a;
        i := 0;
        a := x;
        while (\atomp(a)) do
                i := i+1;
                a := Cdr[a];
        end;
        if (a \= NIL \/ i < k0 \/ (kn \= %1 /\ i > kn))
                syntax(x);
end

The LOAD function reads and evaluates all LISP expressions contained in the file specified in the string S. All expressions evaluate as if entered at the interpreter prompt. When the given file does not exist, an error is reported.

Most of the code of LOAD saves, modifies, and restores the variables that comprise the input buffer. The state of the buffer is stored in local variables, then the state is reset, the buffer is linked to the given file, and a new buffer area (in TMPBUF) is allocated. After loading the file, the previous state of the buffer is restored.

Because stack space may be limited, recursive loading is limited to NLOAD levels of nesting.

var      Loads;
var      Tmpbuf::NLOAD*BUFLEN;

decl eval(1);

load(s) do var fd, ib, ip, ik, in, re;
        if (Loads >= NLOAD) return error("load limit", UNDEF);
        fd := t.open(s, T3X.OREAD);
        if (fd < 0) return error("load", strsym(s));
        in := Input;
        ip := Inp;
        ik := Ink;
        ib := Inbuf;
        re := Rejected;
        Input := fd;
        Inbuf := @Tmpbuf::(Loads*BUFLEN);
        Inp := 0;
        Ink := 0;
        Rejected := EOT;
        Loads := Loads+1;
        while (\Errflag) do
                Acc := read();
                if (Acc = EOT) leave;
                eval(Acc);
        end
        t.close(fd);
        Loads := Loads-1;
        Rejected := re;
        Inbuf := ib;
        Ink := ik;
        Inp := ip;
        Input := in;
end

The TYPE function is called whenever the wrong type is passed to a built-in function in an expression X.

type(x) error("type", x);

The function BUILTIN receives a list X that represents the application of a built-in function, like

(cons x y)

BUILTIN then identifies the function in the first slot of the list and runs the code that implements the function. The code fragments that implement built-in functions all look similar:

first the number of arguments is checked;
if the function expects certain types of arguments, their types are checked;
the function itself is computed and its result returned.

For instance, the code implementing CAR would make sure that X has two elements (the name CAR and the argument of CAR). It would then make sure that the argument of CAR is not an atom and, finally, it would return (CAADR X), the CAR of the first argument in X. The code for CONS or EQ would skip the type check, because these functions accept any types of arguments.

While BUILTIN is (formally) a large function, it is really a collection of small functions of the above form.

builtin(x) do var s, n;
        ie (S_car = Car[x]) do
                check(x, 2, 2);
                n := cadr(x);
                if (n = NIL) return NIL;
                if (atomp(n)) type(x);
                return Car[n];
        end
        else ie (S_cdr = Car[x]) do
                check(x, 2, 2);
                n := cadr(x);
                if (n = NIL) return NIL;
                if (atomp(n)) type(x);
                return Cdr[n];
        end
        else ie (S_eq = Car[x]) do
                check(x, 3, 3);
                return cadr(x) = caddr(x)-> S_t: NIL;
        end
        else ie (S_atom = Car[x]) do
                check(x, 2, 2);
                return atomp(cadr(x))-> S_t: NIL;
        end
        else ie (S_cons = Car[x]) do
                check(x, 3, 3);
                return cons(cadr(x), caddr(x));
        end
        else ie (S_rplaca = Car[x]) do
                check(x, 3, 3);
                if (atomp(cadr(x))) type(x);
                Car[cadr(x)] := caddr(x);
                return cadr(x);
        end
        else ie (S_rplacd = Car[x]) do
                check(x, 3, 3);
                if (atomp(cadr(x))) type(x);
                Cdr[cadr(x)] := caddr(x);
                return cadr(x);
        end
        else ie (S_gensym = Car[x]) do
                check(x, 1, 1);
                Id := Id+1;
                s := ntoa(Id);
                s := s-1;
                s::0 := 'G';
                return addsym(s, UNDEF);
        end
        else ie (S_eofp = Car[x]) do
                check(x, 2, 2);
                return cadr(x) = EOT-> S_t: NIL;
        end
        else ie (S_read = Car[x]) do
                check(x, 1, 1);
                return read();
        end
        else ie (S_prin1 = Car[x]) do
                check(x, 2, 2);
                print(cadr(x));
                return cadr(x);
        end
        else ie (S_print = Car[x]) do
                check(x, 2, 2);
                print(cadr(x));
                nl();
                return cadr(x);
        end
        else ie (S_binding = Car[x]) do
                check(x, 2, 2);
                n := cadr(x);
                if (\symbolp(n)) type(x);
                n := lookup(n);
                if (Car[n] = UNDEF) return NIL;
                return n;
        end
        else ie (S_load = Car[x]) do
                check(x, 2, 2);
                if (\symbolp(cadr(x))) type(x);
                load(symstr(cadr(x)));
                return S_t;
        end
        else ie (S_error = Car[x]) do
                check(x, 2, 3);
                if (\symbolp(cadr(x))) type(x);
                ie (cddr(x) = NIL)
                        error(symstr(cadr(x)), UNDEF);
                else
                        error(symstr(cadr(x)), caddr(x));
                return UNDEF;
        end
        else ie (S_gc = Car[x]) do
                check(x, 1, 2);
                if (Cdr[x] \= NIL)
                        Verbose_GC := cadr(x) \= NIL;
                return strsym(ntoa(gc()));
        end
        else ie (S_suspend = Car[x]) do
                check(x, 2, 2);
                if (\symbolp(cadr(x))) type(x);
                suspend(symstr(cadr(x)));
                return S_t;
        end
        else ie (S_symbols = Car[x]) do
                check(x, 1, 1);
                return Symbols;
        end
        else do
                syntax(x);
                return UNDEF;
        end
end

Special Forms

SPECIAL returns truth, if N is an atom denoting a special operator, like IF, PROGN, etc. S_LAMSTAR denotes the atom LAMBDA*, which is used internally to form closures. Obviously it is a bad idea to use LAMBDA/* in KILO LISP programs. Closures should only be generated by applications of LAMBDA.

specialp(n)
        return   n = S_quote \/
                n = S_if \/
                n = S_progn \/
                n = S_ifnot \/
                n = S_lambda \/
                n = S_lamstar \/
                n = S_apply \/
                n = S_setq \/
                n = S_macro;

CKLAM checks the syntax of a LAMBDA form: it must have at least two arguments and the first one must be either an atom or a list of atoms. A dotted list is fine.

cklam(x) do var p;
        check(x, 3, %1);
        p := cadr(x);
        while (\atomp(p)) do
                if (\symbolp(Car[p])) syntax(x);
                p := Cdr[p];
        end
end

The following STRUCT enumerates the "states" or "modes" of the evaluator. STRUCT is like ENUM in C (but it is most often used to give structure to vectors in T3X). The modes are as follows:

MHALT	evaluation is complete
MEXPR	start evaluation of an expression
MLIST	evaluate arguments of an application
MBETA	apply an operator (special, built-in, or closure)
MRETN	return from a closure application
MAPPL	evaluate arguments of APPLY
MPRED	select alternative or consequent of IF
MNOTP	select predicate or alternative of IFNOT
MSETQ	store a value in a variable
MPROG	evaluate arguments of PROGN

The modes will be explained in detail in the following sections.

struct MODES = MHALT, MEXPR, MLIST, MBETA, MRETN, MAPPL, MPRED,
                MNOTP, MSETQ, MPROG;

SPECIAL is like BUILTIN, but computes special forms. In addition to the special form in X it receives another argument, PM, which is an output parameter (pointer) that is used to update the evaluator mode. SPECIAL will return the (intermediate) result of its operation to the caller and also set the variable pointed to by PM to the mode of the subsequent computation.

All special forms change the mode of the evaluator. This is one of their defining qualities in KILO LISP. The other one is that special forms receive their arguments unevaluated. For instance, evaluating

(QUOTE x)

will pass exactly that form to SPECIAL; X will not be evaluated first. The code fragment interpreting QUOTE will then check the syntax of QUOTE and return the second element of the QUOTE form. It will also restore the previous evaluator state by removing it from the mode stack and storing it in PM[0]. When QUOTE is evaluated directly at the prompt, the previous evaluator state would be MHALT, causing the evaluator to halt and return the argument of QUOTE.

Evaluation of most special forms is more complicated, though.

For example, the IF special form saves the MPRED state on the mode stack, sets the mode to MEXPR, saves the cddr part (the second and third argument of IF) on the stack, and returns the first argument of the IF form. E.g. (IF A B C) would

save (B C) on the stack;
return A.

Because the mode is set to MEXPR, the evaluator would then evaluate A, leaving the result in ACC. When evaluation of A finishes, the previous state is popped from the mode stack. IF left MPRED on the mode stack, so either B or C is selected, depending on the value of ACC. The state will then be changed back to MEXPR so the selected expression (B or C) evaluates.

Most special forms work in this way:

push a subsequent mode to the mode stack;
select an argument;
save the other arguments on the stack;
store a new current mode in PM[0];
return the selected argument.

Many special forms set the mode to MEXPR, causing the value returned by them to be evaluated. Exceptions are QUOTE and LAMBDA*, which are self-quoting, and LAMBDA and MACRO, which return their results directly. These forms pop the previous mode from the mode stack.

special(x, pm) do
        if (S_quote = Car[x]) do
                check(x, 2, 2);
                pm[0] := munsave();
                return cadr(x);
        end
        if (S_if = Car[x]) do
                check(x, 4, 4);
                msave(MPRED);
                pm[0] := MEXPR;
                save(cddr(x));
                return cadr(x);
        end
        if (S_progn = Car[x]) do
                pm[0] := MEXPR;
                if (Cdr[x] = NIL) return NIL;
                if (cddr(x) = NIL) return cadr(x);
                msave(MPROG);
                save(cddr(x));
                return cadr(x);
        end
        if (S_ifnot = Car[x]) do
                check(x, 3, 3);
                msave(MNOTP);
                pm[0] := MEXPR;
                save(caddr(x));
                return cadr(x);
        end
        if (S_lambda = Car[x]) do
                cklam(x);
                pm[0] := munsave();
                return cons(S_lamstar, cons(Env, Cdr[x]));
        end
        if (S_lamstar = Car[x]) do
                ! check(x, 3, %1);
                pm[0] := munsave();
                return x;
        end
        if (S_apply = Car[x]) do
                check(x, 3, 3);
                msave(MAPPL);
                pm[0] := MEXPR;
                save(caddr(x));
                save(UNDEF);
                return cadr(x);
        end
        if (S_macro = Car[x]) do
                check(x, 2, 2);
                if (atomp(cadr(x)) \/ caadr(x) \= S_lambda)
                        syntax(x);
                cklam(cadr(x));
                pm[0] := munsave();
                return cons(S_macro,
                        cons(cons(S_lamstar, cons(Env, cdadr(x))),
                             NIL));
        end
        if (S_setq = Car[x]) do
                check(x, 3, 3);
                if (\symbolp(cadr(x))) syntax(x);
                msave(MSETQ);
                pm[0] := MEXPR;
                save(cadr(x));
                return caddr(x);
        end
        syntax(x);
        return UNDEF;
end

The Macro Expander

The EXPAND function receives a LISP form in X and returns its expanded form. An expanded form is a form that contains only function applications and applications of "primitive" special forms, but no applications of "derived special forms". A primitive special form is a form that is being processed by the SPECIAL function, above. A derived special form is a list with the name of a macro (a "keyword") in its first slot.

The process of macro expansion searches derived special forms in the form X and "expands" them by applying the function associated with the keyword of the form to the arguments of the form. Arguments are not evaluated. When a derived special form

(keyword argument-1 ... argument-N)

is found in X, EXPAND transforms it to the form

(apply *closure* (quote (argument-1 ... argument-N)))

where *CLOSURE* is the function associated with the macro KEYWORD.

It then evaluates the resulting form and replaces the application of the derived special operator with its result. Derived special forms are expanded recursively, i.e. the result of macro expansion is expanded again before substituting the application of the special operator.

Macro expansion never expands atoms or quoted objects. The result of expanding a form X that does not contain any derived special forms is the form X itself. Hence macro expansion eventually reaches a fixed point where its result is identical to its argument (unless expansion does not terminate due to a faulty macro).

The first half of EXPAND deals with the trivial cases and performs the transformation described above. The second half applies EXPAND recursively to every element of X, thereby performing a depth-first traversal of X. Note that dotted lists are never expanded.

expand(x) do var n, m, p;
        if (atomp(x)) return x;
        if (Car[x] = S_quote) return x;
        n := symbolp(Car[x])-> Car[lookup(Car[x])]: UNDEF;
        if (\atomp(n) /\ Car[n] = S_macro) do
                m := cons(Cdr[x], NIL);
                m := cons(S_quote, m);
                m := cons(m, NIL);
                m := cons(cadr(n), m);
                m := cons(S_apply, m);
                save(m);
                n := eval(m);
                Car[Stack] := n;
                n := expand(n);
                unsave(1);
                return n;
        end
        p := x;
        while (\atomp(p)) p := Cdr[p];
        if (symbolp(p)) return x;
        save(x);
        n := NIL;
        save(n);
        p := x;
        while (p \= NIL) do
                m := expand(Car[p]);
                n := cons(m, n);
                Car[Stack] := n;
                p := Cdr[p];
        end
        n := nrev(unsave(1));
        unsave(1);
        return n;
end

The Evaluator

The BINDARGS function binds the symbols in the list V to the values (arguments) in the list A. It does so by creating an association list with the elements of V as keys and value boxes containing the elements of A as values. The resulting association list is then added to the front to the current environment ENV. Existing variables in ENV which have the same names as symbols in V are thereby shadowed.

V and A must have the same length unless V is a dotted list. When V is a dotted list, there must be one corresponding element in A for each element of V that comes before the dot. Any excess elements of A will then be bound to the last symbol in V as a list. For example

V = (a b . c)
A = (1 2 3 4 5)

will result in the association list

((a . (1)) (b . (2)) (c . ((3 4 5))))

When number of symbols in V and values in A does not match, BINDARGS reports an error. In case of success, it extends ENV with the new bindings.

bindargs(v, a) do var e, n;
        e := NIL;
        save(e);
        while (\atomp(v)) do
                if (a = NIL) return error("too few args", Acc);
                n := cons(Car[v], cons(Car[a], NIL));
                e := cons(n, e);
                Car[Stack] := e;
                v := Cdr[v];
                a := Cdr[a];
        end
        ie (symbolp(v)) do
                n := cons(v, cons(a, NIL));
                e := cons(n, e);
                Car[Stack] := e;
        end
        else if (a \= NIL) do
                return error("extra args", Acc);
        end
        Env := cons(e, Env);
        unsave(1);
end

FUNAPP applies a function ("closure") to some arguments. It receives the application of a closure of the form

((lambda* vars env body ...) args ...)

where (LAMBDA* ...) is a closure as returned by the LAMBDA special form. A closure is a function containing the following components:

VARS, the variables of the function;
ENV, the lexical environment that was in effect when the closure was created by an application of LAMBDA;
BODY, one or multiple expressions forming the body of the function.

A closure is applied to the arguments ARGS by

saving the current environment, ENV, on the stack
reestablishing the environment saved in the closure
bindings the variables of the function to the corresponding arguments
pushing the MRETN evaluator mode to the mode stack
returning (PROGN body ...) for evaluation

When the top of the mode stack is MRETN when a function is applied, then the function application is a tail application. When the called (applied) function returns, the caller will itself return, so there is no need to carry along the saved environment of the caller. Therefore steps 1 and 4 will be omitted in the above list when performing a tail application. Subsequently tail-recursive functions will compute in constant space.

funapp(x) do
        Acc := x;
        if (atomp(Car[x]) \/ caar(x) \= S_lamstar)
                syntax(x);
        ie (Mstack \= NIL /\ Car[Mstack] = MRETN) do
                Env := cadar(Acc);
                bindargs(caddar(Acc), Cdr[Acc]);
        end
        else do
                save(Env);
                Env := cadar(Acc);
                bindargs(caddar(Acc), Cdr[Acc]);
                msave(MRETN);
        end
        return cons(S_progn, cdddar(x));
end

The EVAL function is the core of the LISP system. It receives a LISP expression and returns its "normal form", i.e. the result of interpreting the expression. There is a limit on the number of recursive applications of EVAL (EVDEPTH), but this limit only affects macro expansion. While evaluating an expression EVAL never recurses.

The first thing EVAL does is to macro-expand the expression (X) passed to it. The macro expander will in turn apply EVAL in order to expand any macros in X. This mutual recursion is what it limited to EVDEPTH iterations.

The evaluator has a current "state" or "mode" (M), which determines in which way it will transform the expression X. In principle the evaluator can be thought of as a set of functions, one for each mode, and the main loop of the interpreter passes data between these functions. When one internal function calls another internal function, the current mode is pushed to the mode stack. In order to return to the caller, the previous mode is popped from the stack. The mode stack is not only used to return to callers, though. Some functions push a "continuation" to the mode stack: a mode where the computation shall continue when completing the current mode. This has been illustrated in the section on "special forms".

The evaluator stores results (and intermediate results) in a register named the "accumulator" (ACC). This register is also used internally to pass data between the internal functions (modes).

Here follows a summary of the different modes of the evaluator.

MEXPR — Evaluate Expression

When the expression ACC is a symbol, load the value bound to it into ACC and go the previous mode. When the symbol is not bound, report an error.

When ACC is a non-symbolic atom (NIL, the EOT object, etc), go to the previous state (keeping the atom in ACC).

When ACC is a special form (its car part is a special operator or macro name), go to state MBETA.

When ACC is a function application, save cdr(ACC) on the stack, load car(ACC) to ACC, push cdr(ACC) and NIL to the stack and MLIST to the mode stack. The state remains MEXPR, which causes car(ACC) to be evaluated next. After evaluation of car(ACC), MLIST will be popped from the stack, so MLIST is the continuation of this case.

MLIST — Evaluate Argument List

In this state the remaining arguments to be processed will be in cadr(STACK) and the arguments processed so far will be in car(STACK) in reverse order. When there are no more arguments to process, MLIST will construct the final argument list by adding the last element (ACC) and reversing the list. The final argument list is then stored in ACC and both parameters are removed from the stack. The mode will then change to MBETA.

When there are more arguments to be processed, MLIST will cons the previously evaluated argument (ACC) to the top element on the stack and fetch the next element from cadr(STACK) and remove it from that list. It will then push MLIST to the mode stack again (because it has been popped by evaluating the current list element) and go to mode MEXPR in order to evaluate the next element.

MBETA — Function Application

This mode delegates application of special forms, built-in functions, and closures to the functions BUILTIN, SPECIAL, and FUNAPP. Special forms get their continuation mode from the function SPECIAL by passing the address of M (@M) to it. Built-in functions return to the previous mode after computing their result. Closures evaluate the body returned by FUNAPP by going to mode MEXPR. Note that FUNAPP pushes MRETN in order to return from the function later.

This mode is called MBETA because it loosely resembles the "beta" conversion of lambda calculus.

MRETN — Function Return

Returning from a function is done by restoring the environment (ENV) saved on the stack and the mode saved on the mode stack. Both are removed from their stacks.

MAPPL — APPLY operator

The APPLY operator has two arguments that need to be evaluated. When MAPPL is entered for the first time, car(STACK) is UNDEF, cadr(STACK) contains the (not yet evaluated) second argument, and ACC contains the (already evaluated) first argument. The first argument is then stored in car(STACK) and the second argument is prepared for evaluation (like in MLIST).

When MAPPL is entered for the second time, it conses the first argument to the second one and goes to mode MBETA. At this point ACC will contain the form

(first second-1 ... second-N)

where "first" is the (evaluated) first argument of APPLY and each "second-i" is a member of the list forming its (evaluated) second argument. The above form is then evaluated without evaluating its arguments again, by changing the mode directly to MBETA instead of going through MLIST.

MPRED — Predicate

When MPRED is entered, the predicate of the IF special form

(IF predicate consequent alternative)

has been evaluated and its result is in ACC. Car(STACK) contains the "consequent" and "alternative" parts of the special form. MPRED loads the alternative into ACC when ACC is NIL and otherwise it loads the consequent. It then changes the mode to MEXPR in order to evaluate the selected branch.

MNOTP — Negated Predicate

This mode evaluates the IFNOT special form, which is probably has no counterpart in other LISPs. The IFNOT special form

(IFNOT predicate alternative)

evaluates the predicate and if the predicate it not NIL it just returns the value of the predicate (without evaluating it again). If the predicate is NIL it evaluates and returns the alternative.

When MNOTP is entered, the value saved in car(STACK) is the alternative branch. When ACC is not NIL, MNOTP returns to the previous mode. When it is NIL, it will submit car(STACK) for evaluation.

IFNOT is very handy for implementing the OR special form or predicate-only clauses of COND. See the LISP part of this document for examples.

MSETQ — Set Variable

When MSETQ is entered, the value to be assigned to a variable is in ACC and the symbol naming the variable is in car(STACK). MSETQ removes the symbol from the stack, stores the value in its value box, and returns to the previous mode.

MPROG — Program Sequence

MPROG evaluates a sequence of expressions (a "program"). It discards the values of all expressions except for the last one, which it returns. Empty sequences result in NIL, but this case is handled by the function SPECIAL, which has been discussed earlier.

MPROG keeps the list of expressions to evaluate in car(STACK). When car(STACK) contains one single expression, it removes the list from the stack and submits the expression for evaluation. When there are multiple expressions in the list, it removes and submits one expression from the list and pushes MPROG to the mode stack as its continuation.

MHALT — Halt Evaluation

This mode returns from EVAL and delivers the value in ACC to the caller.

Note that MHALT is on top of the mode stack when the EVAL is entered, so returning to the previous mode after finishing evaluation of the expression passed to EVAL will cause evaluation to halt.

Evaluation will also halt when an error is reported (ERRFLAG is set) or when control-C is pressed on the terminal keyboard.

var      Evlev;

eval(x) do var n, m;
        if (Evlev >= EVDEPTH)
                return error("expansion limit", UNDEF);
        Evlev := Evlev + 1;
        Acc := expand(x);
        msave(MHALT);
        m := MEXPR;
        while (\Errflag) do
                ie (m = MEXPR) do
                        if (\Pass /\ tty.query() /\ tty.readc() = 3)
                                error("stop", UNDEF);
                        ie (symbolp(Acc)) do
                                n := Car[lookup(Acc)];
                                if (n = UNDEF)
                                        return error("undefined", Acc);
                                Acc := n;
                                m := munsave();
                        end
                        else ie (atomp(Acc)) do
                                m := munsave();
                        end
                        else ie (specialp(Car[Acc])) do
                                m := MBETA;
                        end
                        else do
                                save(Cdr[Acc]);
                                Acc := Car[Acc];
                                save(NIL);
                                msave(MLIST);
                        end
                end
                else ie (m = MLIST) do
                        ie (cadr(Stack) = NIL) do
                                Acc := nrev(cons(Acc, unsave(1)));
                                unsave(1);
                                m := MBETA;
                        end
                        else do
                                Car[Stack] := cons(Acc, Car[Stack]);
                                Acc := caadr(Stack);
                                Car[Cdr[Stack]] := cdadr(Stack);
                                msave(MLIST);
                                m := MEXPR;
                        end
                end
                else ie (m = MBETA) do
                        ie (specialp(Car[Acc])) do
                                Acc := special(Acc, @m);
                        end
                        else ie (symbolp(Car[Acc])) do
                                Acc := builtin(Acc);
                                m := munsave();
                        end
                        else do
                                Acc := funapp(Acc);
                                m := MEXPR;
                        end
                end
                else ie (m = MRETN) do
                        Env := unsave(1);
                        m := munsave();
                end
                else ie (m = MAPPL) do
                        ie (Car[Stack] = UNDEF) do
                                Car[Stack] := Acc;
                                Acc := cadr(Stack);
                                msave(MAPPL);
                                m := MEXPR;
                        end
                        else do
                                n := unsave(1);
                                unsave(1);
                                Acc := cons(n, Acc);
                                m := MBETA;
                        end
                end
                else ie (m = MPRED) do
                        n := unsave(1);
                        ie (Acc = NIL)
                                Acc := cadr(n);
                        else
                                Acc := Car[n];
                        m := MEXPR;
                end
                else ie (m = MNOTP) do
                        n := unsave(1);
                        ie (Acc = NIL) do
                                Acc := n;
                                m := MEXPR;
                        end
                        else do
                                m := munsave();
                        end
                end
                else ie (m = MSETQ) do
                        n := unsave(1);
                        Car[lookup(n)] := Acc;
                        Acc := n;
                        m := munsave();
                end
                else ie (m = MPROG) do
                        ie (cdar(Stack) = NIL) do
                                Acc := Car[unsave(1)];
                                m := MEXPR;
                        end
                        else do
                                Acc := caar(Stack);
                                Car[Stack] := cdar(Stack);
                                msave(MPROG);
                                m := MEXPR;
                        end
                end
                else if (m = MHALT) do
                        Evlev := Evlev - 1;
                        return Acc;
                end
        end
        return NIL;
end

System Initialization

The RESET function resets some variables to their initial values. For instance, it clears the registers (ACC, ENV, TMPCAR, TMPCDR), empties the stacks, and resets the various counters and the error flag. It prepares the evaluation of an expression. This cannot be done in EVAL itself, because EVAL recurses during macro expansion.

The lexical environment (ENV) must be NIL (empty) initially, because there are no local bindings at the top level. Aborted computation may leave spurious associations in ENV, so it is cleared here. TMPCAR, TMPCDR and the links of ZNODE are cleared so that they do not protect random objects from being recycled.

reset() do
        Evlev := 0;
        Parens := 0;
        Loads := 0;
        Errflag := 0;
        Acc := NIL;
        Env := NIL;
        Stack := NIL;
        Mstack := NIL;
        Tmpcar := NIL;
        Tmpcdr := NIL;
        Car[Znode] := NIL;
        Cdr[Znode] := NIL;
end

The INIT function initializes the state of the system. When not in PASS mode, it sets up the TTYCtl class and puts it in raw mode on Unix systems. It then sets up all internal variables: the freelist and symbol list are set to NIL, the history buffers of READCON are cleared, etc. The input buffer is flushed.

A large part of INIT creates symbols for internal functions, like CONS, CAR, ATOM, etc. All these symbols are defined as constants (symbols referring to themselves), because EQ is used to identify them in the evaluator. For instance, the ATOM in (ATOM 'X) evaluates to ATOM and is then identified using EQ in BUILTIN.

Special form symbols are bound to UNDEF, which leaves them unbound. This means that APPLY or SETQ, for instance, are undefined at the LISP level, but they exist internally and can still be identified by EQ, which is done in the function SPECIAL.

This section also creates the symbols QUASIQUOTE, UNQUOTE, and UNQUOTE-SPLICE, so that the reader can expand their abbreviations.

The canonical truth symbol, T, is also defined here.

init() do
        if (\Pass) do
                tty.init();
                if (tty.columns() > BUFLEN) do
                        clear();
                        fatal("screen too wide");
                end
                tty.mode(1);
                clear();
        end
        Magic := "KL22";
        Symbols := NIL;
        Freelist := NIL;
        Znode := NIL;
        Auxb1::0 := 0;
        Auxb2::0 := 0;
        Id := 0;
        flushinp();
        Verbose_GC := 0;
        t.memfill(Tag, 0, NNODES);
        Znode := cons(NIL, NIL);
        S_t := addsym("t", SPCL);
        S_apply := addsym("apply", UNDEF);
        S_if := addsym("if", UNDEF);
        S_ifnot := addsym("ifnot", UNDEF);
        S_lambda := addsym("lambda", UNDEF);
        S_lamstar := addsym("lambda*", UNDEF);
        S_macro := addsym("macro", UNDEF);
        S_progn := addsym("progn", UNDEF);
        S_quote := addsym("quote", UNDEF);
        S_quasiquote := addsym("quasiquote", UNDEF);
        S_unquote := addsym("unquote", UNDEF);
        S_unquote_splice := addsym("unquote-splice", UNDEF);
        S_setq := addsym("setq", UNDEF);
        S_it := addsym("it", UNDEF);
        S_binding := addsym("binding", SPCL);
        S_cons := addsym("cons", SPCL);
        S_car := addsym("car", SPCL);
        S_cdr := addsym("cdr", SPCL);
        S_atom := addsym("atom", SPCL);
        S_eq := addsym("eq", SPCL);
        S_eofp := addsym("eofp", SPCL);
        S_rplaca := addsym("rplaca", SPCL);
        S_rplacd := addsym("rplacd", SPCL);
        S_gensym := addsym("gensym", SPCL);
        S_read := addsym("read", SPCL);
        S_prin1 := addsym("prin1", SPCL);
        S_print := addsym("print", SPCL);
        S_error := addsym("error", SPCL);
        S_load := addsym("load", SPCL);
        S_gc := addsym("gc", SPCL);
        S_suspend := addsym("suspend", SPCL);
        S_symbols := addsym("symbols", SPCL);
        reset();
end

Main Program and REPL

A statement block without an associated function is the main program and entry point of a T3X module.

The main program first extracts the first command line argument (using T.GETARG) and activates PASS mode when it equals "−". Hence "kl −" can be used to run KILO LISP 22 in batch ("pass") mode with I/O connected to the SYSIN and SYSOUT descriptors.

After initializing the internal state of the system an image file is loaded. When an image file name is specified as the first command line argument, then the given file is loaded. When there is no argument or the argument equals "−" then the image file "klisp" is loaded. A non-existent file will be ignored and no image will be loaded. A corrupt or incompatible image file will cause a fatal error.

Before entering the "REPL" (the read-eval-print loop), the interpreter is heralded and the number of free nodes is reported (unless in PASS mode).

The REPL itself is simple. It

resets the internal state;
clears the message at the bottom of the screen;
prints a prompt;
reads an expression X;
(when an error occurred, starts over);
(when X is EOT, exits);
posts the message "[EVAL]";
evaluates the expression X, giving Y;
(when an error occurred, starts over);
prints the result Y;
binds the result Y to the symbol IT;
starts over.

do var a::14, c;
        c := t.getarg(1, a, 14);
        Pass := 0;
        if (c > 0 /\ strequ(a, "-")) Pass := %1;
        init();
        fasload(c>0 /\ \Pass-> a: "klisp");
        pr("KILO LISP "); pr(@Magic::2); nl();
        if (\Pass) do
                pr(ntoa(gc()));
                pr(" NODES"); nl();
        end
        while (1) do
                reset();
                message("");
                pr("*\s");
                Acc := read();
                if (Errflag) loop;
                if (Acc = EOT) leave;
                message("[EVAL]");
                Acc := eval(Acc);
                if (Errflag) loop;
                print(Acc);
                Car[Cdr[S_it]] := Acc;
                nl();
        end
        if (Pass) nl();
        fini(0);
end

The KILO LISP 22 source code continues with the LISP part of the system.

contact | privacy