# Scheme: An Interpreter for Extended Lambda Calculus/Section 5

Scheme: An Interpreter for Extended Lambda Calculus

Section 5: The Implementation of the Interpreter

## Section 5: The Implementation of the Interpreter

Here we present a real live SCHEME interpreter. This particular version was written primarily for expository purposes; it works, but not as efficiently as possible. The "production version" of SCHEME is coded somewhat more intricately, and runs about twice as fast as the interpreter presented below.

The basic idea behind the implementation is think machine language. In particular, we must not use recursion in the implementation language to implement recursion in the language being interpreted. This is a crucial mistake which has screwed many language implementations (e.g. Micro-PLANNER [Sussman]). The reason for this is that if the implementation language does not support certain kinds of control structures, then we will not be able to effectively interpret them. Thus, for example, if the control frame structure in the implementation language is constrained to be stack-like, then modelling more general control structures in the interpreted language will be very difficult unless we divorce ourselves from the constrained structures at the outset.

It will be convenient to think of an implementation machine which has certain operations, which are "micro-coded" in LISP; these are used to operate on various "registers", which are represented as free LISP variables. These registers are:

`**EXP**`
The expression currently being evaluated.
`**ENV**`
A pointer to the environment in which to evaluate `EXP`.
`**CLINK**`
A pointer to the frame for the computation of which the current one is a subcomputation.
`**PC**`
The "program counter". As each "instruction" is executed, it updates `**PC**` to point to the next instruction to be executed.
`**VAL**`
The returned value of a subcomputation. This register is not saved and restored in `**CLINK**` frames; in fact, its sole purpose is to pass values back safely across the restoration of a frame.
`**UNEVLIST**`, `**EVLIS**`
These are utility registers which are part of the state of the interpreter (they are saved in `**CLINK**` frames). They are used primarily for evaluation of components of combinations, but may be used for other purposes also.
`**TEM**`
A super-temporary register, used for random purposes, and not saved in `**CLINK**` frames or across interrupts. It therefore may not be used to pass information between "instructions" of the "machine", and so is best thought of as an internal hardware register.
`**QUEUE**`
A list of all processes other than the one currently being interpreted.
`**TICK**`
A magic register which a "hardware clock" sets to `T` every so often, used to drive the scheduler.
`**PROCESS**`
This register always contains the name of the process currently swapped in and running.

The following declarations and macros are present only to make the compiler happy, and to make the version number of the SCHEME implementation available in the global variable `VERSION`.

```(DECLARE (SPECIAL **EXP** **UNEVLIS** **ENV** **EVLIS** **PC** **CLINK** **VAL** **TEM**
**TOP** **QUEUE** **TICK** **PROCESS** **QUANTUM**
VERSION LISPVERSION))

(DEFUN VERSION MACRO (X)
(COND (COMPILER-STATE (LIST 'QUOTE (STATUS UREAD)))
(T (RPLACA X 'QUOTE)
(RPLACD X (LIST VERSION))
(LIST 'QUOTE VERSION))))

(SETQ VERSION ((LAMBDA (COMPILER-STATE) (VERSION)) T))
```

The function `SCHEME` initializes the system driver. The two `SETQ`'s merely set up version numbers. The top level loop itself is written in SCHEME, and is a `LABELS` which binds the function `**TOP**` to be a read-eval-print loop. The LISP global variable `**TOP**` is initialized to the closure of the `**TOP**` function for convenience and accessibility to user-defined functions.

```(DEFUN SCHEME ()
(SETQ VERSION (VERSION)  LISPVERSION (STATUS LISPVERSION))
(TERPRI)
(PRINC '|This is SCHEME |)
(PRINC VERSION)
(PRINC '| running in LISP |)
(PRINC LISPVERSION)
(SETQ **ENV** NIL  **QUEUE** NIL
**PROCESS** (CREATE!PROCESS '(**TOP** '|SCHEME -- Toplevel|)))
(SWAPINPROCESS)
(ALARMCLOCK 'RUNTIME **QUANTUM**)
(MLOOP))

(SETQ **TOP**
'(BETA (LAMBDA (**MESSAGE**)
(LABELS ((**TOP1**
(LAMBDA (**IGNORE1** **IGNORE2** **IGNORE3**)
(**TOP1** (TERPRI) (PRINC '|==> |)
(**TOP1** (TERPRI) (PRINC **MESSAGE**) NIL)))
NIL))
```

When the LISP alarmclock tick occurs, the global register `**TICK**` is set to `T`. `**QUANTUM**`, the amount of runtime between ticks, is measured in micro-seconds.

```(DEFUN SETTICK (X) (SETQ **TICK** T))

(SETQ **QUANTUM** 1000000.  ALARMCLOCK 'SETTICK)
```

`MLOOP` is the main loop of the interpreter. It may be thought of as the instruction dispatch in the micro-code of the implementation machine. If an alarmclock tick has occurred, and interrupts are allowed, then the scheduler is called to switch processes. Otherwise the "instruction" specified by `**PC**` is executed via `FASTCALL`.

```(DEFUN MLOOP ()
(DO ((**TICK** NIL)) (NIL)       ;DO forever
(AND **TICK** (ALLOW) (SCHEDULE))
(FASTCALL **PC**)))
```

`FASTCALL` is essentially a `FUNCALL` optimized for compiled "microcode". Note the way it pulls the `SUBR` property to the front of the property list if possible for speed.

```(DEFUN FASTCALL (ATSYM)
(COND ((EQ (CAR (CDR ATSYM)) 'SUBR)
(T ((LAMBDA (SUBR)
(COND (SUBR (REMPROP ATSYM 'SUBR)
(PUTPROP ATSYM SUBR 'SUBR)
(SUBRCALL NIL SUBR))
(T (FUNCALL ATSYM))))
(GET ATSYM 'SUBR)))))
```

Interrupts are allowed unless the variable `*ALLOW*` is bound to `NIL` in the current environment. This is used to implement the `EVALUATE!UNINTERRUPTIBLY` primitive.

```(DEFUN ALLOW ()
((LAMBDA (VCELL)
(T T)))
(ASSQ '*ALLOW* **ENV**)))
```

Next comes the scheduler. It is apparently interrupt-driven, but in fact is not. The key here is to think microcode! There is one place in the microcoded instruction interpretation loop which checks to see if there is an interrupt pending; in our "machine", this occurs in `MLOOP`, where `**TICK**` is checked on every cycle. This is another case where we must beware of using too much of the power of the host language; just as we must avoid using host recursion directly to implement recursion, so we must avoid using host interrupts directly to implement interrupts. We may not modify any register during a host language interrupt, except one (such as `**TICK**`) which is specifically intended to signal interrupts. Thus, if we were to add an interrupt character facility to SCHEME similar to that in MacLISP [Moon], the MacLISP interrupt character function would merely set a register like `**TICK**` and dismiss; `MLOOP` would eventually notice that this register had changed and dispatch to the interrupt handler. All this implies that the "microcode" for the interrupt handlers does not itself contain critical code that must be protected from host language interrupts.

When the scheduler is invoked, if there is another process waiting on the process queue, then the current process is swapped out and put on the end of the queue, and a new process swapped in from the front of the queue. The process stored on the queue consists of an atom which has the current frame and `**VAL**` register on its property list. Note that the `**TEM**` register is not saved, and so cannot be used to pass information between instructions.

```(DEFUN SCHEDULE ()
(COND (**QUEUE**
(SWAPOUTPROCESS)
(NCONC **QUEUE** (LIST **PROCESS**))
(SETQ **PROCESS** (CAR **QUEUE**)
**QUEUE** (CDR **QUEUE**))
(SWAPINPROCESS)))
(SETQ **TICK** NIL)
(ALARMCLOCK 'RUNTIME **QUANTUM**))

(DEFUN SWAPOUTPROCESS ()
(PUTPROP **PROCESS** **VAL** 'VAL))

(DEFUN SWAPINPROCESS ()
**VAL** (GET **PROCESS** 'VAL))
(RESTORE))
```

Primitive operators are LISP functions, i.e. `SUBR`s, `EXPR`s, and `LSUBR`s.

```(DEFUN PRIMOP (x) (GETL x '(SUBR EXPR LSUBR)))
```

`SAVEUP` conses a new frame onto the `**CLINK**` structure. It saves the values of all important registers. It takes one argument, `RETAG`, which is the instruction to return to when the computation is restored.

```(DEFUN SAVEUP (RETAG)
```

`RESTORE` restores a computation from the `CLINK`. The use of `TEMP` is a kludge to optimize the compilation of the "microcode".

```(DEFUN RESTORE ()
(PROG (TEMP)
(ERROR '|PROCESS RAN OUT - RESTORE|
**EXP**
'FAIL-ACT))
**EXP** (CAR TEMP)
TEMP (CDR TEMP)
**UNEVLIS** (CAR TEMP)
TEMP (CDR TEMP)
**ENV** (CAR TEMP)
TEMP (CDR TEMP)
**EVLIS** (CAR TEMP)
TEMP (CDR TEMP)
**PC** (CAR TEMP)
TEMP (CDR TEMP)
```
This is the central function of the SCHEME interpreter. This "instruction" expects `**EXP**` to contain an expression to evaluate, and `**ENV**` to contain the environment for the evaluation. The fact that we have arrived here indicates that `**PC**` contains `'AEVAL`, and so we need not change `**PC**` if the next instruction is also to be `AEVAL`. Besides the obvious objects likes numbers, identifiers, `LAMBDA` expressions, and `BETA` expressions (closures), there are also several other objects of interest. There are primitive operators (LISP functions); `AINT`s (which are to SCHEME as `FSUBR`s like `COND` are to LISP); and `AMACRO`s, which are used to implement `DO`, `AND`, `OR`, `COND`, `BLOCK`, etc.
```(DEFUN AEVAL ()
(COND ((ATOM **EXP**)
(COND ((NUMBERP **EXP**)
(SETQ **VAL** **EXP**)
(RESTORE))
((PRIMOP **EXP**)
(SETQ **VAL** **EXP**)
(RESTORE))
((SETQ **TEM** (ASSQ **EXP** **ENV**))
(RESTORE))
(T (SETQ **VAL** (SYMEVAL **EXP**))
(RESTORE))))
((ATOM (CAR **EXP**))
(COND ((SETQ **TEM** (GET (CAR **EXP**) 'AINT))
(SETQ **PC** **TEM**))
((EQ (CAR **EXP**) 'LAMBDA)
(SETQ **VAL** (LIST 'BETA **EXP** **ENV**))
(RESTORE))
((SETQ **TEM** (GET (CAR **EXP**) 'AMACRO))
(SETQ **EXP** (FUNCALL **TEM** **EXP**)))
(T (SETQ **EVLIS** NIL
**UNEVLIS** **EXP**
**PC** 'EVLIS))))
((EQ (CAAR **EXP**) 'LAMBDA)
(SETQ **EVLIS** (LIST (CAR **EXP**))
**UNEVLIS** (CDR **EXP**)
**PC** 'EVLIS))
(T (SETQ **EVLIS** NIL
**UNEVLIS** **EXP**
**PC** 'EVLIS))))
```

We come to `EVLIS` when a combination is encountered. The intention is to evaluate each component of the combination and then apply the resulting function to the resulting arguments. We use the register `**UNEVLIS**` to hold the list of components yet to be evaluated, and the register `**EVLIS**` to hold the list of evaluated components. We assume that these have been set up by `AEVAL`. Note that in the case of an explicit `LAMBDA` expression in the `CAR` of a combination `**UNEVLIS**` is initialized to be the list of unevaluated arguments and `**EVLIS**` is initialized to be the list containing the lambda expression.

`EVLIS` checks to see if there remain any more components yet to be evaluated. If not, it applies the function. Note that the primitive operators are applied using the LISP function `APPLY`. Note also how a `BETA` expression controls the environment in which its body is to be evaluated. `DELTA` expressions are `CATCH` tags (see `CATCH`). It is interesting that the evaluated components are collected in the reverse order from that which we need them in, and so we must reverse the list before applying the function. Do you see why we must not use side effects (e.g. the `NREVERSE` function) to reverse the list? Think about `CATCH`!

If there remain components yet to be evaluated, `EVLIS` saves up a frame, so that execution can be resumed at `EVLIS1` when the evaluation of the component returns with a value. It then sets up `**EXP**` to point to the component to be evaluated and dispatches to `AEVAL`.

```(DEFUN EVLIS ()
(COND ((NULL **UNEVLIS**)
(SETQ **EVLIS** (REVERSE **EVLIS**))
(COND ((ATOM (CAR **EVLIS**))
(SETQ **VAL** (APPLY (CAR **EVLIS**) (CDR **EVLIS**)))
(RESTORE))
((EQ (CAAR **EVLIS**) 'LAMBDA)
(SETQ **ENV** (PAIRLIS (CADAR **EVLIS**) (CDR **EVLIS**) **ENV**)
**PC** 'AEVAL))
((EQ (CAAR **EVLIS**) 'BETA)
(CDR **EVLIS**)
**PC** 'AEVAL))
((EQ (CAAR **EVLIS**) 'DELTA)
(RESTORE))
(T (ERROR '|BAD FUNCTION - EVARGLIST| **EXP** 'FAIL-ACT))))
(T (SAVEUP 'EVLIS1)
(SETQ **EXP** (CAR **UNEVLIS**)
**PC** 'AEVAL))))
```

The purpose of `EVLIS1` is to gobble up the value, passed in the `**VAL**` register, of the subexpression just evaluated. It saves this value on the list in the `**EVLIS**` register, pops off the unevaluated subexpression from the `**UNEVLIS**` register, and dispatches back to `EVLIS`.

```(DEFUN EVLIS1 ()
(SETQ **EVLIS** (CONS **VAL** **EVLIS**)
**UNEVLIS** (CDR **UNEVLIS**)
**PC** 'EVLIS))
```

Here is the code for the various `AINT`s. On arrival at the instruction for an `AINT`, `**EXP**` contains the expression whose functional position contains the name of the `AINT`. None of the arguments have been evaluated, and no new control frame has been created. Note that each `AINT` is defined by the presence of an `AINT` property on the property list of the LISP atom which is its name. The value of this property is the `LISP` function which is the first "instruction" of the `AINT`.

`EVALUATE` is similar to the LISP function `EVAL`; it evaluates its argument, which should result in a s-expression, which is then fed back into the SCHEME expression evaluator (`AEVAL`).

```(DEFPROP EVALUATE EVALUATE AINT)

(DEFUN EVALUATE ()
(SAVEUP 'EVALUATE1)
**PC** 'AEVAL))

(DEFUN EVALUATE1 ()
(SETQ **EXP** **VAL**
**PC** 'AEVAL))
```

`IF` evaluates its first argument, with a return address of `IF1`. `IF1` examines the resulting `**VAL**`, and gives either the second or third argument to `AEVAL` depending on whether the` **VAL**` was non-`NIL` or `NIL`.

```(DEFPROP IF IF AINT)

(DEFUN IF ()
(SAVEUP 'IF1)
**PC** 'AEVAL))

(DEFUN IF1 ()
(COND (**VAL** (SETQ **EXP** (CADDR **EXP**)))
(SETQ **PC** 'AEVAL))
```

As it was in the beginning, is now, and ever shall be: `QUOTE` without end. (Amen, amen.)

```(DEFPROP QUOTE AQUOTE AINT)

(DEFUN AQUOTE ()
(RESTORE))
```

`LABELS` merely feeds its second argument to `AEVAL` after constructing a fiendishly clever environment structure. This is done in two stages: first the skeleton of the structure is created, with null environments in the closures of the bound functions; next the created environment is clobbered into each of the closures.

```(DEFPROP LABELS LABELS AINT)

(DEFUN LABELS ()
(SETQ **TEM** (MAPCAR '(LAMBDA (DEF)
(LIST (CAR DEF)
(MAPC '(LAMBDA (VC) (RPLACA (CDDADR VC) **TEM**)) **TEM**)
(SETQ **ENV** (NCONC **TEM** **ENV**)
**PC** 'AEVAL))
```

We now come to the multiprocess primitives.

`CREATE!PROCESS` temporarily creates a new set of machine registers (by the lambda-binding mechanism of the host language), establishes the new process in those registers, swaps it out, and returns the new process id; returning causes the old machine registers to be restored.

```(DEFUN CREATE!PROCESS (EXP)
((LAMBDA (**PROCESS** **EXP** **ENV** **UNEVLIS** **EVLIS** **PC** **CLINK** **VAL**)
(SWAPOUTPROCESS)
**PROCESS**)
(GENSYM)
EXP
**ENV**
NIL
NIL
'AEVAL
(LIST NIL NIL NIL NIL 'TERMINATE NIL)
NIL))

(DEFUN START!PROCESS (P)
(COND ((OR (NOT (ATOM P)) (NOT (GET P 'CLINK)))
(ERROR '|BAD PROCESS -- START!PROCESS| **EXP** 'FAIL-ACT)))
(OR (EQ P **PROCESS**) (MEMQ P **QUEUE**)
(SETQ **QUEUE** (NCONC **QUEUE** (LIST P))))
P)

(DEFUN STOP!PROCESS (P)
(COND ((MEMQ P **QUEUE**)
(SETQ **QUEUE** (DELQ P **QUEUE**)))
((EQ P **PROCESS**) (TERMINATE)))
P)
```

`TERMINATE` is an internal microcode routine which terminates the current process. If the current process is the only one, then all processes have been stopped, and so a new SCHEME top level is created; otherwise `TERMINATE` pulls the next process off the scheduler queue and swaps it in. Note that we cannot use `SWAPINPROCESS` because a `RESTORE` will happen in `EVLIS` as soon as `TERMINATE` completes (this is a very deep global property of the interpreter, and a fine source of bugs; much care is required).

```(DEFUN TERMINATE ()
(COND ((NULL **QUEUE**)
(SETQ **PROCESS**
(CREATE!PROCESS '(**TOP** '|SCHEME -- QUEUEOUT|))))
(T (SETQ **PROCESS** (CAR **QUEUE**)
**QUEUE** (CDR **QUEUE**))))
(SETQ **VAL** (GET **PROCESS** 'VAL))
'TERMINATE-VALUE)
```

`EVALUATE!UNINTERRUPTIBLY` merely binds the variable `*ALLOW*` to `NIL`, and then evaluates its argument. This is why this primitive follows the scoping rules for variables!

```(DEFPROP EVALUATE!UNINTERRUPTIBLY EVALUATE!UNINTERRUPTIBLY AINT)

(DEFUN EVALUATE!UNINTERRUPTIBLY ()
(SETQ **ENV** (CONS (LIST '*ALLOW* NIL) **ENV**)
**PC** 'AEVAL))
```

`DEFINE` closes the function to be defined in the null environment, and installs the closure in the LISP value cell.

```(DEFPROP DEFINE DEFINE AINT)

(DEFUN DEFINE ()
(RESTORE))
```

`ASET` looks up the specified variable in the current environment, and clobbers the value cell in the environment with the new value. If the variable is not bound in the current environment, the LISP value cell is set. Note that `ASET` does not need to be an `AINT`, since it does not fool with order of evaluation; all it needs is access to the "machine register" `**ENV**`.

```(DEFUN ASET (VAR VALU)
(SETQ **TEM** (ASSQ VAR **ENV**))
(COND (**TEM** (RPLACA (CDR **TEM**) VALU))
(T (SET VAR VALU)))
VALU)
```

`CATCH` binds the tag variable to a `DELTA` expression which contains the current `CLINK`. When `AEVAL` applies such an expression as a function (of one argument), it makes the `**CLINK**` in the `DELTA` expression be the `**CLINK**`, places the value of the argument in `**VAL**`, and does a `RESTORE`. The effect is to return from the `CATCH` expression with the argument to the `DELTA` expression as its value (can you see why?).

```(DEFPROP CATCH ACATCH AINT)

(DEFUN ACATCH ()
**PC** 'AEVAL))
```

`PAIRLIS` is as in the LISP 1.5 Programmer's Manual [McCarthy][1].

```(DEFUN PAIRLIS (X Y Z)
(DO ((I X (CDR I))
(J Y (CDR J))
(L Z (CONS (LIST (CAR I) (CAR J)) L)))
((AND (NULL I) (NULL J)) L)
(AND (OR (NULL I) (NULL J))
(ERROR '|WRONG NUMBER OF ARGUMENTS - PAIRLIS|
**EXP**
'WRNG-NO-ARGS))))
```

`AMACRO`s are fairly complicated beasties, and have very little to do with the basic issues of the implementation of SCHEME per se, so the code for them will not be given here. `AMACRO`s behave almost exactly like MacLISP macros [Moon][2].

This is the end of the SCHEME interpreter!

## References

1. [McCarthy]
McCarthy, John, et al. LISP 1.5 Programmer's Manual. The MIT Press (Cambridge, 1965).
2. [Moon]
Moon, David A. MACLISP Reference Manual, Revision 0. Project MAC, MIT (Cambridge, April 1974).