Rabbit: A Compiler for Scheme/Chapter 7

From Wikisource
Jump to: navigation, search

[Page 47]

37 7. The Imperative Treatment of Applicative Constructs Given the characteristics of lexical scoping and tail-recursive invocations, it is possible to assign a peculiarly imperative interpretation to the applicative constructs of SCHEME, which consists primarily of treating a function call as a GOTO. More generally, a function call is a GOTO that can pass one or more items to its target; the special case of passing no arguments is precisely a GOTO. It is never necessary for a function call to save a return address of any kind. It is true that return addresses are generated, but we adopt one of two other points of view, depending on context. One is that the return address, plus any other data needed to carry on the computation after the called function has returned (such as previously computed intermediate values and other return addresses) are considered to be packaged up into an additional argument (the continuation) which is passed to the target. This lends itself to a nonfunctional interpretation of LAMBDA, and a method of expressing programs called the continuation-passing style (similar to the message-passing actors paradigm [Hewitt]), to be discussed further below. The other view, more intuitive in terms of the traditional stack implementation, is that the return address should be pushed before evaluating arguments rather than before calling a function. This view leads to a more uniform function-calling discipline, and is discussed in [Declarative] and [Steele].

We are led by these points of view to consider a compilation strategy in which function calling is to be considered very cheap (unlike the situation with PL/I and ALGOL, where programmers avoid procedure calls like the plague see [Steele] for a discussion of this). In this light the code produced by the sample macros above does not seem inefficient, or even particularly convoluted.

Consider the expansion of (OR a b c):

[Page 48]

38 ((LAMBDA (V R) (IF V V (R))) ?LAMBDA () ((LAMBDA (V R) (IF V V (R))) ?LAMBDA () ((LAMBDA (V R) (IF V V (R))) (LAMBDA () 'Nil-)))))) Then we might imagine the following (slightly contrived) compilation scenario.

First, for expository purposes, we shall rename the variables in order to be able to distinguish them.

((LAMBDA (V1 Rl) (IF V1 V1 (Rl))) (LAMBDA () ((LAMBDA (VZ RZ) (IF VZ VZ (RZ))) ?LAMBDA () ((LAMBDA (V3 R3) (IF V3 V3 (R3))) (LAMBDA () 'NIL)))))) we shall assign a generated name to each LAMBDA-expression, which we shall notate by writing the name after the word LAMBDA. These names will be used as tags in the output code.

((LAMBDA namel (Vl Rl) (IF V1 VI (R1))) a (LAMBDA nameZ () ((LAMBDA name3 (VZ RZ) (IF VZ VZ (RZ))) b (LAMBDA name4 () ((LAMBDA name5 (V3 R3) (IF V3 V3 (R3))) (LAMBDA name6 () 'NIL)))))) Next, a simple analysis shows that the variables Rl, RZ, and R3 always denote the LAMBDA-expressions named nameZ, name4, and name6, respectively. Now an optimizer might simply have substituted these values into the bodies of namel, name3, and name5 using the rule of beta-conversion, but we shall not apply that technique here. Instead we shall compile the six functions in a straightforward manner. we make use of the additional fact that all six functions are closed in identical

[Page 49]

39 environments (we count two environments as identical if they involve the same variable bindings, regardless of the number of "frames" involved; that is, the environment is the same inside and outside a (LAMBDA () ...)). Assume a simple target machine with argument registers called regl, regZ, etc. main: LOAD reg2,[nameZ] CALL-FUNCTION Z,[namel] namel: JUMP-IF-NIL reg1,namela RETURN namela: CALL-FUNCTION 0,reg2 nameZ: LOAD reg2,[name4] CALL-FUNCTION Z,[name3] name3: JUMP~IF-NIL regl,name3a RETURN name3a: CALL-FUNCTION 0,reg2 named: <code for c) LOAD reg2,[name6] CALL-FUNCTION Z,[name5] name5: JUMP-IF-NIL reg1,name5a RETURN name5a: CALL-FUNCTION 0,reg2 name6: LOAD regl,'NIL RETURN Now we make use of our knowledge that functions, and convert CALL-FUNCTION of ;result in regl ;[nameZ] is the closure for nameZ ;call namel with 2 arguments ;return the value in regl ;call function in regZ, 0 arguments ;result in regl ;[name4] is the closure for name4 ;call name3 with Z arguments ;return the value in regl ;call function in regZ, 0 arguments ;result in regl ;[name6] is the closure for name6 ;call name5 with 2 arguments ;return the value in regl ;call function in regZ, 0 arguments ;constant NIL in regl certain variables always denote certain a known function to a simple GOTO. (We have actually done things backwards here; in practice this knowledge is used before generating any code. We have fudged over this issue here, but will return to it later. Our purpose here is merely to demonstrate the treatment of function calls as GOTOs.) main:  ;result in regl LOAD reg2,[nameZ] GOTO namel ;[name2] is the closure for nameZ

[Page 50]

40 namel: JUMP-IF-NIL reg1,namela RETURN ' ;return the value in regl namelaz GOTO nameZ nameZ:  ;result in regl LOAD regZ,[name4] ;[name4] is the closure for name4 GOTO name3 name3: JUMP-IF-NIL reg1,name3a RETURN ;return the value in regl name3a: GOTO name4 name4:  ;result in regl LOAD regZ,[name6] ;[name6] is the closure for name6 GOTO name5 name5: JUMP-IF-NIL reg1,name5a RETURN ;return the value in regl name5a: GOTO name6 name6: LOAD reg1,'NIL ;constant NIL in regl RETURN The construction [foo] indicates the creation of a closure for foo in the current environment. This will actually require additional instructions, but we shall ignore the mechanics of this for now since analysis will remove the need for the construction in this case. The fact that the only references to the variables Rl, RZ, and R3 are function calls can be detected and the unnecessary LOAD instructions eliminated. (Once again, this would actually be determined ahead of time, and no LOAD instructions would be generated in the first place. All of this is determined by a general pre-analysis, rather than a peephole post-pass.) Moreover, a GOTO to a tag which immediately follows the GOTO can be eliminated.

[Page 51]

41 main:  ;result in regl namel: JUMP-IF-NIL regl,namela RETURN ;return the value in regl namela: nameZ:  ;result in regl name3: JUMP-IF-NIL reg1,name3a RETURN ;return the value in regl name3a: name4:  ;result in regl name5: JUMP-IF-NIL regl,name5a RETURN ;return the value in regl name5a: name6: LOAD reg1,'NIL ;constant NIL in regl RETURN This code is in fact about what one would expect out of an ordinary LISP compiler. (There is admittedly room for a little more improvement.) RABBIT indeed produces code of essentially this form, by the method of analysis outlined here.

Similar considerations hold for the BLOCK macro. Consider the expression (BLOCK a b c); conceptually this should perform a, b, and c sequentially. Let us examine the code produced: '

((LAMBDA (A B) (B)) (LAMBDA () ((LAMBDA (A B) (B)) b (LAMBDA () C)))) Renaming the variables and assigning names to LAMBDA-expressions:

((LANBDA namel (Al Bl) (Bl)) (LAMBDA nameZ () ((LAMBDA name3 (AZ BZ) (BZ)) b (LAMBDA name4 () c)))) Producing code for the functions:

[Page 52]

main: namel nameZ: name3 named 42 (code for a> LOAD regZ,[name2] CALL-FUNCTION Z,[nanel] CALL-FUNCTION 0,regZ (code for b> LOAD regZ,[nale4] CALL-FUNCTION Z.[n||e3] CALL-FUNCTION 0.rag2 (code for c> RETURN Turning general function calls into direct Go's, on the basis of analysts of what variables must refer to constant functions: main: (code for a> LOAD regZ,[nameZ] GOTO namel name) GOTO nameZ nameZ: LOAD regZ.[name4] GOTO name! name3: GOTO name4 name4 RETURN Eliminating useless GOTO and LOAD instructions: main: (code for a) namel name2: (code for b) nama3: named (code for c> RETURN What more could one ask for?

Notice that this has fallen out of a general strategy involving only an approach to compiling function calls, and has involved no special knowledge of OR

[Page 53]

43 or BLOCK not encoded in the macro rules. The cases shown so far are actually special cases of a more general approach, special in that all the conceptual closures involved_are closed in the same environment, and called from places that have not disturbed that environment, but only used "registers". In the more general case, the environments of caller and called function will be different.

This divides into two subcases, corresponding to whether the closure was created by ax simple LAMBDA or by a LABELS construction. The latter involves circular references, and so is somewhat more complicated; but it is easy to show that in the former case the environment of the caller must be that of the (known) called function, possibly with additional values added on. This is a consequence of lexical scoping. As a result, the function call can be compiled as a GOTO preceded by an environment adjustment which consists merely of lopping off some leading portion of the current one (intuitively, one simply 'pops the unnecessary crud off the stack"). LABELS-closed functions also can be treated in this way, if one closes all the functions in the same way (which RABBIT presently does, but this is not always desirable). If one does, then it is easy to see the effect of expanding a PROG into a giant LABELS as outlined in [Imperative] and elsewhere: normally, a GOTO to a tag at the same level of PROG will involve no adjustment of environment, and so compile into a simple GOTO instruction, whereas a GOTO to a tag at aux outer level of PROG probably will involve adjusting the environment from that of the inner PROG to that of the outer. All of this falls out of the proper imperative treatment of function calls.