C. Why Procedure Calls Have a Bad Reputation
The very notion of "procedure call" in a programming context was only worked out painfully after computers had already existed for some time. Indeed, the idea of automatically linked library procedures met with considerable opposition when first proposed. [Hop73] Since procedure calling instructions were not planned ahead of time in the way that arithmetic, conditional, and branch operations were, one would assume they were implemented on early computers in a rather clumsy fashion. While the basic arithmetic and conditional jump instructions have changed little in nature over the years, one can trace an evolution of special instructions used for procedure calls. Machines before 1960 (for example the IBM 1620 and 704) typically had only one instruction (if any) for subroutine calling, which saved the instruction counter in a register or in the first memory location of the called subroutine. The PDP-1 had three instructions which were all variants on this theme. The IBM 360 had only one such instruction (with two addressing modes), but the programmer had the additional choice of which register to store the program counter into. As far as I know offhand, stack instructions did not generally appear in non-stack-oriented machines until the mid-1960's, in such machines as the PDP-6, which in addition to PUSHJ
offered three other subroutine-calling instructions;
in retrospect, this seems to have been more out of uncertainty as to which would be used than out of necessity for a variety of instructions, for only two of the four are used at all extensively now on the PDP-10 (I am ignoring the "UUO" mechanism as not being a general subroutine-calling device). The PDP-ll (1970 or so) is the earliest machine I know of which was not essentially stack oriented (as some early Burroughs machines were) but which provided only a stack-pushing subroutine call instruction.
The interesting thing is that all of these examples have attempted to compress the procedure call operation into a single instruction. As may be inferred from the discussion above and in [Ste76b], this isn't necessarily the right way to do it. The procedure call may conceptually be divided into three phases: push a return address if necessary, set up arguments, go to called procedure. (Those familiar with spaghetti stacks [Bob73] will recognize this sequence as "create a frame, set up the arguments in the frame, go to called procedure".) It is important to note that the first step naturally comes first, and that it is conditional (but for lexically scoped programs this condition can be determined at compile time for any given procedure call as described above). The mistake that we make is to attempt to combine the first and third steps unconditionally into a single, universal instruction. The result is that the return address is always pushed whether it is necessary or not.
(It is appropriate to note here that the procedure call instruction might not itself push the return address onto the stack. It might put it into a register, in which case that register's previous contents must first be pushed, in general, as there are only a finite number of registers. Another case, often used in FORTRAN implementations, is that every procedure has a location reserved in memory for holding the return address for that procedure. This does not permit recursion, and wastes memory space compared to the use of a stack, because if recursion is not permitted the total stack space could not exceed the number of distinct procedures anyway.)
Compiler writers have often simplified their job at the expense of the procedure call by adopting certain standard protocols. One of these is that the called procedure should save all registers that it uses. This is in turn often simplified to "save all registers". It is seldom the case that all of these registers actually need to be saved; indeed, in statement-oriented