This source file is part of the SubC compiler, which is described in the book
Practical Compiler Construction.
You might prefer to download the compiler source code. It is in the public domain.
SubC Compiler, Version 2012-03-08 By Nils M Holm, 2011--2012 Placed in the public domain SUMMARY SubC is a compiler for a (mostly) strict and sane subset of C as described in "The C Programming Language", 2nd Ed. The language is also known informally as "ANSI C" or "C89". The compiler is described in great detail in the book "Practical Compiler Construction", which can be purchased at Lulu.com. See the end of this text for ordering information. The SubC compiler can compile itself. Unlike many other small C compilers, it does not bend the rules, though. Its code passes "gcc -Wall -pedantic" with little or no warnings (depending on the gcc version). The compiler generates code for GAS-386, the GNU assembler for the 386 processor. Its runtime environment is designed to run on FreeBSD systems, but it should be easy to port to other unixish 32-bit systems. Even porting it to a 64-bit platform should not be too hard. SubC is fast and simple. Its output is typically small (due to a non-bloated library), but not very runtime efficient, because it employs none of the code synthesis or optimization strategies explained in the book. DIFFERENCES BETWEEN SUBC AND FULL C89 (From Practical Compiler Construction) o The following keywords are not recognized: auto, const, double, float, goto, long, register, short, signed, struct, typedef, union, unsigned, volatile. o There are only two data types: the signed int and the unsigned char; there are also void pointers, and there is limited support for int(*)() (pointers to functions of type int). o No more than two levels of indirection are supported, and arrays are limited to one dimension, i.e. valid declarators are limited to x, x[], *x, *x[], **x (and (*x)()). o K\&R-style function declarations (with parameter declarations between the parameter list and function body) are not accepted. o There are no ``register'', ``volatile'', or ``const'' variables. No register allocation takes place, so all variables are implicitly ``volatile''. o There is no typedef. o There are no unsigned integers and no long integers. o There are no structs or unions. o Only ints, chars and arrays of int and char can be initialized in their declarations; pointers can be initialized with 0 (but not with NULL). o Local arrays cannot have initializers. o There are no local externs or enums. o Local declarations are limited to the beginnings of function bodies (they do not work in other compound statements). o There are no static prototypes. o Arguments of prototypes must be named. o There is no goto. o There are no parameterized macros. o The #error, #if, #line, and #pragma preprocessor commands are not recognized. o The preprocessor does not recognize the # and ## operators. o There may not be any blanks between the # that introduces a preprocessor command and the subsequent command (e.g.: "# define" would not be recognized as a valid command). o The sizeof operator is limited to types and single identifiers; the operator requires parentheses. o The address of an array must be specified as "&array[0]" instead of "&array" (but just "array" also works). o Subscripting an integer with a pointer (e.g. 1["foo"]) is not supported. o Function pointers are limited to one single type, int(*)(), and they have no argument types. o There is no assert() due to the lack of parameterized macros. o The atexit() mechanism is limited to one function (this may even be covered by TCPL2). o Environments of setjmp() have to be defined as int[_JMP_BUFSIZ]; instead of jmp_buf due to the lack of typedef. o FILE is an alias of int due to the lack of typedef. o The signal() function returns int due to the lack of a more sophisticated type system; the return value must be casted to int(*)(). o Most of the time-related functions are missing due to the lack of structs; in particular: asctime(), gmtime(), localtime(), mktime(), and strftime(). o The clock() function is missing, because CLOCKS_PER_SEC varies among systems. o The ctime() function ignores the time zone. COMPILING THE COMPILER On a FreeBSD system just type "make". Without "make" the compiler can be bootstrapped by running: cc -o scc0 *.c To compile and package the runtime library: ./scc0 -c lib/*.c ar -rc lib/libscc.a lib/*.o ranlib lib/libscc.a To compile the startup module: as -o lib/crt0.o lib/crt0.s To test the compiler either run "make test" or perform the following steps: ./scc0 -o scc1 *.c ./scc1 -o scc *.c cmp scc1 scc There should not be any differences between the scc1 and scc executables. INSTALLING THE COMPILER If you want to install the SubC compiler on your system, you will have to change the SCCDIR variable, which points to the base directory containing the SubC headers and runtime library. SCCDIR defaults to "." and can be overridden on the command line: ./scc1 -o scc -D 'SCCDIR="$INSTALLDIR"' *.c (where $INSTALLDIR is where the compiler will be installed.) You can place the 'scc' executable wherever you want. The headers go to $INSTALLDIR/include, the library 'lib/libscc.a' and the startup module 'lib/crt0.o' go to $INSTALLDIR/lib. To test the installation just re-compile the compiler: rm scc && scc -o scc *.c FURTHER INFORMATION For a thorough explanation of the internals of the compiler, including theoretical backgrounds, see my book Practical Compiler Construction which can be bought at Lulu.com: http://www.lulu.com/content/12610903 (Paperback) http://www.lulu.com/content/12685672 (PDF) CONTACT Send feedback, suggestions, etc to: n m h @ t 3 x . o r g See http://t3x.org/contact.html for current ways through my spam filter.