README - Book Version

This source file is part of the SubC compiler, which is described in the book

Practical Compiler Construction.

You might prefer to download the compiler source code. It is in the public domain.

	SubC Compiler, Version 2012-03-08
	By Nils M Holm, 2011--2012
	Placed in the public domain


	SubC is a compiler for a (mostly) strict and sane subset of
	C as described in "The C Programming Language", 2nd Ed.
	The language is also known informally as "ANSI C" or "C89".

	The compiler is described in great detail in the book
	"Practical Compiler Construction", which can be purchased
	at See the end of this text for ordering information.

	The SubC compiler can compile itself. Unlike many other small C
	compilers, it does not bend the rules, though. Its code passes
	"gcc -Wall -pedantic" with little or no warnings (depending on
	the gcc version).

	The compiler generates code for GAS-386, the GNU assembler
	for the 386 processor. Its runtime environment is designed
	to run on FreeBSD systems, but it should be easy to port to
	other unixish 32-bit systems. Even porting it to a 64-bit
	platform should not be too hard.

	SubC is fast and simple. Its output is typically small (due
	to a non-bloated library), but not very runtime efficient,
	because it employs none of the code synthesis or optimization
	strategies explained in the book.


	(From Practical Compiler Construction)

	o  The following keywords are not recognized:
	   auto, const, double, float, goto, long, register, short,
	   signed, struct, typedef, union, unsigned, volatile.
	o  There are only two data types: the signed int and the
	   unsigned char; there are also void pointers, and there
	   is limited support for int(*)() (pointers to functions
	   of type int).
	o  No more than two levels of indirection are supported, and
	   arrays are limited to one dimension, i.e. valid declarators
	   are limited to x, x[], *x, *x[], **x (and (*x)()).
	o  K\&R-style function declarations (with parameter
	   declarations between the parameter list and function body)
	   are not accepted.
	o  There are no ``register'', ``volatile'', or ``const''
	   variables. No register allocation takes place, so all
	   variables are implicitly ``volatile''.
	o  There is no typedef.
	o  There are no unsigned integers and no long integers.
	o  There are no structs or unions.
	o  Only ints, chars and arrays of int and char can be
	   initialized in their declarations; pointers can be
	   initialized with 0 (but not with NULL).
	o  Local arrays cannot have initializers.
	o  There are no local externs or enums.
	o  Local declarations are limited to the beginnings of function
	   bodies (they do not work in other compound statements).
	o  There are no static prototypes.
	o  Arguments of prototypes must be named.
	o  There is no goto.
	o  There are no parameterized macros.
	o  The #error, #if, #line, and #pragma
	   preprocessor commands are not recognized.
	o  The preprocessor does not recognize the # and ## operators.
	o  There may not be any blanks between the # that introduces
	   a preprocessor command and the subsequent command (e.g.:
	   "# define" would not be recognized as a valid command).
	o  The sizeof operator is limited to types and single
	   identifiers; the operator requires parentheses.
	o  The address of an array must be specified as "&array[0]"
	   instead of "&array" (but just "array" also works).
	o  Subscripting an integer with a pointer (e.g. 1["foo"]) is
	   not supported.
	o  Function pointers are limited to one single type, int(*)(),
	   and they have no argument types.
	o  There is no assert() due to the lack of parameterized macros.
	o  The atexit() mechanism is limited to one function (this may
	   even be covered by TCPL2).
	o  Environments of setjmp() have to be defined as
	   int[_JMP_BUFSIZ]; instead of jmp_buf due to the lack of
	o  FILE is an alias of int due to the lack of typedef.
	o  The signal() function returns int due to the lack of a more
	   sophisticated type system; the return value must be casted to
	o  Most of the time-related functions are missing due to the lack
	   of structs; in particular: asctime(), gmtime(), localtime(),
	   mktime(), and strftime().
	o  The clock() function is missing, because CLOCKS_PER_SEC
	   varies among systems.
	o  The ctime() function ignores the time zone.


	On a FreeBSD system just type "make".

	Without "make" the compiler can be bootstrapped by running:

	cc -o scc0 *.c

	To compile and package the runtime library:

	./scc0 -c lib/*.c
	ar -rc lib/libscc.a lib/*.o
	ranlib lib/libscc.a

	To compile the startup module:

	as -o lib/crt0.o lib/crt0.s

	To test the compiler either run "make test" or perform the
	following steps:

	./scc0 -o scc1 *.c
	./scc1 -o scc *.c
	cmp scc1 scc

	There should not be any differences between the scc1 and scc


	If you want to install the SubC compiler on your system, you
	will have to change the SCCDIR variable, which points to the
	base directory containing the SubC headers and runtime library.
	SCCDIR defaults to "." and can be overridden on the command

	./scc1 -o scc -D 'SCCDIR="$INSTALLDIR"' *.c

	(where $INSTALLDIR is where the compiler will be installed.)

	You can place the 'scc' executable wherever you want. The
	headers go to $INSTALLDIR/include, the library 'lib/libscc.a'
	and the startup module 'lib/crt0.o' go to $INSTALLDIR/lib.

	To test the installation just re-compile the compiler:

	rm scc && scc -o scc *.c


	For a thorough explanation of the internals of the compiler,
	including theoretical backgrounds, see my book

	Practical Compiler Construction

	which can be bought at     (Paperback)     (PDF)


	Send feedback, suggestions, etc to:

	n m h @ t 3 x . o r g

	See for current ways through my
	spam filter.

contact  |  privacy