C code for 4kb intros

Intro

You see a lot of discussion of compression, algorithmic design and PE header munging. However there is very little on how to code good tight C. Infact a search around the internet reveals very little information at all on this subject. Yet in experiments, I’ve found that I can save ~20% of compressed code space by writing C code specifically designed to compile to small assembler. Also in practice I have found that example source code demos do not follow such rules so there is probably a lack of knowledge about such things.

Firstly, four golden rules.

Goldens

Actually the truth is that something that seems obvious to a human is not obvious to a machine. Where possible do the optimisation yourself (yes, really).

There may be a flag to optimise for small code but its not where the compiler writers put most of their efforts. You need to do the optimisation yourself (yes, really).

Get used to life becoming more complex. No more For loops. 2D arrays are a bad thing. No more sloppy code, use every declarative construct (except register :-) that you can.

Not strictly true, and the law of diminishing returns kicks in, but hold to this principle with blind faith and the code size will miraculously drop and drop and drop. You will know when its not going to go any further but believe me its a long way from the code you first wrote.

C Rules (but doesn’t Rock)

Everything down here has been learnt when using GCC. VC++ may be smarter or have different rules so be careful.

static void myFunc (void) {};
constant float myarray[]={0.0,1.0,0.0,17.0};
--n; // NOT n--;
do {} while (--n); // NOT for loop or while loop

is the best sort of loop to write. One exception is that these two are the same size:

for (i=0; i<10; i++) {}
n = 10;
do { } while (n--); 

Indexing a 2d array introduces a multiply. 3d are worse.

a[3][3]=b[3][3]*d[2][2]; // This is VERY BAD
register int i;  // this has NO EFFECT

Anything done with shorts produces longer code than with ints. The exception would be calling glColor4ubv with stored byte variables (or even one integer :-)

do { *p++ = *q++ } while (--n);

(unless you use memcpy of course)

This reduces the load/stores as the compiler can keep a variable in a register

This allows the compiler to assign one area of memory for all your variables in one go, not multiple leading to much shorter code

The compiler may not spot that two constants can be combined unless they are moved together in the code. This means making them the same order of precedence of course too.

p=q=r=s=0.0;
pos+=vel+=acc;

This allows the compiler to load a constant value to a register just once.

if (a^b)...; // Not if (a==b)...
a &= ~b;     // Not a%=b; useful when b is a power of two only and a is monotonically increasing
a ^= a;      // Not a=0;

Go ahead look up strength reduction, there is a lot of compiler work on this and lots of tricks.

doSomething(3); // this is better than: 
doSomething(2); // i=3;
doSomething(1); // do {} while (i--);

The effect of this is variable but generally unrolling 2-3 times will be better than writing a loop, no matter how efficient your loop is.

x*=0.0008f; // not x*=0.0008;

Coding for compression

I have discovered very little here but some rules are:

In other words if you have an array: {0.1, 0.2, 0.11, 0.3, 0.29} perhaps it will work just as well with {0.1, 0.2, 0.1, 0.3, 0.3} I saved 15 bytes once by altering one number!

Improving compressibility by consistency

Something worth a note…

When trying to minimize the size of the compressed code, it is often useful to be consistent with the code. That is, choose to do the same things always with the same code, so that there’s more repetition for the compressor to take advantage from.

When coding in asm, this means things like “always use the same register for the same purpose”. However, with C you tend to lose the control because C compilers usually just allocate the registers in a predefined order. Some compilers, however, let the programmer give hints on what registers to use, and you can put these hints consistently into your source code.

An example with GCC and x86:

#define isCtr asm("ecx")
#define isAcc asm("eax")
register int i isCtr;
i = 256;
do {
   register int a isAcc;
   a = *s++^p;
   *d++ = a;
} while (--i);

Note: with GCC and x86, registers other than eax, ecx and edx will be saved with separate stack pushes, so you may want to recheck all manual allocations to registers other than these.

Checking all compiler optimizations individually also helps to improve compressibility. For example, the GCC flags -O2 and -Os activate the “schedule-insns” optimizations that reorder the instructions and therefore break the consistent chunks you have carefully built. (In the example code above, the incs get separated from the related memory reads and writes when using these optimizations). This may be one of the reasons why -O1 has been noted to be better than -Os.

–viznut

Some OpenGL Tricks