Linux (static executables and obsolete or niche tricks)

4K programming without libraries

Linux (and most other Unix-like systems) also has the advantage (compared to many other operating systems) that you don’t need to use any libraries at all if you don’t want to, because everyhing can be done through kernel-level system calls (the int opcode in x86 assembly, or syscall in 64-bit mode). This is a totally different approach than the previously mentioned one.

Advantages compared to using libraries

Disadvantages

How to do it

The exact mechanics of how to make a syscall can be found in the “Whirlwind Tutorial on Creating Teensy ELF Executables”.

Audio output can be achieved easily by creating a pipe (using the pipe syscall), forking, dup2 in the read end of the pipe to stdin and execveing the child to /usr/bin/aplay. Then, the parent process can simply write PCM data to the pipe, and sound will appear. Here is a prod that does just this.

The visual side is more complicated. Framebuffer devices (/dev/fd* for pixels, /dev/vcsa for tiles/’characters’, or direct /dev/dri/card0 access) are simple to access for pixel rendering, but they’re not very easy to access on desktops or laptops. Therefore, if you want to do decent visuals in a compatible way (and text terminal output is not enough for you), you must learn how to talk the X11 protocol in a low level. Basically, you establish a connection to localhost:6000, send the initial request, grab the id’s you require from the response, open a window, etc. etc. It may be a bit long-winded, but you also get a low-level access to extensions such as XVideo and GLX. However, if you want access to the GPU (and don’t want to code your own GPU drivers), this isn’t exactly an option as the GPU drivers need to be loaded from dynamic libraries.

Self-compiling

Another approach usable for 4K intros on Unix platforms is self-compiling. That is, you actually distribute the intro as an executable source code package that decompresses, compiles, executes and deletes the actual intro. A shell script stub attached to the compressed source code of an SDL-based intro could perhaps be something like this:

a=/tmp/I;zcat<<X>$a;cc `sdl-config --libs --cflags` -o $a. $a;$a.;rm $a.;exit

Advantages

Disadvantages

Fighting against the C compiler

See the main Linux page

Dynamic ELF, using libraries

As ‘vanilla’ ELF files can get quite large because of the import and relocation tables, a few tricks need to be used to decrease the file size.

Approaches

postprocessing GNU ld output

This is arguably the easiest and most stable option, but isn’t the most effective, size-wise. However, ld doesn’t like to output small binaries, so you have to bully it a little bit:

  1. Use a custom linker script that merges the .text (code) and .rodata (readonly data) sections, discards .gnu.warning, .note.*, .gnu_debuglink, .comment, .gnu.attributes, .eh_frame*, .gcc_except*, .gnu_extab*, .exception*, etc. Get the default linker script using -Wl,-v, then modify it. See the GNU ld docs for the syntax.
  2. Remove symbol version data using objcopy -R .gnu.version -R .gnu.version_r.
  3. Run noelfver on the output of objcopy. This actually disables the use of the versioning info.
  4. strip -s. This removes symbol and debug data etc, but leaves the section headers in. Also removes remaining versioning data now that it is no longer used thanks to noelfver.
  5. sstrip -z to also remove the section headers (from ELF Kickers)
  6. chmod -x

Using a special-purpose minimizing linker

See the main Linux page

Compression

A quick and easy way to compress your executable is to use a shell script dropper: a small shell script that unpacks and runs itself, followed directly by your compressed intro.

Using a shell dropper, you can compress your intro using only tools of which decompressors are installed by default – gzip, xz, lzma, bzip2, etc. Gzip (gzip -cnk9 for compression, zcat for decompression) usually isn’t very good – zopfli might get you slightly better compression results, but xz (xz for compression, xzcat for decompression, see manpage for flags) is usually much better. However, standard xz files have large headers compared to gzip. Therefore, it’s a better idea to use xz in LZMA1 mode (xz --lzma1), which has slightly worse compression, but much smaller headers. With these small sizes, LZMA1 usually ends up performing slightly better than xz because of this.

The smallest shell dropper I’ve come across is this one, which takes 42 bytes:

cp $0 /tmp/M;(sed 1d $0|xzcat)>$_;$_;exit
<compressed data>

Of course, this assumes that /tmp is executable. One could change the /tmp/M to just M to put it in the current directory, but that isn’t guaranteed to be writable either!

However, there is a known ‘fix’ for this: using an in-memory dropper written in assembly. vondehi is currently the best of those. It basically does the same things as the shell script, except everything is performed in RAM. Simply take the output of the NASM file and append your compressed data.

If you want to tune the compression parameters to get a slightly smaller output file, there’s a script called autovndh.py that bruteforces all options, and optionally also prepends a vondehi stub as well.

This still isn’t the best compression ratio possible, however. As Crinkler on Windows performs much better because of its use of a PAQ-style algorithm (as opposed to LZMA), the tooling isn’t finished yet. Work has been done towards reaching a Crinkler equivalent, as there are some obsolete tools that do this, as well as a current work-in-progress project called XLINK by unlord/xylem. The latter isn’t usable yet, though.