I’m primarily a graphics code guy. Recently I burned for more than month trying to write a synth with Saida for a 4k demo. I started from scratch, not knowing anything at all and the final version of the synth has been written 6 times. Its been long and painful and there is really very little written on this subject so here is an article to help those of you going through the same thing.
Learning to write a synth often requires learning what a synth is first. I had no idea what a voice, oscillator or tracker were 6 weeks ago. Now I’m fairly clear but knowing before I started might have made reading articles much easier. I hope this article helps you.
Lets first identify the parts of a 4k synth that you might need. Most of this is my own terminology but hopefully its not too far from what you need to know to read articles. Simply put there are two parts to a synth, something that generates sounds and something that plays music. I distinguish sounds from music. Apart from the obvious reason (a bell is a sound but its hardly music) its a useful way to think about your synth. I’ve heard amazing music produced from simple sounds like windows boot sound cut into pieces and on the other hand, I’m perfectly able to create really bad music if you give me the best sonud generator in the world!
Sound components of a synth:
Music components of a synth:
There is a third element to a synth project, tools. Saida created two tools for use by musicians for the synth. The first created voices, we call it the instrument builder. It allows the musician to combine sounds in meaningful ways. The second is a tracker which allows a musician to create music, or, if you like, a sequence of notes. This is the most important tool for musicians. Often, outside of the 4k world, they dont create instruments at all but use samples (recordings of instruments). However this is 4k and samples are out of the question. So, tools:
Lets say right away there are two basic options for sound generation: midi or roll-your-own. There are still 4k demos that use midi ( Purple Pills )though generally this is frowned upon and will result in bad votes most of the time. Midi gives us a set of voices or instruments to use and so we dont need to write a synth core or fx engine and we dont need to create voices.
NOTE: For a time, DirectSound changed everything. 80% of the 4k intros at assembly 2006 appeared to have used DirectSound with gm.dls which is a general midi sample file standard on all windows platforms. It seems to originate from Roland synths and is high quality (or at least better than normal midi), and gm.dls with clever postprocessing from directsound is still a very acceptable sound source today. The rest of this article, however, will focus primarily on building a synthesizer. If you would like to see some information regarding gm.dls and how to use it, iq has written a fantastic article here.
Of course nothing worth doing is easy so non-gm.dls midi is almost entirely rejected at 4k now. Instead we create our own synth core and fx engine. The ruling 4k synth seems to be the one by Mentor and can be heard in productions like receptor & elevated.
Notes must be stored, perhaps with volume and certainly with voices (which instrument is the note played on). We may also wish to indicate where the note is to be played (left, right, rear?). There are maybe two choices here. Firstly we could consider midi which is not only a set of voices but also has commands for playing specific frequency notes with all of the above information.
The important point is that you dont have to use midi voices to use midi commands for tracking.
Infact using midi tracking data is quite common for 4k synths. Sources are available for such synths (see resource section).
The second choice is to use your own format. In the end we chose this technique. By doing so we could get far more density of notes using hand crafted compression than we observe in typical 4ks these days. One example might be that a drum track has simple on-off beats. The drum is the same frequency and volume every time it is hit. This could be represented as a simple binary pattern (1000111011001010). Relying on the compressor to do this would be unwise as this is could be a very poor sequence of bytes to compress.
For a midi solution, playback is simply submitting the correct midi commands at the right times to the midi player (your soundblaster card). You would create a separate thread with high priority and play the commands in real time. Easy. In fact here is an example of using this technique.
For our own synth engine though we would most likely chose either directsound API, or the much simpler Playsound. Whenever you see 4k demos sitting for a while before playing, its a good bet they are using Playsound. Playsound works by taking an entire wav file of sound and playing it in one go. Thus at the start of the demo its necessary to create this wav file. This means a long time waiting for the wav file buffer to fill up.
A better option is to use DirectSound and provide buffers to it. Again, a high priority thread would fill buffers, almost in real time, perhaps with a safety margin at the start and submit those to DirectSound. Directsound also has some fx built in such as stereo reverb. Northern Dragons use this technique in their synth. One very strong advantage here is that if short buffers are used, synhcronisation between sound and graphics is easy. This is much harder with the playsound technique.
Your instrument builder will be proprietry most likely. However the tracker is more complicated. Musicians have their own favourites, such as http://www.flstudio.com/. It is possible to develop plugins for these trackers for which the musicians will love you but be careful. There are several problems with this approach:
We tried this and rejected it for these two reasons. However one great advantage of doing it this way is that to include the output music data into your 4k as a large array, you simply include it as a header file and let crinkler or your favourite crunching tool do its work.
Another advantage for the musician is a familiar interface.
Your other option here is something proprietry. Here you can really control the bytes, even offering a byte count to help your musician to know where he or she is going. This is perhaps the only advantage in something proprietary, but at 4k, its a big one.
Probably at 4k you will be coding graphics in C or perhaps asm. I know 5 synths designed for 4k and, with one exception, they have all been recoded in asm. If you know my articles at all, you know I try to do everything in C. However, in the synth core, there are routines written in asm by Saida just to save bytes. Overall, where we introduced asm, we saved about 10% of bytes. I guess, but dont know, that the saving would be larger (15%?) if everything was recoded in asm as we could load registers intelligently with data.
For the tools, Saida chose delphi. This allowed very quick development of the GUI but did cause one problem. I use GCC, under windows, for the synth core and we needed to create a DLL for the delphi to access. Many hours were wasted trying to get the dll to load into delphi. There was very little documented on the web on this problem. After fruitlessly tuning compile flags, defines and so forth, I eventually discovered that removing all const and static declarations from function headers and parameters allowed delkphi to use the dll.
See my article on coding dlls for delphi elsewhere.
It is possible to create a synth with just integers. For example the synth in anorgatronikum could all be done in integer form (I dont know if it was, but it could be). Its possible also to create oldskool sounds from pure integer arithmetic (chippy, if thats a good word). Integers are also fine for high frequency (high pitch) sounds.
However we chose floating point accuracy for three reasons:
However, although we chose float, I have a nagging doubt. Its possible to limit the range of the floating point numbers used in a synth quite dramatically. For example you may only ever do floating point calculations in the range -1..1. With such a limited range, scaled ints (or fixed point) would do a terrific job of maintaining accuracy. I’d be very intersted in chatting with you if you have experience in this area in your synth.
Floating point uses a lot of bytes compared to integer and we had to cut quite a few features from the synth core to make the core small enough.
The information I’ve been able to gather on 4ks suggests the following rough guidelines for compressed code:
Total: about 1.3k
These are rough and there are swings and roundabouts. For example, a really sophisticated renderer might mean less music data and vice-versa. Roughly speaking though, you can think of it as 50% of your bytes for sound generation and 50% for music.
A synth core can be built in two ways. Many 4ks use fixed functionality sounds (hardcoded if you will). For example, a drum can be hardcoded but have a number of parameters that allow it to sound differently. A few hard coded instruments with parameters (voice data) can go a long way at 4k.
An alternative is to try and reproduce a basic synthesiser. The components of which are roughly:
An oscillator is really just a mathematical function. Its best to think of it as having an input range of 0..1 and an output in the same range (though his could be any output range you chose). One example could be:
sinosc = (sin(freq) + 1.0f) /2.0f;
This returns the sin of the input but mapped to 0..1 range.
Its possible just to use sin. Many sin waves can produce great effects. Noise or square waves can be achieved with just sin waves. If you are trying to achieve ambient music, then sin may be enough. However, oscillators can be coded very cheaply and 7 are included in my synth at 600 bytes. Simple oscillators are sin, square wave, saw tooth, triangle, noise and so on.
We need more than one sound in our voices. One instance of an oscillator is no use. We need many of them and combine them in some way. Even if we only use a sin oscillator we need many tpo make intersting sounds. How many is many? Well 3-4 seems to suffice in 4k.
This part of a synth is not always given much thought. I’ve seen no articles on this. Standard fourier theory says that a good way to combine oscillators would be repeated addition of the following:
amplitude * oscillator(freq + offset) ie a * sin(bx + c);
Thats an awful lot of variables to store and compute. I tried this and decided that although “correct” in some sense, its too heavy for a 4k synth core in 600 bytes.
An alternative way is not to add the oscillators but to multiply them together. This is where oscillators returning a value between 0..1 is useful. This means the output of one oscillator can be multiplied by another and we are still in the same range. In this scenario, amplitude doesnt make sense, so we can cut that from the voice data.
However the problem is that if you keep multiplying functions in the range 0..1 together, things become very quiet very quickly. Worse, instruments vary in volume a lot depending how many oscillators they have in them.
In the end my solution was to add oscillators using a very simple equation:
finaloutput = osc1(freq) + osc2(freq) + osc3(freq) + osc4(freq); finaloutput = finaloutput * volume;
Nothing fancy. I added quite a few different oscillators to compensate a little but the real key was adding some cool effects.
One note. There is another way to combine oscillators, function of function. For example we could do:
finaloutput = osc1(osc2(osc3(osc4(freq))));
However I didnt explore this option except to put a tangent fucntion on the very outside. This gave soime amazing bounce to drum sounds but in the end I didnt use it.
Once again there are many solutions. The envelope is important. No matter waht oscillators are used, an envelope gives an overall style to your sound. Briefly an envelope is an amplitude multiplier. It makes your voices quiet when they start, loud in the middle and quiet when they end.
I cant say this too strongly. Make sure your voices start at zero volume, ramp up in volume and then ramp down again to zero volume. If not you will get clicking or slapping of sound and it will be offensive to listeners. Do NOT skip the envelope.
Firstly here is a typical full synth envelope: http://en.wikipedia.org/wiki/ADSR. It has four phases and is often shown with linear gradients. There are synthesisers at 4k which implement this. In my opinion its overkill and a waste of bytes at 4k.
Northern dragons describe their system. They ramp up linearly and then use a quarter sin wave to ramp down. As they say, it gives a nice non-linear fall off. The idea could be extended just to use half a sin wave for both ramp up and down. However, drums require volume to ramp up quickly so this would only work under some circumstances.
In our synth we simplified the adsr envelope to have attack, sustain and release only and allowed these to be linear or exponential in shape. The algorithm for doing this in very few bytes is one of the coolest parts of the synth. Storing a complete envelope was also possible in just one byte!
Effects are very important. They can be expensive or cheap. Most 4k synths try to include a hi-lo pass filter with cutoff and resonance. This isnt cheap. However I believe Northern Dragons were smart when they identified a few effects they used which are very cheap to code. In the end we chose not to go with hi-lo pass but to experiment a bit and go for cheap ones.
Some possible effects:
Try negating values or using binary operands on them before putting them into the final wav output!
You can go on endlessly so beware of the byte count.
If you create a midi synth, then voice data is really a matter of storing midi commands to set the correct voices. Midi commands are just integers so the format would reduce to just a few integers stored in a file. When creating your own sound components though, it is necessary to create a proprietry instrument format. Each synth will be different and so further general discussion is irrelevant. In our synth, we have instruments composed of banks of oscillators, running at different frequencies. We store the oscillator type and frequency together with some effects bits and length of the instrument. In all, we are around 10 bytes per instrument. This is not very good but works. After compression our instruments can be smaller. This means in 50-100 bytes we can store 6-12 instruments.