Fifteen dissection part two: Instruments

Posted by HEx 2014-02-19 at 04:45

[Previous instalments: part zero, part one.]

So we need to calculate PCM audio, and we have very little space to do it in. First we need some instruments.

I wrote a tool to play around with generating sounds algorithmically. (I've tidied it up a bit since the compo; here is the original for the curious. Particularly of note is the fact that sample values for both the old version and the entry itself were in the range -32768..32767, whereas now they're -1..1. Hence the code samples differ slightly.)

In the tune we have the following: lead, bass, bass drum, hihat, snare, two bongoids and two thud-type things.1


Melodic instruments can be pretty simple: just take a periodic waveform and decay it exponentially over time.

We start with a sine wave, which is pretty periodic. var freq = 440; return Math.sin(i * freq * 2*Math.PI / 32000);

And add some decay: var freq = 440, decay = 0.0004; return Math.sin(i * freq * 2*Math.PI / 32000) * Math.exp(-i*decay);

Sine waves sound really "thin", having just a single frequency. We need harmonics. A cheap way of doing this I discovered was to generate sinnx instead of sin x: higher values of n give more harmonics. Here's n=9, the value used for the lead melody:

var freq = 440, decay = 0.0004, exponent = 9; return Math.pow(Math.sin(i * freq * 2*Math.PI / 32000), exponent) * Math.exp(-i*decay);

Frequency isn't too useful by itself; instead we need pitches in more useful units such as semitones. The actual function I ended up with was: function r(o,p,f,e) { return 4000*Math.pow(Math.sin(o*Math.pow(2,(p-53)/12)),e) * Math.exp(-o * f); } Here o is the offset in samples from the start of the note, p is the pitch in semitones (adjusted so that useful values are just above zero), f is the decay constant, and e is the exponent. This generate sample values in the range -4000..4000, which will fit about 8 such notes in the 16-bit output range.

And the calling code: z += (t||(l==3))? (r(u, p[l], 3e-4,25) + 2*r(u, p[l]-24, 2e-4, 80)): r(u, p[l]-10?p[l]:13+((i/960)&2), 4e-4,9);

This takes some breaking down.

(t||(l==3)) This determines whether we want the bass sound (when channel == 3, or always during the thuddy chords) or the lead sound.

(r(u, p[l], 3e-4,25) + 2*r(u, p[l]-24, 2e-4, 80)) The bass sound is made up of two calls to r, one with exponent of 25, and one twice as loud and two octaves (24 semitones) lower with an exponent of 80. When the exponent is even the fundamental frequency is doubled so it's actually only one octave lower. The decay constants are slightly different so that the timbre changes over time.

r(u, p[l]-10?p[l]:13+((i/960)&2), 4e-4,9); This is the melody sound, previously dissected.

p[l]-10?p[l]:13+((i/960)&2) A hack for the mordent. p[l] is the pitch stored in the pattern data, which is updated every event, or 3840 samples. If the pitch is not equal to 10, then it is used directly. Pitch 10—a value not needed for the tune—is used as a sentinel to mean "pitch 13 for half an event, then pitch 15 for half an event".


The bass drum is simply a sine wave with descending frequency: return Math.sin(2e5/(i+1948)); The number 1948 was chosen so that after 3840 samples (one event) the result will be approximately zero to avoid clicking.

The remaining percussion (hi-hat and snare—I had to jettison the bongoids for space reasons) were something resembling white noise with exponential decay. I decided to avoid Math.random() in favour of some deterministic bit-twiddling.

Here's the snare: return ((8e10/(i+6e3))&14)*Math.exp(-i/900)/14;

The hat has three different variants with slightly different decay constants. Here's the longest: return ((8e10/(i+3e5)*(i*17|5)&511))*Math.exp(-i/1000)/500;

The percussion sequences are simple enough to be hard-coded. The hi-hats in context: var o = i%3840; var n = (i/3840) |0; return ((8e10/(o+3e5)*(o*17|5)&511)) * Math.exp(-o*(n%3+1)/1000)/500;

And finally the percussion in its entirety: var z = 0; var m = ((i / 3840 / 15) | 0) + 9; var o = i%3840; var n = (i/3840) |0; if(m<20) { /* hat */ if(m>9) z += ((8e10/(o+3e5)*(o*17|5)&511)) * Math.exp(-o*(n%3+1)/1000) * 8; /* bd, snare */ z += [m>11 && Math.sin(2e5/(o+1948))*10,0,m>15 && ((8e10/(o+6e3))&14) * Math.exp(-o/900),0][n&3] * 800; } return z / 32768;

So there you have it. Despite all of this, the instruments were easily the weakest part of the demo.

Still to come: pattern encoding.

[1] One of the fun things about making music electronically is that it's entirely possible to get by without knowing what your instruments are conventionally called, or whether they even have a physical counterpart at all.

Fifteen dissection part one: Audio in a browser

Posted by HEx 2013-09-27 at 21:44

[Part zero is here].

Phase 1 of making a music demo in javascript is to work out how to play js-generated PCM audio in a browser. There are two widely available APIs for synthesizing audio in real time, namely the Firefox-specific MozAudio and the much more heavyweight Web Audio (Chrome, Safari). Opera supports neither. Even neglecting Opera, supporting both would require separate code for each, which is a Really Bad Idea when the space constraints are this tight.1

So real time is out. It's possible to put a base64-encoded WAV in a data: URI and pass that to an audio element. This has been fairly widely exploited by this point, and works just about everywhere. is a good example of this at work. Neglecting the quantization noise that comes from using 8-bit samples, the main problem here is that notes are triggered using setInterval, which is not a precise timing method, and at least on my setup it sounds very juddery.

Which leaves the final option: generating the entire tune as a single WAV. This has its own problems: there's a delay at startup while megabytes of data are precalculated, the tune can't loop indefinitely (unless you use setInterval again, and that won't be seamless), and memory usage for storing the data: URI is quite high. (Some browsers (*cough* IE) place restrictions on the size of data: URIs too.) Still, it's the best we can do.

I settled on mono 16-bit 32kHz audio for a data rate of 64KiB/sec. (8-bit audio sounds terrible; see above.) Delightfully, browsers offer the ancient and arcane btoa() method for base64 encoding, which at 6 bytes can't be beat. Then new Audio('data:audio/wav;base64,"+btoa(header+pcmdata)).play(); will make noises. The data chunk is built up by iterating the following a few million times (z is a number in the range -32768..32767; the bitwise ops force integer conversion): pcmdata += String.fromCharCode(z&255,(z>>8)&255);2

Here is the WAV header:

00000000  52 49 46 46 24 00 00 01  57 41 56 45 66 6d 74 20  |RIFF$...WAVEfmt |
00000010  10 00 00 00 01 00 01 00  00 7d 00 00 00 00 00 00  |.........}......|
00000020  02 00 10 00 64 61 74 61  3a 61 75 64              |    |

To save space the header is stored as a raw string rather than base64-encoded, so we can't use any byte values greater than 0x7f as UTF-8 bloat would more than offset any gains from avoiding base64. Hence 32kHz (0x7d00), which is the highest common rate that is less than 32768. The lengths 0x01000024 and 0x01000000 are simply "sufficiently large" and wildly inaccurate. Similarly the four bytes after "data" are the data length, but since browsers don't check this, we reuse part of the "data:audio/wav" string to increase compression. (It's just as well there's no space to include an <audio> element as the seek bar would get very confused.)

Finally, I discovered at the last minute that submitting entries containing null bytes doesn't work. Sadly there was no time to do anything other than replace them with \0.

Next up: the calculation of those few million z values.

[1] Good news! In the six months since the contest ended, both Firefox and Opera are now shipping with Web Audio. So things will be different next year.

[2] String.fromCharCode? That's 19 bytes. Nineteen! For shame, javascript, for shame. Perl manages with three, and the parens are optional.

Fifteen dissection part zero: History

Posted by HEx 2013-04-14 at 15:58

This is part zero of a dissection of my recent JS1K submission Fifteen, a 1K javascript audio demo. In this part: history of the tune.

In August 2004 I wrote an unnamed tune using soundtracker. It got the temporary name "f", because it's in 15/8 time and 15 is 0xf in hex. As was my custom, snapshots got an incrementing version number stuck on the end, and the "final" version was called f4.xm. I never got round to properly naming or distributing it, but it seemed well received by the few friends I showed it to.

Here's the original xm (or rendered in-browser for your convenience).

Fast forward four years to July 2008. My friend Kinetic had just started serious hacking on a project he'd had in mind for a long while, namely modding the Amiga game Lemmings with new levels, graphics and music. Since he liked my tune, I set myself the challenge of squeezing it into the constraints that would allow playback within the game.

Lemmings has a particularly unsophisticated playroutine. Its capabilities are a tiny subset of those of Protracker: the only supported effect is "set volume", although an initial speed can be set. Only three channels are available as the fourth is reserved for sound effects; maximum 15 samples per tune, and no finetunes. In addition, the entire tune had to fit into 47000 bytes―the game ran on a 512K Amiga so memory was tight. Nonetheless, to my (and Kinetic's!) surprise and delight, I succeeded in making something that sounded very much like the original 8-channel tune.

Here's the 3-channel version (browser).1

And here's a video of it playing in Lemmings.

Fast forward another four years. So JS1K came round and I'd been musing over the idea of submitting something audio-related. Since rule number one of optimization is to have a stable starting point, I needed a tune already written. After my previous success, f4.xm seemed worth trying, although I was under no illusions that it would survive such a drastic excision unscathed.

Next up: so how do you squeeze something like this into 1K?

[1] Alert readers might spot that this file is larger than 47000 bytes. The game's internal file format stored a stream of three-byte (note, sample, volume) tuples with RLE of empty events, making the pattern data smaller than Protracker's encoding. Samples were of course uncompressed to allow Paula to suck them directly out of RAM.