Fifteen dissection part one: Audio in a browser

Posted by HEx 2013-09-27 at 21:44

[Part zero is here].

Phase 1 of making a music demo in javascript is to work out how to play js-generated PCM audio in a browser. There are two widely available APIs for synthesizing audio in real time, namely the Firefox-specific MozAudio and the much more heavyweight Web Audio (Chrome, Safari). Opera supports neither. Even neglecting Opera, supporting both would require separate code for each, which is a Really Bad Idea when the space constraints are this tight.1

So real time is out. It's possible to put a base64-encoded WAV in a data: URI and pass that to an audio element. This has been fairly widely exploited by this point, and works just about everywhere. http://js1k.com/2013-spring/demo/1388 is a good example of this at work. Neglecting the quantization noise that comes from using 8-bit samples, the main problem here is that notes are triggered using setInterval, which is not a precise timing method, and at least on my setup it sounds very juddery.

Which leaves the final option: generating the entire tune as a single WAV. This has its own problems: there's a delay at startup while megabytes of data are precalculated, the tune can't loop indefinitely (unless you use setInterval again, and that won't be seamless), and memory usage for storing the data: URI is quite high. (Some browsers (*cough* IE) place restrictions on the size of data: URIs too.) Still, it's the best we can do.

I settled on mono 16-bit 32kHz audio for a data rate of 64KiB/sec. (8-bit audio sounds terrible; see above.) Delightfully, browsers offer the ancient and arcane btoa() method for base64 encoding, which at 6 bytes can't be beat. Then new Audio('data:audio/wav;base64,"+btoa(header+pcmdata)).play(); will make noises. The data chunk is built up by iterating the following a few million times (z is a number in the range -32768..32767; the bitwise ops force integer conversion): pcmdata += String.fromCharCode(z&255,(z>>8)&255);2

Here is the WAV header:

00000000  52 49 46 46 24 00 00 01  57 41 56 45 66 6d 74 20  |RIFF$...WAVEfmt |
00000010  10 00 00 00 01 00 01 00  00 7d 00 00 00 00 00 00  |.........}......|
00000020  02 00 10 00 64 61 74 61  3a 61 75 64              |....data:aud    |

To save space the header is stored as a raw string rather than base64-encoded, so we can't use any byte values greater than 0x7f as UTF-8 bloat would more than offset any gains from avoiding base64. Hence 32kHz (0x7d00), which is the highest common rate that is less than 32768. The lengths 0x01000024 and 0x01000000 are simply "sufficiently large" and wildly inaccurate. Similarly the four bytes after "data" are the data length, but since browsers don't check this, we reuse part of the "data:audio/wav" string to increase compression. (It's just as well there's no space to include an <audio> element as the seek bar would get very confused.)

Finally, I discovered at the last minute that submitting entries containing null bytes doesn't work. Sadly there was no time to do anything other than replace them with \0.

Next up: the calculation of those few million z values.


[1] Good news! In the six months since the contest ended, both Firefox and Opera are now shipping with Web Audio. So things will be different next year.

[2] String.fromCharCode? That's 19 bytes. Nineteen! For shame, javascript, for shame. Perl manages with three, and the parens are optional.

Leave a comment

Comments