Posted by HEx 2019-02-18 at 20:22

So I wrote a flag[0] for BSides Leeds 2019.

The aim was simple: load and display an image from an audio source on a BBC micro, as inspired by an Easter egg in the recent Black Mirror episode Bandersnatch which did the same on the Spectrum. But I'm nothing if not a perfectionist so I went kind of nuts on the size optimization, turning what should have taken an hour or two into a week-long odyssey.

There's also a tune. More on that later.

An aside: assembly fragments in this article are mostly generated by beebasm, which has all-caps mnemonics and hex digits, uses Beeb-style & to mean hexadecimal, and marks labels with a preceding dot. OS disassemblies are by dxa, which has lower-case mnemonics and hex digits, uses $ to mean hexadecimal, and suffixes labels with a colon. Code is on github for those following along at home.

Stage 0: the header

Acorn's OS stores files on tape in blocks of up to 256 bytes.[1] Each block has a header, containing the filename, block number, the load and execution address (in every block, although only meaningful for the first), and a CRC.[2]

Since we'd like to minimize loading time this is overhead we'd really like to dispense with, so our first order of business will be to replace the OS's tape loading routines with our own that have none of this guff. Happily, now that all the mundane annoyances of using cassette tapes as a reliable data storage medium (wobble, hiss, dropouts, &c.) have vanished into history this works just fine.

Milestones achieved: prints our filename, and its length in (hex) bytes.

2a synchronisation byte, indicates start of header 42 65 65 62 53 69 64 65 73 00 filename "BeebSides", zero-terminated 00 05 ff ff load address: &FFFF0500 (FFs mean "load in I/O space if on a second processor") 09 05 ff ff exec address: &FFFF0509 00 00 block number: 0 75 00 block length: &75 80 flags: &80 for "last block of the file" e1 8c e2 00 reserved bytes, see later 96 cc CRC16: &CC96

Stage 1: data block

The actual block of data we promised would follow the header. Ideally we'd like only one block, which restricts us to 256 bytes of code. But this is doable.

Milestones achieved: OS beeps to indicate loading has finished, turns the tape off, and starts running our code!

Maybe. See, there's a kind of hitch. We don't actually know what the OS is gonna do. It depends upon what the user asked for. Now ideally we'd like it to have just loaded our code at the address we specified, and now go and execute it (again at the address we specified). Indeed it's possible to enforce that this is the only thing the OS will do with our code, by setting a "protection" bit in the header. This ensures that loading-but-not-executing it, loading it somewhere else, loading only part of it &c. are all verboten, and also does some convenient other things to ensure you can't easily read what was just loaded, such as disabling aborting loading with the Escape key, and ensuring that all memory gets zeroed out if you reset the machine by pressing Break.[3]

Sadly, we're not gonna do that. It's unlikely that the command for run-this-machine-code-program would be the first thing someone would try. Instead, approximately 99% of software on tape had a BASIC program at the beginning, and was run by asking BASIC, not the OS. BASIC would load the file into the BASIC program storage area starting at PAGE, ignoring the load address, and proceed to look for BASIC instructions to interpret. Almost everyone who ever used tapes on a Beeb will have used the command to do this, whereas it was entirely possible to get by without knowing that there even existed other ways of running a program.

So with a protected file, people would likely try load-as-BASIC first and get the I'm-afraid-I-can't-do-that-Dave error Locked. And as a CTF at a security conference such an error is less likely to elicit a reaction of "guess I shouldn't do that then" and more "so how do I unlock it?", the answer to which Google is perfectly happy to provide[4]. And thus would people be led astray.

Instead, we're going for the more user-friendly approach of making a BASIC program that just calls our machine code. TOP is a BASIC variable with the address of the end of the current program, where we can stash our code, and CALL TOP will jump there. But since BASIC can load it anywhere[5] we need position-independent code, which the 6502 doesn't exactly excel at. Happily, we can find our current address by calling a subroutine and looking in the stack debris for the address that the CPU pushed and subsequently returned to. PAGE is 256-byte aligned (hence the name) so we only need the high byte, and from there we can construct a loop to copy ourselves to a known address and resume from there.

What subroutine do we call? We can't call any of our own code (because we don't know where it is yet) so that leaves OS API calls. We ask the OS to acknowledge an escape condition, which turns off the beep begun when loading finished. Since we're about to disable interrupts the OS will never have the chance to unbeep otherwise.

.basic 0500 0D ; start of BASIC line 0501 07 E3 ; line number &7E3 = 2019 0503 07 ; line length 0504 D6 ; token for CALL 0505 B8 ; token for TO 0506 50 ; ascii P 0507 0D ; start of BASIC line 0508 FF ; end of program marker .reloc 0509 38 SEC ; execution starts here 050A 66 FF ROR &FF ; set top bit of &FF (escape flag) 050C A9 7E LDA #&7E 050E 20 F4 FF JSR &FFF4 ; OSBYTE 126 0511 A0 00 LDY #&00 ; 0513 84 B9 STY &B9 ; zero low byte of pointer 0515 BA TSX 0516 BD 00 01 LDA &0100,X ; get high byte of PC from stack 0519 85 BA STA &BA ; and store into high byte of pointer .relocloop 051B B1 B9 LDA (&B9),Y ; copy byte 051D 99 00 05 STA &0500,Y 0520 C8 INY 0521 D0 F8 BNE relocloop 0523 4C 26 05 JMP main

Then we turn the tape back on, clear the screen and print the empty progress bar.

.main ; main code starts here 0526 78 SEI 0527 A9 85 LDA #&85 0529 8D 10 FE STA &FE10 052C A9 D5 LDA #&D5 052E 8D 08 FE STA &FE08 ; turn tape on 0531 A0 0A LDY #vduend-vdutab 0533 84 9F STY progress_count .vduloop 0535 B9 6B 05 LDA vdutab,Y ; fetch character 0538 20 CB FF JSR nvwrch ; and print it 053B 88 DEY 053C 10 F7 BPL vduloop ; loop until done [...] .vdutab ; string to be printed. read backwards! 056B 5B ; print "[". cursor left in place for further printing 056C 1F 11 1F ; move cursor to (17,31) 056F 5D ; print "]" 0570 1F 3E 1F ; move cursor to (62,31) 0573 00 16 ; set screen mode 0 (80x32 characters) .vduend

Now, this takes some explaining. Portable tape recorders in the 1980s typically had a remote input whereby recording or playback could be paused by a remote switch, often on an external microphone. Computers used this to enable the tape motor iff they were expecting to read or write data. This was useful to prevent users having to remember to stop the tape themselves or risk losing their tape position or even erasing the entire rest of the tape by accident.

Obviously playback from a device without such a remote control (such as the audio output from almost any other device, be it a laptop, phone, stereo, &c.) would end up with the audio continuing after loading finished with no means for the computer to stop it. But emulators all implement remote control for their virtual tapes. Since clearing the screen takes a non-negligible amount of time, and this code was developed on an emulator with remote control but was intended to be played back to real hardware from a device without remote control, we turn the tape on immediately so that calibrating how long a delay would be necessary while the CPU was busy clearing is possible without needing to test on real hardware.

Interlude: clearing the screen

We're using the high-resolution monochrome screen mode for the logo, which is 20K of data to clear. How does the clearing happen? The screen memory is located from &3000 to &8000, and we want to set it all to a given byte. The obvious way is the following 6502 code:

2000 A2 30 LDX #&30 ; start from &3000 2002 8E 09 20 STX clearloop+2 ; changes instruction below to "STA &3000,X" 2005 A2 00 LDX #0 .clearloop 2007 9D 00 EE STA &EE00,X ; EE gets overwritten 200A E8 INX 200B D0 FA BNE clearloop 200D EE 09 20 INC clearloop+2 ; increment high byte 2010 10 F5 BPL clearloop ; loop while it's less than &80 2012 60 RTS

This weighs in at 19 bytes and ~205k clock cycles, or about 100 milliseconds on a 2MHz processor. (It's also self-modifying, which rules it out for inclusion in ROM. Self-modifying code is common on 6502-based systems: it's often both faster and smaller than the alternatives.)

Instead, the Beeb has the following loop, unrolled an astounding eighty times and reproduced here in its entirety:

cc02 9d 00 30 sta $3000,x ; store byte to &3000+X cc05 9d 00 31 sta $3100,x ; ... cc08 9d 00 32 sta $3200,x cc0b 9d 00 33 sta $3300,x cc0e 9d 00 34 sta $3400,x cc11 9d 00 35 sta $3500,x cc14 9d 00 36 sta $3600,x cc17 9d 00 37 sta $3700,x cc1a 9d 00 38 sta $3800,x cc1d 9d 00 39 sta $3900,x cc20 9d 00 3a sta $3a00,x cc23 9d 00 3b sta $3b00,x cc26 9d 00 3c sta $3c00,x cc29 9d 00 3d sta $3d00,x cc2c 9d 00 3e sta $3e00,x cc2f 9d 00 3f sta $3f00,x cc32 9d 00 40 sta $4000,x cc35 9d 00 41 sta $4100,x cc38 9d 00 42 sta $4200,x cc3b 9d 00 43 sta $4300,x cc3e 9d 00 44 sta $4400,x cc41 9d 00 45 sta $4500,x cc44 9d 00 46 sta $4600,x cc47 9d 00 47 sta $4700,x cc4a 9d 00 48 sta $4800,x cc4d 9d 00 49 sta $4900,x cc50 9d 00 4a sta $4a00,x cc53 9d 00 4b sta $4b00,x cc56 9d 00 4c sta $4c00,x cc59 9d 00 4d sta $4d00,x cc5c 9d 00 4e sta $4e00,x cc5f 9d 00 4f sta $4f00,x cc62 9d 00 50 sta $5000,x cc65 9d 00 51 sta $5100,x cc68 9d 00 52 sta $5200,x cc6b 9d 00 53 sta $5300,x cc6e 9d 00 54 sta $5400,x cc71 9d 00 55 sta $5500,x cc74 9d 00 56 sta $5600,x cc77 9d 00 57 sta $5700,x cc7a 9d 00 58 sta $5800,x cc7d 9d 00 59 sta $5900,x cc80 9d 00 5a sta $5a00,x cc83 9d 00 5b sta $5b00,x cc86 9d 00 5c sta $5c00,x cc89 9d 00 5d sta $5d00,x cc8c 9d 00 5e sta $5e00,x cc8f 9d 00 5f sta $5f00,x cc92 9d 00 60 sta $6000,x cc95 9d 00 61 sta $6100,x cc98 9d 00 62 sta $6200,x cc9b 9d 00 63 sta $6300,x cc9e 9d 00 64 sta $6400,x cca1 9d 00 65 sta $6500,x cca4 9d 00 66 sta $6600,x cca7 9d 00 67 sta $6700,x ccaa 9d 00 68 sta $6800,x ccad 9d 00 69 sta $6900,x ccb0 9d 00 6a sta $6a00,x ccb3 9d 00 6b sta $6b00,x ccb6 9d 00 6c sta $6c00,x ccb9 9d 00 6d sta $6d00,x ccbc 9d 00 6e sta $6e00,x ccbf 9d 00 6f sta $6f00,x ccc2 9d 00 70 sta $7000,x ccc5 9d 00 71 sta $7100,x ccc8 9d 00 72 sta $7200,x cccb 9d 00 73 sta $7300,x ccce 9d 00 74 sta $7400,x ccd1 9d 00 75 sta $7500,x ccd4 9d 00 76 sta $7600,x ccd7 9d 00 77 sta $7700,x ccda 9d 00 78 sta $7800,x ccdd 9d 00 79 sta $7900,x cce0 9d 00 7a sta $7a00,x cce3 9d 00 7b sta $7b00,x cce6 9d 00 7c sta $7c00,x cce9 9d 00 7d sta $7d00,x ccec 9d 00 7e sta $7e00,x ccef 9d 00 7f sta $7f00,x ccf2 e8 inx ; increment X ccf3 f0 70 beq exit ; and branch if it wraps around to zero ccf5 6c 5d 03 jmp ($035d) ; loop again [...] cd65 exit: cd65 60 rts ; return

This takes about 105k cycles, which is almost twice the speed of the first routine. The downside is that it's over ten times the size at 246 bytes, and that's excluding the initialization code (not shown) that sets up the jump vector at &035D.[6]

To put this in perspective: the BBC Micro's operating system is 15616 bytes (16K minus three pages for memory-mapped I/O). Thus over 1.5% of the OS is devoted to making clearing the screen fast.

This code has another interesting characteristic, namely that it clears the screen nonlinearly. Addresses ending in &00 are cleared first, followed by addresses ending in &01, and so on. This is easily demonstrated with the following program:

10 MODE 1 320x256 graphics, four colours (20K screen memory) 20 *FX16 30 *FX151,70,64 slow down the machine so we can see what's happening 40 *FX151,69,1 50 *FX11 disable autorepeat so we can still type afterwards 60 COLOUR 135:CLS clear to white and black alternately 70 COLOUR 128:CLS 80 GOTO 60

Click to emulate

Eighty STAs gives eighty starting points. Hold down a key to pause.[7]

Also demonstrated is the odd screen memory layout, a consequence of the Beeb's use of the character-oriented Motorola MC6845 address generator chip.[8] Each character line is 8 pixels high. Acorn's later Archimedes machines ditched this layout for a linear framebuffer, which is now universal.

The BBC Master had a different, more conventional screen clearing routine (again some initialization omitted):

cb9a loop: cb9a 91 d8 sta ($d8),y ; write a byte to screen pointer at D8-D9 cb9c c8 iny ; and increment cb9d 91 d8 sta ($d8),y ; write another byte (2x unrolled loop) cb9f c8 iny ; and increment cba0 d0 f8 bne loop ; loop if low byte hasn't wrapped to zero cba2 e6 d9 inc $d9 ; increment high byte of pointer cba4 ca dex ; decrement high byte of counter cba5 10 f3 bpl loop ; and loop if still >=0 cba7 60 rts ; return

Smaller and more flexible than the original Beeb's, this code nonetheless loses some of the blazing speed, taking about 195k cycles—about as fast as the self-modifying version.

Stage 2: load the decompressor

After the OS has cleared the screen for us we repeatedly call a routine to wait for and return a byte read from tape.

This routine does a couple of other things, as it will be called by other code further down the line. Every 35th byte read it prints a # (also ASCII 35: reusing the constant results in slightly smaller code, and we can adjust the length of the progress bar so that the number of bytes we have to read fills it up exactly). It also checks a flag to see whether the music has been loaded yet, and if so interleaves checking the tape with polling the music player. To avoid having to explicitly initialise the flag we reuse a location that we had to set to zero as part of the earlier relocation loop.

.loadinitial 053E 20 4D 05 JSR get_crunched_byte 0541 99 00 04 STA &0400,Y ; read decompressor 0544 88 DEY 0545 D0 F7 BNE loadinitial 0547 20 1E 04 JSR decrunch ; decompress music player 054A 4C 09 28 JMP runner ; and jump to it .get_crunched_byte ; read byte from tape 054D 08 PHP 054E C6 9F DEC progress_count 0550 D0 07 BNE poll_things 0552 A9 23 LDA #&23 ; '#' .nop_this_out 0554 20 CB FF JSR nvwrch ; this gets modified later 0557 85 9F STA progress_count .poll_things 0559 A5 B9 LDA music_loaded 055B F0 03 BEQ nomusic 055D 20 9F 27 JSR poll ; poll the music player .nomusic 0560 AD 08 FE LDA &FE08 ; check for byte from tape 0563 4A LSR A 0564 90 F3 BCC poll_things ; no? repeat 0566 AD 09 FE LDA &FE09 ; yes? return byte read 0569 28 PLP 056A 60 RTS

In total we read 255 bytes from tape (hence only needing a single 8-bit register as a counter) and store them verbatim (albeit backwards; another space-saving measure). This is our decompression code. Then we jump there.

Milestones achieved: our progress bar is about a fifth of the way along. Not much to see.

Stage 3: decompressing the music player

The decompressor is the wonderful Exomizer. This uses an LZ77-based algorithm that, when given a stream of bytes, will decompress to a buffer in memory. The target location is stored at the beginning of the data stream, and decompression is self-terminating (i.e. it knows when it's done without needing to be told explicitly). Exo has other modes of operation such as outputting a byte stream which need extra temporary storage, but in our case no other buffers are needed for the decompression other than 3 52-byte tables.

We locate the tables in zero page, which is faster and takes less code to address than elsewhere in memory. This brings our total decompression code size down to 258 bytes. So close! Another 3-byte table (pre-initialized, hence usually part of the decompression code) can be moved elsewhere. But where? It could be part of the earlier data block, but instead we elect to make use of four "reserved" bytes in the block header, which is stored from &3B2-&3D0.[9] Now we're down to the 255 bytes mentioned previously.

Exo will call our "read a byte from tape" routine whenever it needs more data, and fill the output buffer from the end backwards. The music player is 2114 bytes, which compresses down to 1275 bytes, and will end up stored from &2000 to &2842.

1275+255 is 1530 bytes, which should take our progress bar all the way to the end. 1530/35 is 43.7, which means we want it to be 44 characters wide, which fits nicely on an 80-column screen.

Milestones: we're done with the progress bar.

Stage 4: decompressing the image

Since the logo is nearly square, we use a custom screen mode of 576×272 pixels. This has 16 pixels of vertical overscan but since most of the logo pixels at the top and bottom are near the centre this looks OK on a CRT with rounded corners. 576×272 at 1 bit per pixel is 19584 bytes, which is smaller than the 20480 bytes for the normal 640×256 screen. We keep screen memory at &3000, but since our framebuffer is smaller now the progress bar at the bottom is no longer within it and handily vanishes as soon as we reprogram the 6845.

We also turn shadow screen memory off, disabling the feature on later machines where the framebuffer could be paged out to leave more RAM for user applications.

We want to turn the progress bar off, so we patch the opcode within the much-abused "get byte from tape" routine of the call to print the #. Before we had JSR nvwrch (20 CB FF), which calls the OS-provided "write character to screen" routine. After, we have BIT nvwrch (2C CB FF), which does nothing useful other than change processor flags. (We save and restore the flags anyway as Exo doesn't like them changing out from under it.) Using BIT as a no-op this way was common in 6502 code where size was at a premium, and was used in Microsoft BASIC among other places.

We initialise the music player (whose workspace just so happens to overlap the "music loaded" flag...) and call Exomizer again to decompress the image. The image compresses quite well, from 19584 bytes to just 6225. Since exo decompresses backwards, it loads from the bottom up (leaving the all-important flag text till last).

Once we're done decompressing, we turn the tape off and just call the music player in a loop.

.runner 2809 A0 0A LDY #10 .regloop 280B 8C 00 FE STY &FE00 ; select 6845 register 280E B9 37 28 LDA regs-1,Y ; load value from table 2811 8D 01 FE STA &FE01 ; and send to the 6845 2814 88 DEY 2815 D0 F4 BNE regloop ; loop while >1 2817 A9 0F LDA #&0F 2819 8D 34 FE STA &FE34 ; turn off shadow RAM 281C A9 2C LDA #&2C ; opcode for BIT with 2-byte address 281E 8D 54 05 STA nop_this_out ; disable progress bar 2821 20 3E 27 JSR playinit ; initialize music player 2824 20 1E 04 JSR decrunch ; load image 2827 A9 45 LDA #&45 2829 8D 10 FE STA &FE10 ; turn off tape .runloop 282C A9 40 LDA #&40 282E 2C 4D FE BIT &FE4D 2831 F0 F9 BEQ runloop ; wait for timer 2833 20 9F 27 JSR poll ; poll music player 2836 D0 F4 BNE runloop ; and loop (always) .regs ; register table, 1-indexed 2838 48 ; 1: 72 horizontal characters displayed (=72*8=576 pixels) 2839 5E ; 2: 94 horizontal sync position 283A 28 ; 3: 40 horizontal sync width 283B 26 ; 4: 38+1=39 vertical character lines (=39*8=312 lines) 283C 00 ; 5: 0 vertical adjust 283D 22 ; 6: 34 vertical characters displayed (=34*8=272 pixels) 283E 24 ; 7: vertical sync position 283F 00 ; 8: interlace disabled 2840 07 ; 9: 7+1=8 max scanline address 2841 20 ; 10: disable cursor

Stage 5: the tune

The Beeb has a Texas Instruments SN76489AN Programmable Sound Generator (PSG) chip. It has 4 voices: 3 voices of 50%-duty-cycle square wave and 1 voice of noise generated by a 15-bit LFSR. Each voice can be independently attenuated by anywhere from 0 to 28dB in 2dB increments, or disabled entirely.

We set the system VIA timer 1 to run freely with a period of &6262 1MHz cycles, which is about 39.7Hz. Each time the timer expires we advance the music pointer by one event. Each event can have a new note or volume or both.

Events are chunked into "patterns" of 16 events each. Approximately ¾ of the patterns are duplicates, but each is only stored once. The list of patterns and the events themselves are lightly compressed using a custom run length encoding (RLE) scheme.

Patterns contain data for a single voice, and consist of a stream of alternating bytes. one for pitch (0-63, semitones) and one for volume (0-15 in units of 2dB attenuation, where 15 means "silent"; this is the 76489's internal volume register format[10]). &FF means "don't write anything". So, e.g. 4300 ff02 ff03 ff06 4100 ff02 has two notes at full volume at events 0 and 4, and they fade a bit afterwards.

All the pitch and volume values have the top bit clear, so the top bit is repurposed for RLE. Binary 0xxxxxxxx means byte x and 1xxxyyyy means xxx bytes of &FF (&FF means "empty", so occurs frequently). It would seem a shame to waste the four yyyy bits, so if yyyy is non-zero, yyyy means "output an extra byte 0000yyyy" after the run of &FFs. Zero is used a lot as a volume but some of the other values aren't, thus the volume is xored with a tweakable constant so that zero can be packed too. Pitches are also xored for similar reasons: if common pitches can be stored in four bits, they can potentially be tagged onto the end of an RLE run too. The noise voice simply has no pitches.

The sequence data consists of another stream of bytes, again per-voice. It contains two bytes of pointer-into-pattern-data for each sequence. But many patterns follow on from each other, so often the pattern data pointer would already be pointing to what we would just be about to load.

The pattern data is <2K, which is only 11 bits. There are 16 to play with, again leaving some spare for RLE. So, a single &FF byte means "don't bother loading anything", and if we shift the top bit of the low byte of the two-byte pointer into the high byte we can use the existing RLE code (recall that it can only output bytes with the top bit clear). The sequence pointers are stored straight after the voice pointers, and the RLE routine gets called with a voice number from 4-7 instead of 0-3 for the actual voices. Thus runs of adjacent patterns (such as the melody, which isn't very repetitive) get encoded as a single byte too. So we have 11111111 for nothing, 01xxxxxx 0yyyyyyy for a two-byte pointer, and 00zzzzzzz spare. The 6 zzzzzz bits we can use as a signed offset from the current pointer.

If zzzzzz fits in 4 bits, again we can use the tagging-onto-the-end-of-an-RLE-run scheme. The offset calculation is chosen carefully so that this happens often, which is why bit 6 is clear for a single-byte offset.

One final evil hack: when the melody cuts out at the end just before the loop, it's cheaper to compare the sequence position with that position and ignore the melody voice from then onwards than it is to store repeated pointers to an empty pattern.

The source XM is by my reclusive friend K circa 2008 and was written specifically with the 76489 in mind. I wrote the conversion script and Beeb playroutine (demoscene-speak for "an embeddable music player you can use in your demo") a few days later, and much of this explanation of it is cribbed from decade-old chat logs.


So how did people get on?

Before I do that, here is a Beeb configured similarly to the one at the conference should you like to try it yourself.









The BBC Master on display at the conference handily already had a 3.5mm audio jack plugged into the cassette port, to which was attached to an MP3 player in shuffle mode playing an image slideshow. So all people actually needed to do after plugging in their laptop was to switch filesystem to tape (from the default disc) and run the first program found. The former is canonically done with *TAPE (abbreviated *T., case-insensitive) but can also be accomplished by holding down T and pressing Break. The latter: CHAIN"" (abbreviated CH."", case-sensitive: run BASIC program). Most games and other software titles were originally supplied with these instructions. In our case *RUN (abbreviated */, case-insensitive: run machine code program) would also work.

Stairway to Hell is probably the best web reference that tells you this succinctly. (Their suggested PAGE=&E00 is redundant for us but necessary for some very early software predating disc systems that would otherwise run out of memory due to making assumptions about OSHWM that would later prove unwise.)

Instead the booby prize is chapter 35 of the User Guide, the original 1981 documentation supplied with the Beeb, which is the top google hit for "bbc micro load tape". This is... not the best document for figuring out how to do this, particularly if you're in a hurry because it's the middle of a conference.

I saw one group trying to work out what to do with this as their only guide. That was fun. Here's what they did (I egged them on only slightly):

  • Type LOAD "PROG" without switching to tape first. This hangs as it waits for the non-existent disc drive to respond.
  • Figure out *TAPE and type LOAD "PROG", not realising that PROG was merely an example filename and that such a command would result in any other file being catalogued but not loaded. To be fair, the User Guide does mention that it's only an example in the previous section devoted to saving files. Clearly the authors assumed that people would have nothing to load unless they'd saved it themselves (or that commercial software would be supplied with its own instructions).
  • Type LOAD "BeebSides", painstakingly typing in the filename listed as as result of the previous step, rather than using the common shortcut of a empty filename to mean "whatever you find next". This is not even mentioned as a possibility in the User Guide. (To be fair, this shortcut works only on tape and not on disc, which has no concept of "next").

  • Successfully load the BASIC program with LOAD, get a beep and prompt back, and be unsure what to do next.

  • Figure out to type RUN to run the program but not consider that (a) the WAV file is about a minute long and loading the BASIC program took about a second, ergo (b) there was more data that it was expecting to load that had passed by while they were figuring and typing.[11]
  • Watch it work anyway because I'd not asked the OS to clear memory contents on Break, so all the code was already in RAM from the previous guys who'd run it successfully...
Mark revealing all

At least one other person got it running but didn't recognise it as a flag before Mark revealed all at the end of the day. (In my defence, I've never participated in a CTF before so had only a single-sentence description of what flags typically look like.)

I also witnessed it fail in an unexpected manner when loaded correctly (screen didn't clear before printing the progress bar; bar progressed normally but it subsequently hung when about to load the image). Again I'm chalking this up to not having cleared the memory. Lesson learned.

Total number of successful flag captures: three (not counting Mark, who'd solved it a week or so earlier, as well as solving the (sadly) much more demanding challenge of getting the WAV file into an emulator).

[0] "Capture the flag" contests, typically involving some kind of hacking challenge that, when solved, reveals some text redeemable for points and/or prizes from the organizers, are a staple at security conferences.

[1] The block length field is 2 bytes. Does this mean you can have blocks longer than 256 bytes? No, no it does not.

[2] The CRC covers only the header. The actual data in the block has a separate CRC, following the data.

[3] Actually, for fascinating reasons I'm not gonna get into, it skips clearing every 256th byte of memory (those whose addresses end in &00). This was corrected on the BBC Master.

[4] E.g. This is a classic time-of-check-to-time-of-use (TOCTOU) attack: only after the header is read in its entirety does the OS start to decode it. As it trickles in at 1200 baud, there's plenty of time to modify its contents before it's used simply by hooking the 50Hz vertical blanking interrupt.

[5] "OK, so maybe BASIC can load it anywhere, but surely there's a default load address that we can assume? Right?"

Unfortunately not. The default is the prettily-named OSHWM (OS high water mark), which varies depending on what expansions you have fitted. &E00 is the norm on cassette-based BBC Micros and Electrons, and all Masters, whereas &1900 is usual on BBC Micros with a disc system (i.e. most of them). Other, rarer values include &1D00 on Electron with Plus 3 fitted (the official disc system for the Electron, now with hierarchical filesystem!) or BBC Micro with same, &1F00 for BBC Micros or Electrons with support for both disc types, and &800 on a second processor. Add anywhere from &100 to &500 if you decide you want to define lots of character bitmaps. You get the idea.

[6] The point of the indirect jump is to allow changing the amount of screen memory cleared. While always ending at &8000, the screen starts at &3000 or &4000 or &5800 or &6000 or even &7C00 depending on video mode.

[7] The slowdown is achieved by changing the system VIA timer 1 frequency, normally 100Hz, to about 5kHz. The Beeb can barely cope with interrupts at this rate, and the extra processing required when a key is held down renders it completely stuck until the key is released.

[8] Datasheet, with registers, here. We will be needing these later.

[9] Specifically, they are reserved when storing files on tape but are used when storing files on a ROM cartridge (the OS code for which is largely shared with the tape loading code). These locations and more are documented at

[10] More details on register formats can be found in the 76489 datasheet. Note that the frequency of the square wave voices are controlled by a simple 10-bit counter, so we need a lookup table to convert from semitones. The table is only 12 entries long as an octave corresponds to dividing the counter period by two.

It may seem odd that we're doing this manually rather than using the OS's extensive sound facilities, but there's a good reason. Playing the same note on multiple voices simultaneously would result in the phase between voices remaining constant but undefined, producing unpredictable total volume or even complete silence if you were unlucky. Acorn went to remarkable trouble to avoid this counterintuitive behaviour being exposed to users: they added the voice number (0, 1 or 2) to the period register, thus detuning some notes slightly but ensuring that when playing multiple copies of the same note the phase was continuously varying, fixing the volume problem and making a fuller sound with a characteristic "beating".

Sadly, because the periods get closer together as notes get higher, this becomes much more noticeable the higher you go. We can avoid playing several of the same note but sadly the workaround cannot be disabled, other than by bypassing the OS entirely.

[11] I considered having the tape loader check for some known bytes before blindly loading and executing whatever it read, but ultimately decided keeping it small was more important.

Leave a comment