Memory mapping the TiLDA

Posted by HEx 2016-09-15 at 01:31

Last time I looked at the TiLDA and how to make it run native code. Anyone who's played with this device for any length of time (read: five minutes) knows that there's a serious memory shortage. There's only 128K of RAM total and half of that's used by MicroPython. By the time your app is running you have perhaps 30K to play with.

It gets worse. Because the garbage collector can't move blocks around, the heap gets fragmented incredibly quickly. You want to load a 10K file? Nope. 5K? On a good day.

Here gc.mem_free() is reporting 30944 bytes free, which is technically correct (the best kind of correct!), but reveals the true story. Each line is one kilobyte, and free space is represented by dots:

GC memory layout; from 20003c80: 00000: MDhhhhBLhhDh======BBBhh=============h=ShSh=ShSh=Sh=BhBBhBB=BBBBh 00400: =====h=======h=====BTh===BhDBhhh=B=h============================ 00800: ====================================================h========h== 00c00: ===BBBB=hB=BhBhBhBBhShBhBhBhhBh=BhSBhSSB=BBSh===B=hSB=hB=h=B=BBB 01000: Sh========Bh=BBhhhShShhShSDh=hShSFhShFShSFh=hLhShh=ShhBSh===hSSh 01400: ===========hShShh=====ShShShShDh=ShLhSShShShShShShShShShShShShSh 01800: Sh=========================h==Sh==Sh=Sh=Sh=h==================== 01c00: h==ShShShShShShh=hh=hh===hh=hh============h=Sh====ShShShShSh=h== 02000: hh==hh=hh==============h=Sh====ShShShShSh=h======hSh=Sh=h=hh=hh= 02400: ===================================hShh=h=h=h=hShShShShShShSh=hh 02800: hShShShShSh=h=Bh=hhBh=hBhhh=hBhShSh=hhShSh=hShShShShShShShShShSh 02c00: ShShShShShhhh=====hShShShShShShShShShShShShShShShShSh=hBhhBh.... 03000: ...h=....h=..................................................... (2 lines all free) 03c00: ...................h........MD.................................. (6 lines all free) 05800: ............................................MD.h=............... 05c00: .BD..B.B=BB.B..B=Bh===B=h====B..h===.BBBBh=======.........h===== 06000: =====B.B=.SMD.h=======.DDF.Bh=====....MD.BBB....MD...MDF........ 06400: ...hSFFhh.hhhh...hh.....h.Bh=======h.B=B.......BDDBhh=======h=== 06800: ===..hh..hh.h===hh..B=B.Bh=======h=======B=............h==hh==== 06c00: ===.....h========B=BBBBM......h======h============hh============ 07000: ============================h=======h=hh=======h=======h======== 07400: ===M.h==h==hh=======hSh===h==hh==hh==hh=====hh=hh=hD..D.Mh=..... 07800: DBB..Lhh=======h.h=====B...hBh=D...Bh=.BBB.....h===========..h== 07c00: =====BB.....B=B....Sh..h=hhShh=====h=====h========h==S.......BMB 08000: D.h=S.....h=======h====================h========...............h 08400: ===.......h==hSh==.......h==============hh==hh===.h=..........h= 08800: ========.............h=======h====h====h==h.h=======h===h=B==... 08c00: ........B==.B==..................h====h====h====h=======h=....h= 09000: =h============h==h=====h==hh======.h==h=h=hF............B==DB..h 09400: S....MDB=.BB....hBBh===h===.....hSh=h=h=====...h========.h====== 09800: =h==hShFh========....hh......................................... 09c00: ........h....................................h===hShh=........h= 0a000: h==h====h===h=================================================== 0a400: ==...h========h=====h===========h=h=Sh=======h==========hhS..h== 0a800: ==.....................................................h=======. (2 lines all free) 0b400: ..............................................Sh=======......... 0b800: ................................................................ 0bc00: h================hh===========h=h=======h=hh==h================= 0c000: ================================================================ 0c400: ================================h==Sh=ShSh===Sh==Sh=ShShSh===h== 0c800: =hShSh....h.......h...................................Sh........ 0cc00: ................................................................ 0d000: ....................................h=======.................... 0d400: ................................................................ 0d800: ..................................h...................h========= 0dc00: ===================================================............. 0e000: .................................hh............................. 0e400: h=======.....h.................................................. 0e800: ..............h===h====h==hh=====hSh==h==hh==hh=====hh==h======h 0ec00: h=hh=hh==hh===hh==hh==hh==hh===hh==hh========hh==hh=====hSh==h=h 0f000: h=hh===hh==hh==h................................................ 0f400: ................................................................ 0f800: ...........h==h===h=h===========================h=h============= 0fc00: ========================h=h==================================h=. 10000: .......h================hh===hh=hh=hh=hh=hh==h.h=h=.....

We'd be lucky to find 2K of contiguous memory free! Once you start delving into nativity, how objects are laid out in memory starts to become really important. Last time's app was reduced with a bit of effort to 3200 bytes (from about twice that), but they still need to be consecutive bytes. Writing anything bigger would be a problem. In particular graphics take up an awful lot of space.


But stepping back a moment. The data we want is already stored in the filesystem. The filesystem is in flash. Flash is memory-mapped. Furthermore, the filesystem is FAT, which is primitive and doesn't have any annoying gotchas such as compression. So the data we want should be there for the taking.

The FAT implementation in micropython is a generic device-independent filesystem. It can use filesystems stored anywhere (e.g. on an SD card), for any device for which a "get block" and optionally "put block" function are provided. It assumes that block reads and writes are slow and will cache blocks in RAM when it thinks this is useful. For the case of our flash filesystem, this means a lot of unnecessary block copies and memory usage. There is of course no mmap() function, and it's not even clear how the filesystem could usefully provide one with the existing block device API.

But maybe we can make our own.

Thus the following idea. Open a file and read its contents. Loop through the entire flash (it's only a megabyte) looking for where it's stored. The cluster size is 512 bytes, so files will start on a 512-byte boundary.

def mmap(file): buf=bytearray(open(file).read()) for ptr in range(0x8000000,0x8100000,512): ## it'd be so great if this worked #if stm.mem8[ptr:ptr+len(buf)]==buf: # return ptr for i in range(len(buf)): if stm.mem8[ptr+i]!=buf[i]: break else: return ptr

Because of the previously-mentioned limitations of the stm.mem* functions we have to iterate by hand rather than using slices, which is hideously slow. But we can of course write a native version1:

def mmap(file): @micropython.asm_thumb def findblk(r0,r1,r2): data(2,0xf500,0x7000,0xf1b0,0x6f01,0xd208,0x2300,0x58c4,0x58cd) data(2,0x42ac,0xd1f5,0x3304,0x4293,0xd1f8,0xe000,0x2000) buf=open(file).read() return findblk(1<<27,buf,len(buf))

There are complications. Firstly, filesystems suffer fragmentation just as the heap does, so files may not be contiguous either. Secondly, reading the entire file contents takes up the same amount of RAM, temporarily, as the file size. The second problem could be solved by only reading and comparing small chunks of the file at a time, but then either we require that the first chunk of the file uniquely identifies it, or we're willing to backtrack (possibly multiple times!) if we found a false positive early on in the search. Also possible is precomputing a hash for the file rather than actually reading it in (we wouldn't need to bother with filenames at all!), but then we'd need to pick an algorithm, and implement it, and hope there are no collisions... Messy, all told.

The first is the biggie though, and there's no easy solution. But we can mitigate both problems simultaneously by simply having small files, but lots of them—say, one per sprite. This turns out to work beautifully in practice.

Is it hacky? Why yes. But this is a microcontroller. If you're not prepared to get your hands dirty you're doing it wrong!

Next time: let's put mmap to good use by writing a game.

[1] There are a couple of gotchas with this code. Word comparisons are used, so ensure your files are a multiple of four bytes long. It takes an address to start searching at, but the first block is skipped. Here's the actual code encoded in the hex constants:

0: f500 7000 add.w r0, r0, #512 ; 0x200 4: f1b0 6f01 cmp.w r0, #135266304 ; 0x8100000 8: d208 bcs.n 0x1c a: 2300 movs r3, #0 c: 58c4 ldr r4, [r0, r3] e: 58cd ldr r5, [r1, r3] 10: 42ac cmp r4, r5 12: d1f5 bne.n 0x0 14: 3304 adds r3, #4 16: 4293 cmp r3, r2 18: d1f8 bne.n 0xc 1a: e000 b.n 0x1e 1c: 2000 movs r0, #0

TiLDA March: how to run native code from MicroPython

Posted by HEx 2016-08-13 at 15:51

Another EMF, another badge! This one is the TiLDA MK3, an STM32-based board running MicroPython. It's similar enough to the pyboard that most of the following might be of use even to non-EMF attendees.

The MK3 is very much an embedded platform, with 128K of RAM, 1MByte of flash and an 80MHz Cortex-M4. How do you get performance out of such a device? Not by writing python, that's for sure.

We have two requirements for running native code. We need to be able to load code at a known address, and we need to be able to branch to it. The built-in assembler is terrible. It only supports a subset of instructions (and in a non-standard format, too). Nonetheless it has a data() function to insert arbitrary bytes into the instruction stream. Problem solved! Just write a script to convert gcc output into a series of data() calls and we're done. Right?

@micropython.asm_thumb def arbitrary_code(): data(4,0x11111111,0x22222222,0x33333333...)

This works. But it's kind of inefficient, and since we have so little space it would be foolish to incur a 300% bloat penalty1. But it is enough to investigate the environment in which we find ourselves.


Here's a useful function. It doesn't do anything itself.2 But nonetheless you can pass arbitrary python objects to it and get their addresses back.

@micropython.asm_thumb def addr(r0): mov(r0,r0)

We also have stm.mem* which lets us read and write arbitrary memory locations from Python. So we can build ourselves a hex dumper:

def dump(addr): # copy memory just in case it changes out from under us3 a = bytearray(16) for i in range(16): a[i] = stm.mem8[addr+i] out = "%08x: " % addr for b in a: out = out + ("%02x " % b) for b in a: out = out + (("%c" % b) if b > 31 and b < 127 else ".") print(out)

addr(addr) returns its own address, which turns out to be some kind of struct with a pointer to the code at offset 8.

>>> dump(addr(addr)) 20004a10: 4c fb 06 08 01 00 00 00 70 9f 00 20 02 00 00 00 L.......p.. .... >>> dump(stm.mem32[addr(addr)+8]) 20009f70: f2 b5 00 46 f2 bd 00 00 00 00 00 00 00 00 00 00 ...F............

Disassemble the code and we get to see the prologue and epilogue:

$ perl -e 'print chr hex for @ARGV' f2 b5 00 46 f2 bd >/tmp/dump $ arm-none-eabi-objdump -b binary -m arm -M force-thumb,reg-names-std -D /tmp/dump [...] 0: b5f2 push {r1, r4, r5, r6, r7, lr} 2: 4600 mov r0, r0 4: bdf2 pop {r1, r4, r5, r6, r7, pc}

Which looks fairly standard.

The goal though is to be able to get the address of a buffer that contains code we want to run. It turns out that python bytearrays are passed directly to assembly functions. Handy.

>>> test = bytearray([0x2a,0x20,0x70,0x47]) # "movs r0, #42; bx lr" >>> dump(addr(test)) 2000f950: 2a 20 70 47 00 00 00 00 00 00 00 00 00 00 00 00 * pG............

Now we can read a buffer from file and branch to it. So, assuming we would actually like to return control to python afterwards, how do we do the branching?

blx is the standard way to jump to an address and return afterwards. blx requires the instruction set (ARM or Thumb) to be encoded in the LSB of the address; while the Cortex-M4 supports only Thumb mode, the processor does indeed verify that this is the case, so we must set the LSB correctly or Bad Things happen.4 Sadly the older bl instruction (which doesn't change instruction set) can't branch to an address in a register. We could always push the return address ourselves and load the PC directly, but then we'd have to set the LSB on the return address!

The assembler refuses to assemble blx(r0) so we have to do it ourselves.

@micropython.asm_thumb def call(r0): data(2,0x4780) # blx r0

Then we can call code in our test buffer.

>>> assert(not(addr(test)&3)) # ensure the buffer is aligned >>> call(addr(test)+1) # set the LSB and branch 42


Of course, if we do the incrementing from within the call function we can dispense with addresses entirely:

def call_inc(r0): data(2,0x3001,0x4780) # adds r0,#1; blx r0 buf = open("code").read() call_inc(buf) # assumes buf is word-aligned, which seems to always be the case

Now to get gcc on board.

To C

Here's “hello world” without the hello (or the world).

$ cat test.c int test(void) { return 42; } $ arm-none-eabi-gcc -mthumb -mcpu=cortex-m4 -ffreestanding -nostdlib -fPIC test.c -o test ld: warning: cannot find entry symbol _start; defaulting to 0000000000008000 $ arm-none-eabi-objcopy play -O binary test-raw

(Why not just get gcc to output a flat binary directly?

$ arm-none-eabi-gcc -Wl,--oformat=binary -mthumb -mcpu=cortex-m4 -ffreestanding -nostdlib -fPIC test.c -o test ld: error: Cannot change output format whilst linking ARM binaries.

Apparently this is deliberate. I have no idea why.)

Let's have a look at the result:

$ arm-none-eabi-objdump -b binary -m arm -M force-thumb,reg-names-std -D test-raw [...] 0: b480 push {r7} 2: af00 add r7, sp, #0 4: 232a movs r3, #42 ; 0x2a 6: 4618 mov r0, r3 8: 46bd mov sp, r7 a: f85d 7b04 ldr.w r7, [sp], #4 e: 4770 bx lr

A bit verbose perhaps, but we didn't ask for any optimization. Try it:

>>> f = open("test-raw") >>> test2 = >>> f.close() >>> call_inc(test2) 42

Looking good. Things go badly very quickly though:

$ cat count.c int arr[] = { 1, 2, 3, 4, 5 }; int count(void) { int i, total=0; for(i=0; i<5; i++) { total += arr[i]; } return total; } $ arm-none-eabi-gcc -Wall -mthumb -mcpu=cortex-m4 -ffreestanding -nostdlib -fPIC count.c -o count ld: warning: cannot find entry symbol _start; defaulting to 0000000000008000 $ arm-none-eabi-objcopy count -O binary count-raw

Again we compile without optimization, in this case to avoid gcc simply returning a constant instead of performing calculations at runtime.

Suddenly our binary is 64K in size! And it contains some code that will never work:

$ arm-none-eabi-objdump -b binary -m arm -M force-thumb,reg-names-std -D count-raw [...] 0: b480 push {r7} 2: b083 sub sp, #12 4: af00 add r7, sp, #0 6: 4a0e ldr r2, [pc, #56] ; (0x40) 8: 447a add r2, pc [...] 14: 4b0b ldr r3, [pc, #44] ; (0x44) 16: 58d3 ldr r3, [r2, r3] [...] 40: 003c movs r4, r7 42: 0001 movs r1, r0 44: 000c movs r4, r1 [...] 10054: 8058 strh r0, [r3, #2] 10056: 0001 movs r1, r0

Here it is trying to set R2 to the contents of PC+0x10054, which is 0x18058, and later reading from that address. Disassembling the original ELF output reveals what's going on: ldr r2, [pc, #56]: add r2, pc is attempting to load the global offset table. GCC has assumed (despite -ffreestanding!) that we're running in an environment with virtual memory where code and data are held in separate pages.

We need to create a custom linker script to tell it to not do that. Not being any kind of ld expert, here's what I cribbed together:

$ cat linkscr SECTIONS { .text 0 : AT(0) { code = .; *(.text.startup) *(.text) *(.rodata) *(.data) } }

108 bytes, plus our warning has gone away! That's more like it. We also need to replace -fPIC with -fPIE. The tempting -mpic-data-is-text-relative option is, happily, already enabled by default in gcc 4.8 and up.

$ arm-none-eabi-gcc -Tlinkscr -Wall -mthumb -mcpu=cortex-m4 -nostdlib -fPIE count.c -o count $ arm-none-eabi-objcopy count -O binary count-raw >>> call_inc(open("count-raw").read()) 15


So having got this far I made a little app, a quick-and-dirty port of some existing C code I've been working on.

What's the code? And why is it so arcane? That's a story for another day.

Code is on github.

[1] 4 bytes written in hex is 0x12345678, (11 bytes). This will allocate another four bytes for the result, for a total of 15 bytes. Then there's the space for data() (which almost certainly has a limit on the number of parameters, so multiple statements would be required). Result: about four times the space.

[2] Indeed, we only need the no-op at all to avoid confusing Python with an empty function body.

[3] Sadly the implementation of stm.mem* contains the following line:

// TODO support slice index to read/write multiple values at once

so this somewhat unpythonic code is actually the best we can do.

[4] Bad Things include but are not limited to: LEDs flashing ominously, unresponsiveness (necessitating a reboot), filesystem corruption (necessitating a reinstall, with loss of data), and nasal demons.

Rainbow Noise

Posted by HEx 2014-12-08 at 00:25

A friend of mine (hi K!) revealed to me recently that he uses a "background noise" app, and that he had some strange observations. He could hear patterns in the noise, right at the edge of audibility.

I was skeptical. Humans are notorious for imagining patterns everywhere: it's how we're wired. What other explanations could there be, if he wasn't just making it up?1 Perhaps the app was using a poor random number generator? Not exactly: further investigation revealed the surprising fact that the patterns seemed periodic—every 30 seconds to be precise—and it wasn't long from there until the discovery that yes, the app simply uses prerecorded segments of noise, and yes, they're exactly 30 seconds long.

This both astounded and disturbed me. That someone could detect long-range periodicity in randomness by ear was pretty neat. That someone would make an app that explicitly stored 5 megabytes of random data and played it back in a loop instead of writing a dozen lines of code was just depressing2, especially as it defeated the entire point (the supposed soothing properties of the noise were demonstrably being eroded by distracting patterns).

Since the app offered various colours of noise I found myself idly wondering how to generate them algorithmically, then idly playing about, and before I'd quite realised what I was doing I'd somehow written an app of my own.

Rainbow Noise offers six varieties of mathematically-well-defined noise. It also contains an easter egg. More accurately, it is an easter egg that just happens to contain a noise generator. Since bloating a simple app by over an order of magnitude just for the lulz is crazy (not to mention that unnecessary bloat was one of the motivating factors behind this in the first place), I have gone to some effort to keep the size down.3

Sadly the egg does not yet work in Chrome. Given the pathological loathing of legally-unencumbered codecs apparently shared by Apple and Microsoft, it doesn't work in Safari or IE either, but at least Google has acknowledged the problem. So maybe hooray for Google but definitely hooray for Firefox, where everything just works. (Why no, I will not be working around this bug.)

Code is on github.

[1] Presumably subconsciously, but you never know.

[2] Apparently randomness as a service is a thing these days. The notion that entropy is sufficiently hard to come by that you would ever need to look to a third party to provide you with some—and any would-be cryptographers who think this is a good idea should be beaten to death with their own S-boxes—is one of the dafter things I've ever heard. But not, perhaps, as daft as having a fixed entropy pool and outputting it verbatim over and over again.

[3] Not all-out. Some.

On Google Analytics

Posted by HEx 2014-12-03 at 03:13

On the web, the "traditional" method of visitor tracking is at the origin web server. This is perfectly reliable and trustworthy: it is impossible to visit a site without contacting the origin server1, and no other parties need be involved. Tools such as webalizer perform logfile analysis and produce visitor statistics.2

However, in the past decade a new model of visitor tracking has achieved prominence: third-party tracking via javascript, often with a fallback to transparent 1x1 GIFs ("web bugs"). This model is exemplified by the most widespread tracking tool, Google Analytics.

Google Analytics (GA) is overwhelmingly popular. At the time of writing, approximately half of the web uses GA.3 There is no user benefit to participating in GA tracking, but there are costs both in resource usage and in information leakage. As a consequence, is also one of the most blacklisted domains on the web.

How blacklisted? Here is a sampling of popular ad-blocking and other privacy-centric browser addons, along with their approximate userbase.

All of the above tools block GA by default. This is not counting other, more niche tools, or the individuals and organisations who block GA manually in their browsers or at their firewalls.

This is not a negligible number of people. Notwithstanding the fact that an awful lot of people seem to not want GA to be a part of their browsing experience, these are all people who will be invisible to GA, and absent in the data it provides. That it is possible (and even desirable) to opt out of third-party tracking undermines the entire concept of third-party tracking.

That Google offers such a deeply flawed service is easy to understand. The question for Google is not whether the data it collects from GA is complete, but whether it gets to collect it at all. Google is the biggest infovore in history: of course websites outsourcing their tracking to Google is a win for Google.

And yet, in part because it is easy to use, GA is incredibly widespread. Given its popularity, GA would perhaps seem to be a win for site operators too. I think not, and here's why: deploying GA sends a message. If a site uses GA, it seems reasonable to conclude the following about the site operator:

  • You care about visitor statistics, but not enough to ensure that they are actually accurate (by doing them yourself).4 Perhaps you don't understand the technology involved, or perhaps you value convenience more than correctness. Neither bodes well.
  • Furthermore, you're willing to violate visitors' privacy by instructing their browsers to inform Google every time they visit your site. Since in return you get only some questionable statistics, it seems visitors' privacy is not important to you.

If you run a website, this is probably not the message you want to be sending.

[1] The increasing use of CDNs to serve high-bandwidth assets such as images and videos does not change the fact that 100% of visits to a modern, dynamic site pass through the origin server(s). Frontend caches such as Varnish merely change the location of logs, not their contents or veracity.

[2] There are very probably more modern server-side tools available nowadays; I've been out of this game for a while now. Certainly CLF leaves a lot to be desired. But since web servers have all the data available to them this is strictly a quality-of-implementation issue.

[3] claims 49.9%. confirms this number, and offers a breakdown by site popularity. Of the most popular 10,000, 100,000 and 1,000,000 sites, the proportion using GA rises from 45% of the top million to nearly 60% of the top ten thousand.

[4] This is an assumption. Deploying GA does not preclude the option of tracking visitors at your web server. But since web server tracking is strictly better than GA, why would anyone bother to use both?

What's wrong with Threes 1

Posted by HEx 2014-04-16 at 15:18

A few weeks ago, when the 2048 craze was at its height, this Mobile Mavericks article (mirror) was posted on HN. (Go read it. Go read it now. I'll wait.)

To say I disagree with this article would be putting it mildly. But hey, someone is wrong on the internet. It happens all the time.

However, here's the Threes team weighing in on the topic. They say essentially the same thing, only more diplomatically: woo, something we made became popular! But there's all these "rip-offs" that are more popular still, and we're not happy about that. Especially since we think they're inferior.

Why, they don't even have "save game syncing across devices, a beautiful top screen and gorgeous little sharing cards for social media"!

You have to pay for Threes, and you can only play it on your phone, and furthermore only if that phone is made by Apple.1 Spending fourteen months polishing your flawless jewel, releasing the result under those kind of restrictions and expecting people to be content to look but not touch is, well, naïve. That someone spends a weekend reimplementing a version that works everywhere just for fun is hardly surprising, and thus the year-long Threes development process surely counts as a Very Poor Business Decision Indeed.

That a Threes-like game proceeded to take over the internet is something that nobody could've predicted. The internet is nothing if not fickle. But if any version was to take over the internet, it would most certainly not be Threes, unless Threes ran in a web browser and was available for free. Those requirements are mandatory for the kind of exposure that 2048 has garnered.

2048 is absolutely an improvement over Threes, in every way that counts. But 2048 went one step further. It's open source. The sheer bewilderment at this is evident in the MM article:

"What isn’t alright by me is a game that releases for free and makes no attempt to make money, which is what 2048 has done. It does nothing to monetise: it makes no advertising revenue; it has no broader cross promotional purpose and it certainly has no in app purchases to make money."

This guy made a game and just gave it away? What is wrong with him?

This mindset is what I loathe about the mobile world2, and part of why I don't own a smartphone. Everyone has an ulterior motive, usually money. It's natural and assumed. Nobody would ever do anything that wasn't to their advantage, and users are there to be exploited.

Let's be clear here. 2048 is not a "rip-off".3 2048 is not destroying value, it's creating it. 2048 is someone building on what has come before. The many 2048 variants are people building on what has come before. This is how culture works. This is how culture has always worked. This human tendency is what made the technology on which Threes depends possible!

Why is this news to anyone? Particularly the author of the MM article, but also the Threes developers. That someone considered Threes important enough, inspirational enough to build on should be cause for celebration, not consternation!

They aren't the only ones missing the point, of course. Ownership of ideas is accepted and commonplace today. Imagine for a moment what might've happened had the Threes team spent some of their fourteen months applying for a patent on the mechanics of their game. Likely they would've got their "no rip-offs" wish: 1024 could have been effectively nipped in the bud. Its successor 2048 would never have existed. And nobody would ever have heard of Threes.

If you don't consider that outcome an improvement (I don't), maybe it's worth pondering how we as a society could start encouraging this kind of creativity instead of denigrating it. But first we need to start accepting that "derivative" is not a dirty word.

(Full disclosure: I made a 2048 variant. And I have never played Threes.)

[1] Yes, there's an official Android port now, but not until over a month after the iOS release, which was quite long enough. Also iPads count as big phones for the purposes of this rant.

[2] And before that, the Windows world. Happily, there are communities where people cooperate to make and give away software, even entire operating systems, without any motivation beyond wanting to make something awesome. And for that I am truly grateful.

[3] rip-off /ˈrɪpɔf/ n. a copy, and that's bad.


Posted by HEx 2014-03-20 at 03:50

I've never been much of a fan of memes, much less participated in one. I really should know better. Still, I've had fun with this one.

First there was 2048, which by now needs no introduction. Then there was 2048-AI. The AI's minimax strategy resulted in it routinely calculating the worst possible positions for new tiles, so it could avoid potential disasters. This prompted a lengthy discussion on HN1 about whether minimax was really a good idea when in the game new tiles were placed randomly, and that it was arguably optimizing for the wrong thing, resulting in needlessly conservative play.

Then there was 2048-Hard, which demonstrated that purposeful tile placement really did make the game significantly more difficult. Almost impossible, in fact, even for the AI.

Whether 2048-Hard is an argument for or against minimaxing isn't clear. But a different idea occurred to me: rather than fixing the AI so that it assumes random tile placement, why not fix the game so that it matches the AI's model more closely?

Thus was born 8402, an inside-out variation where the AI is not on your side at all. As well as an interesting (and almost entirely unlike 2048!) game in its own right, hopefully 8402 offers some insight into the original too.

[1] See also this Stack Overflow question.

Fifteen dissection part two: Instruments

Posted by HEx 2014-02-19 at 04:45

[Previous instalments: part zero, part one.]

So we need to calculate PCM audio, and we have very little space to do it in. First we need some instruments.

I wrote a tool to play around with generating sounds algorithmically. (I've tidied it up a bit since the compo; here is the original for the curious. Particularly of note is the fact that sample values for both the old version and the entry itself were in the range -32768..32767, whereas now they're -1..1. Hence the code samples differ slightly.)

In the tune we have the following: lead, bass, bass drum, hihat, snare, two bongoids and two thud-type things.1


Melodic instruments can be pretty simple: just take a periodic waveform and decay it exponentially over time.

We start with a sine wave, which is pretty periodic. var freq = 440; return Math.sin(i * freq * 2*Math.PI / 32000);

And add some decay: var freq = 440, decay = 0.0004; return Math.sin(i * freq * 2*Math.PI / 32000) * Math.exp(-i*decay);

Sine waves sound really "thin", having just a single frequency. We need harmonics. A cheap way of doing this I discovered was to generate sinnx instead of sin x: higher values of n give more harmonics. Here's n=9, the value used for the lead melody:

var freq = 440, decay = 0.0004, exponent = 9; return Math.pow(Math.sin(i * freq * 2*Math.PI / 32000), exponent) * Math.exp(-i*decay);

Frequency isn't too useful by itself; instead we need pitches in more useful units such as semitones. The actual function I ended up with was: function r(o,p,f,e) { return 4000*Math.pow(Math.sin(o*Math.pow(2,(p-53)/12)),e) * Math.exp(-o * f); } Here o is the offset in samples from the start of the note, p is the pitch in semitones (adjusted so that useful values are just above zero), f is the decay constant, and e is the exponent. This generate sample values in the range -4000..4000, which will fit about 8 such notes in the 16-bit output range.

And the calling code: z += (t||(l==3))? (r(u, p[l], 3e-4,25) + 2*r(u, p[l]-24, 2e-4, 80)): r(u, p[l]-10?p[l]:13+((i/960)&2), 4e-4,9);

This takes some breaking down.

(t||(l==3)) This determines whether we want the bass sound (when channel == 3, or always during the thuddy chords) or the lead sound.

(r(u, p[l], 3e-4,25) + 2*r(u, p[l]-24, 2e-4, 80)) The bass sound is made up of two calls to r, one with exponent of 25, and one twice as loud and two octaves (24 semitones) lower with an exponent of 80. When the exponent is even the fundamental frequency is doubled so it's actually only one octave lower. The decay constants are slightly different so that the timbre changes over time.

r(u, p[l]-10?p[l]:13+((i/960)&2), 4e-4,9); This is the melody sound, previously dissected.

p[l]-10?p[l]:13+((i/960)&2) A hack for the mordent. p[l] is the pitch stored in the pattern data, which is updated every event, or 3840 samples. If the pitch is not equal to 10, then it is used directly. Pitch 10—a value not needed for the tune—is used as a sentinel to mean "pitch 13 for half an event, then pitch 15 for half an event".


The bass drum is simply a sine wave with descending frequency: return Math.sin(2e5/(i+1948)); The number 1948 was chosen so that after 3840 samples (one event) the result will be approximately zero to avoid clicking.

The remaining percussion (hi-hat and snare—I had to jettison the bongoids for space reasons) were something resembling white noise with exponential decay. I decided to avoid Math.random() in favour of some deterministic bit-twiddling.

Here's the snare: return ((8e10/(i+6e3))&14)*Math.exp(-i/900)/14;

The hat has three different variants with slightly different decay constants. Here's the longest: return ((8e10/(i+3e5)*(i*17|5)&511))*Math.exp(-i/1000)/500;

The percussion sequences are simple enough to be hard-coded. The hi-hats in context: var o = i%3840; var n = (i/3840) |0; return ((8e10/(o+3e5)*(o*17|5)&511)) * Math.exp(-o*(n%3+1)/1000)/500;

And finally the percussion in its entirety: var z = 0; var m = ((i / 3840 / 15) | 0) + 9; var o = i%3840; var n = (i/3840) |0; if(m<20) { /* hat */ if(m>9) z += ((8e10/(o+3e5)*(o*17|5)&511)) * Math.exp(-o*(n%3+1)/1000) * 8; /* bd, snare */ z += [m>11 && Math.sin(2e5/(o+1948))*10,0,m>15 && ((8e10/(o+6e3))&14) * Math.exp(-o/900),0][n&3] * 800; } return z / 32768;

So there you have it. Despite all of this, the instruments were easily the weakest part of the demo.

Still to come: pattern encoding.

[1] One of the fun things about making music electronically is that it's entirely possible to get by without knowing what your instruments are conventionally called, or whether they even have a physical counterpart at all.

Crystal Maze theme

Posted by HEx 2014-02-02 at 14:35

I've been getting my nostalgia on recently by watching The Crystal Maze. My main takeaway—other than that surely sliding block puzzles aren't that hard?—is that the theme tune badly needs remixing. But before I embarked on such a project, I was aware of a computer adaptation to the Archimedes, a platform with which I'm deeply familiar. I'd played the game (well, the demo) back in the day: I even still had a copy, although it had suffered bitrot and would crash on startup. It presumably contained the theme tune. Now I just had to extract it.

It's been many years since I did any Acorn hacking, but this turned out to be remarkably straightforward. The presence of TrackerModule in the game's Modules directory was a dead giveaway. TrackerModule can play precisely three file formats: Soundtracker, Protracker, and its native format, the almost-lost-to-history Archimedes Tracker. Soundtracker doesn't have any magic numbers to speak of, but even in 1993 nobody used such an obsolete format. Happily the others do: in particular, Archimedes Tracker files start with the string "MUSX", and lo, three data files contain that string. Not quite that easy though, as they're embedded in some kind of custom archive format that I'm not about to reverse engineer.1

Instead I turned to the game itself. Will it unpack the tune for me? I found a pristine copy of the demo—pristine enough to get to the title screen before crashing anyway—and listened for the first time in about two decades to its remarkably poor quality rendition of a fragment of the theme tune. Hardly worth ripping, but no point leaving a job half done. It turned out the crashing was actually an asset: the tune would be left in memory, I didn't even have to break out a debugger! The game normally cleaned up after itself, but that was easily remedied.2

So, the game has quit, and TrackerModule is still loaded. *PlayStatus? Address exception. Well, of course, the tune was in application workspace. Retry outside of desktop, *Modules gives me the workspace address, *PlayStatus gives me the length (also "Converted from Amiga" *sigh*), save it out and we're done.

Since nothing can read Archimedes Tracker these days3, since it started off as an Amiga format anyway, and since I just happen, once upon a time, to have written a converter, I converted the tune back to Protracker. So here it is in all its non-glory. The main executable contains the string "Thanks to Mark Vanstone for the tracker music"—so now we know.

I wonder what the other tunes were.

[1] I'm also struck by how much data compression has come along since the early nineties. These days no compressor I can imagine would leave recognizable ASCII strings behind: what a waste of entropy!

[2] RISC OS's command line interpreter (and its predecessor on the BBC micro) is, at least in my experience, unique in using a single vertical bar as a comment character.

[3] Except xmp, of course.

It's about time

Posted by HEx 2014-01-13 at 00:57

{blog.,} now has SSL (indeed, TLS) enabled.1 After the last mishap, this time everything went smoooth.

I've stopped short of redirecting HTTP requests to the HTTPS version, but I have enabled HSTS, so if you visit the above domains with HTTPS once, your browser will remember to always use encrypted connections in the future regardless of what protocol you request. This is a Good Thing™.

While I often post links to or include resources from sphere on this blog, happily I had the foresight to not use protocol-specific URLs.2 Thus even cross-domain resources will be fetched using whatever protocol the originating request used, for maximum compatibility, security and avoidance of tedious warnings and broken padlocks. This doesn't help with embedding resources from domains not under my control of course—I can't require that other sites support encryption—but that's a fool's errand anyway.

[1] Also

[2] The syntax for this is simply "//domain/path", e.g. <a href="//">my home page</a>. This trick ought to be more widely known.

Getting familiar with MIDI

Posted by HEx 2014-01-11 at 03:45

So I recently came into possession of a Yamaha P-80 digital piano. Now that my MIDI cable has finally arrived I've been toying with controlling it by computer. MIDI is a terrible protocol, with no concept of plug-and-play whatsoever. And since the P-80 has a very limited set of instruments, with only a token attempt at mapping them onto General MIDI, any software trying to control it needs to have a preset specifically tailored for the P-80.

The program I tried using, Rosegarden, did not. So I made one: p80.rgd. (I haven't played much with the controllers but all the instruments are present and correct.)

Here's a MIDI file I used for testing, and here's how it sounds in FluidSynth with the Fluid-R3-GM soundfont1:

Here's a tweaked version for the P-80, and here's the P-80 rendering it:

The tune is the in-game tune from Spring Weekend, a demo version of which was included on a Windows 98 CD I had many moons ago.2

[1] Well, how the first quarter of it sounds, anyway. For some reason (workaround for poor looping facilities?) the file contains four identical copies of the fragment I recorded.

[2] And while I was googling, I came across Raymond Chen on MEP:TPC.