Memory mapping the TiLDA

Posted by HEx 2016-09-15 at 01:31

Last time I looked at the TiLDA and how to make it run native code. Anyone who's played with this device for any length of time (read: five minutes) knows that there's a serious memory shortage. There's only 128K of RAM total and half of that's used by MicroPython. By the time your app is running you have perhaps 30K to play with.

It gets worse. Because the garbage collector can't move blocks around, the heap gets fragmented incredibly quickly. You want to load a 10K file? Nope. 5K? On a good day.

Here gc.mem_free() is reporting 30944 bytes free, which is technically correct (the best kind of correct!), but pyb.info(1) reveals the true story. Each line is one kilobyte, and free space is represented by dots:

GC memory layout; from 20003c80: 00000: MDhhhhBLhhDh======BBBhh=============h=ShSh=ShSh=Sh=BhBBhBB=BBBBh 00400: =====h=======h=====BTh===BhDBhhh=B=h============================ 00800: ====================================================h========h== 00c00: ===BBBB=hB=BhBhBhBBhShBhBhBhhBh=BhSBhSSB=BBSh===B=hSB=hB=h=B=BBB 01000: Sh========Bh=BBhhhShShhShSDh=hShSFhShFShSFh=hLhShh=ShhBSh===hSSh 01400: ===========hShShh=====ShShShShDh=ShLhSShShShShShShShShShShShShSh 01800: Sh=========================h==Sh==Sh=Sh=Sh=h==================== 01c00: h==ShShShShShShh=hh=hh===hh=hh============h=Sh====ShShShShSh=h== 02000: hh==hh=hh==============h=Sh====ShShShShSh=h======hSh=Sh=h=hh=hh= 02400: ===================================hShh=h=h=h=hShShShShShShSh=hh 02800: hShShShShSh=h=Bh=hhBh=hBhhh=hBhShSh=hhShSh=hShShShShShShShShShSh 02c00: ShShShShShhhh=====hShShShShShShShShShShShShShShShShSh=hBhhBh.... 03000: ...h=....h=..................................................... (2 lines all free) 03c00: ...................h........MD.................................. (6 lines all free) 05800: ............................................MD.h=............... 05c00: .BD..B.B=BB.B..B=Bh===B=h====B..h===.BBBBh=======.........h===== 06000: =====B.B=.SMD.h=======.DDF.Bh=====....MD.BBB....MD...MDF........ 06400: ...hSFFhh.hhhh...hh.....h.Bh=======h.B=B.......BDDBhh=======h=== 06800: ===..hh..hh.h===hh..B=B.Bh=======h=======B=............h==hh==== 06c00: ===.....h========B=BBBBM......h======h============hh============ 07000: ============================h=======h=hh=======h=======h======== 07400: ===M.h==h==hh=======hSh===h==hh==hh==hh=====hh=hh=hD..D.Mh=..... 07800: DBB..Lhh=======h.h=====B...hBh=D...Bh=.BBB.....h===========..h== 07c00: =====BB.....B=B....Sh..h=hhShh=====h=====h========h==S.......BMB 08000: D.h=S.....h=======h====================h========...............h 08400: ===.......h==hSh==.......h==============hh==hh===.h=..........h= 08800: ========.............h=======h====h====h==h.h=======h===h=B==... 08c00: ........B==.B==..................h====h====h====h=======h=....h= 09000: =h============h==h=====h==hh======.h==h=h=hF............B==DB..h 09400: S....MDB=.BB....hBBh===h===.....hSh=h=h=====...h========.h====== 09800: =h==hShFh========....hh......................................... 09c00: ........h....................................h===hShh=........h= 0a000: h==h====h===h=================================================== 0a400: ==...h========h=====h===========h=h=Sh=======h==========hhS..h== 0a800: ==.....................................................h=======. (2 lines all free) 0b400: ..............................................Sh=======......... 0b800: ................................................................ 0bc00: h================hh===========h=h=======h=hh==h================= 0c000: ================================================================ 0c400: ================================h==Sh=ShSh===Sh==Sh=ShShSh===h== 0c800: =hShSh....h.......h...................................Sh........ 0cc00: ................................................................ 0d000: ....................................h=======.................... 0d400: ................................................................ 0d800: ..................................h...................h========= 0dc00: ===================================================............. 0e000: .................................hh............................. 0e400: h=======.....h.................................................. 0e800: ..............h===h====h==hh=====hSh==h==hh==hh=====hh==h======h 0ec00: h=hh=hh==hh===hh==hh==hh==hh===hh==hh========hh==hh=====hSh==h=h 0f000: h=hh===hh==hh==h................................................ 0f400: ................................................................ 0f800: ...........h==h===h=h===========================h=h============= 0fc00: ========================h=h==================================h=. 10000: .......h================hh===hh=hh=hh=hh=hh==h.h=h=.....

We'd be lucky to find 2K of contiguous memory free! Once you start delving into nativity, how objects are laid out in memory starts to become really important. Last time's app was reduced with a bit of effort to 3200 bytes (from about twice that), but they still need to be consecutive bytes. Writing anything bigger would be a problem. In particular graphics take up an awful lot of space.

FAT

But stepping back a moment. The data we want is already stored in the filesystem. The filesystem is in flash. Flash is memory-mapped. Furthermore, the filesystem is FAT, which is primitive and doesn't have any annoying gotchas such as compression. So the data we want should be there for the taking.

The FAT implementation in micropython is a generic device-independent filesystem. It can use filesystems stored anywhere (e.g. on an SD card), for any device for which a "get block" and optionally "put block" function are provided. It assumes that block reads and writes are slow and will cache blocks in RAM when it thinks this is useful. For the case of our flash filesystem, this means a lot of unnecessary block copies and memory usage. There is of course no mmap() function, and it's not even clear how the filesystem could usefully provide one with the existing block device API.

But maybe we can make our own.

Thus the following idea. Open a file and read its contents. Loop through the entire flash (it's only a megabyte) looking for where it's stored. The cluster size is 512 bytes, so files will start on a 512-byte boundary.

def mmap(file): buf=bytearray(open(file).read()) for ptr in range(0x8000000,0x8100000,512): ## it'd be so great if this worked #if stm.mem8[ptr:ptr+len(buf)]==buf: # return ptr for i in range(len(buf)): if stm.mem8[ptr+i]!=buf[i]: break else: return ptr

Because of the previously-mentioned limitations of the stm.mem* functions we have to iterate by hand rather than using slices, which is hideously slow. But we can of course write a native version1:

def mmap(file): @micropython.asm_thumb def findblk(r0,r1,r2): data(2,0xf500,0x7000,0xf1b0,0x6f01,0xd208,0x2300,0x58c4,0x58cd) data(2,0x42ac,0xd1f5,0x3304,0x4293,0xd1f8,0xe000,0x2000) buf=open(file).read() return findblk(1<<27,buf,len(buf))

There are complications. Firstly, filesystems suffer fragmentation just as the heap does, so files may not be contiguous either. Secondly, reading the entire file contents takes up the same amount of RAM, temporarily, as the file size. The second problem could be solved by only reading and comparing small chunks of the file at a time, but then either we require that the first chunk of the file uniquely identifies it, or we're willing to backtrack (possibly multiple times!) if we found a false positive early on in the search. Also possible is precomputing a hash for the file rather than actually reading it in (we wouldn't need to bother with filenames at all!), but then we'd need to pick an algorithm, and implement it, and hope there are no collisions... Messy, all told.

The first is the biggie though, and there's no easy solution. But we can mitigate both problems simultaneously by simply having small files, but lots of them—say, one per sprite. This turns out to work beautifully in practice.

Is it hacky? Why yes. But this is a microcontroller. If you're not prepared to get your hands dirty you're doing it wrong!

Next time: let's put mmap to good use by writing a game.


[1] There are a couple of gotchas with this code. Word comparisons are used, so ensure your files are a multiple of four bytes long. It takes an address to start searching at, but the first block is skipped. Here's the actual code encoded in the hex constants:

0: f500 7000 add.w r0, r0, #512 ; 0x200 4: f1b0 6f01 cmp.w r0, #135266304 ; 0x8100000 8: d208 bcs.n 0x1c a: 2300 movs r3, #0 c: 58c4 ldr r4, [r0, r3] e: 58cd ldr r5, [r1, r3] 10: 42ac cmp r4, r5 12: d1f5 bne.n 0x0 14: 3304 adds r3, #4 16: 4293 cmp r3, r2 18: d1f8 bne.n 0xc 1a: e000 b.n 0x1e 1c: 2000 movs r0, #0

Leave a comment

Comments