On Google Analytics

Posted by HEx 2014-12-03 at 03:13

On the web, the "traditional" method of visitor tracking is at the origin web server. This is perfectly reliable and trustworthy: it is impossible to visit a site without contacting the origin server1, and no other parties need be involved. Tools such as webalizer perform logfile analysis and produce visitor statistics.2

However, in the past decade a new model of visitor tracking has achieved prominence: third-party tracking via javascript, often with a fallback to transparent 1x1 GIFs ("web bugs"). This model is exemplified by the most widespread tracking tool, Google Analytics.

Google Analytics (GA) is overwhelmingly popular. At the time of writing, approximately half of the web uses GA.3 There is no user benefit to participating in GA tracking, but there are costs both in resource usage and in information leakage. As a consequence, google-analytics.com is also one of the most blacklisted domains on the web.

How blacklisted? Here is a sampling of popular ad-blocking and other privacy-centric browser addons, along with their approximate userbase.

All of the above tools block GA by default. This is not counting other, more niche tools, or the individuals and organisations who block GA manually in their browsers or at their firewalls.

This is not a negligible number of people. Notwithstanding the fact that an awful lot of people seem to not want GA to be a part of their browsing experience, these are all people who will be invisible to GA, and absent in the data it provides. That it is possible (and even desirable) to opt out of third-party tracking undermines the entire concept of third-party tracking.

That Google offers such a deeply flawed service is easy to understand. The question for Google is not whether the data it collects from GA is complete, but whether it gets to collect it at all. Google is the biggest infovore in history: of course websites outsourcing their tracking to Google is a win for Google.

And yet, in part because it is easy to use, GA is incredibly widespread. Given its popularity, GA would perhaps seem to be a win for site operators too. I think not, and here's why: deploying GA sends a message. If a site uses GA, it seems reasonable to conclude the following about the site operator:

  • You care about visitor statistics, but not enough to ensure that they are actually accurate (by doing them yourself).4 Perhaps you don't understand the technology involved, or perhaps you value convenience more than correctness. Neither bodes well.
  • Furthermore, you're willing to violate visitors' privacy by instructing their browsers to inform Google every time they visit your site. Since in return you get only some questionable statistics, it seems visitors' privacy is not important to you.

If you run a website, this is probably not the message you want to be sending.

[1] The increasing use of CDNs to serve high-bandwidth assets such as images and videos does not change the fact that 100% of visits to a modern, dynamic site pass through the origin server(s). Frontend caches such as Varnish merely change the location of logs, not their contents or veracity.

[2] There are very probably more modern server-side tools available nowadays; I've been out of this game for a while now. Certainly CLF leaves a lot to be desired. But since web servers have all the data available to them this is strictly a quality-of-implementation issue.

[3] http://w3techs.com/technologies/overview/traffic_analysis/all claims 49.9%. http://trends.builtwith.com/analytics/Google-Analytics confirms this number, and offers a breakdown by site popularity. Of the most popular 10,000, 100,000 and 1,000,000 sites, the proportion using GA rises from 45% of the top million to nearly 60% of the top ten thousand.

[4] This is an assumption. Deploying GA does not preclude the option of tracking visitors at your web server. But since web server tracking is strictly better than GA, why would anyone bother to use both?

Crystal Maze theme 1

Posted by HEx 2014-02-02 at 14:35

I've been getting my nostalgia on recently by watching The Crystal Maze. My main takeaway—other than that surely sliding block puzzles aren't that hard?—is that the theme tune badly needs remixing. But before I embarked on such a project, I was aware of a computer adaptation to the Archimedes, a platform with which I'm deeply familiar. I'd played the game (well, the demo) back in the day: I even still had a copy, although it had suffered bitrot and would crash on startup. It presumably contained the theme tune. Now I just had to extract it.

It's been many years since I did any Acorn hacking, but this turned out to be remarkably straightforward. The presence of TrackerModule in the game's Modules directory was a dead giveaway. TrackerModule can play precisely three file formats: Soundtracker, Protracker, and its native format, the almost-lost-to-history Archimedes Tracker. Soundtracker doesn't have any magic numbers to speak of, but even in 1993 nobody used such an obsolete format. Happily the others do: in particular, Archimedes Tracker files start with the string "MUSX", and lo, three data files contain that string. Not quite that easy though, as they're embedded in some kind of custom archive format that I'm not about to reverse engineer.1

Instead I turned to the game itself. Will it unpack the tune for me? I found a pristine copy of the demo—pristine enough to get to the title screen before crashing anyway—and listened for the first time in about two decades to its remarkably poor quality rendition of a fragment of the theme tune. Hardly worth ripping, but no point leaving a job half done. It turned out the crashing was actually an asset: the tune would be left in memory, I didn't even have to break out a debugger! The game normally cleaned up after itself, but that was easily remedied.2

So, the game has quit, and TrackerModule is still loaded. *PlayStatus? Address exception. Well, of course, the tune was in application workspace. Retry outside of desktop, *Modules gives me the workspace address, *PlayStatus gives me the length (also "Converted from Amiga" *sigh*), save it out and we're done.

Since nothing can read Archimedes Tracker these days3, since it started off as an Amiga format anyway, and since I just happen, once upon a time, to have written a converter, I converted the tune back to Protracker. So here it is in all its non-glory. The main executable contains the string "Thanks to Mark Vanstone for the tracker music"—so now we know.

I wonder what the other tunes were.

[1] I'm also struck by how much data compression has come along since the early nineties. These days no compressor I can imagine would leave recognizable ASCII strings behind: what a waste of entropy!

[2] RISC OS's command line interpreter (and its predecessor on the BBC micro) is, at least in my experience, unique in using a single vertical bar as a comment character.

[3] Except xmp, of course.