WebDAV and the HTTP PATCH nightmare

Posted by HEx 2012-11-21 at 22:04

HTTP 1.1 defines a way of retrieving part of a file1, namely the Content-Range header (plus Accept-Ranges and the 206 status code). This was widely implemented by web servers and is now ubiquitous, meaning that clients can resume partial downloads and/or seek in remote files (think video) with impunity. HTTP 1.1 also introduced the PUT method to create and update files, which was later used in WebDAV.

Clients might want to do partial updates for much the same reasons they want to perform partial retrieval: because network connections might die part way through big requests.2 The obvious way to perform partial updates is to send a PUT request with a Content-Range header, and indeed Apache supports such requests and behaves as expected.3

This seemed so clear and straightforward a solution to me that, when implementing a WebDAV client, I tried this idea before even reading the spec, was gratified that it worked on my test Apache server, and called it “done”.

Sadly, there's a snag.4 HTTP 1.1 defined PUT as being “idempotent”, which is allegedly incompatible with partial updates. RFC 5789, thirteen years after HTTP 1.1, decided to address this with the HTTP PATCH method.  It says:

“The PUT method is already defined to overwrite a resource with a complete new body, and cannot be reused to do partial changes. Otherwise, proxies and caches, and even clients and servers, may get confused as to the result of the operation.”

OK, let's assume for the minute that this is true, and this RFC is an earnest attempt to solve the problem.

RFC 5789 documents a way of sending “patch” requests to existing files. It allows multiple patch formats, and documents how to enumerate the list of formats, but—and this boggles my mind—does not document, or even hint at, any examples of such formats.5 Indeed, nowhere does such a list exist. Again from the RFC:

“Further, it is expected that different patch document formats will be appropriate for different types of resources and that no single format will be appropriate for all types of resources.  Therefore, there is no single default patch document format that implementations are required to support.”

Why, why would they do this?  It's HTML5 <video> all over again.

Unsurprisingly, given such an underspecified standard (yes, RFC 5789 is Standards Track!), no WebDAV servers have bothered to implement it.

Actually, that's not quite true. SabreDAV is making a valiant attempt. Since there's not enough information in the standard to actually implement,  PartialUpdate defines a SabreDAV-specific patch format with the same functionality that Apache has supported for PUT with Content-Range for the past twelve years, namely updating a simple byte range.

How many clients support this SabreDAV-specific behaviour?  To the best of my (and google's) knowledge: none.

But is PATCH even solving a real problem? RFC 2616 has the following to say about PUT requests:

“The recipient of the entity MUST NOT ignore any Content-* (e.g. Content-Range) headers that it does not understand or implement and MUST return a 501 (Not Implemented) response in such cases.”

To me, this reads as “Content-Range is just fine for PUT requests, but if you don't implement it, make sure you throw an error instead of silently ignoring it”.  For this to actually mean “you MUST return 501 if the request contains a Content-Range that is not the entire file” seems perverse. The possibility exists that the authors of RFC 2616 overlooked something in their implicit approval of Content-Range for PUT requests, but I have my doubts.

As for idempotence: it is true that PUT requests are defined as being idempotent, and thus proxies, caches, clients and servers are allowed to optimize accordingly. But partial PUT requests are idempotent! Writing part of a file more than once has precisely the same effect as writing it once.

Multiple partial PUT requests for different parts of a file may of course not be idempotent (depending on whether the ranges overlap), but this isn't a problem! Multiple differing complete PUT requests are not idempotent either, indeed, HTTP 1.1 explicitly states that “it is possible that a sequence of several requests is non-idempotent, even if all of the methods executed in that sequence are idempotent”.

Where I think the confusion lies is the notion that the state of a file after a (successful) PUT request is completely specified by the request, and does not rely on the previous file contents. Nowhere that I have found has this been claimed, so presumably any software that assumes it to be true can expect nasal demons.

I do not know whether such broken software exists. It might do. Nonetheless, I cannot think of a single case where, given the assumptions in HTTP 1.1, partial PUT requests might cause a problem. Clearly the authors of Apache considered it sufficiently non-problematic to support them. lighttpd supports them too. Whether other WebDAV servers that declined to did so because of fear of misbehaviour or because they considered resumable uploads a corner case too obscure to be worthwhile implementing is anyone's guess (though if it's the latter I will happily denounce them for skimping).

That I can't think of any possibilities for badness doesn't mean they don't exist, but examples would go a long way to making me a believer. Meanwhile I can't help but wonder what the authors of RFC 5789 (which, as a standard, “represents the consensus of the IETF community”) considered so worrying, and why they proposed such a baroque non-solution to a seeming non-problem.

So where do things stand if you want resumable uploads over WebDAV?

  • If your WebDAV server is Apache (or lighttpd!): use PUT with Content-Range, and ignore what RFC 5789 says about this being forbidden.
  • If your WebDAV server is a recent SabreDAV: use PATCH with a SabreDAV-specific Content-Type.
  • If your WebDAV server is anything else (nginx, IIS, ...) you're probably out of luck.

And this is why we still don't have a sane standards-based network filesystem.

[1] For the pedants: yes, the things that an HTTP server serves up needn't be files, and indeed the official terminology is “entity” or “resource”.  But I'm going to call them “files” for simplicity, in deference to the 99% of the time that this is the case.

[2] There are other reasons, of course.  But this is the biggie.

[3] WebDAV code first appeared in Apache's repository in June 2000, with this functionality already present. See here, around line 1120.

[4] You saw that one coming, right?

[5] Well. “The determination of what constitutes a successful PATCH can vary depending on the patch document and the type of resource(s) being modified.  For example, the common 'diff' utility can generate a patch document that applies to multiple files in a directory hierarchy.”—wait, so web servers are supposed to understand diffs? Diffs that apply to multiple files? Talk about scope creep!