An Illusory Intertwingling of Reason and Response

Tech: Yes, I’m a geek. I admit it. At least I’m not a nerd!

Tafel :: tech

Friday, August 04, 2006

URL Escaping and GET Requests

Just discovered something: the vital difference between escape() and encodeURI() in JavaScript.

Now, up until recently, I'd always used escape() to finagle text into a form suitable for returning via GET requests. However, I was trying to return some Unicode and Apache kept throwing "406 Not Acceptable"s at me.

A little digging around the Apache dev lists and bug reports, and it turns out that the "%u####" escape methodology used by most Unicode-aware JavaScript implementations is actually not in accordance with the W3C standards on the subject; hence this is considered proper behaviour, and will not be changed.

Turns out the W3C-kosher way to do it is by encoding each Unicode character as three normal "%##" escapes within the normal ASCII URI escaping scheme. Fortunately, there exists encodeURI(), which does nearly the same thing as escape(), except it follows the W3C recommendations on encoding Unicode in URIs.

So remember, if there's even the merest smear of a chance of Unicode characters getting into your GET string, use encodeURI() and you'll save yourself many a headache.

And that, ladies and gentlemen, is a Good Thing.