The Tangled Web

The Tangled Web

Michal Zalewski's The Tangled Web is an excellent introduction to web application security. The book is structured in three parts. The first part explains how the web works. I thought I knew this stuff, but I was surprised at how many seemingly harmless quirks gain a security significance in the hands of an imaginative attacker. For example, paypal.com and paypaI.com look the same, but are they really? (And by the way, did you know there is a Unicode character called "cat face with tears of joy"?)

Part two dives into the various security features that browsers implement. This includes a comprehensive discussion of the same-origin policy, cookies, frames, content types, and browser plug-ins like Flash. The big picture is rather depressing, as so many complex components inevitably interact in unexpected ways, leading to vulnerabilities. When a new security hole is discovered, it often has to be fixed using a messy workaround, because a drastic security-minded redesign would break backwards compatibility with too many websites. On the bright side, each chapter ends with a "cheat sheet" of practical advice, which makes the web-security landscape a little less daunting.

Part three ends the book with a brief look at upcoming browser features and their security implications. Then there is a two-page epilogue which I found deeply intriguing. Here is an excerpt, where the emphasis is my own:

I am haunted by the uncomfortable observation that in real life, modern societies are built on remarkably shaky ground. Every day, each of us depends on the sanity, moral standards, and restraint of thousands of random strangers--from cab drivers, to food vendors, to elevator repair techs. [...] In this sense, our world is little more than an incredibly elaborate honor system that most of us voluntarily agree to participate in. [...] A degree of trust is simply essential to advancing our civilization at a reasonable pace. [...] What if our pursuit of perfection in the world of information security stems from nothing but a fundamental misunderstanding of how human communities can emerge and flourish? The experts of my kind preach a model of networked existence based on complete distrust, but perhaps wrongly so: As the complexity of our online interactions approaches that of real life, the odds of designing perfectly secure software are rapidly diminishing. Meanwhile, the extreme paranoia begins to take a heavy toll on how quickly we can progress.

Michal Zalewski in The Tangled Web

The rest of this post lists some tidbits from the book that I found particularly interesting.

  • Tim Berners-Lee wanted to create a semantic web, where chunks of usable information would be enclosed in machine-readable tags, such as <cite>, <code>, and <address>. Most developers don't care about this, and they just use <span> and <div> for everything. As pages become increasingly dynamic and as HTML is reduced to the role of a canvas for CSS and JavaScript, the vision of the semantic web may become irrelevant.

  • Do not try to sanitize an untrusted document in place. Attackers will always find ways around your safety checks. Instead, parse the document into some in-memory representation (such as a document tree), sanitize that representation, and then serialize it back into a clean and valid document. In the same vein, always use whitelists, because if you use a blacklist, someone will figure out a way around it.

  • Did you ever wonder why GMail prepends while(1); to all its JSON responses? This is done to protect against a clever way of stealing data by including GMail's JSON on some evil site, after redefining the Array or Object constructor. It turns out that even this might be insufficient, since infinite loops can terminate in JavaScript! Remember those dialogs that say "A script on this page has become unresponsive. Do you want to stop the script or let it continue running?" Zalewski recommends )}]'\n as a "reliable parser-busting prefix" instead.

  • When you install a browser extension, you often have to wait a few seconds before the "Install" button becomes clickable. This is not there just to annoy you -- it's a security feature! Without it, an attacker could get you to install a rogue extension by showing you a harmless button, and then opening the "install extension" dialog a split-second before you click. This is also probably why there's no keyboard shortcut for downloading a PDF in Chrome, after the "potentially harmful file" warning has been triggered.