Browser Security Handbook, part 2.5 - 10 February 2010

Home » » Browser Security Handbook, part 2.5

11.25.06

Browser Security Handbook, part 2.5

Simultaneous connection limits

For performance reasons, most browsers regularly issue requests simultaneously, by opening multiple TCP connections. To prevent overloading servers with excessive traffic, and to minimize the risk of abuse, the number of connections to the same target is usually capped. The following table captures these limits, as well as default read timeouts for network resources:

Test description	MSIE6	MSIE7	MSIE8	FF2	FF3	Safari	Opera	Chrome	Android
Maximum number of same-origin connections	4	4	6	2	6	4	4	6	4
Network read timeout	5 min	5 min	2 min	5 min	10 min	1 min	5 min	5 min	2 min

Third-party cookie rules

Another interesting security control built on top of the existing mechanisms is the concept of restricting third-party cookies. For the privacy reasons noted earlier, there appeared to be a demand for a seemingly simple improvement: restricting the ability for any domain other than the top-level one displayed in the URL bar, to set cookies while a page is being visited. This was to prevent third-party content (advertisements, etc) included via <IMG>, <IFRAME>, <SCRIPT>, and similar tags, from setting tracking cookies that could identify the user across unrelated sites relying on the same ad technology.

A setting to disable third-party cookies is available in many browsers, and in several of them, the option is enabled by default. Microsoft Internet Explorer is a particularly interesting case: it rejects third-party cookies with the default "automatic" setting, and refuses to send existing, persistent ones to third-party content ("leashing"), but permits sites to override this behavior by declaring a proper, user-friendly intent through compact P3P privacy policy headers (a mechanism discussed in more detail here and here). If a site specifies a privacy policy and the policy implies that personally identifiable information is not collected (e.g., P3P: CP=NOI NID NOR), with default security settings, session cookies are permitted to go through regardless of third-party cookie security settings.

The purpose of this design is to force legitimate businesses to make a (hopefully) binding legal statement through this mechanism, so that violations could be prosecuted. Sadly, the approach has the unfortunate property of being a legislative solution to a technical problem, bestowing potential liability at site owners who often simply copy-and-paste P3P header examples from the web without understanding their intended meaning; the mechanism also does nothing to stop shady web sites from making arbitrary claims in these HTTP headers and betting on the mechanism never being tested in court - or even simply disavowing any responsibility for untrue, self-contradictory, or nonsensical P3P policies.

The question of what constitues "first-party" domains introduces a yet another, incompatible same-origin check, called minimal domains. The idea is that www1.eu.example.com and www2.us.example.com should be considered first-party, which is not true for all the remaining same-origin logic in other places. Unfortunately, these implementations are generally even more buggy than cookies for country-code TLDs: for example, in Safari, test1.example.cc and test2.example.cc are not the same minimal domain, while in Internet Explorer, domain1.waw.pl and domain2.waw.pl are.

Although any third-party cookie restrictions are not a sufficient method to prevent cross-domain user tracking, they prove to be rather efficient in disrupting or impacting the security of some legitimate web site features, most notably certain web gadgets and authentication mechanisms.

Test description	MSIE6	MSIE7	MSIE8	FF2	FF3	Safari	Opera	Chrome	Android
Are restrictions on third-party cookies on in default config?	YES	YES	YES	NO	NO	YES	NO	NO	NO
Option to change third-party cookie handling?	YES	YES	YES	NO	YES	YES	persistent only	YES	NO
Is P3P policy override supported?	YES	YES	YES	n/a	NO	NO	n/a	NO	n/a
Does interaction with the IFRAME override cookie blocking?	NO	NO	NO	n/a	NO	YES^*	n/a	NO	n/a
Are third-party cookies permitted within same domain?	YES	YES	YES	n/a	YES	YES	n/a	YES	n/a
Behavior of minimal domains in ccTLDs (3 tests)	1/3 FAIL	1/3 FAIL	3/3 PASS	n/a	3/3 PASS	1/3 FAIL	n/a	3/3 PASS	n/a

^* This includes script-initiated form submissions.

Content handling mechanisms

The task of detecting and handling various file types and encoding schemes is one of the most hairy and broken mechanisms in modern web browsers. This situation stems from the fact that for a longer while, virtually all browser vendors were trying to both ensure backward compatibility with HTTP/0.9 servers (the protocol included absolutely no metadata describing any of the content returned to clients), and compensate for incorrectly configured HTTP/1.x servers that would return HTML documents with nonsensical Content-Type values, or unspecified character sets. In fact, having as many content detection hacks as possible would be perceived as a competitive advantage: the user would not care whose fault it was, if example.com rendered correctly in Internet Explorer, but not open in Netscape browser - Internet Explorer would be the winner.

As a result, each browser accumulated a unique and very poorly documented set of obscure content sniffing quirks that - because of no pressure on site owners to correct the underlying configuration errors - are now required to keep compatibility with existing content, or at least appear to be risky to remove or tamper with.

Unfortunately, all these design decisions preceded the arrival of complex and sensitive web applications that would host user content - be it baby photos or videos, rich documents, source code files, or even binary blobs of unknown structure (mail attachments). Because of the limitations of same-origin policies, these very applications would critically depend on having the ability to reliably and accurately instruct the browser on how to handle such data, without ever being second-guessed and having what meant to be an image rendered as HTML - and no mechanism to ensure this would be available.

This section includes a quick survey of key file handling properties and implementation differences seen on the market today.

Survey of content sniffing behaviors

The first and only method for web servers to clearly indicate the purpose of a particular hosted resource is through the Content-Type response header. This header should contain a standard MIME specification of document type - such as image/jpeg or text/html - along with some optional information, such as the character set. In theory, this is a simple and bullet-proof mechanism. In practice, not very much so.

The first problem is that - as noted on several occasions already - when loading many types of sub-resources, most notably for <OBJECT>, <EMBED>, <APPLET>, <SCRIPT>, <IMG>, <LINK REL="...">, or <BGSOUND> tags, as well as when requesting some plugin-specific, security-relevant data, the recipient would flat out ignore any values in Content-Type and Content-Disposition headers (or, amusingly, even HTTP status codes). Instead, the mechanism typically employed to interpret the data is as follows:

General class of the loaded sub-resource is derived from tag type. For example, <IMG> narrows the options down to a handful of internally supported image formats; and <EMBED> permits only non-native plugins to be invoked. Depending on tag type, a different code path is typically taken, and so it is impossible for <IMG> to load a Flash game, or <EMBED> to display a JPEG image.

The exact type of the resource is then decided based on MIME type hints provided in the markup, if supported in this particular case. For example, <EMBED> permits a TYPE= parameter to be specified to identify the exact plugin to which the data should be routed. Some tags, such as <IMG>, offer no provisions to provide any hints as to the exact image format used, however.

Any remaining ambiguity is then resolved in an implementation- and case-specific manner. For example, if TYPE= parameter is missing on <EMBED>, server-returned Content-Type may be finally examined and compared with the types registered by known plugins. On the other hand, on <IMG>, the distinction between JPEG and GIF would be made solely by inspecting the returned payload, rather than interpreting HTTP headers.

This mechanism makes it impossible for any server to opt out from having its responses passed to a variety of unexpected client-side interpreters, if any third-party page decides to do so. In many cases, misrouting the data in this manner is harmless - for example, while it is possible to construct a quasi-valid HTML document that also passes off as an image, and then load it via <IMG> tag, there is little or no security risk in allowing such a behavior. Some specific scenarios pose a major hazard, however: one such example is the infamous GIFAR flaw, where well-formed, user-supplied images could be also interpreted as Java code, and executed in the security context of the serving party.

The other problem is that although Content-Type is generally honored for any top-level content displayed in browser windows or within <IFRAME> tags, browsers are prone to second-guessing the intent of a serving party, based on factors that could be easily triggered by the attacker. Whenever any user-controlled file that never meant to be interpreted as HTML is nevertheless displayed this way, an obvious security risk arises: any JavaScript embedded therein would execute in the security context of the hosting domain.

The exact logic implemented here is usually contrived and as poorly documented - but based on our current knowledge, could be generalized as:

If HTTP Content-Type header (or other origin-provided MIME type information) is available and parses cleanly, it is used as a starting point for further analysis. The syntax for Content-Type values is only vaguely outlined in RFC 2045, but generally the value should match a regex of "[a-z0-9\-]+/[a-z0-9\-]+" to work properly.Note that protocols such as javascript:, file://, or ftp:// do not carry any associated MIME type information, and hence will not satisfy this requirement. Among other things, this property causes the behavior of downloaded files to be potentially very different from that of the same files served over HTTP.

If Content-Type data is not available or did not parse, most browsers would try to guess how to handle the document, based on implementation- and case-specific procedures, such as scanning the first few hundred bytes of a resource, or examining apparent file extension on the end of URL path (or in query parameters), then matching it against system-wide list (/etc/mailcap, Windows registry, etc), or a builtin set of rules.Note that due to mechanisms such as PATH_INFO, mod_rewrite, and other server and application design decisions, the apparent path - used as a content sniffing signal - may often contain bogus, attacker-controlled segments.

If Content-Type matches one of generic values, such as application/octet-stream, application/unknown, or even text/plain, many browsers treat this as a permission to second-guess the value based on the aforementioned signals, and try to come up with something more specific. The rationale for this step is that some badly configured web servers fall back to these types on all returned content.

If Content-Type is valid but not recognized - for example, not handled by the browser internally, not registered by any plugins, and not seen in system registry - some browsers may again attempt to second-guess how to handle the resource, based on a more conservative set of rules.

For certain Content-Type values, browser-specific quirks may also kick in. For example, Microsoft Internet Explorer 6 would try to detect HTML on any image/png responses, even if a valid PNG signature is found (this was recently fixed).

At this point, the content is either routed to the appropriate renderer, or triggers an open / download prompt if no method to internally handle the data could be located. If the appropriate parser does not recognize the payload, or detects errors, it may cause the browser to revert to last-resort content sniffing, however.
An important caveat is that if Content-Type indicates any of XML document varieties, the content may be routed to a general XML parser and interpreted in a manner inconsistent with the apparent Content-Type intent. For example, image/svg+xml may be rendered as XHTML, depending on top-level or nested XML namespace definitions, despite the fact that Content-Type clearly states a different purpose.

As it is probably apparent by now, not much thought or standardization was given to browser behavior in these areas previously. To further complicate work, the documentation available on the web is often outdated, incomplete, or inaccurate (Firefox docs are an example). Following widespread complaints, current HTML 5 drafts attempt to take at least some content handling considerations into account - although these rules are far from being comprehensive. Likewise, some improvements to specific browser implementations are being gradually introduced (e.g., image/* behavior changes), while other were resisted (e.g., fixing text/plain logic).

Some of the interesting corner cases of content sniffing behavior are captured below:

Test description	MSIE6	MSIE7	MSIE8	FF2	FF3	Safari	Opera	Chrome	Android
Is HTML sniffed when no `Content-Type` received?	YES	YES	YES	YES	YES	YES	YES	YES	YES
Content sniffing buffer size when no `Content-Type` seen	256 B	∞	∞	1 kB	1 kB	1 kB	~130 kB	1 kB	∞
Is HTML sniffed when a non-parseable `Content-Type` value received?	NO	NO	NO	YES	YES	NO	YES	YES	YES
Is HTML sniffed on `application/octet-stream` documents?	YES	YES	YES	NO	NO	YES	YES	NO	NO
Is HTML sniffed on `application/binary` documents?	NO	NO	NO	NO	NO	NO	NO	NO	NO
Is HTML sniffed on `unknown/unknown` documents?	NO	NO	NO	NO	NO	NO	NO	YES	NO
Is HTML sniffed on MIME types not known to browser?	NO	NO	NO	NO	NO	NO	NO	NO	NO
Is HTML sniffed on unknown MIME when `.html`, `.xml`, or `.txt` seen in URL parameters?	YES	NO	NO	NO	NO	NO	NO	NO	NO
Is HTML sniffed on unknown MIME when `.html`, `.xml`, or `.txt` seen in URL path?	YES	YES	YES	NO	NO	NO	NO	NO	NO
Is HTML sniffed on `text/plain` documents (with or without file extension in URL)?	YES	YES	YES	NO	NO	YES	NO	NO	NO
Is HTML sniffed on GIF served as `image/jpeg`?	YES	YES	NO	NO	NO	NO	NO	NO	NO
Is HTML sniffed on corrupted images?	YES	YES	NO	NO	NO	NO	NO	NO	NO
Content sniffing buffer size for second-guessing MIME type	256 B	256 B	256 B	n/a	n/a	∞	n/a	n/a	n/a
May image/svg+xml document contain HTML xmlns payload?	(YES)	(YES)	(YES)	YES	YES	YES	YES	YES	(YES)
HTTP error codes ignored on sub-resources?	YES	YES	YES	YES	YES	YES	YES	YES	YES

In addition, the behavior for non-HTML resources is as follows (to test for these, please put sample HTML inside two files with extensions of .TXT and .UNKNOWN, then attempt to access them through the browser):

Test description	MSIE6	MSIE7	MSIE8	FF2	FF3	Safari	Opera	Chrome	Android
File type detection for `ftp://` resources	content sniffing	content sniffing	content sniffing	content sniffing	content sniffing	content sniffing	extension matching	content sniffing	n/a
File type detection for `file://` resources	content sniffing	sniffing w/o HTML	sniffing w/o HTML	content sniffing	content sniffing	content sniffing	content sniffing	extension matching	n/a

Microsoft Internet Explorer 8 gives an option to override some of its quirky content sniffing logic with a new X-Content-Type-Options: nosniff option (reference). Unfortunately, the feature is somewhat counterintuitive, disabling not only dangerous sniffing scenarios, but also some of the image-related logic; and has no effect on plugin-handled data.

An interesting study of content sniffing signatures is given on this page.

Views: 8555 | Added by: b1zz4rd | Rating: 0.0/0

Total comments: 0

« February 2010 »
Su	Mo	Tu	We	Th	Fr	Sa
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28


Name *:
Email *:

Code *:

Login:
Password: