As
the initial HTML designs evolved to include a growing body of eye
candy, the language ended up mixing very specific visual presentation
cues with content classification directives; for example, a similar
inline syntax would denote that the following piece of text is an
element of a numbered list, or that it needs to be shown in 72 pt Comic Sans
font. This notation, although convenient to use on new pages, made it
very difficult to automatically adjust appearance of a document without
altering its structure (for example to account for mobile devices,
printable views, or accessibility requirements), or to accurately
preserve document structure when moving the data to non-HTML media or
new site layouts. Cascading Style Sheets
is a simple concept that corrects this problem, and also provides a
much more uniform and featured set of tools to alter the visual
appearance of any portion of the document, far surpassing the original
set of kludgy HTML tag attributes. A stylesheet outlines visual
rendering rules for various functional types of document elements, such
as lists, tables, links, or quotations, using a separate block of data
with a relatively simple syntax. Although the idea of style sheets as
such predates HTML, the current design used for the web stabilized in
the form of W3C proposals only around 1998 - and because of the complex
changes required to renderers, it took several years for reasonably
robust implementations to become widespread. There are three distinct ways to place CSS directives in HTML documents: - The use of an inline STYLE="..."
parameter attached to HTML tags of any type; attributes specified this
way apply to this and nested tags only (and largely defeat the purpose
of the entire scheme when it comes to making it easy to alter the
appearance of a document),
- Introduction of a block of CSS code with <STYLE>...</STYLE>
in any portion of the document. This block may change the default
appearance of any tag, or define named rulesets that may be explicitly
applied to specific tags with a CLASS="..." parameter,
- Inclusion of a remote stylesheet with a <LINK REL="stylesheet" HREF="...">, with the same global effect as a <STYLE> block.
In
the first two modes, the stylesheet generally inherits the character
set of its host document, and is interpreted accordingly; in the last
mode, character set might be derived from Content-Type headers, @charset directives, or auto-detected if all other options fail (reference). Because
CSS provides a powerful and standardized method to control the visual
appearance of a block of HTML, many applications strive to let third
parties control a subset of CSS syntax emitted in documents.
Unfortunately, the task is fairly tricky; the most important security
consequences of attacker-controlled stylesheets are: - The risk of JavaScript execution. As a little-known feature, some CSS implementations permit JavaScript code to be embedded in stylesheets. There are at least three ways to achieve this goal: by using the expression(...) directive, which gives the ability to evaluate arbitrary JavaScript statements and use their value as a CSS parameter; by using the url('javascript:...') directive on properties that support it; or by invoking browser-specific features such as the -moz-binding mechanism of Firefox.
- The ability to freely position text.
If user-controlled stylesheets are permitted on a page, various
powerful CSS positioning directives may be invoked to move text outside
the bounds of its current container, and mimick trusted UI elements or
approximate them very accurately. Some examples of absolute positioning
directives include z-index, margin, padding, bottom, left, position, right, top, or text-indent (many of them with sub-variants, such as margin-left).
- The ability to reuse trusted classes. If user-controlled CLASS="..."
attributes are permitted in HTML syntax, the attacker may have luck
"borrowing" a class used to render elements of the trusted UI and
impersonate them.
Much like JavaScript, stylesheets are also tricky to sanitize, because they follow CDATA-style parsing: a literal sequence of </STYLE>
ends the stylesheet regardless of its location within the CSS syntax as
such. For example, the following stylesheet would be ended prematurely,
and lead to JavaScript code being executed: <STYLE> body { background-image: url('http://example.com/foo.jpg?</STYLE><SCRIPT>alert(1)</SCRIPT>'); } </STYLE> Another interesting property specific to <STYLE>
handling is that when two CDATA-like types overlap, no particular
outcome is guaranteed; for example, the following may be interpreted as
a script, or as an empty stylesheet, depending on the browser: <STYLE> <!-- </STYLE><SCRIPT>alert(1)</SCRIPT> --> </STYLE> Yet another characteristic that sets stylesheets apart from JavaScript
is the fact that although CSS parser is very strict, a syntax error
does not cause it to bail out completely. Instead, a recovery from the
next top-level syntax element is attempted. This makes handling
user-controlled strings even harder - for example, if a stray newline
in user-supplied string is not properly escaped, it would actually
permit the attacker to freely tinker with the stylesheet in most
browsers: <STYLE> .example { content: '*** USER STRING START *** } .example { color: red; } .bar { *** USER STRING END ***'; } </STYLE>
<SPAN CLASS="example">Hello world (in red)!</SPAN> Additional examples along these lines are explored in more detail on this page. Several fundamental differences in style parsing between common browsers are outlined below: Test description | MSIE6 | MSIE7 | MSIE8 | FF2 | FF3 | Safari | Opera | Chrome | Android | Is JavaScript expression(...) supported? | YES | YES | YES | NO | NO | NO | NO | NO | NO | Is script-targeted url(...) supported? | YES | NO | NO | NO | NO | NO | NO | NO | NO | Is script-executing -moz-binding supported? | NO | NO | NO | YES | NO | NO | NO | NO | NO | Does </STYLE> take precedence over comment block parsing? | NO | NO | NO | YES | YES | NO | NO | NO | NO | Characters permitted as CSS field-value separators (excluding \t \r \n \x20) | \x0B \x0C \ \xA0 | \x0B \x0C \ \xA0 | \x0B \x0C \ \xA0 | \x0B \x0C \ | \x0C \ | \x0C \ | \x0C \ \xA0 | \x0C \ \xA0 | \x0C \ |
In many cases, as with JavaScript,
there is a need for web applications to render certain user-supplied
user-controlled strings within stylesheets in a safe manner. To handle
various reserved characters, a method for escaping potentially
troublesome values is required; confusingly, however, CSS format
supports neither HTML entity encoding, nor any of the common methods of
encoding characters seen in JavaScript. Instead, a rather unusual and incompatible scheme consisting of \ followed by non-prefixed, variable length one- to six-digit hexadecimal is employed; for example, "test" may be encoded as "t\65st" or "t\000065st" - but not as t\est", "t\x65st", "t\u0065st", nor "test" (reference). A
very important and little-known oddity unique to CSS parsing is that
escape sequences are also accepted outside strings, and confusingly,
may substitute some syntax control characters in the stylesheet; so for
example, color: red and color: \072\065\064 have the same meaning - and so does color: expression(alert(1)) and color: expression\028\061lert\028\031\029\029. To add insult to injury, syntax such as color: \027red' or color: 'red\027 would not work, but color: \027red\027 is OK. With the exception of Internet Explorer, stray multi-line string literals are not supported; but a lone \ at the end of a line may be used to seamlessly break long lines. HTML
aside, modern browser renderers usually natively support an additional
set of media formats that may be displayed as standalone documents.
These can be generally divided into two groups: - Rich data formats. This category is primarily populated by non-HTML XML namespace parsers (SVG, RSS, Atom);
beyond raw data, these document formats contain various rendering
instructions, hints, or conditionals. Because of how XML works, each of
the XML-based formats has two important security consequences:
- Firstly,
nested XML namespaces may be defined, and are usually not verified
against MIME type intents, permitting HTML to be embedded for example
inside image/svg+xml.
- Secondly, these formats may actually come with provisions for non-standard embedded HTML or JavaScript
payloads or scripts built in, permitting HTML injection even if the
attacker has no direct control over XML document structure.
One example of a document that, even if served as image/svg+xml, would still execute scripts in many current browsers despite MIME type clearly stating a different intent, is as follows: xml version="1.0"?> <container> <svg xmlns="http://www.w3.org/2000/svg"> [...] </svg> <html xmlns="http://www.w3.org/1999/xhtml"> <script>alert('Hello world!')</script> </html> </container> Furthermore, SVG natively permits embedded scripts
and event handlers; in all browsers that support SVG, these scripts
execute when the image is loaded as a top-level document - but are
ignored when rendered through <IMG> tags. Some of the non-HTML builtin document type behaviors are documented below: Test description | MSIE6 | MSIE7 | MSIE8 | FF2 | FF3 | Safari | Opera | Chrome | Android | Supported bitmap formats (excluding JPG, GIF, PNG) | BMP ICO WMF | BMP ICO WMF | BMP ICO WMF | BMP ICO TGA* | BMP ICO TGA* | BMP TIF | BMP* | BMP ICO | BMP ICO | Is generic XML document support present? | YES | YES | YES | YES | YES | YES | YES | YES | YES | Is RSS feed support present? | NO | YES | YES | YES | YES | YES | YES | NO | NO | Is ATOM feed support present | NO | YES | YES | YES | YES | YES | YES | NO | NO | Does JavaScript execute within feeds? | (YES) | NO | NO | NO | NO | NO | NO | (YES) | (YES) | Are javascript: or data: URLs permitted in feeds? | n/a | NO | NO | NO | NO | YES | YES | n/a | n/a | Are CSS specifications permitted in feeds? | n/a | NO | YES | YES | YES | NO | YES | n/a | n/a | Is SVG image support present? | NO | NO | NO | YES | YES | YES | YES | YES | NO | May image/svg+xml document contain HTML xmlns payload? | (YES) | (YES) | (YES) | YES | YES | YES | YES | YES | (YES) |
* Format support limited, inconsistent, or broken. Trivia: curiously, Microsoft's XML-based Vector Markup Language
(VML) is not natively supported by the renderer, and rather implemented
as a plugin; whereas Scalable Vector Graphics (SVG) is implemented as a
core renderer component in all browsers that support it. Unlike
the lean and well-defined set of natively supported document formats,
the landscape of browser plugins is extremely diverse, hairy, and
evolving very quickly. Most of the common content-rendering plugins are
invoked through the use of <OBJECT> or <EMBED> tags (or <APPLET>, for Java), but other types of integration may obviously take place. Common document-embeddable plugins can be generally divided into several primary categories: - Web programming languages. This includes technologies such as Adobe Flash, multi-vendor Java, or Microsoft Silverlight,
together present at a vast majority of desktop computers. These
languages generally permit extensive scripting, including access to the
top-level document and other DOM information, or the ability to send
network requests to at least some locations. The security models
implemented by these plugins generally diverge from the models of the
browser itself in sometimes counterintuitive or underspecified ways
(discussed in more detail later on).This
property makes such web development technologies a major attack surface
by themselves if used legitimately. It also creates a security risk
whenever a desire to user content arises, as extra steps must be taken
to make sure the data could never be accidentally interpreted as a
valid input to any of the popular plugins.
- Drop-in integration for non-HTML document formats. Some popular document editors or viewers, including Acrobat Reader and Microsoft Office,
add the ability to embed their supported document formats as objects on
HTML pages, or to view content, full-window, directly within the
browser. The interesting property of this mechanism is that many such
document formats either feature scripting capabilities, or offer
interactive features (e.g., clickable links) that may result in data
passed back to the browser in unusual or unexpected ways. For example,
methods exist to navigate browser's window to javascript: URLs from PDF documents served inline - and this code will execute in the security context of the hosting domain.
- Specialized HTML-integrated markup languages. In addition to kludgy drop-in integration for document formats very different from HTML, specialized markup languages such as VRML, VML, or MathML,
are also supported in some browsers through plugins, and meant to
discretely supplement regular HTML documents; these may enable HTML
smuggling vectors similar to those of renderer-handled XML formats (see
previous section).
- Rich multimedia formats.
A variety of plugins brings the ability to play video and audio clips
directly within the browser, without opening a new media player window.
Plugin-driven browser integration is offered by almost all contemporary
media players, including Windows Media Player, QuickTime, RealPlayer, or VLC,
again making it a widely available capability. Most of such formats
bear no immediate security consequences for the hosting party per se,
although in practice, this type of integration is plagued by
implementation-specific bugs.
- Specialized data manipulation widgets. This includes features such as DHTML ActiveX editing gizmos (2D360201-FFF5-11D1-8D03-00A0C959BC0A). Some such plugins ship with Windows and are marked as safe despite receiving very little security scrutiny.Trivia: there is reportedly a sizeable and generally underresearched market of plugin-based crypto clients implemented as ActiveX controls in certain Asian countries.
One
of the more important security properties of object-embedding tags is
that like with many of their counterparts intended for embedding other
non-HTML content, no particular attention is paid to server-supplied Content-Type and Content-Disposition
headers, despite the fact that many embedded objects may in fact
execute in a security context related to the domain that served them. The precedence of inputs for deciding how to interpret the content in various browsers is as follows: Input signal | MSIE6 | MSIE7 | MSIE8 | FF2 | FF3 | Safari | Opera | Chrome | Android | Tag type and TYPE= / CLASSID= values | #1 | #1 | #1 | #1 | #1 | #1 | #1 | #1 | n/a | Content-Type value if TYPE= is not recognized | ignored | ignored | ignored | ignored | ignored | ignored | ignored | ignored | n/a | Content-Type=value if TYPE= is missing | #2 | #2 | #2 | #2 | #2 | #2 | #2 | #2 | n/a | Content sniffing if TYPE= is not recognized | ignored | ignored | ignored | ignored | ignored | ignored | ignored | ignored | n/a | Content sniffing if TYPE= is missing | ignored | ignored | ignored | ignored | ignored | (#3) | ignored | (#3) | n/a |
The
approach to determining how to handle the data with no attention to the
intent indicated by the hosting server is a problematic design concept,
as it makes it impossible for servers to opt out from having any
particular resource being treated as a plugin-interpreted document in
response to actions of a malicious third-party site. In conjunction
with lax syntax parsers within browser plugins, this resulted in a
number of long-standing vulnerabilities, such as valid Java archive
files (JARs) doubling as valid images; or, more recently, Flash applets doing the same. Another
interesting property worth noting here is that many plugins have their
own HTTP stacks and caching systems, which makes them a good way to
circumvent browser's privacy settings; Metasploit maintains an example site illustrating the problem. (Continue to browser security features...)
|