Web Security Training

Navigating the web security landscape

Navigating the web security landscape

Article

Leveraging 20-year old technology to build more secure Web applications

Ever thought about the security consequences of including JavaScript files from just about anywhere? Or why Cross-Site Scripting attacks are so dangerous? It all comes down to the core security model of the browser, where resources from different origins are separated from each other by the Same-Origin Policy. An understanding of the Same-Origin Policy, the protection it offers and, most importantly, its limits, is crucial for building secure Web applications. In this post, we look into the history of the Same-Origin Policy, and we show how it falls short of protecting Web sites from malicious, third-party code. We conclude with concrete advice on how to integrate third-party code in a more secure way, something you should take to heart.

So, what’s the story behind the Same-Origin Policy? For that, we need to take a trip down memory lane, all the way back to 1995. Going online in 1995 involved a lot of screeching noises from a modem, a lot of waiting for stuff to load, and websites looked vastly different — And if you don’t believe me, check out a a few blasts from the past. 1995 was also the year that Netscape Navigator 2 was released, back then by far the most popular browser. Surprisingly, this ancient browser from 1995 included myriad features that are still present in modern day browsers. A few examples are JavaScript support, HTML frames, and plugin support (e.g. Macromedia Shockwave, the predecessor to Flash). Now, 20 years later, we all know that these features have had a major impact on the evolution of the Web, in more ways that you might think at first glance.

A screenshot of Netscape Navigator 2, courtesy of Wikipedia

Naturally, the strongest impact of these features has been on the user experience of the Web. Less obvious is the impact on the security model of the Web as we know it today. By introducing the capability to frame other pages within a page, and to use a scripting language to inspect and modify page contents, a potential security risk has been created.

Imagine the consequences if there were no restrictions on how different contexts can interact with each other? In such a world, a Web site could easily include any other page in a frame, and use JavaScript to inspect and modify its contents. Definitely not something you want on the Web, with plenty of sensitive pages, such as login forms.

Without the Same-Origin Policy, a page could easily include another page in a frame, and start inspecting its contents. Stealing a username and password would be child’s play. (Disclaimer: no Google pages were harmed making this image)

Fortunately, such scenarios are not possible, because the browser is built to isolate contexts from each other by default. This behavior is known as the Same-Origin Policy (SOP), and has been built into browsers since the first introduction of JavaScript and frames, back in 1995.

Origin-based Isolation

The essence of the SOP is actually very straightforward. Every context in the browser is assigned an origin, which is defined as the triple (scheme, host, port), and is derived from the URL. For example, the origin associated with the page you are currently reading is (https, www.websec.be, 443). The SOP dictates that only contexts that have the same origin are allowed to interact freely. Interactions between contexts that have different origins will very constrained, and are essentially limited to an opt-in message passing mechanism. In practice, this means that the interactions in our illustration from before will be prevented by the SOP. The SOP will not prevent the framing, but it will prevent the top level page from reaching into the frame, and inspecting its contents. A violation of the SOP will cause the browser to throw an error, as shown below. In a nutshell, the SOP ensures that within the browser, nobody from outside your origin can inspect and modify your pages.

Attempts to breach the Same-Origin Policy are met with an error thrown by the browser

That’s exactly what the SOP was all about back in 1995, but a lot has changed since then. As already stated in the beginning of this post, the SOP is still one of the most relevant security policies in the Web’s architecture. The reason that the SOP is still so relevant and so important, is that modern browsers associate a lot more than a page and its contents with an origin. Newly introduced client-side storage features, such as Web Storage, Indexed Database and the File API, isolate the stored data per origin. Each origin has its own little data store, and there is no way to access another origin’s data. Similarly, permissions granted by the user to retrieve location information, to enable full-screen mode and to record audio and video are associated with an origin. Any context associated with such an origin can take advantage of these granted permissions, while other origins can not. Additionally, the ability to make network requests from JavaScript using the XMLHttpRequest object is also constrained by the SOP. Same-origin connections are not subject to any constraints, while cross-origin requests are subject to the rules defined by the Cross-Origin Resource Sharinghttps://www.w3.org/TR/cors/) policy.

As you can see, an origin has become a primary security principal within the browser, and is used in access control decisions to various sensitive resources. Fortunately, the SOP is there to prevent unauthorized access from contexts with another origin. However, there is one important caveat: the SOP only applies to interactions between contexts with different origins, and not to interactions within one origin. Translated to HTML lingo, this means that the SOP is enforced on interactions between windows and frames, but not on interactions between an HTML document and its styles or scripts.

Script Madness

When an HTML page includes a script tag, either with inline code or from a JavaScript file, the code is loaded and executed within the context of that page. This means that the script runs within the page’s origin, where it has full access to that origin’s resources, can take full advantage of the permissions assigned to that origin, and has the possibility to send any kind of XMLHttpRequest to origin’s backend. This is exactly the reason why Cross-Site Scripting (XSS) is so dangerous: an attacker able to include malicious scripts into your origin gets full access, without restrictions.

Using the script mechanism in this way was perfectly legitimate in 1995, since Web sites only relied on their own script files to enable dynamic behavior. However, in the past 10 years, the way we develop Web applications has changed drastically. With the introduction of JQuery in 1995, and plenty of other libraries since, we started integrating third-party libraries into our applications. A modern Web application today depends on dozens of external libraries, who load millions of lines of code, all directly into your origin. One might wonder whether that is still a good idea…

While these dependencies introduce a certain security risk, one could argue that a similar risk can be found in other development environments, such as a Maven project that includes various libraries. This argumentation actually makes sense for essential libraries that belong to the core of your application, especially because we have long passed the point where you could do it all yourself, without relying on third-party libraries. Including libraries into your origin to build better Web applications has become the default way of doing things. Fortunately, security initiatives such as Subresource Integrity allow you to verify the integrity of included libraries coming from a CDN.

There is however another script inclusion pattern that is at least as common as the inclusion of libraries, and, unfortunately, significantly more dangerous: the inclusion of non-essential components through as script files. Primary examples are advertisements, social media buttons, discussion widgets, etc. Virtually all of these are easily included by copy/pasting a few lines of HTML, which will in turn load a piece of JavaScript, which then starts loading content and injecting that into the page. You can add every piece of functionality with virtually no effort … Cool, right?

Well, from a development perspective, this is definitely a major step forward. However, from a security perspective, this quickly becomes a nightmare, as sites have started to include third-party code and components from everywhere. A study from 2012 shows that of the Alexa top 10,000, at least 88.45% includes scripts from one remote host, and one site even from 295 remote hosts. This behavior is worrisome, since you have absolutely no control over what code will be included into your origin. where it has full access to your origin, all its associated resources and permissions.

A good example to illustrate the dangers are advertisements. If you include an advertisement through an advertising network, you actually include a piece of code from the advertising network. This network code will then fetch the actual advertisements that have been supplied by their clients, and include them in your page. These advertisements can contain just about anything, ranging from Flash files to JavaScript code. And if you are thinking something along the lines of “yeah, but they will check the contents of these advertisements, it’ll be fine”, then you’re gonna have a bad time.

The list of cases where things go wrong is endless. There’s even a specific term for malicious advertisements: malvertisements. I’m only including a few highlights here:

By now, we’ve established that including third-party scripts directly into your origin is not such a great idea, regardless whether it are advertisements, widgets or something else. The next section covers a few alternative integration strategies, where you leverage the protection of the SOP to isolate the third-party code from your own.

Effective Content Isolation

The most effective way to prevent third-party content from taking advantage of your origin’s resources or permissions is to avoid including it directly into your origin. By loading the third-party content into a frame with a different origin, you essentially leverage the protection of the SOP to isolate the content from your own origin. This approach is well suited for isolating components such as a chat widget or a social media timeline, which do not really need context information from the page itself. An illustration of this approach in practice is offered by Dropbox. On their main website, they include a support chat widget, offered by a third-party provider. Since they deem the risk of including third-party code in their main origin unacceptable, they isolate the content in an iframe. To enable communication between the main page and the widget, they use the Web Messaging specification, which offers an opt-in communication mechanism between contexts. This approach is the recommended way of included third-party components, and actually sounds a lot harder than it actually is!

If you’re ready to take frame-based isolation a step further, then you’ll want to hear about the HTML5 sandbox. This sandbox lets you put additional restrictions on content running in a frame, allowing you to apply the principle of least privilege. If you enable the sandbox by setting the sandbox attribute on an iframe element, you enable all the restrictions offered by the sandbox, which include the restriction to autoplay audio/video, to submit forms, to run scripts, to load plugin content, … Most of these restrictions can be re-enabled selectively by adding options to the sandbox attribute, as explained in this tutorial. One particularly interesting feature is the possibility to assign a unique origin to a sandbox. This unique origin will never match any other existing origin, which effectively allows you to load potentially untrusted content from your own origin. A perfect mechanism to safely isolate untrusted content, such as a forum post that may be riddled with XSS attacks.

A third technique that can help you gain control over your own and third-party content is Content Security Policy (CSP). One of the main goals of CSP is to stop XSS attacks from being executed, as explained here. The CSP specification allows the server to define a policy, stating the source of remote content (e.g. scripts, styles, images, …), and the destination of outgoing requests (e.g. form submissions, XMLHttpRequests, …). Essentially, CSP puts you in control over what happens on one of your pages, and allows you to block undesired behavior. Therefore, you can configure your CSP policy so that it allows the loading of the third-party code, but disallows the loading of additional, unknown files. This offers less security guarantees than origin-based content isolation, but can be viable alternative where the use of frames is out of the question.

Conclusion

Every time you include a third-party script into your origin, you’re enlarging the trusted computing base of your application. Every one of these files is an attack vector, and sneaking in malicious code through one of them is sufficient to take full control of your Web site, and all its associated resources.

Fortunately, you can leverage the protection of the Same-Origin policy to effectively isolate third-party components from the rest of your Web site. Further restrictions are available through the HTML5 sandbox attribute, and the incredibly powerful Content Security Policy.

Want to stay informed?

Subscribe to our mailing list and never miss an update or an event!

Comments & Discussion