How HTML Encoding prevents XSS

Cross-Site Scripting (XSS) is a security vulnerability that allows an attacker to execute malicious Javascript code on a victim’s browser.

HTML Encoding is one the most common method used to prevent XSS vulnerability. It is an effective and easy to implement fix which can protect web applications from malicious Javascript payloads used to perform XSS attack. Although in certain scenarios your application may still be exposed to XSS attacks even with HTML Encoding implementation.

Consider a scenario where you are trying to search something on a vulnerable website whose URL looks like this:

vulnerable.site/?q=hacktest

and the server returns the following HTML code fragment:

<p>You searched for 'hacktest':</p>

As it can be observed that value of parameter ‘q’ is inserted in the page and returned by the server. Now an attacker can try to perform XSS as following:

vulnerable.site/?q=%3Cscript%3Ealert(“XSS”);%3C/script%3E

The server will return a page with following code fragment:

<p>You searched for '<script>alert("XSS");</script>':</p>

The <script> tag will execute the Javascript code and cause a pop-up with the message ‘XSS’.

Now, to prevent these type of attacks from happening, HTML encoding on output is implemented. This is done by HTML Encoding all potentially dangerous data such as user input stored on server or database before sending them to the client’s browser.

Considering the above examples again, if an attacker sent a malicious parameter value as:

vulnerable.site/?q=%3Cscript%3Ealert(“XSS”);%3C/script%3E

Then the web application was returning the following code fragment along with a pop-up:

<p>You searched for '<script>alert("XSS");</script>':</p>

If HTML Encoding on output is implemented on the same server, it will respond with a different code fragment which is:

<p>You searched for '&lt;script&gt;alert('XSS');&lt;/script&gt;':</p>

As we can see the angle brackets were replaced with their HTML Encoded characters (Encoding Angle Brackets only) which breaks the <script> tag definition hence preventing the script from executing.

The logic behind is that although the browser will automatically decode the &lt; to show < sign and &gt; as > sign, this decoded value will now be treated as plain text rather than active code content and as we know to insert <script> tag we need to provide angle brackets.

**The browser automatically decodes the value of an attribute or text content of an element.

Leave a comment

Blog at WordPress.com.