Wednesday, February 25, 2009

Encoding is "the process of transforming information from one format into another" [Wikipedia]

In the web development world, when we talk about encoding text, we are normally talking about taking some input text and making it appropriate to use in a given context. For example, taking the user's first name and last name, and making it safe to put in a <b> tag within an html page.

We care about encoding most when we take input that we don't trust from our users - if we ever display that input we have to be careful to remove any characters that may interfere with the display of our web pages, cause javascript to run, or allow other malicious actions.

This article will help you understand what encoding is, why you need to do it and how that helps prevent cross-site scripting, and give a little introduction to the AntiXSS library.

A bold example

As a running example, let's say we are letting the user enter anything they want for their name - in an input box like this on our website:

Text box to collect name from the user

We then take the text they enter and store it in our database. Later on when we display it on the web page, we wrap the text in bold tags so that it stands out:

Welcome to the website, Kirk!

In ASP.NET one way of doing this would be to put an ASP.NET label between <b> tags:

Welcome to the website, <b><asp:Label ID="NameLabel" runat="server"></asp:Label></b>!

...and then in the code behind, take the name from our database and assign it to the Text property:

User user = GetFromDatabase();

NameLabel.Text = user.Name;

Trust no-one

The problem is, we've received this name directly from your user (who of course, you shouldn't trust), and we've stored it in a column in our database (which we now can't trust), and now we can't safely display it on our website without sanitising it or making it trust-worthy.

The number one lesson I try to give in my presentations on web security is "Don't trust...". You can't trust your user, you can't trust your employees, your students, or even your mother. There is no such thing as "safe input" that you receive over the Internet, everything you receive is suspect.

(Even people who are otherwise trustworthy might not be in control of their faculties if they have spyware or are virus-infected)

Everything is fine if the user enters only ascii characters:

User enters

But what happens if the user enters some html into the input box?

The user enters html, the page layout changes.

The user is now able to change how our page looks! Indeed, they can inject HTML, script or other content directly into pages on our website!

This is known as Cross-site scripting, or XSS, and is the bane of our existence as web developers.

What went wrong?

The ASP.NET label outputs the Text directly into the HTML output of the page:

    Welcome to the website, <b><span id="NameLabel">Kirk </b><i>Jackson</i></span></b>!

The problem here is that the ASP.NET label is not encoding the text before outputting it. The text is not appropriate to use in an HTML context, as it contains characters that have meaning in HTML (namely the characters making the </b> and <i> tags).

To make the user's name safe to use in an HTML context, we need to encode the inappropriate text to be safe in an HTML context:

Kirk &lt;/b>&lt;i>Jackson&lt;/i>

HTML Encoding

HTML encoding is turning a string into a safe block of text for insertion in an HTML web page.

This means it should not use any of the special characters that are used to mark the beginning or end of tags (< and >), attribute values (") or the ampersand character on it's own (&). If those characters are left in the string, then they could be used to start or stop HTML tags and change the behaviour of our page.

To remove these characters, HTML encoding requires them to be turned into character entity references, or numeric entity references. This stops them from being treated as special characters for formatting an HTML page, and just treats them as a character to be displayed.

Original character Character Entity Reference Numeric Entity Reference
< (less-than sign) &lt; &#60;
> (greater-than sign) &gt; &#62;
" (double quote) &quot; &#34;
& (ampersand) &amp; &#38;

The above table shows a few examples of how to encode special characters. For a more complete reference, see Wikipedia or W3C.

Note that since the ampersand character is used to start an encoded character sequence, it can't be used on it's own as a regular character. This is why ampersands should be encoded as &amp; in HTML.

Once the users name is encoded, it will then be in the HTML as &lt;i> instead of <i>, which means that in the above example, italic mode won't turn on:

The users text is now encoded correctly.

The screenshot above looks a little weird, but the page now displays the text exactly as the user typed it in, without treating the users input as special HTML markup.

Attribute Encoding

Attribute encoding is turning a string into a safe block of text for use within an attribute of an HTML tag.

Attributes are the name/value pairs on a tag node in HTML (or SGML and XML, for that matter). For example, in the following HTML, the a tag has a title attribute:

<a href="foo.html" title="test">thing</a>

The title tag is displayed as a tooltip

The text inside the title attribute is used to create a tool tip when the mouse pointer hovers over the hyperlink.

This HTML contains an a tag (an anchor tag), which has two attributes set: href and title. The a tag also contains some HTML within it: the text 'thing'. The contained text must be HTML encoded if you only want text within the a tag, and the two attributes must be attribute encoded.

At a simplistic level, text is valid inside an attribute as long as it doesn't contain double quotes ("), ampersands (&) or less-than symbols (<), as the double quote would prematurely end the attribute, and the other two characters must be encoded anywhere they are used within an HTML document (except when creating tags).

To extend our earlier example, imagine the users name is used as the tooltip of a link, to pop up before they follow the link. If we naively output the users name as a title attribute without encoding it, the user could inject some additional behaviour into our page. e.g.

<a href="foo.html" title="<%= User.Name %>">thing</a>

If the user enters something malicious, for example by entering a double-quote followed by some javascript, then they have managed to inject extra HTML or javascript behaviour into our site:

User enters script into Name field

The hover for the hyperlink looks okay, but when the user clicks the link, malicious javacript can run:

Malicious javascript running

This is because the HTML that we have sent to the clients browser actually contains an onclick attribute that we didn't intend:

<a href="foo.html" title="Kirk" onclick="alert('Hi')">thing</a>

Encoding the users data before sending it to the browser would have protected us from this, and then the HTML sent would look like this:

<a href="foo.html" title="Kirk&quot; onclick=&quot;alert('Hi')">thing</a>

Which correctly displays exactly what the user entered:

Tooltip now shows complete text entered

URL Encoding

URL encoding is turning a string into a safe block of text for appending on the query string of a URL.

The original specification for HTTP URL's (RFC 1738) specifies that URLs should only include certain characters, and all others must be encoded. This is similar to the case of HTML encoding, but there is a much smaller set of characters allowed, and the way you encode them is different.

To encode characters to append to a URL, you use a percentage symbol, followed by the two-digit hex number representing that character. For example:

Original character Character Entity Reference
space %20
/ (forward slash) %2F
" (double quote) %22
? (question mark) %3F

The above table shows a few examples of how to URL encode special characters. For a more complete reference, see Brian Wilson's URL Encoding page.

We need to encode strings before appending them to a URL, to make sure that untrusted input is not able to change the URL.

For example, if our page above constructed a URL to search Google for the name of the user entered into the website, it could look like this:

Construct a search url by joining two strings together

When the user clicks the link, they will search Google for their name.

Here the naive code is just constructing a url by joining the two strings together:

User user = GetFromDatabase();

string url = "" + user.Name;

But if a name with spaces is entered, then we're generating an invalid URL:

Create a url with spaces in it

The URL is invalid because it contains an illegal character - a space that should be encoded as %20.

We could also be opening our users up to cross-site scripting bugs, because we are effectively letting them create any url they want. For example:

Create a url with ampersands in it

Here we are appending the ampersand (&) that the user entered directly to the end of the url, so rather than their text being passed to the server as the "q" parameter, we're letting them add other query string parameters (in this case, the "I'm feeling lucky!" button). The solution in this case is to encode the ampersand as %26.

The AntiXSS library

The AntiXSS library (currently at version 3.0 beta) has been built by the Microsoft ACE Security and Performance Team [ooops! By the Connected Information Security Group, sorry!]

The library provides two related functions:

  • Encoding methods to make text safe for a variety of contexts
  • An HttpHandler to automatically encode your ASP.NET controls

I'll cover the Security Runtime Engine HttpHandler in another post.

The encoding methods have been built using more robust and secure coding practices than the existing methods in the HttpUtility class of the .NET framework, so you should use them in preference when encoding your data.

public class AntiXss
    public static string HtmlAttributeEncode(string input);
    public static string HtmlEncode(string input);
    public static string JavaScriptEncode(string input);
    public static string UrlEncode(string input);
    public static string VisualBasicScriptEncode(string input);
    public static string XmlAttributeEncode(string input);
    public static string XmlEncode(string input);

You need to decide which context you're outputting text, and then choose the appropriate method to encode the text.

  • HtmlEncode - use for all HTML output, except for when you're adding text inside an attribute of a tag (e.g. use for <b>...</b>)
  • HtmlAttributeEncode - use for text that will appear inside attributes of tags (e.g. <a title="...">)
  • UrlEncode - use for text that you are appending as a value in a url query string (e.g.
  • JavascriptEncode - use when you want to put the string into a javascript variable (e.g. var foo = '...'). This method will also create the surrounding quotes.
  • VisualBasicScriptEncode - use if you're unlucky enough to be creating pages with VBScript on them
  • XmlEncode, XmlAttributeEncode - the XML equivalents of the above HTML methods

To use inline in your ASPX page, you can call the library methods directly:

<a href="foo.html" title="<%= HttpUtility.HtmlAttributeEncode(User.Name) %>">thing</a>

To use from your code-behind, decide whether your control outputs it's content as an attribute or in an html context, and then call the appropriate method:

Label1.Text = AntiXss.HtmlEncode(User.Name);

Deciding which context you're in and which encoding method to use is a major annoyance, so be sure to look at the Security Runtime Engine which does it for you. I'll write more about that in a future blog post, so please subscribe to my RSS.

Hopefully this article has helped you understand what encoding is; why you need to encode untrusted input and how that helps prevent cross-site scripting; and has given a little intro to the AntiXSS library.


posted on Wednesday, February 25, 2009 3:57:16 PM (New Zealand Standard Time, UTC+12:00)  #    Comments [1]
 Saturday, February 21, 2009

Join The New Zealand Internet Blackout to protest against the Guilt Upon Accusation law 'Section 92A' that calls for internet disconnection based on accusations of copyright infringement without a trial and without any evidence held up to court scrutiny. This is due to come into effect on February 28th unless immediate action is taken by the National Party

It's not about downloading illegal content. Copyright laws exist for a reason, and protect creators of content (and even users of GPL software). It's about laws that have been drafted foolishly and that reduce our rights.


posted on Saturday, February 21, 2009 12:41:12 PM (New Zealand Standard Time, UTC+12:00)  #    Comments [0]
 Wednesday, February 18, 2009

Developer survey from Microsoft. Each answer you put in displays a different cartoon reflecting your choice. Fill in the survey here.


posted on Wednesday, February 18, 2009 9:34:37 PM (New Zealand Standard Time, UTC+12:00)  #    Comments [0]

I'll post the slides from my AntiXSS talk later, once I've cleaned them up. In the meantime, here's a couple of links:

I will post the slides later.


posted on Wednesday, February 18, 2009 9:20:27 PM (New Zealand Standard Time, UTC+12:00)  #    Comments [0]
 Friday, February 13, 2009

The twitter "don't click" messages are spreading like wildfire. It's a relatively benign form of clickjacking (analysis here) that tricks you into click a button when actually you're click on a hidden button on the twitter site that posts a tweet.

I've talked about clickjacking in Wellington, Auckland, Christchurch and Nelson, and while I don't know of a fool-proof way to protect yourself against click-jacking, you should do what twitter have done (and what I suggested at those talks) and include some frame-busting javascript at the top of every page in your site. Details are here: Framebusting in Javascript

Frame-busting works by unwrapping your site from being hosted inside an iframe. It won't stop all click-jacking attacks, and it won't protect all users, but like many security mitigations it's about layering several 90% solutions on top of each other to protect your users and your websites.


posted on Friday, February 13, 2009 9:02:39 AM (New Zealand Standard Time, UTC+12:00)  #    Comments [0]
 Thursday, February 12, 2009

It was a nice sunny day in Nelson yesterday, and it was nice to have a little look at the scenery afterwards (thanks, Daniel!).

I presented a similar "Overcoming your web insecurity" talk that I gave in Auckland recently [slides], and it was good fun diving in to some depth in the extra time we had... hopefully I managed to scare some people!


Next Wednesday at the Wellington .NET Users Group, Owen Evans (who also works at Xero) and I will be presenting two sessions.

Owen will be doing a LINQ Refresher to get us up to speed with the LINQ syntax for selecting, grouping, where-ing and more.

I will be talking about the Anti-XSS library, which is now in beta. The library is pretty cool and helps a lot with encoding data before it ends up on your website :)

More details of the event are here: LINQ Refresher, Anti-XSS and SDE Libraries


Hope to see you on Wednesday!


posted on Thursday, February 12, 2009 10:09:55 PM (New Zealand Standard Time, UTC+12:00)  #    Comments [0]
 Friday, February 06, 2009

Oisín Grehan has a good list of the new cmdlets in PowerShell 2 (currently in CTP3 and the Windows 7 beta):

It's cool having a list of all 106 new cmdlets, including such useful ones as:

  • Test-Connection (ping)
  • ConvertFrom/To-CSV
  • Start/Stop/etc Jobs in the background
  • Get-Random (useful for drawing prize winners at user groups!)
  • ConvertTo-Xml

PowerShell 2 has a bunch of cool new features, and feels like it's getting real close now :)


posted on Friday, February 06, 2009 9:32:39 AM (New Zealand Standard Time, UTC+12:00)  #    Comments [0]

I've got the afternoon off work this Wednesday 11 Feb, and am popping over to Nelson to present on web security (details below).

I hope to see you there!


Daniel Ballinger wrote:
> Hi All,
> Kirk Jackson from the Wellington .NET user group will be in town on
> Wednesday the 11th of February and is giving a presentation.
> Title:
> Overcoming your web insecurity
> Abstract:

> As an ASP.NET developer, there are many things to think about while
> developing your web application. Come along to understand the
> fundamentals of developing a secure web application, and learn how to
> protect your site against the dangers of cross-site scripting, cross
> domain request forging and click-jacking.
> This session will be suitable for all levels of experience, and
> developers who use other web development platforms such as PHP or Java.

> Presenter:
> Kirk Jackson
> Useful links:
> - Kirk's blog
> - The home of Microsoft communities in New Zealand
> When:
> Wednesday 11th February 2009
> Gather at 2:50 pm, starting at 3:00 pm.
> Approximately 1 hour 15 minutes plus pizza afterward.
> Where:
> FuseIT Ltd,
> Ground Floor,
> 7 Forests Rd,
> Stoke,
> Nelson
> (Off Nayland Rd and behind Carters)
> or
> If you are parking on site, please use the parks marked FuseIT that
> are at the back of the site.
> Giveaways:
> A single copy Microsoft Office 2007 Professional
> Catering: Pizza & Drinks
> Door Charge: Free
> RSVP to me if you are going to attend so I can guesstimate the food
> and drink requirements.
> However, feel free to turn up on the day though if you can't commit at
> the moment.
> Please feel free to invite anyone who may be interested in attending.
> Cheers,
> Daniel
> Daniel Ballinger
> Developer
> FuseIT ™

posted on Friday, February 06, 2009 9:17:38 AM (New Zealand Standard Time, UTC+12:00)  #    Comments [0]