General security

How to filter user input: An overview

Patrick Lambert
December 21, 2012 by
Patrick Lambert

If you make web sites, online apps, or even just your own personal blog, chances are that you've heard the phrase "Don't trust user input!" This is one of the key security concepts about the Internet, and the failure of web developers to adhere to this principle is the number one reason sites get hacked, users get infected with malware, and web pages get defaced. There are many types of user input, some of which may not be all that obvious. There are also a lot of books and articles that focus on one particular aspect, such as cross-site scripting, cross-site request forgery, or SQL injections. Here, we're going to go over each type of user input and the basic security checks that you should do when you create any type of online piece of code. By following these simple concepts from the start of your development process all the way to the end, you can ensure that the result will be much safer from potential threats.

Cross-site scripting (XSS)

Cross-site scripting, referred to as XSS, is a fairly popular way for trolls and script kiddies to exploit a vulnerable web app, whether it be an online forum, a commenting system, or any site which accepts user input. XSS by itself doesn't exploit the actual server; instead it exploits other users of the site. Because the server isn't affected, a lot of developers used to consider this a low-priority issue. Unfortunately, as web apps have become more popular and more complex, the things that can be done with XSS have become more serious.

The basic idea behind XSS is to insert HTML or, more commonly, JavaScript tags into a place that should only accept text, or some other type of input. For example, let's take a simple commenting system that accepts a name, web site, email, and comment. Each of these fields is an input tags that accept user text. Then, when the user clicks on the submit button, the information is sent to the server and added to the database. Usually, you would expect the user to place some type of comment in this box, but instead they may try to add code:

If the server doesn't check for XSS, that code gets added to the database, and every time a new user goes to the site, their browser will actually execute the code. Solving this issue is really simple, but it needs to be implemented for every field. It also needs to be done on the server, not in JavaScript, because using tools like TamperData, a user can easily submit the information directly to the server, bypassing any kind of JavaScript verification code.

So all you need to do to solve this problem is look for four tags: <, >, %3C, %3E. The last two are the encoded versions of the first two tags. Be sure to check both upper and lower cases! By replacing them with &lt; and &gt; you tell the browser to print the actual tags, instead of interpreting what's inside as valid HTML.

Cross-site request forgeries (CSRF)

CSRF is a fairly new term that invokes a fairly ingenious way to exploit persistent logins. The idea is for an attacker to craft a malicious web site that will exploit another site where you may already be logged in. For example, let's say you go to your banking site at www.bank.com and stay logged in. Then, you browse the web and end up at a malicious page designed to attempt a CSRF on that bank's site. What the page will do is attempt to redirect your browser to a specific query on the bank's site. Your browser will try to load that resource, unknowingly triggering an action on the site, which will be sent with your own cookies for the bank. If you're still logged in, then that action will be valid.

For example, the site could have an img tag which, instead of containing the URL to an actual image, will try to load:

www.bank.com/?q=withdrawal&amount=2000

When your browser tries to load the image, it will instead load that URL. Assuming that this is the script that performs a withdrawal, you may end up losing money. One way that can help protect your site against this is very simple, and involves understanding the difference between GET and POST requests. When you create a form, you have to designate a method with which the form will be submitted to your script. If you use GET, the arguments will be added to the URL and will be susceptible to CSRF exploits. You should only use GET when you're actually retrieving information, or if you really need to send information to another site which uses this type of query. Otherwise, all inputs should be sent using POST. Note, however, that a malicious page can still get around this by using JavaScript to send POST content, so keeping track of each individual session with a session cookie is best, along with requesting a password confirmation for any sensitive account modification. Finally, it's a good idea to keep users logged into your site only for as long as it's reasonable. If you're making a forum, maybe it's fine to keep the cookie alive for a week. But if you're making a banking site, chances are it shouldn't be valid for more than an hour, at the most.

SQL injections

When we hear about web sites being hacked, the number one method that bad guys seem to use is SQL injections. This is a very powerful way to gain access to unauthorized information or to inject your own code into a site. Yet, it's a very easy thing to protect yourself against. Let's first see how a SQL injection works. Basically, all modern sites work off of a database. This means that all the content on the backend is stored in a separate place, usually a database server running MySQL, Microsoft SQL, Oracle, or any of the other database servers out there. Your site communicates with that database by connecting to it and then sending SQL commands. This is a very simple and efficient way to store and retrieve data, but it does have one huge drawback. By using those SQL commands, you can do anything you want with the data, very quickly. This means it's very important that only your scripts can send these commands. If a user can, somehow, make your script send a random query to the database, that's an SQL injection, and then all bets are off.

These injections are done by simply adding SQL commands to an input field. As your script tries to add that information to the database, or check the database for confirmation, the user can escape out of that initial command, and run their own queries. For example, let's say you have a login script that accepts a user name. Then, the script checks if the name is valid like this:

"SELECT * FROM users WHERE name='" . $input_name . "';";

This would be a fairly common way to check a database from a PHP script. The problem is that it's incredibly insecure. All the user has to do is use the following code:

This will escape the query thanks to the first ' character, and then check whether or not 1 = 1. Since one always equals one, that means the database will always say that this user is valid, and log the malicious user in. Now you may be tempted to add sanity checks and, if you're thorough, that may work. But the problem is that there are many different ways to inject SQL code, and you can't test for every possible combination. Consider the following equally valid SQL injection code:

You would be hard pressed to detect that. So instead, the way to defeat SQL injections is to create prepared statements. When you add user input to a database query, you should always do so in prepared statements, like this:

$stmt = $db->prepare("SELECT * FROM users WHERE name='?';");

$stmt->bindParam(1, $input_name);

$stmt->execute();

This is a PHP example that does the same thing as the statement above but which doesn't allow SQL injections to happen.

Sanitizing input types

We've seen the most common types of exploits when it comes to trusting user input, but there are many other variations. For example, a user may not be trying to actually exploit your server or users, but instead may be trying to spam them. Even if you filter out HTML tags, they may still be able to insert a lot of useless junk in your commenting scripts. Or, someone may want to exploit your code in another way. For example, let's say you create a banking app, and you have a variable that accepts an amount to withdraw. What happens if the user sends a negative value instead? If the script isn't checking for that, then it may end up adding money to the user's account. Or if your web app uses integer values for some type of ID system, what happens if the user puts in a very high value, bigger than 64bits, and tries to overflow your integer? Or what if they add text instead of a number?

This brings us to the basic concept of sanitizing input types. When you're making a web app, or writing any type of code that accepts user input, it's always useful to make sure the resulting values are close to what you expect. There are many checks you can do here. For example, if your ID numbers are always 8 characters long, then you could only take the first 8 letters from the input string. If you're expecting a positive number, then convert your input value to a number, and discard any negative values. If you're looking for a text based comment, then use a regexp function to remove any character that isn't a letter, number, space or punctuation. These little checks will ensure that many attempts will get thwarted.

Other tips

Finally, there are various other things you should consider when it comes to user input. One particularly risky thing you can do with a web app is to accept files from users. By allowing uploads, you're opening your server to many types of issues, but sometimes you may want users to be able to upload their own files. Maybe you want them to have their own profile pictures, or perhaps you want them to share videos to your site. Either way, special care should be taken here. The first thing you need to do is make sure you enforce a strict size limit. You don't want someone to upload huge files and fill up your server. Then, you need to make sure they upload only specific types of files. If you allow people to upload images to display on their profiles, someone could instead upload malware, and serve it to every browser that sees the user's profile. The way to protect against that is to check file types after upload, not just the file extension, but checking the file itself. For example, in PHP, you may be tempted to check $_FILES["file"]["type"], but that's user-submitted information, and it can be faked. Instead, use the finfo function.

Another thing you should do is make sure you place all uploads in a specific folder on your server, and lock that folder down. For example, if you use Apache, then you can add an .htaccess file with the following code which will prevent scripts from being executed:

AddHandler cgi-script .php .pl .jsp .asp .sh .cgi

Options -ExecCGI

Of course, make sure that users can't upload a file of the same name, or with the name of another crucial file in that folder. In fact, .htaccess files (or the Web.config equivalent for IIS) can host a lot of useful security commands that you should take a look at. You can make it so your code is always executed in scripts, and not sent as HTML, in case you ever mistype something in them and they aren't seen as scripts anymore. You can also deny access to configuration or database files. And you can add restrictions based on where the user comes from; for instance, if they get redirected from another site when trying to submit to scripts that should only be called from your own pages.

Finally, web application firewalls and programming frameworks have become very popular in recent years and can help you immensely with all of these issues. By standardizing all of your checks inside of a framework, you don't have to worry so much while you're actually writing code, and that's a great help. Take a look around the web for the available frameworks and WAFs that could work with your development environment. In the end, you need to make sure you protect both yourself and your users.

Patrick Lambert
Patrick Lambert

Patrick Lambert boasts over 15 years of experience in creating online content from designing Websites and writing articles for various technology magazines, to managing campaigns on both Facebook and Twitter. He is also certified in many Microsoft products and has worked in diverse computer-related fields such as customer support, software quality assurance, and IT.