Securing User Input in Web Based Applications

30 November -0001
Justin Klein Keane
July 1, 2007

User input validation is consistently one of the most widespread problems in software contributing to security incidents. Often times software developers assume that users will only provide valid user input, or that users will only provide user input in one form. Many web application developers fail to understand the hostile environment their code will be exposed to. Gathering input via a form doesn't guarantee that the only data passed to the form processing script will be passed by the form. Developers should not expect that input type, names, or formats will match those laid out in the form the developers produce.

In general, a malicious user attempting to exploit a web application will attempt to mangle input in order to escape the handling of the input and grant the attacker control of the handling process. For the purposes of example let us suppose a Perl CGI script collects a users name from form input then prints a customized greeting message to the screen. The actual Perl code might look something like:

print "Hello ", qx/$user_name/;

This appears to be fairly innocuous input, but consider what would happen if the user passed in the value:

`ls`

for $user_name. The malicious user has gained access to shell commands at this point. This is the same principle used in SQL injection attacks. The malicious user endeavors to break out of the confines of command execution that the developer has put in place. In the case of SQL injection the malicious user wants to craft their own SQL stamens independent of those the developer crafted. In the above example the malicious user intends to hijack the entire program process.

Weeding out bad input is a challenging task that varies depending on the complexity of the input data and the usage of that data in the form processing script. In most cases it is easier to list what can be considered valid user input than it is to enumerate what could constitute malicious or malformed input.

Some scripting languages provide easy tools with which to handle user input. For instance, if user input is going to be passed into a MySQL database via PHP, PHP's native mysql_real_escape_string() works quite well to mitigate dangers. In fact, PHP, since it was designed for use on the web, has many handy functions with which to escape strings so they cannot be mangled. Functions such as htmlentities() and addslashes() can be used to handle user input.

In lieu of these functions, however, it is possible to quantify what could be considered valid user input. Whenever possible, check the user input against known 'good' inputs. For instance, if a variable is intended to be a filename, say to a configuration file, check the files that exist in the possible location and compare them to the value in the input. For instance, if a user is supplying the value to a variable $config_file, which should point to one of four files in the /config directory then create an array of the four file names. For instance, if the /config directory contains the files normal.conf, extended.conf, testing.conf, and down.conf, simply do a directory listing and put those four filenames into an array and check to see if the user supplied input matches any of the names. This insures the flexibility of the program while maximizing security. The system can easily tell the program what values are acceptable ones and that piece of information can be used to filter out bad inputs.

Another trick is to filter input strings to get rid of characters you know you don't want. For instance, if the input supplied is supposed to be a filename, you can safely enumerate the valid characters that can be used to make up filenames. That is to say, check to make sure the input only consists of alpha-numeric characters, dashes, underscores and periods. Doing this sort of checking can mitigate much of the threat your applications will face.

Many applications quote user input in order to pass it as parameters to functions or other system calls. In these cases it is important to ensure that the user input doesn't contain any quotes that aren't properly escaped. The easiest way to do this is to simply URL encode the user's input. This is especially effective if the user input is only going to be used as presentation material for HTML.

Properly escaping quoted strings can be challenging as well. Using native functionality in the scripting language can alleviate much of the headache. For instance, in double quoted string the developer must check for every double quote and substitute a backslash followed by a double quote for the original double quote. However, if a malicious user passed in a double quote preceded by a backslash, such a filter formula would result in two backslashes followed by a double quote. Many systems will interpret this as an escaped backslash followed by a double quote. Thus, any home rolled formula for escaping strings must find all the quotes and then recursively check the preceding character to make sure that if it should be a backslash it occurs in an odd number. Otherwise the quote should be escaped with a backslash. Malicious users looking to exploit your application are well aware of the difficulty of this sort of process and will certainly attempt to exploit any weakness in the checking algorithms developers employ.

The developer must be aware that in some circumstances a quote isn't the only dangerous character that can be provided in user input. Even in a quoted string characters such as the dollar sign ($) can be used to reference variables from inside and outside the program if the quoted string is passed on to the command line. Be sure to consult the documentation for your scripting language as well as any environments to which you are passing your program commands. For instance, when using PHP's exec() function on a typical Linux platform, the exec() statement might pass the command to the BASH environment. Even if properly quoted, a semi-colon could cause the exec statement to terminate early and a new command to be executed.

Consider the following PHP:

<?php
exec("echo $_SERVER['QUERY_STRING'] >> log.txt");
?>

This PHP is used to take the user input and append it to a log file. If a malicious user called this script with the URL query string that included:

?something;rm -rf *

The system would echo 'something' then proceed to delete files starting at whatever root it was executing and working it's way down. The above script would be nice enough to leave a handy listing of what it had deleted in the file log.txt, but nothing else would be left of the web application. As you can see, in these situations it is imperative to screen the user input.

PHP provides several useful functions for this type of situation including escapeshellarg() and escapeshellcmd(). Perl is a little more difficult and you may need to write more extensive input checking routines to insure that your scripts are safe.

In all cases developers need to understand that user supplied input is a great unknown. Developers cannot expect to get good input from users. Malicious users abound, as do rogue spidering agents, spammers, and other nefarious elements on the web. Developers need to set strict guidelines for acceptable user input and enforce them rigorously. It is impossible to quantify the known bad inputs and therefore impossible to write a programmatic screen to remove them. It is, however, possible to quantify known good inputs. If developers restrict the inputs to known good inputs they will find their applications' security posture is greatly enhanced.

Ultimately developers must make it a priority to understand all the relevant functions in their chosen programming language as well as the security features afforded them. Often times insecure applications result from a lack of understanding of either the programming language or the particulars of certain function calls. It is imperative that developers research the ramifications of their code, especially if it transcends platforms. It is not enough for a Perl or PHP developer to understand their chosen language. If their applications interact with either the shell, a database server, or even a mail server, the developer must also take the time to understand how those systems work in order to insure that they only pass safe input to those secondary systems. Since the data passed to those systems is often transmitted with the elevated permission of the web server, and implicitly trusted by the secondary system, it is critical that developers shield these underlying applications from unexpected user input.

One of the most effective strategies to mitigate this threat is code review. There are several commercial tools that can assist in checking web based application user input. Barring this option, however, a thorough code review by developers not associated with the code development is often sufficient to spot potential problems. It is important that all developers are well aware of security risks involved in writing web based applications. As a rule of thumb it is always better to err on the side of caution with checking user input validation. Web application security is an evolving field and it is likely there are attack vectors that have not yet been discovered. Good coding practice with an eye towards security can help mitigate these threats before they ever become a problem.