Writing Safer Database Queries from PHP
One of the most powerful features of web scripting languages is database interactivity. Databases are optimized to sort data and retrieve that data with great efficiency. Combining a PHP web based application with a database back end makes perfect sense. Leveraging a database's power to organize, sort, search, and retrieve data makes a web application more flexible and dynamic. The open source MySQL database makes a perfect compliment to PHP scripting language and is the database of choice on the LAMP platform.
The architectural choices afforded PHP developers with respect to database interaction have come a long way since PHP version 3. There are several ways in which a PHP application can structure database interaction. Unfortunately many of the ways in which PHP can derive and execute MySQL queries lead to serious security vulnerabilities. The aim of this article is to explore some of the ways in which PHP applications interact with MySQL databases, and draw lessons about the safety of various application structures.
For the purposes of our discussion let us consider an extremely common database query task performed by PHP applications. There is content for a web site stored in a database. Each piece of content is identified by a unique identification number for storing and retrieving this content. For the purposes of this discussion we'll simplify the table, called "content", to two columns, an "id" column of type "int" that auto increments and is the primary key. The second column will be called "data" and will be of data type "text".
One of the simplest ways to retrieve data from the database is via a URL query. For instance, the page "content.php?id=4" might be the resource locator for the page that displays content with the id of 4. In version 3 PHP defaulted to a model that allowed automatic variable naming such that a script could utilize the previously specified URL to derive an $id variable that would be set to 4. This could then be used to construct a MySQL query by directly interlacing the variable into the statement like so:
<?php $query = "select * from content where id = $id"; ?>
This statement is dangerous for a number of reasons. This is a classic SQL injection vulnerability. An attacker is able to manipulate the id variable in the URL to manipulate the SQL query that will be passed on to the database. Fortunately this behavior is not allowed by default in later PHP versions. Unfortunately it can be disabled, and I have even seen it emulated programmatically, using code like:
<?php foreach($_REQUEST as $key=>$val) { ${$key}=$val; } ?>
While clever, this sort of thing defeats the security provided by prohibiting global variables.
The most common solution for SQL injection in PHP is the mysql_real_escape_string() function, which automatically escapes input to sanitize it. This prevents some SQL injection vectors, but is not a panacea. For instance, in the query listed above an attacker could still bypass mysql_real_escape_string() because the query does not utilize quoting to isolate the variable. Thus, an input of "1 or 1=1" would inject arbitrary SQL that would not be prevented by the mysql_real_escape_string() function. A better function in this example would be the intval() function, which would force the $id variable to be numeric, thus eliminating any opportunity to inject arbitrary SQL into the query.
Architecturally it is cleaner to remove all such SQL query statements from the PHP display code and encapsulate them in objects. This allows for a database abstraction layer, providing an API for display, and an enforced opportunity for input/output sanitizing. To do this, a Content object could be utilized which structured the types of data returned from the database. Such a class would resemble:
<?php Class Content { var $id; var $data; public function __construct($id='') { if ($id != '') $this->id = intval($id); } public function get_data() { if ($this->id > 0) { $query = 'select data from content where id = ' . $this->id; } return htmlentities($this->data); } } ?>
You'll notice that this pseudo-code class performs data validation on input, only assigning integer values to the internal $id variable. This variable is then used to safely construct query statements. The class further protects the application by using display sanitizing htmlentities() functionality before returning database contents. Ideally the class would sanitize data on the way into the database as well.
While this structure has several advantages it can be improved upon further by enforcing typing in the query statement itself. While there are many methods to do this, including the PEAR MDB2 library, one of the simplest ways is to simply utilize the PHP mysqli (MySQL Improved Extension) prepared statement functionality. This allows queries to be prepared with variable placeholders that enforce type, then bind PHP variables to the placeholders before execution. Using mysqli would result in a safer query as it enforced data types in the query, like so:
<?php $mysqli = new mysqli("host","user","pass","db"); $stmt = $mysqli->prepare("select data from content where id=?"); $stmt->bind_param('i',$id); $stmt->execute(); ?>
By using the bind_param() functionality in mysqli the variable is forced to be of the type integer (using the 'i' designation, 'd' is used for double, 's' for string, and 'b' for blob). This additional layer of safety can be useful, especially when queries are being reused and you may not be able to insure the safety mechanisms of the class containing the query.
The one major security weakness left in this approach is the actual MySQL user that executes the query. Because the application needs permissions to access the database and contents, MySQL credentials must be supplied in order to make the connection. These credentials include a username and password that are stored in plain text in the PHP script. Because these credentials are exposed on the filesystem, and because they are used by anonymous internet application users, the privileges on the account should be strictly limited. Under no circumstances should the MySQL root user account be utilized for these purposes. A distinct, application specific, user account should be created with only the privileges required by the application, in this case, the SELECT privilege. Even this privilege may be too broad, however.
This example application only actually requires privileges to run specific queries against the database. Rather than needing to be able to run any SELECT statement, this application needs only the privileges to run the specific select statement we've crafted. In the spirit of "least privilege" why grant this application user more rights than absolutely necessary?
To do this we'll first create a stored procedure called get_conent (). To do this we'll utilize:
mysql> DELIMITER // mysql> CREATE PROCEDURE get_content ( IN inid INT) SQL SECURITY INVOKER BEGIN SELECT data FROM content WHERE id = inid; END;//
One item to note is that by default MySQL stored procedures run with the privileges of the account used to create them rather than the account used to 'execute' them. This means that if you create stored procedures as the root user they will execute by default with those privileges. You can specify that the procedure run with the security context of the caller using the "SQL SECURITY INVOKER" statement.
Next we'll allow a user to execute this procedure using the GRANT statement:
mysql> GRANT EXECUTE ON database.get_content TO user@localhost;
This then allows us to reuse the query in a tightly controlled security context. Calling the stored procedure from within the PHP class provides extremely robust "defense in depth" with data validation at several steps in the application stack.
By using MySQL stored procedures we can strictly limit the queries that an application user can run within the database. We can also restrict the security context of the MySQL account used in a web application to the execution of only pre-defined queries. Furthermore, by using this model in conjunction with an object oriented database abstraction layer in PHP we can verify all the data passed to the application as well as data returned from the application. This structure helps defend against all forms of SQL injection, including those that may only be used to display malicious data without actually altering the database contents. By enforcing multiple layers of security you substantially raise the bar for attackers, increase the likelihood of attacks being stopped at one or many of the layers, and increase the overall security posture of your PHP web based application.