Open source software security

Brute Forcing PHP MD5 Hashed Passwords

30 November -0001

Web Application Passwords

Many PHP based web applications use md5 hashing in order to obscure stored passwords. At first glance this seems like an effective security measure, however upon further examination it becomes clear that this approach does little to secure a password. Let us assume that an attacker somehow captures the md5 hash of a users password. This could happen in many ways, the most obvious being a SQL injection that reveals the password.

MD5

MD5 or message digest algorithm 5, is a 128 bit hash. A hash is a one way mathematical function, that given a value will return a standard, unique return value (a hash) that cannot be used to recover the original value. Thus the hash of an input doesn't reveal any information about that input. However, the hash is unique for the input. Thus the hash of "apple" is different from the hash of "orange" but neither hash reveals any information about the word used to produce it. Hashes are often used to store passwords because they don't reveal any information about the password but can be used to check a password. For instance, the md5 hash of "apple" is "1f3870be274f6c49b3e31a0c6728957f". There is no way to tell that "apple" produced that hash, but if a user chose the password "apple" then an application could prompt a user for their password, hash that password and compare it to the stored value, verify they were the same then permit or reject a user based on the results. (Interesting sidebar: if you Google "1f3870be274f6c49b3e31a0c6728957f" you can instantly find that it is the hash of "apple")

Attacks on MD5 Hashes

The most obvious attack against the md5 password hash is a simple brute force attack. In this scenario an attacker creates a script that cycles through password possibilities, hashes each through an md5 algorithm and then compares the result to the stolen password. This is a time consuming process. To crack a single lower case character password only takes 26 tries, but to crack a two characters lower case password takes 26^2 tries. The exponential nature of brute forcing makes it time consuming. Add to this the fact that the brute forcing program has to run an md5 hash on each possibility and you can see how the computational overhead can become quite large. Brute forcing an eight character password's md5 hash could take months on a regular computer.

Some strategies exist to ameliorate this situation. If the attacker has more than one computer available they can divide up the cracking process. For instance, if the cracking process is focusing on a two character lower case password there are 576 possibilities. With two computers the attacker can have one machine work on the first 288 possibilities and the second computer can work on the other 288 possibilities. An attacker can also exploit CPU architecture to take advantage of multiple cores on one processor or multiple processors on a single machine. This makes cracking the password much easier. For example, if an attacker with 2 machines is trying to crack an 8 character password (52 possible characters, upper and lower case) there are about a trillion possibilities. This seems daunting, but if the attacker has two computers they each only have to try 500 billion possibilities. With 100 machines each computer only has to try 10 billion. With 1000 computers it's down to 1 billion, and so on. With a large botnet this could become quite feasible.

Another strategy exists though, one that can be employed by an attacker with only one machine. Such an attacker can generate a rainbow table. This strategy involves pre-calculating the password hash possibilities. While the initial generation of the trillion password choices could take some time (especially as each password will have to be md5 hashed) - this operation only has to occur once. What's more, once the rainbow table is produced it can be published on the internet or distributed via a peer to peer file sharing system. Once the user has the tables it is quite easy to look up the password that matches a particular hash.

Other Weakness

The other weakness of storing md5 hash values of passwords is one of information leakage. If two users who chose the same password they will have the same hash value stored in the database. This will be instantly apparent, and effectively leaks the value of the password. For instance, Bob chooses the password "foo", which is md5 hashed to:

mysql> select md5('foo');
+----------------------------------+
| md5('foo')                       |
+----------------------------------+
| acbd18db4cc2f85cedef654fccc4a4d8 |
+----------------------------------+
1 row in set (0.00 sec)

If Bob can query the user table and sees something like this:

mysql> select user, pass from users;
+-------+----------------------------------+
| user  | pass                             |
+-------+----------------------------------+
| Bob   | acbd18db4cc2f85cedef654fccc4a4d8 |
| Alice | 4202a5f87a68583e20aaa6917c8c38d1 |
| Joe   | acbd18db4cc2f85cedef654fccc4a4d8 |
+-------+----------------------------------+
3 rows in set (0.00 sec)

Bob now knows Joe's password, without even having to do any password cracking. One way around this strategy is to use a salting scheme.

Defensive Strategies

A strategy does exist to defeat this type of attack. If a password is salted, that is a unique sequence is appended to the password in order to randomize it, then the attacker will have to generate a rainbow table for each salt for each possible password - exponentially increasing the effort an attacker must make. Unfortunately PHP's md5 hash function does not natively have a salt argument. This deficiency has two impacts. The first is that passwords are weaker and more easily cracked using rainbow tables. The second is that two values that are md5 hashed using PHP (or MySQL for that matter) will always produce the same output. Using a salt adds extra pseudo random data to a password making it stronger. For instance, instead of storing the hash of the users' password you can store the value of the user's password prefixed by the first two characters of the user's username. Such a scheme would store the value of Bob's password as the md5 hash of 'Bofoo', and the value of Joe's password (which is also "foo") as 'Jofoo'. In such a scheme the above query would look more like:

mysql> select user, pass from users;
+-------+----------------------------------+
| user  | pass                             |
+-------+----------------------------------+
| Bob   | 2340c42969317d88c1636af7c849246b |
| Alice | 4202a5f87a68583e20aaa6917c8c38d1 |
| Joe   | 85434bb39ebb3731447891a7f989354c |
+-------+----------------------------------+
3 rows in set (0.00 sec)

You would still run into problems with Joe's and Josh's password being the same but you can see how this type of scheme easily obfuscates the passwords even further. A good strategy for generating really random salts is to use the unix time. It is even safe to store the salt in the same table as the password hash, so it can be used to recover the password, but strengthens the hash by increasing the length of the hashed value and preventing collision. So if our user table was architected like so:

mysql> desc users;
+-------+-------------+------+-----+-------------------+----------------+
| Field | Type        | Null | Key | Default           | Extra          |
+-------+-------------+------+-----+-------------------+----------------+
| id    | int(11)     | NO   | PRI | NULL              | auto_increment |
| user  | varchar(60) | NO   |     |                   |                |
| pass  | varchar(60) | NO   |     |                   |                |
| salt  | timestamp   | NO   |     | CURRENT_TIMESTAMP |                |
+-------+-------------+------+-----+-------------------+----------------+
4 rows in set (0.00 sec)

We can insure that no two users have the same hashed password stored, even if they both choose the same password. We also protect against rainbow attacks since we make the complexity of each password exponentially more complex. Assume we have three users, each who have chosen the password 'foo'. Looking at our new table you see that each users password hash is unique:

mysql> select * from users;
+----+-------+---------------------+----------------------------------+
| id | user  | salt                | pass                             |
+----+-------+---------------------+----------------------------------+
|  2 | Bob   | 2008-10-15 12:18:34 | 3ef41c0e711dab040cc95df1a1bf597f |
|  3 | Joe   | 2008-10-15 12:20:40 | b022e71bd47266c3087cadd5b909d57a |
|  4 | Alice | 2008-10-15 12:20:47 | d244edd670d6f85e18dc40d2f50100e6 |
+----+-------+---------------------+----------------------------------+
3 rows in set (0.00 sec)

We could easily compare a password to the hashed value by adding the hash to the password then computing the hash and doing a comparison like so:

mysql> select id from users where user = 'Bob' and pass = md5(concat(salt,'foo'));
+----+
| id |
+----+
|  2 |
+----+
1 row in set (0.00 sec)

Using this type of strategy preserves the users password but defends effectively against information leak and rainbow table attacks with the added benefit that it increases password strength, thus making a brute force attack more difficult as well.

Conclusion

Storing md5 values of passwords in a database is not a safe scheme for password storage. Although it adds overhead to an attacker, a rainbow table attack still presents a significant threat. Without a salt of some sort an attacker can pre-calculate the hash of a user password. Since most systems require a complex 8 character password and an 8 character rainbow table is fairly easy to generate, it make sense to use a large salt to effectively increase the strength of the hashes without forcing users to create more complex passwords. This allows users to remember an 8 character password while increasing the strength of the stored hash.