Using the Google Safe Browsing API from PHP

30 November -0001

Google's new Safe Browsing API is a neat service that allows you to poll the MD5 hashes of known malware and phishing sites. This is especially handy because you can check URLs submitted to your site or service by internet users to make sure that they don't include malicious links. The API is relatively well documented at http://code.google.com/apis/safebrowsing/developers_guide.html so the purpose of this tutorial is mainly focused on how you can utilize PHP to implement the API. If you use Firefox you are probably familiar with the malware or phishing warning screen that shows up when you visit suspicious sites. This feature implements the Safe Browsing API.

Screenshot of Firefox warning screen

Making calls to the Safe Browsing API is pretty straightforward. You need to first register with Google to get a developer key in order to access the service. Once you do this you simply call a certain URL which responds with a list of MD5 hash values to suspected malware sites. The first thing you should do is set up a local database to store these values. In MySQL you can use the following to set up a simple table to store these:

# mysql -u root
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.1.35 Source distribution

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> create database malware;
Query OK, 1 row affected (0.00 sec)

mysql> use malware
Database changed

mysql> create table malware (malware_hash varchar(32) NOT NULL primary key);
Query OK, 0 rows affected (0.01 sec)

mysql> grant all privileges on malware.* to 'malware'@localhost identified by 'malware';
Query OK, 0 rows affected (0.00 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

Once you've got your MySQL table set up you're ready to populate it with values from the API. Using the following PHP code snippit you can pull down these hash values and parse them into the database:

<?php

$conn = mysql_connect('localhost', 'malware', 'malware')
	or die(mysql_error());
mysql_select_db('malware') or die(mysql_error());

$api_key = "your_google_developer_key_here";
$version = "goog-malware-hash";
$google_url = "http://sb.google.com/safebrowsing/update";

//open the remote URL
$target = "$google_url?client=api&apikey=$api_key&version=$version:1:-1";
$handle = fopen("$target", 'r')
	or die("Couldn't open file handle " . $target);
//populate the db
if ($handle) {
    while (!feof($handle)) {
        $line = fgets($handle);
        //ignore the line [goog-malware-hash 1.14879]
        if (substr($line,0,1) != '[') {
        	$operation = (substr($line,0,1)); //get the '+' or '-'
        	$hash = substr($line,1); //get the md5 hash
        	$hash = mysql_real_escape_string($hash); //just to be safe
        	if ($operation == '+') 
        		$sql = 'insert into malware set ' .
        				'malware_hash = \'' . $hash . '\'';
        	else 
        		$sql = 'delete from malware ' .
        				'where malware_hash = \'' .
        				$hash . '\'';
        	mysql_query($sql) or die(mysql_error());
        }
    }
    fclose($handle);
}
mysql_close($conn);
?>

This script will handle the initial import but may need some tweaking for polling updates to the list. Note that allow_url_fopen must be set to 'On' in your php.ini file for this script to work (otherwise you'll get an error because the PHP engine can't open remotely hosted files).

Note that this script shouldn't be run every time a user submits a URL. According to Google your client (the database) should only refresh its list of suspected malware sites every half hour. Scheduling this script from cron is probably the easiest way to implement it.

Once the data has been pulled into your local database you can implement a simple service using the following PHP code snippit. I haven't bothered to implement all the permutations for checks suggested by Google, but it should be more than enough for proof-of-concept:

<?php
/**
 * PHP demonstration of the Google Safe Browsing API
 * http://code.google.com/apis/safebrowsing/developers_guide.html
 * If you're looking for a good test URL try 
 * http://malware.testing.google.test/testing/malware
 * 
 * @author Justin C. Klein Keane <justin@madirish.net>
 */
$debug = false;  //change to true to see progressive info messages
$message = "<p><em>Warning</em>- Visiting this web site may harm your computer. " .
		"This page appears to contain malicious code that could be downloaded " .
		"to your computer without your consent. You can learn more about harmful " . 
		"web content including viruses and other malicious code and how to " . 
		"protect your computer at <a href='http://www.stopbadware.org'>StopBadware.org</a>.</p>" . 
		"<p>This evaluation was made possible using <a href='http://www.google.com'>Google's</a> " . 
		"Safe Browsing API.</p>" .
		"Google works to provide the most accurate and up-to-date phishing and malware information. " .
		"However, it cannot guarantee that its information is comprehensive and error-free: " .
		"some risky sites may not be identified, and some safe sites may be identified in error.";

if (! isset($_GET['lookup'])) die('Must input a string to look up.');
else $url = strtolower($_GET['lookup']);

$url_parsed = parse_url($url);
$url = $url_parsed['host']; 
if (isset($url_parsed['path'])) $url .= $url_parsed['path'];
if (isset($url_parsed['query'])) $url .= $url_parsed['query'];

// Implement the Google guidelines from
// http://code.google.com/apis/safebrowsing/developers_guide.html

//follow connonicalization rules
$url = urldecode($url); //remove hex encodings
$url = preg_replace('/^\.*/','', $url); //remove leading dots
$url = preg_replace('/\.*$/','', $url); //remove trailing dots
$url = preg_replace('/\.+/','.', $url); //replace consecutive dots
$url = preg_replace('/\/+/','/', $url); //replace consecutive slashes

//append a trailing slash if no resource is specified
$has_resource = false;
if (isset($url_parsed['path'])) {
	$path = $url_parsed['path'];
	$path_array = split('/',$path);
	$target = $path_array[count($path_array)-1];
	if (! strpos($target,'.') && substr($target,-1)!='/') $url .= '/';
	else $has_resource = true;
}


if ($debug) echo "<p>URL: $url</p>";
if ($debug) echo "<p>Path: ".$url_parsed['path']."</p>";
if ($debug) echo "Hash: " . md5($url);

$conn = mysql_connect('localhost', 'malware', 'malware')
	or die(mysql_error());
mysql_select_db('malware') or die(mysql_error());

//look for the whole URL
$sql = 'select * from malware where malware_hash = \'' . md5($url) . '\'';
check_query($sql);

//strip off the query params
if (isset($url_parsed['query'])) $url = str_replace($url_parsed['query'], '', $url);
$sql = 'select * from malware where malware_hash = \'' . md5($url) . '\'';
check_query($sql);

//strip off the resource
if ($has_resource) {
	$url = substr($url, 0, strrpos($url, '/'));
	echo "Got " . $url;
}

//and so on according to docs - you should have the idea by now :)

function check_query($sql) {
	global $message;
	$retval = mysql_query($sql);
	if ($row = mysql_fetch_row($retval)) {
		echo $message;
		die(); //we should stop at a hit
	}
}
echo "Nothing found.";
mysql_close($conn);
?>

Try pulling up this script with the url ?lookup=http://malware.testing.google.test/testing/malware and you should be presented with the warning message if everything is working properly. You can probably tweak this functionality to better support for your projects (depending on whether you need AJAX support or whatnot) but in its current form it can demonstrate functionality and be used for feasibility.