Exploring JPEG Metadata

30 November -0001

About JPEG

JPEG is a compression algorithm for digital images described in ISO 10918-1 (http://www.w3.org/Graphics/JPEG/itu-t81.pdf). It is a format specification that tells a computer how to render an image based on the bits that comprise the image file. Because all computerized data is stored as ones and zeros on a computer, it is necessary to provide a translation framework so the computer can render human readable images out of the binary data. JPEG is one of the most common image formats, but several others exist, including GIF, PNG, TIFF and others. Each format was developed for different reasons, JPEG's primary design purpose was for formatting and transmitting imagery via the web. JPEG is not only a format, it also specifies compression and rendering for image data. JPEG is optimized for multi-colored imagery and therefore is a poor format for icons or sharp, simple color imagery, but works wonderfully for digital photography.

The JPEG specification includes formatting definitions for data within the image file such as the delimiters of the actual image data, details about compression algorithms, image size, and other parameters. The specification also includes a definition for a COM section within each JPEG.

About MetaData

Metadata is descriptive information used to classify or summarize data. Metadata is commonly used to help organize larger, more amorphous data. For instance, metadata often includes information like author, date, title, and summary. This information can be encoded into larger files and used for indexing and searching. This saves computational time as a computer can sift through metadata much faster than it can an entire file's data. Metadata can also be useful in recording additional data about a file that cannot otherwise be included in the file. This is particularly true with imagery. Metadata can be used to record information about a digital photographs subject, when the picture was taken, and even a summary or keywords about the photography. Image metadata is analogous to labeling that many people used to put on the back of physical photographs.

JPEG Exif (Exchangeable image file format for digital still cameras) is a common specification (http://www.exif.org/specifications.html) for encoding metadata into JPEG imagery. Exif IFT (Image File Directory) provides a common specification for many common tags including timestamp, camera, Exif version, etc. This data is inserted in the Application Market Segment (APP1) portion of the JPEG file and cannot exceed 64 KB. This data includes several tags that can be useful in providing relevant metadata. The DateTime tag specifies image data in "YYYY:MM:DD HH:MM:SS" format. The ImageDescription tag is generally used for a Title to the image. There are also tags for Make, Model, and Software. The most abstract tag is the UserComment tag, which is generally used for descriptions. There are also several other interesting tags that store various information about the camera settings when the image was taken.

On Windows XP

On a standard Windows XP, SP 3 machine it is fairly straightforward to view and modify image metadata. There is quite a bit of metadata available by default in most digital images. The most basic metadata, include file size, modification time, and image dimensions are displayed by default in the "Details" section of the Windows XP Explorer sidebar when you select an image:

This simple metadata can be modified by right clicking on an image and selecting properties. Under the "Summary" tab you can review the metadata for the image. You can clearly see there is a title, subject, keywords, and comments field, along with an author field.

Clicking the "Simple" button will allow you to fill in form fields to supply or modify this data. Once saved this data can be used in searches for the image.

Because this data is encoded in the actual image, it persists even if the image is transferred to a different machine. For instance, filling in the metadata for the "Blue hills.jpg" sample image, we can move that image to a Linux machine and view it with F-Spot and easily observe the image metadata:

Now, even though the word "Microsoft" never appears anywhere in the image filename, we can find the image using the Windows XP Search functionality because that word appears in the metadata:

Although this data is encoded into the file itself, it isn't stored in any sort of mysterious format. If you open the image in a hex viewer you can clearly see the encoded data right after the start of the file:

Privacy Implications

Because image metadata is encoded into the actual bits of the file it is possible to retrieve image metadata from an image regardless of its location. This means that posting pictures on the web could include revealing metadata and potentially pose a threat to personal privacy. Often times people are unaware that imagery even contains this extra data. Even innocuous data such as camera model or software could be used to erode personal privacy.

Security Implications

There have been instances of security implications associated with JPEG metadata. One such incident involves a denial of service related to the way KDE, a popular windowing environment for Linux, parsed metadata (http://www.kde.org/info/security/advisory-20061129-1.txt). Exposures such as this are made more threatening by the ease with which image metadata can be altered by a user.

Exif formatting also includes a specification for the image thumbnail. This allows a thumbnail of an image to be embedded in the Exif data of the image itself. This is useful for browsing images, especially on a device like a digital camera. However, users might not be aware that altering the image itself doesn't necessarily alter the embedded thumbnail. This could lead to situations in which an image might be altered in some way to protect privacy (such as redacting a face or text) but the thumbnail might not be altered, thus defeating the attempts at obfuscation. (For further information see http://www.securiteam.com/securitynews/5FP011FFPY.html.)

Programming Interfaces

Many programming languages offer easy interfaces to analyze Exif metadata. Perl offers the Image::Exif module (http://search.cpan.org/~ccpro/Image-EXIF-0.99.4/EXIF.pm). Java doesn't have any native libraries for handling Exif data, but there are numerous third party libraries that will analyze JPEG Exif data. Pyexif is a Python Exif parser available from SourceForge.net (http://pyexif.sourceforge.net/). Many others about. The fact that Exif is a well documented and supported format means that an Exif parser could be implemented in just about any language that can read binary file data.

Conclusions

JPEG metadata is a powerful tool that can be used to help organize and tag imagery with useful descriptive material. It is important to realize that there is quite a bit of metadata encoded into images by capture devices such as digital cameras. This data is often irrelevant and can often introduce privacy leaks so it is important to understand the nature of Exif data present in imagery. It is also useful to note that while metadata can be appended to image files, this metadata travels with the file, and could reveal information about the file that the owner never intended. It is also useful to note that the Exif format includes a thumbnail image that may not necessarily be altered by editing software, which could create problems. Despite these potential pitfalls, however, image metadata is a powerful tool that extends imagery and allows users greater flexibility in providing portable, cross platform, descriptive information that can easily be embedded within image files, preserving that data for the life of the file.