Open source software security

How I (Ideally) Approach Work on a PHP Application

30 November -0001

The first, and in my opinion, most important step in approaching a new PHP project is to identify the goals and business rules of a project. This means understanding the purpose of the application as well as the real world system that it will be mimicking.

Obviously any application begins with client meetings and requirement gathering. Assuming all of that is complete and I have a good understanding of what it is I'm aiming for I start thinking about the code.

There are two main pieces of any PHP web based application. The database and the code. The database, ideally, should be designed to be a pristine repository for user data. The data should be reliable and solid. An appropriate data model should be designed to store all the required application data as simply and elegantly as possible. Repetition is the bane of the data model, so it should be as lightweight as possible.

Data modeling

Creating an approachable data model is also very important. Reading a data model diagram for the application should be easy and self explanatory. Table names and column names should be in plain English and easily understood. While I don't follow many hard and fast rules for data modeling, consistency is of the utmost importance. If you name some tables with underscores don't name others with camel case. So if you have a table 'registrant_details' you shouldn't have a table named 'EventDetails'. Other than this I generally follow a few more rules:

  1. Table names should always be singular. This helps speed writing code and cuts down on errors.
  2. Column names should always be preceded by the table name. So the 'id' field on a table for 'event' should be 'event_id'. This makes reading complex SQL much simpler and prevents any name space collision (like using a reserved word for a column name).
  3. Mapping tables should have the name of one table, an underscore, an 'x' and then the name of the other table. So 'student_x_event' is easily recognizable as a mapping table between students and events.
  4. All tables should have an independent, computer generated primary key. Even if it doesn't seem necessary you never know down the line.
  5. Naming should be consistent. If you're tracking first and last name in a student table and in a teacher table don't call the student first name field 'student_firstname' and the teacher first name field 'teacher_first_name'.

Building the Application Itself

Building the PHP portion of the application begins with trying to figure out what objects exist in your application. For instance, if you are building an event registration application you'll want to decide what objects you want to create to mimic the real world process that is used to register for events. This involves a little bit of thinking in an 'object oriented' fashion and may be a bit difficult at first, but over time it becomes the norm.

With an event registration system we can possible assume that we'll have a student object. This is the actual person who is registering for the event (assuming it's limited to students, otherwise it might be a registrant object). We'll also probably have an event object, and perhaps a few others, but for now we'll focus on the student object.

Objects have two essential parts � their methods (things they do) and their attributes (their properties). These can roughly correspond to real world verbs (for methods) and nouns (for attributes). There's a lot of academic debate about how to form objects and the relation of verbs and nouns to objects that may be of interest for further investigation, but I won't jump into that here. The student object has a couple properties that are quickly obvious. They'll have a name property (perhaps first and last), an id property, and perhaps an email property. These are things that all the students will have.

Next it's time to think about the methods for the student. These are things that the student object will do when it is created. First lets think about the actual creation process. Obviously we can create a blank student object as a template, but that's rather useless. Rather, when we create a new student object we're likely to use an id to create that object. The first thing we'll do is look up that id to find out if it corresponds to a valid student, then we'll load up the attributes of that student based on what we find. PHP 5 has a predefined construct() method that runs whenever the object is created. It seems obvious that we're going to want a method to look up the student data. A good name for this method would be get_student(). Perhaps we'll require that an id be passed to the construct() so that each student is required to have an id. At this point our student object looks like:

Class Student {
  var $id;
  var $email;
  var $name;

  __construct($id) {
    $this->id = $id;
    $this->get_student();
  }
  function get_student() {
    if (! $this->id) {
      die(�No id property set�);
    }
    //go to the database and look up the student assign $this->email and $this->name based on what we find
  }
}

You can quickly see how the code is beginning to grow based on a process that we're defining very abstractly, based on object that aren't really all that concrete yet. You should continue this process until you feel you can identify all the major objects (classes) in your application. This can all be done before you really write any PHP code.

You may also notice that at this point we're identifying the major parts of the code but not necessarily identifying how they will work together. Once we have all our base objects we might want to think how they're going to interact. For instance, once we have a student, and an event, how does the student register for the event? Perhaps we need a registration object that is loaded with a student and all the events for which they have registered. When a student registers for a new event we'll create a new registration object using that student, then call an add_registration() function and pass it the corresponding event object.

You should begin to notice a layer of abstraction developing at this point. Without writing any code you've begun to think about your process and how it interacts. You should also notice that by approaching the application in this manner you're remaining database agnostic. You're creating classes that are PHP representations of the underlying data model. The application then changes these classes by updating attributes or calling methods, which then write the changes back to the database. This helps to insulate the database since you're not writing anything but very specific code to poll data into objects and methods to write data from existing objects back to the database. By insulating the database you'll find that a lot of the code can be written without any knowledge of the underlying data model at all.

For instance, I know I've got my student object (class) and it has an email property, but there is no reference to what table this property is being pulled out of in the data model. Similarly, if we include a function called set_student() that writes the email property back to the data model then we can reset the object's email property, call the set_student() method, and assume that their data has been updated � all without any knowledge of the database. This helps to subdivide work very nicely.

Thus, if I'm writing a page that processes a form that is responsible for updating a user's email address I don't have to write any SQL at all! I simply find out what id I'm working with, create a new student object with that id, change the $email property of that object, and call the set_student() method like so:

$student = new Student($id);
$student->email = $_POST['new_email'];
$student->set_student();

And I've changed the data without ever directly touching the data model. This helps in that you can have one developer writing back end class code, and another developer writing front end logic and assuming the front end coder has a specification for the objects available, their methods and properties, that coder never needs to write any SQL or know anything at all about the database.

This method turns relational data into object data. One developer works on the transformation while the other interacts solely with the objects.

Application Modeling

I prefer to approach a new PHP application from several different perspectives. I am a proponent of three tier architecture and MVC design.

Three tier architecture means separating your application into three distinct layers. The data layer, the logic layer, and the presentation layer. You should try to insulate each of these layers as much as possible and regulate interactions between them. Three tiered architecture aids in longterm maintainability, modularity, and ease of debugging. It is a well documented and widely used system for creating web based applications and a host of information about the design exists online.

A certain layer of abstraction is incredibly helpful when using a three tiered architecture. Using tools like a templating system (such as Smarty - http://www.smarty.net/) and PEAR DB (http://pear.php.net/package/DB) help you to insulate the various parts of the application, keeping your systems modular and robust. Also, using widely documented systems allows for easy code maintainability by other developers.

MVC stands for Model, View, Controller and it is a design pattern. It is a conceptual framework from which to build your application. The concept was originally developed for desktop applications but is also relevant to web based applications (even though ideally they should follow REST, but that's another debate).

The three parts of the MVC pattern are actually quite simple. The model is the back end, or the data. The View is the presentation, or the part the user sees. The Controller is the logic that handles interactions between the Model and the View (responding to changes on both sides).

Combining MVC with three tiered architecture leads to a somewhat complex, but very modular and maintainable system. The basic idea is that you have a presentation, or an HTML page. You get input from the user through that HTML page (the View). This input is passed to a Controller, which decides what to do with that input. Typically the Controller will examine the data and determine what the user intends, and hands control of the application off to a more specialized Controller for that request. For instance, an action requesting a list of all students enrolled in a specific event will have a separate and distinct Controller from an action that tries to update a user record. Ideally there should be a hierarchy of Controllers (each represented most commonly by a single PHP page that checks input, builds objects, and includes sub pages based on the requested action) that hand off to one another based on the intent of the request. If you design your application so that each controller does one and only one thing then your code becomes very granular and easy to maintain. If there is a problem updating a user it's easy to pinpoint that controller and troubleshoot, for instance. The Controllers build objects, change objects, and request object interaction, but are not necessarily objects themselves. They are PHP pages that simply guide the flow of the application and eventually govern the View returned to the user.

For instance, lets say we begin with the following directory structure for our event registration system:

index.php
lib/
    student.class.php
    event.class.php
    event_collection.class.php
    event_registration.class.php
controllers/
    view_events.php
    register_for_event.php
templates/
    event_list.tpl
    registration_confirm.tpl

A student enters the system and requests index.php. This file is the controller. It examines the student request and has a list of conditionals. Lets assume the user hasn't specified any request and the default action is to view a lits of events that the student can enroll in. The Controller (index.php) includes the file view_events.php, which in turn becomes the new Controller. This page creates a new object called Event_Collection out of the class by the same name found in lib/event_collection.class.php. The view_events.php page passes the student's id to this newly created object and queries the object to get a list of all events that the student might be able to enroll in (by calling a method called get_all_events() for instance). The Event_Collection object actually consults the database and creates an array of new Event objects (using the event.class.php file, which contains the class Event) and passes those back to the Controller (view_events.php). This file takes that array and passes it to event_list.tpl. This file is a Smarty template (and also the View in this case) and completes it's operation.

From this point the user views a list of events and can click on one to register. This might POST a form to index.php, which would call theregister_for_event.php controller that would create a new student object , event object, and pass them both to a new Event_Registration class. Then this page might call the Event_Registration class' register() method which would write the registration to the database, and then load the registration_confirm.tpl template that would thank the user.

You can easily see how this type of structure sub-divides the task of the application. The actual file and directory structure might seem large, but the amount of code in each file will likely be quite succinct, making it easy to identify pieces of the application and their function. I try to follow a few other guidelines when creating PHP web based applications as well.

Other Guidelines - Overall Coding Guidelines

Each php page should be named using underscores to separate words. Each page should be logically named so that it's function is immediately apparent. Classes should be capitalized and methods all lower case.

All variables should be descriptively named. If seen in a vacuum, each variables usage should be immediately apparent. The only exception to this rule should be integers for incrementing or decrementing (with loops).

Classes should be constructed to allow for auto generation of PHP Docs (http://www.phpdoc.org/). Comments should likewise be modeled upon the phpdoc guidelines.

Methods in classes should be listed in alphabetic order. This helps maintainers who are using text editors (like vi) to quickly identify methods and trace code flow.

Only one class should be listed per file. Each filename should correspond to the class it contains.

Braces should always be included in code, even when they are not necessary. Single quotes should be used in any string enumeration. PHP parses single quotes much faster. If variables must be salted into the string, use the concatenation operator. Consistency is the most important rule with braces, if you choose one brace style stick to it.

Operators (+, -, /, *) should be surrounded by spaces for legibility. For instance, utilize "$foo * $bar" rather than "$foo*$bar" in PHP code.

Drupal.org has an excellent set of PHP coding standards at http://drupal.org/coding-standards.

Other Guidelines - Controller Guidelines

Each controller page should accomplish one and only one action (note that 'determine appropriate include' is only one action even though it may have several branching results). This way it is easy to identify the logic page controlling each action of the application. Thus, a page that displays a form for the student to fill in their long name should be separate from the page that processes the form. It is not necessary for each controller to have a display, some controllers may simply process data then hand off control to a different script for display.

Other Guidelines - Template Guidelines

Ideally the templates will be strict XHTML constructs, with every element identified by an 'id' element. These id's will then be used in conjunction with a style sheet (CSS) to control display.

Each template usually contains two include files, one for a global header and one for a global footer. The global header may include a metas template as well. The global header includes a central link to a style sheet.

Use of a central style sheet will allow for ease of maintenance and uniform presentation.

Every effort should be made to conform to section 508 suggested guidelines (http://www.section508.gov). This includes using markup sparingly to accomodate screen readers, employing image alt tags, and sticking to clean, valid, XHTML code.

Each form should have a name element. I usually use the name 'theForm' by default.

Each page should have a distinct title for ease of navigation. Templates should, wherever possible, follow W3C Web Accessibility Guidelines (http://www.w3.org/WAI/) and Dublin Core (http://dublincore.org/).

Other Guidelines - JavaScript Guidelines

A top level javascript directory should be used to centrally locate all javascripts. Each javascript should be called with a src element in the script tag to allow for maximum code reuse. All javascripts should be abstracted as much as possible to allow for reuse on any page.

Other Guidelines - Style Sheet Guidelines

Styles should be listed alphabetically. Each style should be broken up with line breaks after each value pair. Elements should be listed alphabetically as well.

For instance:

#nav_bar {
background-color: blue;
border: 1px solid;
padding: 2px;
}

Element names should be separated with an underscore. Emphasis (.em) should be the method of sizing all fonts rather than points or pixels as it allows for better user control over display. Similarly, using percentage based, rather than absolute layout is preferable to allow users a greater degree of control over application display. Ideally the user should be able to resize fonts and view pages at their chosen resolution rather than be forced to adhere to some developer imposed guideline.

Finally, all CSS should validate using http://jigsaw.w3.org/css-validator/ or a similar service.