Open source software security

Designing a Data Driven Website - Part II

30 November -0001

Designing a Data Driven Website

Part 2

21 February 2002

Data Modeling - Setting Up Your Systems

Data modeling is essential to any dynamic data driven website. Set up a good data model, and you'll never have to touch it again, set up a hasty, poorly conceived data model and it will haunt your site for its lifetime. Unfortunately data modeling is a skill that depends in large part on experience, but there are a few tips you can remember to make your data modeling experience more effective. At its core, data models consist of the setup of database architecture and the tables and their relationships in your database. Of course your data model will in some part be determined by your selection of database and scripting language, but in general your data model will be fairly independent of your operating system, database engine, and scripting language.

NOTE: Before even attempting to begin a data model it is essential to have at least a rudimentary understanding of SQL (Server Query Language). If you need a primer check the MySQL section on Madirish.net.

Data modeling is easiest to learn by example. I'd like to set up a simple scenario to use for the purposes of this article. Our task will be to design a simple data model to handle user information for a website. This information will include a user's name, contact information, and a password. Although this all seems rather simple, we will soon see that this impression is a bit deceptive. We will call this project the User Manager.

The first thing to determine when choosing a data model is how many databases you will need. For our purposes and for example we will assume that only one database is needed. For most projects (even very large ones) only one database is necessary (although the number of tables in your database may be extremely large). Databases are the containers into which you place tables that hold information. You can think of your database as a storage bin for related tables. For our User Manager we will use one database, called 'Website' that will eventually hold most of our information for the data driven site. Thus far our data model consists of only the database. Once we have created the database (this process will vary greatly depending on the database software you are using) we must begin to create tables to hold our data.

Before we begin creating the actual tables in our database we should develop a rough outline of all our tables and how they will interact. This mapping goes to the heart of data modeling. On first glance it seems we should only need one table (called Users) for our User Manager that looks something like this:

User_Namevarchar(255)
User_Passwordvarchar(255)
User_Addressvarchar(255)
User_Phonevarchar(255)
User_Emaivarchar(255)

While this table would work fine for a very simple version of our Contact Manager it has several major deficiencies. First of all all the data types are of the same size (255 ' the maximum size for a MySQL varchar). Many of the fields in our table do not need to be quite that big (how many people do you know with a name longer than 100 characters?). We can probably pare the fields down significantly to save on space (which may become a consideration down the road. After we do this we may have:

User_Namevarchar(75)
User_Password varchar(20)
User_Addressvarchar(255)
User_Phonevarchar(20)
User_Emailvarchar(75)

We pared down the password field significantly and we should probably note this so that when new accounts are created we check to make sure people only enter passwords of up to 20 characters.

The table still has several major flaws in it, however. We should consider how we are going to select users. Based on the current model we would probably select users based on their names. Well, what if we only want to select users' last names, how can we distinguish them from the first name? While we could write a clever text string search to parse the User Name field based on blank spaces, it would probably be easier to restructure the table to look like this:

User_FirstNamevarchar(50)
User_LastNamevarchar(50)
User_Passwordvarchar(20)
User_Addressvarchar(255)
User_Phonevarchar(20)
User_Emailvarchar(75)

At this point we can organize users based on their first or last name. However, we still have a problem if we get two users named John Smith. How can we distinguish the users? While we could, say, check other criteria in addition to names (such as passwords and addresses), it is easier in the long run to distinguish each user with a unique id. We'll see how this will become useful later on as well. This unique id should be the primary key and should auto increment so that the database assigns the key. Now our table looks like:

User_IDint
User_LastNamevarchar(50)
User_Passwordvarchar(20)
User_Addressvarchar(255)
User_Phonevarchar(20)
User_Emailvarchar(75)

At this point our table is beginning to look pretty good. Lets say, however, that a we want to delete a user (because they go on vacation for instance) but later on we find we want to reinstate the user. Well, to avoid the hassle of actually deleting and re-entering all their information we can shortcut the process by putting a 'flag' on their account to show whether they are active or not. We can use this flag later in our code to only display or use records that are flagged as active. We'll use a simple bit type (0 or 1, 0 for inactive, 1 for active) for now:

User_IDint
User_LastNamevarchar(50)
User_Passwordvarchar(20)
User_Addressvarchar(255)
User_Phonevarchar(20)
User_Emailvarchar(75)
User_Activebit

Ok, now we can deactivate and reinstate user accounts at will, distinguish different users, etc. We will certainly encounter a problem down the road, however, if someone wants to list more than one phone, email, or address. Right now our system can't handle that information. In order to expand the User Manager, lets add that capability by adding additional tables to our data model. Lets start with phone entries. We'll create a table called 'Phone':

Phone_IDint
Phone_numbervarchar(20)
User_IDint

You'll notice the field User_ID has the same name as the primary key in our Users table. This field will be used as a foreign key (an entry that is a primary key in another table). This way we can keep track of who's phone number is whose. For instance, if we have a user with the following information:

User_ID1
User_FirstNameJoe
User_LastnameCool
User_PasswordSecret
User_Active1

We can list his phone number in the Phone table like this:

Phone_ID2
Phone_number222.222.2222
User_ID1

In order to use this table effectively we should remove the Phone_Number Entry in the original users table. You may be tempted to keep it and use it as a foreign key (mapped to the Phone_ID field in the Phone table) but if we did this we wouldn't be able to list multiple phones for one user. In the current architecture we have a 'one to many' relationship between the Users and the Phone table. Right now our data model looks like this:

UsersPhone
User_IDintPhone_IDint
User_LastNamevarchar(50)Phone_Numbervarchar(20)
User_Passwordvarchar(20)User_IDint
User_Addressvarchar(255)
User_Emailvarchar(75)
User_Activebit

If we want to further expand our data model so that we can distinguish multiple types of phone numbers (cell, home, work, fax, etc.) we can do can do so by adding a new table. Your first instinct might be to add an additional field in the Phone table (something like Phone_Type varchar(20)) but if we do this then there is a chance that we could get all sorts of scattered data (like, home, primary, fax, mobile, cell and any number of variations). To keep the data clean we'll create a new table called Phone_Type that will look like this

Phone_Type_IDint
Phone_Type_Typevarchar(20)

And we'll modify the Phone table so that it looks like this:

Phone_IDint
Phone_Numbervarchar(20)
Phone_Type_IDint
User_IDint

You'll notice we're using two foreign keys in the table now, one for User_ID (to tell who the phone number belongs to) and one for Phone_Type_ID (to keep track of what type of number it is).

Right now we've got a pretty good data model for our user manager. It still can't handle multiple e-mail addresses and the User_Address field is a bit sloppy (it's not broken down into Street Address, City, State, Country, Zip or anything), and we might want to add a datetime field to the table to keep track of when we entered people but its perfectly functional. Hopefully this short example has shown you a bit about what to do with data modeling. Our finished data model looks like this:

Database: Website

Table: Users

User_IDint
User_LastNamevarchar(50)
User_Passwordvarchar(20)
User_Addressvarchar(255)
User_Emailvarchar(75)
User_Activebit

Table: Phone

Phone_IDint
Phone_Numbervarchar(20)
Phone_Type_IDint
User_IDint

Table: Phone_Type

Phone_Type_IDint
Phone_Type_Typevarchar(20)