Open source software security

Long Time... or Taxonomies in CMS'es

30 November -0001

I've been away from the blog for a while (which is bad) because I've been upgrading servers (which is good?). I haven't had much time to devote to personal writing but I figured I'd get back in the saddle now that things on the hardware/operating systems fronts have settled down.

I've been giving a lot of thought to MiGCMS v3 as a part of some work that I'm doing to evaluate Drupal as a CMS. One thing I've run into in my investigation of various CMS options (both open and closed source) is the complete lack of a unified taxonomy for describing web pages. Perhaps this is just symptomatic of the roiling nature of the web, but it's a problem - and not just for developers. Referring to various structures and types of content becomes difficult, and cluttered with buzzwords, which leads to undue confusion. Simple concepts such as a piece of text in a site can be labeled with various names, all of which have different, and often highly personal connotations. There are pages, stories, highlights, comments, nodes, content, posts, and so on. Because there is no standard for referring to these different classifications of what ultimately boils down to content, there is no easy way to distinguish one type of content from another. Looking through source code to various CMS solutions you can see this confusion extends down to the various architectures that support many websites. Often the table names in various data models make it painfully aware that developers are experiencing this confusion in content down to the core o their systems.

When developing content management systems it is important to provide an abstract enough framework to allow users to customize their presentation to their tastes. It becomes difficult to develop an abstraction of an abstraction though, and the lack of conceptual terminology to describe what appears on a website makes it nearly impossible to conceptualize the structural underpinnings of that content. Thus, most content management systems are hobbled before they are even deployed. Because the data model will ultimately restrict the system, an imprecise or overly precise data model can kill flexibility from the outset. Without some sort of language to describe various types of content, or even a system to categorize that content, it becomes virtually impossible to develop a strong structural foundation to manage content.

The roots of this problem are perhaps why certain content management systems succeed in very specific areas. Blogging software springs to mind as one of the most prominent examples. Because a blog is a strong conceptual foundation and has a fairly strict taxonomy it is easy to develop good systems to support a blog. Notions like posts and comments are easily articulated and understood and blogging software is generally well structured to support these taxonomies. Forum software is another example of a system with a strong, consistent vocabulary.

Confusion enters as soon as you begin to consider a CMS that would support blogging and forum like features. Suddenly definitions are blurred. What traditionally could be considered a 'story' might now also be a 'post' and vice-versa. As the vocabulary becomes confused it becomes difficult for the developer to create an abstraction that can easily be supported on a back end systems level. How does the developer produce a data model to support this very vague notion of 'textual stuff' without any clear sense of context?

It is imperative that some sort of shared vocabulary be developed in order to facilitate continued development of strong software that will govern web content. With a solid classification system it becomes easy to organize, and to abstract data that will appear on a site. More highly organized data leads to tighter, but also more flexible back end systems that are easier to develop and maintain. A unified vocabulary would also provide a common foundation for various systems. This would enable more accurate comparisons as well as provide a general familiarity that would cut down on training time. Just as web users have come to rely on certain standards (such as navigation items, cursor indicators, etc.) content management users, developers and administrators would be greatly served by some sort of standard classification system for the content that appears on web sites.