Dynamic data layout with vertical tables
Developers today are faced with poorly defined requests and constantly changing requirements. It is not uncommon for a developer to spend countless meetings and design sessions getting the database and object layout just right, only to have the client request just one more field when the first beta application is delivered.
That one more field requires you to change your database structure, add extra code to your object, adjust your database queries, and add additional form elements to your web site or application. This can be very tedious and often introduces bugs.
Often times, these new fields are simply that, new fields. While they do require basic validation and application plumbing to accommodate them, more often than not they do not require extra business logic (e.g. adding a second phone number field for a client list).
With the concept of vertical tables, you can avoid much of this overhead. In addition, vertical tables can help accommodate new features later.
What is a vertical table?
Normally you store data in a database table like so:
Name – Phone Number
Bob – 555-5555
Frank – 333-3333
You have one entry per row with 1 field for each piece of data you want to hold. When you decide you need to hold a work number also, you would add a new field.
Name – Phone Number – Work Phone
Bob – 555-5555 – 111-1111
Frank – 333-3333 – 222-2222
This would require you to adjust your code in the application and the corresponding database queries. Even if the work number field isn’t displayed on the form (maybe you have a new part of your application that assigns that?) you still run the risk of breaking previously written code due to the change in table structure.
A vertical table changes things around. You now get multiple rows for one person. For example.
Name – Key – Value
bob – name – bob
bob – phone_number – 555-5555
frank – name – frank
frank – phone_number – 111-1111
You add a new row for each value you want to store about bob. The name in this case would be the key to knowing if the data belonged to bob or frank. To add a new “field” in this case, you would add a new row.
bob – work_number – 333-3333
frank – work_number – 777-7777
The field “Key” becomes your field name. The field “Value” becomes your actual data. You can now add as many bits of information as you want about the person without having to change the structure of your table.
It changes the way you code
For this to work, you need to change the way you think and code. Often web programmers will do raw SQL queries and output the results directly to a web page. This layout makes that difficult. You will need some kind of intermediate object to handle the loading and saving of values for the table. This makes it no longer a simple CRUD application. Instead, you create objects that will persist to the database rather than working with raw data stored in the database.
Given the table of people above, in PHP we would create an object that has a hash array of values. In PHP they are also called associative arrays. This hash array would use the key field as the hash, and the value field as the value of the array. To load a user object, you would give it the name (bob) and tell it to load all values for bob. The object would loop through all values in the database and load them into the array. This way, as new values were stored in the database, the object would automatically load them. You aren’t required to create static properties for each value, just read the array which means adding new fields requires no change to the middle tier object definition.
When saving, the same thing occurs. We loop through the array and save out each value to the database. In our own code, we also keep track of “changed” values and only save the ones marked as changed to minimize re-writing information that hasn’t been modified. Also, keep in mind that in most code, you won’t change values very much, so often it is just a read operation. In any case, this gives us automatic object persistence as long as we put all the values we want to keep in the array.
Further, we store loaded objects in sessions where appropriate so that we don’t have to reload the object on each page view. This means refreshing the current page results in fewer database queries.
Here is an example of the people class:
edfa
class Person {
// The array to hold our values
var $values = array();
function LoadPerson($name) {
// Load the person
$SQL = "SELECT * FROM table_people WHERE name='$name'";
$rs = mysql_query($SQL);
while ($record = mysql_fetch_assoc($rs)) {
$this->array[$record["key"]] = $record["value"];
}
}
function GetValue($value_name, $default="") {
if (isset($this->array[$value_name])) {
// This value exists, return it
return $this->array[$value_name];
}
// Value does not exist, return the default value
return $default;
}
}
This code is super simple, but this is enough features to illustrate the point.
$person = new Person();
$person->LoadPerson("bob");
print_r($person->values);
Run that, then add more values to your database and run that again. It will automatically load your new values.
The real change is in how you think about databases. It is easy to think of databases as “THE” data. This methodology is supported by the concept of CRUD applications, where forms are just windows to the database table with validators. Vertical tables requires you have a different perspective on data storage.
In short, this method is a good way to make your objects persistent using a database and build your application based on your objects. You create objects to work with, not “data”.
For example, applications that store data directly in a file only use that file to save data and load it as needed, not to directly run reports etc… This method works well if you think of the database as a file. It just so happens that in this case the file is indexed for super speed and can remove your file format and parsing code in favor of SQL calls.
Automatic Persistence
We used a constructor and destructor to auto load and auto save values. Using this method you can store data for most of your structures. If you setup a parent class that has basic loading/saving/etc built in, you can have persistent data structures very easily by having your objects extend that class. Wouldn’t it be nice to not have to re-debug your loading and saving code each time you create a new class?
Extending your structures
Writing your GUI code to use these objects allows you to divorce your UI code from the underlying database structure. This means adding new values won’t mess up your existing code as easily. This also means that as you want to extend your application, that you can hook into the objects and store new data just by adding it to the array.
For example. If you decided to add e-mail alerts to your web site, the alert GUI code could save all its data in the Person object just by setting new values. Bob could now login and add e-mail addresses to the alert form. The alert code won’t need its own alert table. Just make sure the key names are unique so you don’t overwrite any data.
It is faster than you think, and wastes less space than you might imagine
First – It can be quite fast. You need to handle it properly in your code. If you limit your round trips to the database, it can be as fast as grabbing single rows from a normal table. You can do one query per user to load information (see example above). This works well for circumstances where you are only loading information for a few users at a time. If you are loading them in bulk, you will want to re-think your loading technique, but even then this could be optimized. Think about one query and have a loading function where you can pass the object a record rather than have each object make it’s own query.
As for space, yes it will take more space than a normal table. But more space is often not an issue if you are working with smaller data sets. Should Google build its search index this way? I wouldn’t recommend it. But for smaller web sites, database storage space generally isn’t your problem. Web sites getting > 10,000,000 hits a month and having several hundred thousand records can utilize this method (yes, we have used this technique in real world scenarios on sites with some real traffic). If your data storage is 200 meg, using this method is still possible. In the right circumstances, using this method is feasible for much larger data sets.
You have to decide if you are a minimalist – has to be the most space efficient storage method just because – or if you can spare a few K per item to save yourself some time.
Just make sure to add proper indexing to the fields. MySQL 5.0 is very sensitive to proper indexing. Previous versions did a better job of guessing and still running fast. 5.0 is wicked fast, but you need to be explicit with what is indexed and how. Index all fields except for the Value field, and maybe that too if you ware finding information stored there.
Advantages of this method
This method takes you away from simple CRUD/Data binding mentality and forces you to create a useful object model to code against. This is part of the idea of N-Tier data access. Too often though N-Tier data models become simply wrapper classes to CRUD style applications. In this case, the objects are a must as simple things can become unwieldy if you are trying to do data queries for everything.
Add new values at any time for any reason. It is nice to be able to just add new values to an object without re-coding the object or the database. This makes adding a new field to a form quite fast as you just need to hook the UI code to the object via the Get and Set methods. For example adding a second email address for a user would require adding the text element and a line to get/set the value. The object at that point would just deal with loading/saving the addition value. No changes to the object and no changes to the database structure.
I can create new classes that are persistent very quickly and have them be solid and stable with minimal effort. This can save many man hours! You can either copy/paste the get/set/load/save methods and setup a new table using the same format, or you could possibly write a base class that has the core functionality and have new classes extend/inherit.
Disadvantages of this method
This is NOT a good method for you data binders out there. The exception to that would be for those who data bind to objects instead. This can be effective. We have used this technique in .net on a few different occasions. Most data binding tools will not bind properly to the vertical tables as they assume that tables are one row per item rather than multiple rows per item.
Be careful with this method. Don’t try it right away on a big project. You need to play with it a bit to understand it and to figure out when it is appropriate. If coded and managed properly, it can be an elegant solution. If not, it can be your worst nightmare. I had some coders re-design some stuff I wrote who didn’t quite grasp what I was doing. Their “enhanced” version of what I made ended up harder to use than before and lost usability and some existing features in order to gain a few features that would require a senior level programmer to utilize. In short, it was harder to understand and work with instead of easier.
Storage space requirements are bigger, but not that much bigger. You will have to decide what your speed and space requirements really are. As with any application, test it! Proper coding techniques will almost always result in more performance improvements than picking the “perfect” database layout.
Conclusion
There are many enhancements that are needed to make this truly useful. We have auto saving, and various helper functions to make the code easier. Ideally you only actually call database queries in the load and save functions to read or store the array. The rest is abstracted from the database. A base class is also not a bad idea so that classes that extend from that class can automatically get the load and save functionality.
Our new tool kit will be using this method to allow us to plug in new features on an existing website. Check it out. The programming library will be released free! http://cmtk.cmagic.biz/. The new website is now live. New examples and information will be added regularly.