Get it All
Together

Another post in my Development Walk-Through Series, following me through the development of my own application to share my thoughts and observations. Please consider subscribing to my feed if you find this information helpful.

Now that you’re working on your Models in Phase Two of your development process, you have a second opportunity to trim up your database. Once you start putting together CakePHP queries, it quickly becomes obvious that CakePHP will arrange your queries according to it’s own sense of MySQL rather than what you might have originally thought. This means that the database development you did in Phase One will have to be altered a bit.

But the way you put together queries is also important. For example, when assembling the [‘conditions’] array within your query, the order in which you specify the conditions is directly reflected in the way the conditions are built into the query.

Why is this important? Well for a start, because if you’re using any multi-column indexes, those indexes rely on the order of columns in the query being in the same order as they appear in the index. Without the correct order, they simply cannot be used. I don’t have any proof to back this up, but it stands to reason that this fact may point the way to another tweaking possibility: that the order of columns probably matters regardless of whether or not you use multi-column indexes.

The take away from this is that when you create queries in the CakePHP manner, it’s important to look at the resultant query and make sure it’s behaving the way you expected it to. If you can tweak the query, do it. If you cannot, it’s probably time to rethink some of your optimization strategy to better reflect the reality of the system in which you are working.

It will doubtless seem over-obvious to those veterans out there, but for those who are diving into CakePHP for the first time, let a fellow first-timer let you in on the first, most important secret of proper CakePHP building: nothing matters until you have your database.

“Sure,” you say. “Of course I know I need a database to build a project from.” No. I’m not saying you need one. I’m saying that until you’ve got your database designed, built, tested, benchmarked and completely sound, there’s really no point in building the CakePHP aspect of it. This may not be of monumental importance to those doing simple things, such as building a rudimentary blog or content management system (why aren’t you using WordPress or Drupal?). But since the real purpose of a framework like CakePHP is to develop applications other than those already provided an Open Source solution, the majority of developers do need to consider this all-important fact before they start hacking away at some high-value project or another.

Really, its at the moment when you begin to think about developing with Cake that you take your first important step towards being some kind of database administrator. CakePHP will only provide the presentation layer for your application, along with some basic calculations and such. The real meat of most applications will happen inside the database, or at least, the efficiency with which you design your database will determine the ability to gather the data you’ve previously stored and therefore determine the efficiency of your site to its users.

In my opinion, this is especially true because CakePHP’s methods for dealing with data tend to be something on the less-efficient side of the curve. Regardless of your opinion on the efficiency of CakePHP’s models, there is no question that those models are meant to reflect – not influence! – your database design.  As your understanding of your database increases, your understanding of how best to form the models will also increase at the same rate. Once you can easily extract the right data in the right order directly from MySQL (or phpMyAdmin, if you like), you’re well on your way towards creating efficient, practical code in your models and controllers to deal with same.

Thus it pays to spend a lot of time carefully planning out your database and normalizing it to the best of your abilities. When considering table structures, give a bit of thought to the following:

  1. What are the broad-strokes, raw data types you will be using? Orders? SKUs? Ratings? Users? This helps determine what the tables aught to be.
  2. What are the attributes of each data type that you need to capture? Think deeply on this one. Is the attribute in question really associated to the data type, or is there a better data type you need? Is there a new table to add in/modify?
  3. Math matters! Performing mathematical formulas on data with MySQL is more efficient in many cases than drawing the data out and then performing math, but it means that query results cannot be cached. Which method works best for your database and application depends greatly on your circumstances. Plan and test this aspect of your system very carefully.
  4. Math matters, part the second: if performing mathematical queries seems the best option for certain functions of your system, consider isolating those attributes of a given data type in their own table. Basically, you might get a performance boost from dividing a subject into two tables, one containing the non-math attributes and the other containing the math attributes.
  5. Have you considered cache tables? Even those things, like COUNT(*), which don’t seem like math, strictly speaking, take efficiency away from your database. Just plain old queries take time. Perhaps those tables that get hit hard with queries should have summary cache tables to support them? A multi-table, multi-join query might for example be run every ten minutes, with the data and time of update saved to a cache table for every query in between.
  6. Indexes! Its the ever-present challenge of database design: having enough indexes for enough query types, but not too many. A good tool for this is to just start writing down the anticipated queries you think you’ll be running on the data. What fields make sense to index in this case? Can you use values like timestamps to serve as primary keys instead of single-use IDs? Can you combine fields into a single index and have that be a relevant index for a sufficient number of queries?

And planning is only one part of the process of getting a nice database on which to build an application. The next step is to fill it full of data (even just garbage data, as long as it conforms to what you’d expect in your application) and start running queries against the tables. How well do they perform with a gigabyte or so of data to sift through? Can you knock together a few new indexes and make it perform better?

Of course, as an application evolves, you may well find that the design you started with is not adequate over time. This happens. But starting with something tuned precisely to your needs will get you a lot farther once you get to the CakePHP side of things, either way.