Get it All

It will doubtless seem over-obvious to those veterans out there, but for those who are diving into CakePHP for the first time, let a fellow first-timer let you in on the first, most important secret of proper CakePHP building: nothing matters until you have your database.

“Sure,” you say. “Of course I know I need a database to build a project from.” No. I’m not saying you need one. I’m saying that until you’ve got your database designed, built, tested, benchmarked and completely sound, there’s really no point in building the CakePHP aspect of it. This may not be of monumental importance to those doing simple things, such as building a rudimentary blog or content management system (why aren’t you using WordPress or Drupal?). But since the real purpose of a framework like CakePHP is to develop applications other than those already provided an Open Source solution, the majority of developers do need to consider this all-important fact before they start hacking away at some high-value project or another.

Really, its at the moment when you begin to think about developing with Cake that you take your first important step towards being some kind of database administrator. CakePHP will only provide the presentation layer for your application, along with some basic calculations and such. The real meat of most applications will happen inside the database, or at least, the efficiency with which you design your database will determine the ability to gather the data you’ve previously stored and therefore determine the efficiency of your site to its users.

In my opinion, this is especially true because CakePHP’s methods for dealing with data tend to be something on the less-efficient side of the curve. Regardless of your opinion on the efficiency of CakePHP’s models, there is no question that those models are meant to reflect – not influence! – your database design. ¬†As your understanding of your database increases, your understanding of how best to form the models will also increase at the same rate. Once you can easily extract the right data in the right order directly from MySQL (or phpMyAdmin, if you like), you’re well on your way towards creating efficient, practical code in your models and controllers to deal with same.

Thus it pays to spend a lot of time carefully planning out your database and normalizing it to the best of your abilities. When considering table structures, give a bit of thought to the following:

  1. What are the broad-strokes, raw data types you will be using? Orders? SKUs? Ratings? Users? This helps determine what the tables aught to be.
  2. What are the attributes of each data type that you need to capture? Think deeply on this one. Is the attribute in question really associated to the data type, or is there a better data type you need? Is there a new table to add in/modify?
  3. Math matters! Performing mathematical formulas on data with MySQL is more efficient in many cases than drawing the data out and then performing math, but it means that query results cannot be cached. Which method works best for your database and application depends greatly on your circumstances. Plan and test this aspect of your system very carefully.
  4. Math matters, part the second: if performing mathematical queries seems the best option for certain functions of your system, consider isolating those attributes of a given data type in their own table. Basically, you might get a performance boost from dividing a subject into two tables, one containing the non-math attributes and the other containing the math attributes.
  5. Have you considered cache tables? Even those things, like COUNT(*), which don’t seem like math, strictly speaking, take efficiency away from your database. Just plain old queries take time. Perhaps those tables that get hit hard with queries should have summary cache tables to support them? A multi-table, multi-join query might for example be run every ten minutes, with the data and time of update saved to a cache table for every query in between.
  6. Indexes! Its the ever-present challenge of database design: having enough indexes for enough query types, but not too many. A good tool for this is to just start writing down the anticipated queries you think you’ll be running on the data. What fields make sense to index in this case? Can you use values like timestamps to serve as primary keys instead of single-use IDs? Can you combine fields into a single index and have that be a relevant index for a sufficient number of queries?

And planning is only one part of the process of getting a nice database on which to build an application. The next step is to fill it full of data (even just garbage data, as long as it conforms to what you’d expect in your application) and start running queries against the tables. How well do they perform with a gigabyte or so of data to sift through? Can you knock together a few new indexes and make it perform better?

Of course, as an application evolves, you may well find that the design you started with is not adequate over time. This happens. But starting with something tuned precisely to your needs will get you a lot farther once you get to the CakePHP side of things, either way.