Phase Two Development: Order Matters!
Another post in my Development Walk-Through Series, following me through the development of my own application to share my thoughts and observations. Please consider subscribing to my feed if you find this information helpful.
Now that you’re working on your Models in Phase Two of your development process, you have a second opportunity to trim up your database. Once you start putting together CakePHP queries, it quickly becomes obvious that CakePHP will arrange your queries according to it’s own sense of MySQL rather than what you might have originally thought. This means that the database development you did in Phase One will have to be altered a bit.
But the way you put together queries is also important. For example, when assembling the ['conditions'] array within your query, the order in which you specify the conditions is directly reflected in the way the conditions are built into the query.
Why is this important? Well for a start, because if you’re using any multi-column indexes, those indexes rely on the order of columns in the query being in the same order as they appear in the index. Without the correct order, they simply cannot be used. I don’t have any proof to back this up, but it stands to reason that this fact may point the way to another tweaking possibility: that the order of columns probably matters regardless of whether or not you use multi-column indexes.
The take away from this is that when you create queries in the CakePHP manner, it’s important to look at the resultant query and make sure it’s behaving the way you expected it to. If you can tweak the query, do it. If you cannot, it’s probably time to rethink some of your optimization strategy to better reflect the reality of the system in which you are working.
Testing Your Models: Phase Two Development
If you’ve been reading through my Development Walk-Through Series, this is Part II. Part I can be found here.
I’ve spent long hours tweaking my database, reading up on relatively arcane MySQL tuning techniques, filling the database with dummy values, querying, checking, requerying. I’m reasonably certain that the database structure I’ve chosen is the best option for my application. At least, that’s the way it appears for now: databases always change over time, so it’s worth keeping a loose hold on your concept of “acceptable.” Now it’s time to start putting together the CakePHP side of the show.
And what’s the first step? Well, once you’ve uploaded your files and established a connection to the database, it’s time to start building the models1. Models are the CakePHP expression of your database structure and relationships. By properly building and configuring your database model files, you can speed the development of new database queries by allowing CakePHP to do a lot of the lexical heavy lifting. Instead of spending time getting frustrated trying to figure out what’s wrong with this code:
SELECT ratings.rating ratings.category FROM `ratings_entries` JOIN `ratings` ON ratings.rating_id = ratings_entries.rating_id WHERE ratings_entries.cmp_id = %cmp% AND ratings_entries.zipcode = %zip% AND ratings.cat_id = %cat%"
You can simply create the relationship in rating.php, add the WHERE conditions to an array of variables and let CakePHP build the query for you2.
But there is an inherent difficulty and almost contradiction to the way CakePHP’s documentation explains building your first app. Namely, the CakePHP docs show you how to build a model, then a controller and finally three separate view files for each individual database table. This is a simple setup aimed at producing a working model, not an optimal example. For a start, you may not necessarily need to use controllers or views for every single database table. Beyond that, it is not at all practical to have to create a model, controller and view for every single table just to find out if the models you’re building are working or not. The contradiction is that, while models, controllers and views should be independent, the example forces them into an uncomfortable marriage.
The solution for your model testing needs is to create one controller and use the $uses variable to its best advantage, including all the models you need into a single location. This may be – and almost certainly will be – entirely too many models for any one practical controller in your application. But this is a short-cut method aimed at bringing in all your data models and working with them. Whereas the example application shows you the following basic configuration:
var $uses = array('Ratings');
You can instead load all your models into this single controller:
var $uses = array('Ratings','RatingEvents','KitchenSink');
Now you can test your basic queries and make sure they respond the way you expect. You can have a quick check for lexical errors in your PHP without having to build three other PHP files before you get started. In short, you can develop just the models and leave the controller and view development where they belong: in the controller and view development steps.
Calculation with PHP or MySQL?
I must stress at the outset that this post is merely my thinking out loud about a subject that has been on my mind with my new CakePHP project for some time. I do not claim any particular prowess with MySQL especially, but have discovered something rather interesting while trying to tune up my database.
I have made the point in posts past that building a new application with CakePHP is all about creating a proper database. Well, over the course of months (since, having a full-time job, I don’t get to work on my project as much as I’d like), I’ve been developing what I’d hoped would be a well-normalized and efficient database. However, the question remains unanswered until you’re able to build a test fixture and start benchmarking your queries. In my case, since the new project is in fact a ratings system, the application relies heavily on performing averages.
Thus one additional question is: does calculating in MySQL perform better or worse than calculating in PHP? It stands to reason that there might be some performance variance between the two, just because they’re two different things and therefore one might assume that they might approach the same relatively simple task in different ways. However, using my test fixture, I’ve found that at least in the relatively simple case of doing averages, there’s really no performance difference between the two.
The test fixture I created selects a random but still predictably-valid set of criteria for a given query and performs the query one hundred times in a row. In the case of the MySQL calculations, it then simply returns the microtime difference between the start and end of the query each time (to create a stopwatch effect). In the case of the PHP calculation routine, it performs the averaging calculation and then returns the same stopwatch value. In each case, the stop watch times are then averaged after the loop is finished to arrive at the final average calculation time. This test is then performed ten times in a row to account for variations between the various random samplings. Pseudocode is below:
for 10 iterations:
for 100 iterations:
get microtime1
query database
(if php) average values
get microtime2
stopwatch[] = microtime2-microtime1
end iteration
average stopwatch values
return stopwatch average
end iteration
It should be noted that the database I’m using is very simple and holds a relatively trivial amount of data. I filled it with over half a million records among 7 tables, but the lion’s share are in the tables that require calculation, which is just two tables. The database after all that data still only holds about 19Mb of data. Its also worth noting that the table on which I’m performing calculations holds only an ID, a category ID and the calculated data. This was by design because of MySQL performance tips provided in this well-viewed video by MySQL guru Jay Pipes.
I may find that other configurations, such as adding or removing indexes, may yield different results. I’ll be posting my findings as I go, since that’s what this blog is all about.
It’s All About the Database
It will doubtless seem over-obvious to those veterans out there, but for those who are diving into CakePHP for the first time, let a fellow first-timer let you in on the first, most important secret of proper CakePHP building: nothing matters until you have your database.
“Sure,” you say. “Of course I know I need a database to build a project from.” No. I’m not saying you need one. I’m saying that until you’ve got your database designed, built, tested, benchmarked and completely sound, there’s really no point in building the CakePHP aspect of it. This may not be of monumental importance to those doing simple things, such as building a rudimentary blog or content management system (why aren’t you using WordPress or Drupal?). But since the real purpose of a framework like CakePHP is to develop applications other than those already provided an Open Source solution, the majority of developers do need to consider this all-important fact before they start hacking away at some high-value project or another.
Really, its at the moment when you begin to think about developing with Cake that you take your first important step towards being some kind of database administrator. CakePHP will only provide the presentation layer for your application, along with some basic calculations and such. The real meat of most applications will happen inside the database, or at least, the efficiency with which you design your database will determine the ability to gather the data you’ve previously stored and therefore determine the efficiency of your site to its users.
In my opinion, this is especially true because CakePHP’s methods for dealing with data tend to be something on the less-efficient side of the curve. Regardless of your opinion on the efficiency of CakePHP’s models, there is no question that those models are meant to reflect – not influence! – your database design. As your understanding of your database increases, your understanding of how best to form the models will also increase at the same rate. Once you can easily extract the right data in the right order directly from MySQL (or phpMyAdmin, if you like), you’re well on your way towards creating efficient, practical code in your models and controllers to deal with same.
Thus it pays to spend a lot of time carefully planning out your database and normalizing it to the best of your abilities. When considering table structures, give a bit of thought to the following:
- What are the broad-strokes, raw data types you will be using? Orders? SKUs? Ratings? Users? This helps determine what the tables aught to be.
- What are the attributes of each data type that you need to capture? Think deeply on this one. Is the attribute in question really associated to the data type, or is there a better data type you need? Is there a new table to add in/modify?
- Math matters! Performing mathematical formulas on data with MySQL is more efficient in many cases than drawing the data out and then performing math, but it means that query results cannot be cached. Which method works best for your database and application depends greatly on your circumstances. Plan and test this aspect of your system very carefully.
- Math matters, part the second: if performing mathematical queries seems the best option for certain functions of your system, consider isolating those attributes of a given data type in their own table. Basically, you might get a performance boost from dividing a subject into two tables, one containing the non-math attributes and the other containing the math attributes.
- Have you considered cache tables? Even those things, like COUNT(*), which don’t seem like math, strictly speaking, take efficiency away from your database. Just plain old queries take time. Perhaps those tables that get hit hard with queries should have summary cache tables to support them? A multi-table, multi-join query might for example be run every ten minutes, with the data and time of update saved to a cache table for every query in between.
- Indexes! Its the ever-present challenge of database design: having enough indexes for enough query types, but not too many. A good tool for this is to just start writing down the anticipated queries you think you’ll be running on the data. What fields make sense to index in this case? Can you use values like timestamps to serve as primary keys instead of single-use IDs? Can you combine fields into a single index and have that be a relevant index for a sufficient number of queries?
And planning is only one part of the process of getting a nice database on which to build an application. The next step is to fill it full of data (even just garbage data, as long as it conforms to what you’d expect in your application) and start running queries against the tables. How well do they perform with a gigabyte or so of data to sift through? Can you knock together a few new indexes and make it perform better?
Of course, as an application evolves, you may well find that the design you started with is not adequate over time. This happens. But starting with something tuned precisely to your needs will get you a lot farther once you get to the CakePHP side of things, either way.
2 comments so far (is that a lot?)
One App, Many Subdomains: My Solution
The application that I’m in the process of building is designed around the idea of large metropolitan areas being grouped together in their own subdomains. So for example, I would like all people in the Rochester, NY metro area to go to rochester.example.com and all the Buffalo traffic to go to buffalo.example.com. So, I had to figure out how, exactly, I was going to go about doing this and I started looking around. After some trials and tribulations, I’m going to commit my process to the blog for the benefit of those who are searching for similar solutions.
I began my quest by reading this post at the CakePHP Bakery. A very good article and well documented, but it has a few problems for my application. Primarily, this example creates completely separate instances of the application for each domain, with different databases in each instance. That’s not at all what I had in mind for my application, which I expect to be cross-pollinating.
Apache and Your Domain Host
It is important to note at this point that hosted services such as BlueHost or 1and1.com will not be able to accomodate this kind of setup. Or at least, it will require a lot of work that is beyond the scope of this article. But I wanted to quickly cover the needed parameters for Apache and your domain host, assuming you’re using a VPS of some kind. You’ll need to setup any additional requirements with your web host as needed.
You can setup subdomains in one of two ways: either setting each domain up on your host separately, or else using a wild card to cover all the bases. I used this second option, which with GoDaddy.com, requires the use of an ANAME record. Note that if you use a wildcard, this will necessitate having some sort of “catch all” clause somewhere in your code to accomodate erroneous subdomains.
As for Apache, you’ll need to setup your VirtualHost entries for each anticipated subdomain. I personally use two VirtualHost files: one for my domain and a default file for stuff that comes in that I didn’t intend, which points to a generic “Whoopsies!” kind of message. The VirtualHost entries for your subdomains can just point to your Cake directory like so:
<VirtualHost *:80>
ServerName domain1.example.com
DocumentRoot /var/www/path/to/cake
</VirtualHost>
<VirtualHost *:80>
ServerName domain2.example.com
DocumentRoot /var/www/path/to/cake
</VirtualHost>
. . . and so on. Remember to reload Apache once this is done to get it to work.
CakePHP Part the First: Bootstrap.php
Based on the suggestion from the Bakery article I initially read, I setup my bootstrap.php file for subdomains. The bootstrap runs before everything else and is a great way to define some basic variables. Since many different controllers and methods will be available on a single domain, it’s important to at least verify that the correct operation is happening on the correct subdomain. Thus I defined a CLIENT_NAME per the original Bakery example, though with a few modifications, here:
if ( isset($_SERVER['SERVER_NAME']) ) {
// define values that should NOT be affected by this test:
$subdomain = substr($_SERVER["HTTP_HOST"], 0, strpos($_SERVER["HTTP_HOST"], "."));
$donot = array(
'productionserver',
'developmentserver',
'cake'
'www');
if (!in_array($subdomain, $donot)) {
define('CLIENT_NAME', $subdomain);
} else {
define('CLIENT_NAME', 'home');
}
}
As you can see, I’ve used an array to define some known values that I’d like the system to define as “home.” If a person comes to productionserver.com, I obviously want them to be routed as coming to the home page. I also want to avoid misinterpreting my development server’s name as a metro, so I’ve specified a couple possible dev domain names as “home,” as well. Even though it is my habit to avoid using the ‘www’ on a domain name, for security’s sake, I’ve also defined this as “home” as well. The default option if none of these values is identified assumes that the segment of the URL passed is going to be a valid metro area and specify that as the current CLIENT_NAME.
This provides me the verification that I need whenever a new request is processed for a subdomain. But I have a controller to handle metro requests which I’d like to use in place of the generic pageController I use for the home page. For this, I need to use the Routes configuration:
CakePHP Part the Second: Routes Configuration
routes.php is a file you use to define what controllers get used with what URL strings. This is a hugely flexible and powerful piece of code which I have found you should use sparingly and cautiously. But there’s no question that Route configuration can be your best friend when building a functional application.
In our case, we need to snag the subdomain name once more and if it is indeed a subdomain, call the metroController with the host name as the first parameter. If this code looks familiar, that’s because it should! It’s practically the same code as the bootstrap.php code:
$subdomain = substr($_SERVER["HTTP_HOST"], 0, strpos($_SERVER["HTTP_HOST"], "."));
// define values that should NOT be affected by this test:
$donot = array(
'potholepatrol',
'holisticnetworking',
'cake');
if (!in_array($subdomain, $donot)) {
Router::connect('/', array('controller' => 'metros', 'action' => 'view', $subdomain));
} else {
Router::connect('/', array('controller' => 'pages', 'action' => 'home'));
}
Why not just see if the ‘CLIENT_NAME’ is defined? Well, because these are two separate checks on the same thing, and we don’t want to allow an error in one to affect the other.
This concludes my setup example for using CakePHP with multiple domains. I hope other developers find it interesting and useful. Please add any comments you have if you think there’s a better way!
2 comments so far (is that a lot?)
Hello world!
Welcome to Holisticnetworking.net. This is your first post. Edit or delete it, then start blogging!