I must stress at the outset that this post is merely my thinking out loud about a subject that has been on my mind with my new CakePHP project for some time. I do not claim any particular prowess with MySQL especially, but have discovered something rather interesting while trying to tune up my database.
I have made the point in posts past that building a new application with CakePHP is all about creating a proper database. Well, over the course of months (since, having a full-time job, I don’t get to work on my project as much as I’d like), I’ve been developing what I’d hoped would be a well-normalized and efficient database. However, the question remains unanswered until you’re able to build a test fixture and start benchmarking your queries. In my case, since the new project is in fact a ratings system, the application relies heavily on performing averages.
Thus one additional question is: does calculating in MySQL perform better or worse than calculating in PHP? It stands to reason that there might be some performance variance between the two, just because they’re two different things and therefore one might assume that they might approach the same relatively simple task in different ways. However, using my test fixture, I’ve found that at least in the relatively simple case of doing averages, there’s really no performance difference between the two.
The test fixture I created selects a random but still predictably-valid set of criteria for a given query and performs the query one hundred times in a row. In the case of the MySQL calculations, it then simply returns the microtime difference between the start and end of the query each time (to create a stopwatch effect). In the case of the PHP calculation routine, it performs the averaging calculation and then returns the same stopwatch value. In each case, the stop watch times are then averaged after the loop is finished to arrive at the final average calculation time. This test is then performed ten times in a row to account for variations between the various random samplings. Pseudocode is below:
[code]
for 10 iterations:
for 100 iterations:
get microtime1
query database
(if php) average values
get microtime2
stopwatch[] = microtime2-microtime1
end iteration
average stopwatch values
return stopwatch average
end iteration
[/code]
It should be noted that the database I’m using is very simple and holds a relatively trivial amount of data. I filled it with over half a million records among 7 tables, but the lion’s share are in the tables that require calculation, which is just two tables. The database after all that data still only holds about 19Mb of data. Its also worth noting that the table on which I’m performing calculations holds only an ID, a category ID and the calculated data. This was by design because of MySQL performance tips provided in this well-viewed video by MySQL guru Jay Pipes.
I may find that other configurations, such as adding or removing indexes, may yield different results. I’ll be posting my findings as I go, since that’s what this blog is all about.