Apr 24, 2008 @ 10:36 am

For many of us, databases are no fun… surely not as fun as ActiveRecord or Rails in general. Every railers I know love the “dot notation” offered by ActiveRecord associations immensely… while refering to the vast majority of raw SQL queries as “Some ugly non-sense gibberish that we would all prefer not know the existence”.

Our hate towards the SQL language and databases is not a valid excuse to not assume our responsabilities as application developers. Recently I was working on a Rails plugin when the sentiment of being incompetent struck my body exactly like the lightning strikes an adventurous golfer who happily plays golf during a thunderstorm.

It all happened when I had a query that was generating a lot of results… and was slow as hell to process… even though I was using the :include option for Eager Loading.

@stuff = ParentStuff.find(:all, :include => :child_stuff, :order => “created_at DESC”)

Then, in my view, I had something like :

  1. @stuff.each do |parent_stuff|
  2.   bla bla bla <%=parent_stuff.name%>
  3.   parent_stuff.child_stuff.each do |stuff|
  4.      bla bla bla <%=stuff.title%>
  5.   end
  6. end

It was awfully slow! I started looking at the logs and saw this typical monstruous sql query. Look at it… you have to look at it. I know you don’t want to… but it won’t go away :

SELECT parent_stuff.`id` AS t0_r0, parent_stuff.`field1` AS t0_r1, parent_stuff.`field2` AS t0_r2, parent_stuff.`field3` AS t0_r3, parent_stuff.`field4` AS t0_r4, child_stuff.`id` AS t1_r0, child_stuff.`field5` AS t1_r1, child_stuff.`field6` AS t1_r2, child_stuff.`field7` AS t1_r3, child_stuff.`field8` AS t1_r4, child_stuff.`field9` AS t1_r5, child_stuff.`field10` AS t1_r6, child_stuff.`field11` AS t1_r7 FROM parent_stuff LEFT OUTER JOIN child_stuff ON child_stuff.the_foreign_key = parent_stuff.id WHERE (parent_stuff.created_at >= ‘2008-04-24 00:00:00′ AND parent_stuff.created_at <= '2008-04-24 09:44:56') ORDER BY parent_stuff.created_at

What's the problem with this? It's just one query... cannot take that much time. I thought it was a "Rails problem" until I ran the query directly in a MySql web interface. It took the same amount of time, meaning that the problem was within the query itself. Oh sh**... it was a database problem! Like many Rails Developers, my database skills are, while not inexistant, inadequate.

I tried to figure out what could be the problem with this query. I replaced every LEFT OUTER JOIN with INNER JOIN and Boom! the query executed in less than half a second.

My urgent desire of leaving the MySql web interface made me go back to my Rails code. I then tried some stupid random stuff like :

@stuff = ParentStuff.find(:all, :include => :child_stuff, :joins => “INNER JOIN child_stuff ON child_stuff.parent_stuff_id=id” :order => “created_at DESC”) but the slow LEFT OUTER JOINs remain in the query instead of being replace by the faster INNER JOIN. Then, I learned that in Rails 2.0, you could do Eager Loading with INNER JOIN by passing an association name to the :joins option, like that :

@stuff = ParentStuff.find(:all, :joins => :child_stuff, :order => “created_at DESC”)

Problem is I was working on a plugin… and I’d like it to be compatible with older versions of Rails. So I did what I was scared of doing since the beginning : Going to some MySql forums to read the advice of some DBA’s …

Then I found the answer to my problem… and this answer proved me that I needed to stop playing blindly with ActiveRecord like if the backend database was none of my business…. because it is. The answer is so obvious… but given the fact that I never wanted to care about databases since Rails exists, I didn’t think about it :

Answer : Add an index to the foreign key in the child table!

This is just one case that shows the dangers of not being aware of what’s going on in the database. It’s sad, but you cannot blindly let ActiveRecord manage everything that is Database related for you. It’s only a question of being in control of the entire application… and the relational database is a part of it.

UPDATE
This post has kind of turned against me. The only thing I wanted to say with this article is that ActiveRecord needs our supervision… not that I am a moron who can’t understand databases. I know what joins are, I know what indexes are… but in most situations I don’t have to “care” about them with ActiveRecord. Sorry but I felt the need to defend myself about this issue :)

Bookmark this post : These icons link to social bookmarking sites where readers can share and discover new web pages.
  • DZone
  • Reddit
  • del.icio.us
  • Digg
  • Furl
  • Technorati
  • StumbleUpon
Rate this post :
1 Star2 Stars3 Stars4 Stars5 Stars (14 votes, average: 3.57 out of 5)
Loading ... Loading ...
Posted under : Easy reading
19 Comments
MyAvatars 0.2

How can we be so unrespectful about databases? I’ve been working in Rails since 1 year and never I looked at the indexes that are created (or not). Shame on me…

Comment by : Dan
— April 24, 2008 @ 10:58 am

MyAvatars 0.2

Nice article. I’m working with databases full time so I definitely appreciate indexes and using correct joins. Jamis Buck covered this issue a while back here (should be mandatory reading for rails developers) http://www.therailsway.com/2006/11/21/tracks-part-4

Comment by : Shai
— April 24, 2008 @ 11:21 am

MyAvatars 0.2

Some related reading, I did a presentation on ActiveRecord gotchas with include/select/find_by_sql. Not only joins, but limiting your selects are really important here http://work.rowanhick.com/2008/02/12/how-to-avoid-hanging-yourself-with-rails/

Comment by : Rowan
— April 24, 2008 @ 11:43 am

MyAvatars 0.2

It’s very true that ActiveRecord causes too many Rails coders to not think about the database. ActiveRecord can be painfully slow

One question: Is it right that in many cases, switching an outer join to an inner join could cause different data to be selected from the database?

Comment by : Robin Watson
— April 24, 2008 @ 12:17 pm

MyAvatars 0.2

@Robin: yes, they can provide different results. Here’s a pretty good write up on the subject: http://www.wellho.net/mouth/158_MySQL-LEFT-JOIN-and-RIGHT-JOIN-INNER-JOIN-and-OUTER-JOIN.html

This is why I like the stuff that RedHill on Rails has. Allowing me to add my indexes and foreign key associations in my migrations.

Comment by : kabari
— April 24, 2008 @ 12:34 pm

MyAvatars 0.2

You seem to have run into the same problem like many other developers. Just because you’re doing Rails, that doesn’t mean you don’t have to understand the works _behind_ Rails. Every web developer working with any programming language or framework should be at least familiar with the basics of joins (i.e. differences between outer and inner joins), basic database optimization (like using indexes) and how to track inefficient queries (using EXPLAIN in MySQL, for example).
Read. Learn. Remember. ;-)

Comment by : Clemens Kofler
— April 24, 2008 @ 1:19 pm

MyAvatars 0.2

[…] other news, most Rails apps are slow because the developers are ignorant of database basics. Not slamming the blog’s author, but it is a good example of how easy it is to mess up a Rails […]


MyAvatars 0.2

On the other hand, you lose a lot of the ActiveRecord advantages if you pay *too much* attention to the database. Don’t optimize prematurely. But yeah — if there’s slowness, tail the db log and look for inefficiencies.

Comment by : Joe Grossberg
— April 24, 2008 @ 1:43 pm

MyAvatars 0.2

[…] have just read two posts (here and here), both of them discussing the sometimes formidable challenges faced when attempting to […]


MyAvatars 0.2

After reading this post and seeing the comment “don’t optimize prematurely”, many database developers and seasoned programmers are shaking their heads right now.

I’ve seen many a code (in rails, python, or any language for that matter) that uses an ORM library which doesn’t generate efficient SQL. This is fine, as long as those developer using the ORM know what the generated SQL is and how to make it more efficent. The guys from iBATIS have a good idea of how to interface with a DB, but it’s missing the glamour of ActiveRecord. Sometimes custom SQL is the best option, regardless of how ugly it makes the ORM/ActiveRecord code look.

Comment by : typicalrunt
— April 24, 2008 @ 2:02 pm

MyAvatars 0.2

Rails DOES care about the database, in fact Rails cares more than most other web frameworks out there. ActiveRecord is designed to keep you CLOSE to SQL, so you never forget that in the end that’s what being executed.

You should think of ActiveRecord as something to help you construct and issue SQL commands that would be otherwise complex and troublesome to write, not as an excuse to not know anything about your database.

Comment by : Pazu
— April 24, 2008 @ 4:25 pm

MyAvatars 0.2

Rails is a calculator, and the database is math. Teach a child math before giving them a calculator!

Comment by : NIck Fessel
— April 24, 2008 @ 9:15 pm

MyAvatars 0.2

I got exactly the opposite from you I do a great merise relationship model (i try …) but a sh***s rails models…
I learn merise with a teacher that doesn’t know ruby nor rails …(and who think c# THE programming language ..)so i start with complicated diagrams but poor(inefficient) relation in my apps.
well sometimes it’s hard to make rails/relationship diagrams match…

Comment by : florentin
— April 24, 2008 @ 11:40 pm

MyAvatars 0.2

Indexes are good for more than speeding up SELECTs, as I’m sure you’re aware. One ActiveRecord feature bit me pretty nicely. The validates_uniqueness_of constraint is definitely not guaranteed as the skateboard book says on p. 369. Further, it does a SELECT every time it does an INSERT. This can be costly. A “guaranteed” way to enforce uniqueness and to help with efficiency is to create a unique index on the field(s) in question at the database level. You’ll need to trap the database error yourself though. YMMV.

Comment by : Michael
— April 25, 2008 @ 8:33 am

MyAvatars 0.2

Thanks everyone for your comments

The morale of the story for me is that you cannot blindly delegate everything that is “database related” to ActiveRecord or any other O/R mapping tool. Maybe one day tools like ActiveRecord will be so advanced that playing at the database level will not be a necessity anymore … but this time hasn’t come yet. Until then we have to keep an eye on the generated queries from time to time as well as do some optimizing when our app ask for it.

You know what? Before switching to Rails I was a convinced .NET (1.1) developer that used to write all his SQL queries by hand. I remember how much it was repetitive and tedious to do this for every single project. I hated it so much that I made the mistake of embracing ActiveRecord with too much passion when I made the switch to Rails. There is an old saying that love is blind… I should have think about it.

Comment by : Frank
— April 25, 2008 @ 12:20 pm

MyAvatars 0.2

You should ignore the database, until you can’t ignore it any longer. I’ve seen the SQL that Rails has generated and figured I could write better by hand. But then I’ve seen the assembler generated by C compilers and reckoned that I could write better by hand. That is not the point though. Ultimately what you are missing here is a tool on the database that *tells* you when you should put an index on something. I’ve alwasy been surpirsed at how poor the tooling is at the database level. Perhaps there is too much ‘hand optimisation’ going on, not too ittle

Comment by : Neil Wilson
— April 27, 2008 @ 6:44 am

MyAvatars 0.2

Do you really know the difference between ‘inner’ and ‘outer’ joins? They are quite striking and you can’t replace one for the other with no adequate reasoning. Rails jas both :include and :joins keyword (and :joins accept an array or strings as well) and you have to do the reasoning on what do you want. It seems to me that you don’t know those differences and what do you really want.

Comment by : AkitaOnRails
— April 27, 2008 @ 1:58 pm

MyAvatars 0.2

@Akita,

*sigh*… enough with this already… this is getting on my nerves. I know the difference between INNER and OUTER. A left outer join includes the rows in the right tables even if there are no match (when it happens the columns in the right table will all be filled with NULLs) while an INNER join only returns the rows “that matches”. In my specific case it just could not happen that a row from the right table does not match a row from the left.

Is this explanation convince you that I know what joins are or do you still need more clarifications?

Comment by : Frank
— April 28, 2008 @ 9:01 am

MyAvatars 0.2

This is so very true: Rails is pretty damned magic, but sometimes the rubber meets the road.

SQL makes me laugh since a design goal was to be usable by typical business people … yeah, right. I consider myself a black-belt SQLer (arguable, but invite smackdowns from any interested combatant), and despite my >20 years experience with the language, I still find SQL to be the most infuriating and obtuse syntax on the face of the earth (this includes C and C++, as well as perl). Rails insulates us in perhaps 90% of normal cases from the gnarly details. The problem is those 10% of the rest.

I wonder if there’s a need for a Rails “sniff check” of a given controller action, or query? Wouldn’t it be cool if Rails warned “this query takes a really long time — try these things before hiring a $250/hour consultant: 1) use eager loading, 2) check to make sure these columns are indexed (”foo”, “bar”, “fubar”), and 3) learn SQL, you lazy, incompetent so-called software engineer. OK, maybe just 2 things.

We dealt with this issue in the last few days for a really complicated polymorphic query. It is as real as it gets. Rails doesn’t solve all problems … yet.

Comment by : Green Rails
— April 29, 2008 @ 7:20 pm




Leave a comment
Name (required)
Email (will not be publish) (required)
Website