Wednesday, March 28, 2012

Performance Quirks of MySQL Information Schema

I decided to start documenting my experiences related to what I do. And what I do, is all kinds of tech related stuff at Enkora. I was considering starting a blog for a while, but somehow I could never get around to it. I was never into writing and always hated literature classes, but one's gotta learn!

Anyways, here goes the first post!

MySQL tends to provide an infinite source of entertainment. This time it was the information schema. Making queries against the information schema can be incredibly slow, and our application is of course caching them all. However, once the cache expires, it could take up to 30 seconds or more to repopulate the cache.

First, I found out that one needs to specify the table schema very explicitly, since if MySQL decides to go through all the schemas, it will take a while. Adding a schema table condition to all the tables did help, however there was one remaining query that stayed slow.

Dima Tisnek (my colleague and a local Python guru) was doing occasional testing/development against his own copy of the system, which meant that the cache was pretty empty most of the time. This annoyed him greatly. He asked me yet again to do something about the terrible waiting times and suggested more caching.

Here is the offending query:

SELECT
 r.table_name,
 r.column_name,
 r.referenced_table_name,
 r.referenced_column_name,
 c.column_key = 'PRI' as is_primary
FROM information_schema.KEY_COLUMN_USAGE r
LEFT JOIN information_schema.COLUMNS c
 USING (table_name, column_name, table_schema)
WHERE r.table_schema = 'some_schema' AND
 c.table_schema = 'some_schema' AND
 r.referenced_table_name = 'some_table'

After running EXPLAIN it became clear that the second table is joined with no keys! Apparently MySQL cannot use indexes for joins of information schema tables. This means that if one has a lot of table schemas running on one MySQL instance, the query would take a while.

I did not manage to make MySQL use the correct key (or actually any key) for information_schema.COLUMNS table and almost gave up on it, when it came to me: temporary tables! The solution was to copy the relevant rows of the  information_schema.COLUMNS into a temporary table, and join information_schema.KEY_COLUMN_USAGE with the temporary table.

Here is the final query:

DROP TEMPORARY TABLE IF EXISTS information_schema_tmp_columns;

CREATE TEMPORARY TABLE information_schema_tmp_columns
SELECT * FROM information_schema.COLUMNS c
WHERE c.table_schema = :table_schema;

SELECT
 r.table_name,
 r.column_name,
 r.referenced_table_name,
 r.referenced_column_name,
 c.column_key = 'PRI' as is_primary
FROM information_schema.KEY_COLUMN_USAGE r
LEFT JOIN information_schema_tmp_columns c
 USING (table_name, column_name, table_schema)
WHERE r.table_schema = 'some_schema' AND
 r.referenced_table_name = 'some_table'

This SQL works orders of magnitude faster! Just because it seems like there's a bug in the way MySQL is joining the information schema tables.

For the record I was using MySQL 5.5.14 .

No comments:

Post a Comment