A couple of weeks ago I posted an alternative to ORDER BY RAND() when using MySQL which uses MySQL’s LIMIT syntax to jump to a record chosen at random based on the number of records in the table. I received a Tweet earlier this week which pointed out that LIMIT is slow when dealing with large offsets so take a look at this here.
Table used
I tested this on INNODB and MyISAM versions of the same table with 170429 records. The INNODB table uses 172 MB and the MyISAM 105 MB. This is real data and not some random database table I created for the purposes of this post, although it’s not the sort of table you’d normally pull random records out from.
Using LIMIT to fetch the last record
The best way to see how slow it can potentially be is to use LIMIT to select the very last record. By doing this we’ll get the maximum time it will take to run this query if the last record is the one selected at random.
The first query gets the count from the table:
SELECT COUNT(*) FROM mytable;
This returned 170432. Now to select the last record (note the offset is 1 less than the total number of records):
SELECT * FROM mytable LIMIT 170431, 1;
I ran this a number of times and took around 1.8 seconds for INNODB and 0.5 seconds for MyISAM on the machine I ran it on. This is far too slow if it’s something that might be called frequently in a web application, although it might be a little better on some beefier hardware.
Comparing LIMIT with ORDER BY RAND()
The next thing to do was to benchmark the result from using LIMIT with ORDER BY RAND() to see if had been correct in my previous post that using LIMIT really is faster. So:
SELECT * FROM mytable ORDER BY RAND() LIMIT 1
I ran this query a number of times as well, and it took around 33 seconds for INNODB and 30 seconds for MyISAM each time. So clearly using LIMIT is much faster although it’s still not a suitable solution if random data needs to be selected frequently in an on demand application.
Another alternative
I have another alternative which is almost instant, even on large tables but it does require an adjustment to the table and some additional business logic for your application. However it does solve the issue with speed of random records on a large table if you need random data frequently. This will be posted this time next week.