[nycphp-talk] MySQL doubt
Ophir Prusak
prusak at gmail.com
Sat Jan 15 17:01:36 EST 2005
thanx for the explanations everyone!
fyi, i did the following two queries which use the same columns but
return a different number of results. Indeed the %30 rule seems to be
the case.
mysql> explain select rev_regdate from review where rev_id > 30000;
+--------+-------+---------------+---------+---------+------+-------+------------+
| table | type | possible_keys | key | key_len | ref | rows |
Extra |
+--------+-------+---------------+---------+---------+------+-------+------------+
| review | range | PRIMARY | PRIMARY | 4 | NULL | 11032 |
where used |
+--------+-------+---------------+---------+---------+------+-------+------------+
1 row in set (0.04 sec)
mysql> explain select rev_regdate from review where rev_id > 10000;
+--------+------+---------------+------+---------+------+-------+------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra |
+--------+------+---------------+------+---------+------+-------+------------+
| review | ALL | PRIMARY | NULL | NULL | NULL | 43896 | where used |
+--------+------+---------------+------+---------+------+-------+------------+
1 row in set (0.01 sec)
On Sat, 15 Jan 2005 13:49:30 -0500, Mark Horton <mark at nostromo.net> wrote:
> Ophir Prusak wrote:
> > As long as we're on the subject, could someone shed some light on this behavior?
> > It caused me a good amount of hair pulling until I realized what mysql
> > was doing.
> >
> > Why doesn't MySQL use the primary key in the second query?
> > I would think that count doesn't care what column I use, but obviously it does.
>
> I think what many have said on this is accurate. I wanted to further
> say that the short answer to your question is that if the optimizer
> detects it will scan more than 30% of the index tree then it may decide
> not to use the index at all and do a full table scan. (According to the
> recent docs there are other variables taken into account, but I believe
> the 30% rule is a significant factor.)
>
> You table has 43894 rows and your where clause indicates it will scan
> about 75% of the index tree, thus your second query does a full table scan.
>
> However, the results of your first query can be wholly computed without
> touching the data at all, so it decides not to do a table scan. You are
> fetching rev_id and using rev_id in the where clause, so it gets
> everything it needs from the index tree. (It doesn't always work like
> this, but I believe it does for integer based indexes.)
>
> Why would it decide to do a full table scan? My guess is that it
> doesn't take into account that rev_id is an auto_increment column. So
> it doesn't know that it can simply start at 10001 and work its way up
> using the index. For all it knows your index could be a normal index
> with duplicates. I believe this is related to cardinality. IOW it
> decides it would be faster to only reference the data (1 disk seek)
> instead of the index tree then also fetch the data (possibly 2 disk
> seeks).
>
> You can hand optimize the second query with 'force index' if you want.
> I've also read that you can use 'max_seeks_for_key' to manipulate some
> of the behavior, although I've never used it myself.
>
> Mark
> _______________________________________________
> New York PHP Talk
> Supporting AMP Technology (Apache/MySQL/PHP)
> http://lists.nyphp.org/mailman/listinfo/talk
> http://www.nyphp.org
>
More information about the talk
mailing list