Thread: bad select performance for where (x=1 or x=3)
[PostgreSQL 6.5.0 on i586-pc-linux-gnu, compiled by gcc egcs-2.91.66] I have a 'select' including 'where (x=1 or x=3)' that takes 16 times as long as with just 'where x=1'. Here's a (somewhat simplified) example: table opset_steps (name text, id int2, ver int2) [1400 rows] non-unique index is on (id, ver) table run_opsets (status int2, id int2, ver int2, run_id int2, seq int2) [17000 rows] pkey is (id, seq), second index on(status, id, ver, run_id) select count(*) from run_opsets where status=1; --> 187 select count(*) from run_opsets where status=3; --> 10564 table runs (run_name text, run_id int2, status int2) [900 rows] pkey is run_name, second index(run_id, status) I have vacuum analyzed all relevant tables. This query takes 16 seconds (without the explain of course): explain select os.name,r.run_name,ro.status from opset_steps os,runs r,run_opsets ro where (ro.status=1 or ro.status=3)and ro.opset_id=os.opset_id and ro.run_id=r.run_id and ro.opset_ver=os.opset_ver and r.status=1; Hash Join (cost=1793.58 rows=14560 width=38) -> Hash Join (cost=1266.98 rows=14086 width=24) -> Seq Scan on run_opsets ro (cost=685.51 rows=13903 width=8) -> Hash (cost=70.84 rows=1389 width=16) -> Seq Scan on opset_steps os (cost=70.84 rows=1389 width=16) -> Hash (cost=47.43 rows=374 width=14) -> Seq Scan on runs r (cost=47.43 rows=374 width=14) This query takes just under one second:[diff is status=1 instead of (1 or 3)] explain select os.name,r.run_name,ro.status from opset_steps os,runs r,run_opsets ro where ro.status=1 and ro.opset_id=os.opset_idand ro.run_id=r.run_id and ro.opset_ver=os.opset_ver and r.status=1; Hash Join (cost=1359.57 rows=7719 width=38) -> Hash Join (cost=1051.39 rows=7467 width=24) -> Seq Scan on run_opsets ro (cost=685.51 rows=7370 width=8) -> Hash (cost=70.84 rows=1389 width=16) -> Seq Scan on opset_steps os (cost=70.84 rows=1389 width=16) -> Hash (cost=47.43 rows=374 width=14) -> Seq Scan on runs r (cost=47.43 rows=374 width=14) Why should it take over 16 times as long for (status=1 or 3) as for status=1? I have indexes on exactly all the fields used in the where clause. I tried "status in (1,3)" with no improvement. George Young, Rm. L-204 gry@ll.mit.edu MIT Lincoln Laboratory 244 Wood St. Lexington, Massachusetts 02420-9108 (781) 981-2756
George Young <gry@ll.mit.edu> writes: > table run_opsets > (status int2, id int2, ver int2, run_id int2, seq int2) [17000 rows] > pkey is (id, seq), second index on(status, id, ver, run_id) > select count(*) from run_opsets where status=1; --> 187 > select count(*) from run_opsets where status=3; --> 10564 > Why should it take over 16 times as long for (status=1 or 3) as for status=1? Offhand it looks like the former would produce 57 times as many possible rows from the run_opsets table as the latter (187+10564 vs 187), which the system would then have to try to match against the other tables. You didn't say how many tuples actually get returned, but certainly the number of iterations through each of the join loops is likely to be much higher. I'm surprised the cost differential isn't more than 16:1. A more interesting question might be "why doesn't the system's cost estimator realize that the second case will be much cheaper?" The answer to that is that VACUUM ANALYZE doesn't keep sufficiently detailed statistics to let it realize that there are far more x=3 than x=1 rows. regards, tom lane
Hi there, I´ve a table with about 142000 rows like shown below and I want to set field "divis" to "unknown" by executing following update command: update ipacct set divis = 'unknown'; However this seems to take hours, I´ve a PII 350MHz with 192Mb memory and Quantum UIDE disk, is there a way for me to speed up the update process except for switching to a SCSI hard drive? Does indexing the field "divis" speed up the update performance? The load on the machine is about 1,5 to 2 all the time and it´s hacking on the hard drive like hell. I´ve the same problem when I do "UPDATE ipacct SET traf = (src+dst)", it takes hours. Is this normal? Is there another way to do this? However reading from the database doing select is very fast. Many thanks in advance for any hint, best regards, Gunnar Ingvi Thorisson CCIE 4826 Iceland gunni@if.is venus:~$ psql ipacct Welcome to the POSTGRESQL interactive sql monitor: Please read the file COPYRIGHT for copyright terms of POSTGRESQL [PostgreSQL 6.5.0 on i686-pc-linux-gnulibc1, compiled by gcc 2.7.2.3] [cut] ipacct=> select * from ipacct where id = 1; id| datetime|pcol|saddr |sport|daddr |dport|pcnt| size|if |login |mac |src|dst|traf|depart|divis --+---------+----+-----------------+-----+------------+-----+----+-----+---- +-------+------------+---+---+----+------+----- 1|928368167| 6|209.85.127.151/32| 80|10.10.2.3/32| 2448| 57|74856|eth0|unknown|00805FC1525C|100| 3| 103| | (1 row) ipacct=> update ipacct set divis = 'unknown'; The table: ID INT4 UNIQUE, DATETIME TEXT, PCOL INTEGER, SADDR CIDR, SPORT INTEGER, DADDR CIDR, DPORT INTEGER, PCNT INT4 SIZE INT4; IF TEXT; LOGIN TEXT; MAC TEXT; SRC INTEGER; DST INTEGER; TRAF INTEGER; DEPART TEXT; DIVIS TEXT;
"Gunnar Ingvi Thorisson" <gunni@if.is> writes: > I�ve a table with about 142000 rows like shown below and I want to > set field "divis" to "unknown" by executing following update command: > update ipacct set divis = 'unknown'; > However this seems to take hours, Well, updating 142000 rows is going to take a little while... Are you starting the postmaster with -o -F ? That makes for a pretty considerable speedup in most cases (at the cost of trouble if you have a system crash during an update). > Does indexing the field "divis" speed up the update performance? No, it would not help a query like that --- though if you added a clause like "where divis = 'oldvalue'" then an index would help to find the rows that need updated. Actually, every index you add *slows down* updates, since all the indexes must be updated along with the table. regards, tom lane