Thread: bad select performance for where (x=1 or x=3)

bad select performance for where (x=1 or x=3)

From

George Young

Date:

20 July 1999, 10:03:12

[PostgreSQL 6.5.0 on i586-pc-linux-gnu, compiled by gcc egcs-2.91.66]

I have a 'select' including 'where (x=1 or x=3)' that takes 16 times as long
as with just 'where x=1'.  Here's a (somewhat simplified) example:

table opset_steps
      (name text, id int2, ver int2) [1400 rows]
      non-unique index is on (id, ver)

table run_opsets
      (status int2, id int2, ver int2, run_id int2, seq int2) [17000 rows]
      pkey is (id, seq), second index on(status, id, ver, run_id)
      select count(*) from run_opsets where status=1; --> 187
      select count(*) from run_opsets where status=3; --> 10564

table runs
      (run_name text, run_id int2, status int2) [900 rows]
      pkey is run_name, second index(run_id, status)

I have vacuum analyzed all relevant tables.

This query takes 16 seconds (without the explain of course):
  explain select os.name,r.run_name,ro.status from opset_steps os,runs r,run_opsets ro where (ro.status=1 or
ro.status=3)and ro.opset_id=os.opset_id and ro.run_id=r.run_id and ro.opset_ver=os.opset_ver and r.status=1; 

Hash Join  (cost=1793.58 rows=14560 width=38)
  ->  Hash Join  (cost=1266.98 rows=14086 width=24)
        ->  Seq Scan on run_opsets ro  (cost=685.51 rows=13903 width=8)
        ->  Hash  (cost=70.84 rows=1389 width=16)
              ->  Seq Scan on opset_steps os  (cost=70.84 rows=1389 width=16)
  ->  Hash  (cost=47.43 rows=374 width=14)
        ->  Seq Scan on runs r  (cost=47.43 rows=374 width=14)


This query takes just under one second:[diff is status=1 instead of (1 or 3)]
  explain select os.name,r.run_name,ro.status from opset_steps os,runs r,run_opsets ro where ro.status=1 and
ro.opset_id=os.opset_idand ro.run_id=r.run_id and ro.opset_ver=os.opset_ver and r.status=1; 

Hash Join  (cost=1359.57 rows=7719 width=38)
  ->  Hash Join  (cost=1051.39 rows=7467 width=24)
        ->  Seq Scan on run_opsets ro  (cost=685.51 rows=7370 width=8)
        ->  Hash  (cost=70.84 rows=1389 width=16)
              ->  Seq Scan on opset_steps os  (cost=70.84 rows=1389 width=16)
  ->  Hash  (cost=47.43 rows=374 width=14)
        ->  Seq Scan on runs r  (cost=47.43 rows=374 width=14)


Why should it take over 16 times as long for (status=1 or 3) as for status=1?
I have indexes on exactly all the fields used in the where clause.
I tried "status in (1,3)" with no improvement.


George Young,  Rm. L-204        gry@ll.mit.edu
MIT Lincoln Laboratory
244 Wood St.
Lexington, Massachusetts  02420-9108    (781) 981-2756

Re: [SQL] bad select performance for where (x=1 or x=3)

From

Tom Lane

Date:

20 July 1999, 10:54:56

George Young <gry@ll.mit.edu> writes:
> table run_opsets
>       (status int2, id int2, ver int2, run_id int2, seq int2) [17000 rows]
>       pkey is (id, seq), second index on(status, id, ver, run_id)

>       select count(*) from run_opsets where status=1; --> 187
>       select count(*) from run_opsets where status=3; --> 10564

> Why should it take over 16 times as long for (status=1 or 3) as for status=1?

Offhand it looks like the former would produce 57 times as many possible
rows from the run_opsets table as the latter (187+10564 vs 187), which
the system would then have to try to match against the other tables.
You didn't say how many tuples actually get returned, but certainly the
number of iterations through each of the join loops is likely to be much
higher.  I'm surprised the cost differential isn't more than 16:1.

A more interesting question might be "why doesn't the system's cost
estimator realize that the second case will be much cheaper?"  The
answer to that is that VACUUM ANALYZE doesn't keep sufficiently detailed
statistics to let it realize that there are far more x=3 than x=1 rows.

            regards, tom lane

Bad update performance?

From

"Gunnar Ingvi Thorisson"

Date:

20 July 1999, 20:04:02

Hi there,

I´ve a table with about 142000 rows like shown below and I want to set field
"divis" to "unknown" by executing following update command:
update ipacct set divis = 'unknown';

However this seems to take hours, I´ve a PII 350MHz with 192Mb memory and
Quantum UIDE disk, is there a way for me to speed up the update process
except  for switching to a SCSI hard drive? Does indexing the field "divis"
speed up the update performance?

The load on the machine is about 1,5 to 2 all the time and it´s hacking on
the hard drive like hell.
I´ve the same problem when I do "UPDATE ipacct SET traf = (src+dst)", it
takes hours. Is this normal? Is there another way to do this?

However reading from the database doing select is very fast.

Many thanks in advance for any hint,
best regards,
Gunnar Ingvi Thorisson

CCIE 4826
Iceland
gunni@if.is



venus:~$ psql ipacct
Welcome to the POSTGRESQL interactive sql monitor:
  Please read the file COPYRIGHT for copyright terms of POSTGRESQL
[PostgreSQL 6.5.0 on i686-pc-linux-gnulibc1, compiled by gcc 2.7.2.3]

[cut]

ipacct=> select * from ipacct where id = 1;
id| datetime|pcol|saddr            |sport|daddr       |dport|pcnt| size|if
|login  |mac         |src|dst|traf|depart|divis
--+---------+----+-----------------+-----+------------+-----+----+-----+----
+-------+------------+---+---+----+------+-----
 1|928368167|   6|209.85.127.151/32|   80|10.10.2.3/32| 2448|
57|74856|eth0|unknown|00805FC1525C|100|  3| 103|      |
(1 row)

ipacct=> update ipacct set divis = 'unknown';

The table:
   ID               INT4  UNIQUE,
   DATETIME         TEXT,
   PCOL             INTEGER,
   SADDR            CIDR,
   SPORT            INTEGER,
   DADDR            CIDR,
   DPORT            INTEGER,
   PCNT             INT4
   SIZE             INT4;
   IF               TEXT;
   LOGIN            TEXT;
   MAC              TEXT;
   SRC              INTEGER;
   DST              INTEGER;
   TRAF             INTEGER;
   DEPART           TEXT;
   DIVIS            TEXT;

Re: [SQL] Bad update performance?

From

Tom Lane

Date:

21 July 1999, 09:40:14

"Gunnar Ingvi Thorisson" <gunni@if.is> writes:
> I�ve a table with about 142000 rows like shown below and I want to
> set field "divis" to "unknown" by executing following update command:
> update ipacct set divis = 'unknown';
> However this seems to take hours,

Well, updating 142000 rows is going to take a little while...

Are you starting the postmaster with -o -F ?  That makes for a pretty
considerable speedup in most cases (at the cost of trouble if you
have a system crash during an update).

> Does indexing the field "divis" speed up the update performance?

No, it would not help a query like that --- though if you added a clause
like "where divis = 'oldvalue'" then an index would help to find the
rows that need updated.  Actually, every index you add *slows down*
updates, since all the indexes must be updated along with the table.

            regards, tom lane