Thread: Problem with a rule on upgrade to v7.1.1

Problem with a rule on upgrade to v7.1.1

From

Jon Lapham

Date:

10 May 2001, 17:35:44

Hello all-

Yesterday I upgraded my database from Pg v7.1RC1 to v7.1.1.  Since this
upgrade, I have been having unbelievable performance problems with updates
to a particular table, and I've tracked the problem down to a rule within
that table.

I've enclosed a simple case study at the end of this email (the real
example is basically the same, except that there are many more fields in
the tables).  I will send the real table definitions if anyone thinks it
would be useful.

The problem is that in Pg v7.1RC1 (and previously with Pg v7.0.3) a simple
update to the child table, changing the boolean active='t' to active='f'
would be basically instantaneous.  Now, it takes about an hour.  The real
database has ~10000 records in total between the "child" and "parent"  
tables.

Basically, the rule "r_inactivate_child" below is the problem.  If I drop 
that rule, everything runs fast again.

The idea of this rule is to set active='f' in the parent table whenever
all of the children (things in the child table) are inactive.

Any suggestions would be *greatly* appreciated!  Thanks!

PS: Most likely the problem is in the design of the rule (I'm sure it 
could be done better), but I would remind you that these same updates were 
very, very fast in the older versions of Pg.

PSS: I'm running linux, kernel v2.4.4, RH7.1, homerolled PG.

-----------------------------------
Tables and rules:

CREATE TABLE parent ( parentid  int4 PRIMARY KEY, active    boolean 
);
CREATE TABLE child ( childid  int4 PRIMARY KEY, parentid int4 references parent(parentid),active   boolean 
);

CREATE RULE r_inactivate_childAS ON UPDATE TO child    WHERE NEW.active='f' AND OLD.active='t'DO UPDATE parent SET
active='f'   WHERE parentid=NEW.parentid        AND (SELECT count(*) FROM child        WHERE parentid=NEW.parentid AND
          childid<>NEW.childid AND active='t') = 0;
 

CREATE RULE r_activate_child AS ON UPDATE TO child    WHERE NEW.active='t' AND OLD.active='f'DO UPDATE parent SET
active='t'   WHERE parentid=NEW.parentid AND active='f';
 

-----------------------------------
Populate with data:
INSERT INTO parent (parentid, active) VALUES (1, 't');
INSERT INTO child  (childid, parentid, active) VALUES (1, 1, 't');
INSERT INTO child  (childid, parentid, active) VALUES (2, 1, 't');
INSERT INTO child  (childid, parentid, active) VALUES (3, 1, 't');

(note, you will need *a lot* more data like this to see the slow 
updates... but you get the idea, I hope).

-----------------------------------
Perform an update:
UPDATE child SET active='f' WHERE childid=2;

(this would take an hour on a ~8000 record child, ~3000 record parent 
database)


-----------------------------------
Explain:

test=# explain update child set active='t' where childid=2;
NOTICE:  QUERY PLAN:

Result  (cost=0.00..30020.00 rows=1000000 width=10) ->  Nested Loop  (cost=0.00..30020.00 rows=1000000 width=10)
-> Seq Scan on parent  (cost=0.00..20.00 rows=1000 width=10)       ->  Seq Scan on child  (cost=0.00..20.00 rows=1000
width=0)

NOTICE:  QUERY PLAN:

Nested Loop  (cost=0.00..49.28 rows=25 width=14) ->  Index Scan using child_pkey on child  (cost=0.00..8.16 rows=5 
width=4) ->  Index Scan using parent_pkey on parent  (cost=0.00..8.16 rows=5 
width=10)

NOTICE:  QUERY PLAN:

Index Scan using child_pkey on child  (cost=0.00..8.14 rows=10 width=14)

-- 

-**-*-*---*-*---*-*---*-----*-*-----*---*-*---*-----*-----*-*-----*---Jon LaphamExtracta Moléculas Naturais, Rio de
Janeiro,Brasilemail: lapham@extracta.com.br      web: http://www.extracta.com.br/
 
***-*--*----*-------*------------*--------------------*---------------

Re: Problem with a rule on upgrade to v7.1.1

From

Tom Lane

Date:

10 May 2001, 18:06:31

Jon Lapham <lapham@extracta.com.br> writes:
> Yesterday I upgraded my database from Pg v7.1RC1 to v7.1.1.  Since this
> upgrade, I have been having unbelievable performance problems with updates
> to a particular table, and I've tracked the problem down to a rule within
> that table.

Uh, have you VACUUM ANALYZEd yet?  Those EXPLAIN numbers look
suspiciously like default statistics ...
        regards, tom lane

Re: Problem with a rule on upgrade to v7.1.1

From

Jon Lapham

Date:

10 May 2001, 18:29:17

On Thu, May 10, 2001 at 05:56:11PM -0400, Tom Lane wrote:
> Jon Lapham <lapham@extracta.com.br> writes:
> > Yesterday I upgraded my database from Pg v7.1RC1 to v7.1.1.  Since this
> > upgrade, I have been having unbelievable performance problems with updates
> > to a particular table, and I've tracked the problem down to a rule within
> > that table.
> 
> Uh, have you VACUUM ANALYZEd yet?  Those EXPLAIN numbers look
> suspiciously like default statistics ...
> 
>             regards, tom lane

Nope, forgot to on the little demonstration tables I made.  I tacked the 
post-VACUUM ANALYZE explain results (they look much better) at the end of 
this email.

However, I did run a VACUUM ANALYZE on my real database.  And, just to be 
sure, I just ran it again.  The updates still take a very, very long time 
(actually it is about 12 minutes, not an hour as I previously stated, it 
just feels like an hour).

I also included the explain output for my real database (main_v0_8).

Thanks Tom!
-Jon

PS: anything else I should try?

---------------------------------
test=# vacuum analyze;
VACUUM
test=# explain update child set active='t' where 
childid=2;
NOTICE:  QUERY PLAN:

Result  (cost=0.00..2.07 rows=3 width=10) ->  Nested Loop  (cost=0.00..2.07 rows=3 width=10)       ->  Seq Scan on
parent (cost=0.00..1.01 rows=1 width=10)       ->  Seq Scan on child  (cost=0.00..1.03 rows=3 width=0)
 

NOTICE:  QUERY PLAN:

Nested Loop  (cost=0.00..2.07 rows=1 width=14) ->  Seq Scan on parent  (cost=0.00..1.01 rows=1 width=10) ->  Seq Scan
onchild  (cost=0.00..1.04 rows=1 width=4)
 

NOTICE:  QUERY PLAN:

Seq Scan on child  (cost=0.00..1.04 rows=1 width=14)

EXPLAIN

-------------------------------------------
main_v0_8=# VACUUM ANALYZE;
VACUUM
main_v0_8=# explain update tplantorgan set active='f' where 
sampleid=100430;
NOTICE:  QUERY PLAN:

Nested Loop  (cost=0.00..2243933.76 rows=1 width=239) ->  Seq Scan on tplantorgan  (cost=0.00..2243931.72 rows=1
width=4)      SubPlan         ->  Aggregate  (cost=258.96..258.96 rows=1 width=0)               ->  Seq Scan on
tplantorgan (cost=0.00..258.96 rows=1 
 
width=0) ->  Index Scan using tplant_pkey on tplant  (cost=0.00..2.03 rows=1 
width=235)
NOTICE:  QUERY PLAN:

Result  (cost=0.00..1112558.20 rows=31883520 width=235) ->  Nested Loop  (cost=0.00..1112558.20 rows=31883520
width=235)      ->  Seq Scan on tplant  (cost=0.00..167.80 rows=3680 width=235)       ->  Seq Scan on tplantorgan
(cost=0.00..215.64rows=8664 width=0)
 

NOTICE:  QUERY PLAN:

Seq Scan on tplantorgan  (cost=0.00..237.30 rows=1 width=103)

EXPLAIN


-- 

-**-*-*---*-*---*-*---*-----*-*-----*---*-*---*-----*-----*-*-----*---Jon LaphamExtracta Moléculas Naturais, Rio de
Janeiro,Brasilemail: lapham@extracta.com.br      web: http://www.extracta.com.br/
 
***-*--*----*-------*------------*--------------------*---------------

Re: Problem with a rule on upgrade to v7.1.1

From

Tom Lane

Date:

10 May 2001, 18:47:41

Jon Lapham <lapham@extracta.com.br> writes:
>> Uh, have you VACUUM ANALYZEd yet?  Those EXPLAIN numbers look
>> suspiciously like default statistics ...

> Nope, forgot to on the little demonstration tables I made.  I tacked the 
> post-VACUUM ANALYZE explain results (they look much better) at the end of 
> this email.

Ah.  The plans for the little demo tables are uninteresting anyway,
but the plans for the real tables are interesting.

Next question: do you still have your 7.0.* DB up?  Can you get an
EXPLAIN that shows how it did it (on the real tables)?
        regards, tom lane

Re: Problem with a rule on upgrade to v7.1.1

From

"Jon Lapham"

Date:

10 May 2001, 22:08:16

Tom Lane <tgl@sss.pgh.pa.us> escreveu:
> [snip]
> Next question: do you still have your 7.0.* DB up?  Can you get an
> EXPLAIN that shows how it did it (on the real tables)?

Setting it up right now... unfortunately I will have to do a recompile /
reinstall as I have rm -r'ed the old versions.  So, this may take a little bit
of time, as I will also have to remove a few minor Pg v7.1 only items
enhancements to the tables.  I'll get back to you soon.

I should be able to also give you the EXPLAIN data for a v7.1RC1 db.

-- 
-**-*-*---*-*---*-*---*-----*-*-----*---*-*---*-----*-----*-*-----*---Jon LaphamExtracta Mol�culas Naturais, Rio de
Janeiro,Brasilemail: lapham@extracta.com.br      web: http://www.extracta.com.br/
 
***-*--*----*-------*------------*--------------------*---------------

Re: Problem with a rule on upgrade to v7.1.1

From

Jon Lapham

Date:

11 May 2001, 09:27:39

On Thu, May 10, 2001 at 06:44:39PM -0400, Tom Lane wrote:
> Next question: do you still have your 7.0.* DB up?  Can you get an
> EXPLAIN that shows how it did it (on the real tables)?

Tom-

Okay.  I started from a clean slate, by recompiling both Pgv7.1.1 and
Pgv7.1RC1, initdb'ing each (after appropriately changing /etc/ld.so.conf,
running ldconfig, etc, etc), and restoring my real DB from a previously
created dump file.  I didn't do Pgv7.0.3 b/c I think it may be unnecessary
since 7.1RC1 doesn't show this problem, while 7.1.1 does.  But, if you
really think it necessary, I will repeat his using 7.0.3.

Notes: 
1) As usual, the 7.1RC1 returns from the "UPDATE ... " command as fast
as I press enter.  The 7.1.1 returns from the "UPDATE ... " command in
about 10 minutes.
2) The two explains are identical.
3) Both updates succeed, it is only the time difference that is the 
problem
4) Running "UPDATE tplantorgan SET active='t' WHERE sampleid=100430;" 
(setting the boolean to true, instead of false) is instantaneous for both 
7.1RC1 and 7.1.1
5) There are 8664 and 3680 tuples in the "tplantorgan" and "tplant" tables 
respectively.  So this is a relatively small DB.

-Jon

The actual results:
----------------------------------
Pg v7.1RC1 (restored from 2001-05-10 db dump):

main_v0_8=# vacuum ANALYZE ;
VACUUM
main_v0_8=# explain update tplantorgan set active='f' where 
sampleid=100430;
NOTICE:  QUERY PLAN:

Nested Loop  (cost=0.00..2243933.76 rows=1 width=239) ->  Seq Scan on tplantorgan  (cost=0.00..2243931.72 rows=1
width=4)      SubPlan         ->  Aggregate  (cost=258.96..258.96 rows=1 width=0)               ->  Seq Scan on
tplantorgan (cost=0.00..258.96 rows=1 
 
width=0) ->  Index Scan using tplant_pkey on tplant  (cost=0.00..2.03 rows=1 
width=235)
NOTICE:  QUERY PLAN:

Result  (cost=0.00..1112558.20 rows=31883520 width=235) ->  Nested Loop  (cost=0.00..1112558.20 rows=31883520
width=235)      ->  Seq Scan on tplant  (cost=0.00..167.80 rows=3680 width=235)       ->  Seq Scan on tplantorgan
(cost=0.00..215.64rows=8664 width=0)
 

NOTICE:  QUERY PLAN:

Seq Scan on tplantorgan  (cost=0.00..237.30 rows=1 width=103)

EXPLAIN
main_v0_8=# update tplantorgan set active='f' where sampleid=100430;
UPDATE 1

----------------------------------
Pg v7.1.1 (restored from 2001-05-10 db dump):

main_v0_8=# VACUUM ANALYZE ;
VACUUM
main_v0_8=# explain update tplantorgan set active='f' where 
sampleid=100430;
NOTICE:  QUERY PLAN:

Nested Loop  (cost=0.00..2243933.76 rows=1 width=239) ->  Seq Scan on tplantorgan  (cost=0.00..2243931.72 rows=1
width=4)      SubPlan         ->  Aggregate  (cost=258.96..258.96 rows=1 width=0)               ->  Seq Scan on
tplantorgan (cost=0.00..258.96 rows=1 
 
width=0) ->  Index Scan using tplant_pkey on tplant  (cost=0.00..2.03 rows=1 
width=235)
NOTICE:  QUERY PLAN:

Result  (cost=0.00..1112558.20 rows=31883520 width=235) ->  Nested Loop  (cost=0.00..1112558.20 rows=31883520
width=235)      ->  Seq Scan on tplant  (cost=0.00..167.80 rows=3680 width=235)       ->  Seq Scan on tplantorgan
(cost=0.00..215.64rows=8664 width=0)
 

NOTICE:  QUERY PLAN:

Seq Scan on tplantorgan  (cost=0.00..237.30 rows=1 width=103)

EXPLAIN
main_v0_8=# update tplantorgan set active='f' where sampleid=100430;
UPDATE 1
main_v0_8=# select active from tplantorgan where sampleid=100430;active 
--------f
(1 row)

-- 

-**-*-*---*-*---*-*---*-----*-*-----*---*-*---*-----*-----*-*-----*---Jon LaphamExtracta Moléculas Naturais, Rio de
Janeiro,Brasilemail: lapham@extracta.com.br      web: http://www.extracta.com.br/
 
***-*--*----*-------*------------*--------------------*---------------

Re: Problem with a rule on upgrade to v7.1.1

From

Tom Lane

Date:

11 May 2001, 12:29:46

Jon Lapham <lapham@extracta.com.br> writes:
> I didn't do Pgv7.0.3 b/c I think it may be unnecessary
> since 7.1RC1 doesn't show this problem, while 7.1.1 does.  But, if you
> really think it necessary, I will repeat his using 7.0.3.

No, that seems like useless work.

> 1) As usual, the 7.1RC1 returns from the "UPDATE ... " command as fast
> as I press enter.  The 7.1.1 returns from the "UPDATE ... " command in
> about 10 minutes.
> 2) The two explains are identical.

Oh, *that's* interesting.

> 5) There are 8664 and 3680 tuples in the "tplantorgan" and "tplant" tables 
> respectively.  So this is a relatively small DB.

Would you be willing to send me a dump of the whole DB (or at least the
tables needed for this query)?
        regards, tom lane

Re: Problem with a rule on upgrade to v7.1.1

From

Tom Lane

Date:

11 May 2001, 14:34:45

Jon Lapham <lapham@extracta.com.br> writes:
> But, there is definitely something wrong here, b/c the rule that is 
> causing this *should* only need to run the subselect [SELECT count(*) FROM 
> tplantorgan WHERE tplantid=NEW.tplantid AND sampleid<>NEW.sampleid AND 
> active='t'] one time!  My understanding is that the first conditional in 
> the statement (WHERE tplantid=NEW.tplantid) would be evaluated before the 
> subselect, and there is only 1 tuple in which tplantid=NEW.tplantid.

Well, I've figured it out.  The problem is that the expensive subselect
is actually being evaluated *first* among the several available WHERE
clauses, and thus it's run at basically every row of tplantorgan.  The
tplantid=NEW.tplantid clause does not help because it's a join clause
and is not evaluated until we do the join between tplant and
tplantorgan.  The subselect is a restriction clause and so is evaluated
lower down in the plan tree.

There are other restriction clauses available, however: OLD.active='t'
from the rule qual and sampleid=100430 from the original query both
end up in the same list of restriction clauses for the tplantorgan scan.
But in 7.1.1 they get put behind the subselect clause.  Had they come
first, the subselect would get evaluated at very few tplantorgan rows.

Postgres has never been particularly careful about the ordering of WHERE
clauses, and depending on release version, phase of the moon, etc etc
it's perfectly possible that the subselect would have ended up last in
the other versions you happened to try.  I was able to make current
sources run quickly by backing out the rev 1.59 change seen at
http://www.ca.postgresql.org/cgi/cvsweb.cgi/pgsql/src/backend/optimizer/plan/initsplan.c
This explains why you saw different results between 7.1RC1 and 7.1.1.
There was probably some other change between 7.0.2 and 7.0.3 that caused
7.0.3 to put the clauses in the "right" order whereas 7.0.2 didn't, but
I don't feel like trawling the revision history right now to find it.

The long-term solution to this is for the planner to pay attention to
the execution cost of WHERE clauses and try to put the expensive ones
last in whatever list they end up in.

Meanwhile, I don't really recommend that you hack up the code to reverse
the ordering yet again.  The query is a mess anyway, and rewriting it
seems the better pathway.

> I'm beginning to suspect that my rule is just simply designed poorly...

Yes.  Why not replace both of those rules with

ON UPDATE to tplantorgan DO UPDATE tplant SET active = EXISTS (SELECT 1 FROM tplantorgan WHERE
tplantid=NEW.tplantidAND active) WHERE tplantid=NEW.tplantid;

which seems a lot more obvious as well as quicker.  BTW, an index on
tplantorgan(tplantid) would likely help too...
        regards, tom lane

Re: Problem with a rule on upgrade to v7.1.1

From

Jon Lapham

Date:

11 May 2001, 16:07:58

Tom-

Thanks for the detective work.  This makes perfect sense now, and explains 
the radically different behaviour between 7.1RC1 and 7.1.1.  I will change 
my rules to not have to depend on Pg choosing which WHERE clause to 
evaluate first.

-Jon

PS: (Talking *way* above my head now) Would be possible to have 
EXPLAIN flag this type of problem?  Remember, the EXPLAIN output for 
7.1RC1 and 7.1.1 were identical.

-- 

-**-*-*---*-*---*-*---*-----*-*-----*---*-*---*-----*-----*-*-----*---Jon LaphamExtracta Moléculas Naturais, Rio de
Janeiro,Brasilemail: lapham@extracta.com.br      web: http://www.extracta.com.br/
 
***-*--*----*-------*------------*--------------------*---------------

Re: Problem with a rule on upgrade to v7.1.1

From

Tom Lane

Date:

12 May 2001, 21:59:07

Jon Lapham <lapham@extracta.com.br> writes:
> PS: (Talking *way* above my head now) Would be possible to have 
> EXPLAIN flag this type of problem?  Remember, the EXPLAIN output for 
> 7.1RC1 and 7.1.1 were identical.

Yeah, because EXPLAIN doesn't show the individual qual clauses attached
to each plan node.  EXPLAIN VERBOSE does ... but it's, um, too verbose
for most people.

I've speculated to myself about designing some intermediate level of
EXPLAIN display that would show things like qual clauses and indexscan
conditions in a readable fashion (unlike EXPLAIN VERBOSE).  It could use
the ruleutils.c code to produce the output, so there's not that much
coding involved, just some tasteful selection of exactly what details to
show and how to format the output.  But I haven't gotten around to
actually doing anything ...
        regards, tom lane