Re: feeding big script to psql - Mailing list pgsql-general
From | Peter Wilson |
---|---|
Subject | Re: feeding big script to psql |
Date | |
Msg-id | dcokg3$31d5$1@news.hub.org Whole thread Raw |
In response to | Re: feeding big script to psql (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: feeding big script to psql
|
List | pgsql-general |
Tom Lane wrote: > Peter Wilson <petew@yellowhawk.co.uk> writes: >> I found a while ago that after inserting a lot of rows into a clean >> Postgres table it would take several minutes just to analyse a command, >> not even starting the execution. > > Oh? Could you provide a test case for this? I can certainly believe > that the planner might choose a bad plan if it has no statistics, but > it shouldn't take a long time to do it. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster > Hi Tom, I've spent half an hour restoring the database from the backup I took. This was back in December '04. The database dump isabout 95Mbytes, none of the tables are particularly big. On investigation the problems occurs on 'EXPLAIN ANALYZE' - which is what pgadminIII does when you press the explain button.According to the manual, this actually executes the command and gives actual times (I think pgadmin is wrong here - if you ask it to explain a commandyou want the plan it's going to use - not wait for it to finish!) That said - I'll include the details here, at least partly because the tables are small in database terms, and I don't understandhow this query even with lots of scanning can take 100% CPU for over 10 minutes (654 seconds for explain/analyze 651 seconds for execute). OK - the query is: SELECT DISTINCT c.client_id, c.instance, c.contact_id, c.uname FROM contacts c WHERE c.client_id = 'gadget' AND c.instance = '0' AND ( c.type = 'COMMUNITY' OR c.type = 'OU' OR c.type = 'INDIVIDUAL' ) AND c.contact_id in ( SELECT subb.community_id FROM contact_att subb WHERE subb.client_id = 'gadget' AND subb.instance = '0' AND subb.contact_id = 3854.000000 ) ; By itself the sub-select executes in 235ms and yields a set that does not vary depending on the outer select - it's a constantset in effect. It would seem that in the worst case assuming no index or size information, the planner should spot the invariance of the subselectand a sequential scan of the 'contacts' table would be the worst result I would expect. There are two tables involved in this query. 'contacts' contains 3728 records. 'contact_att' contains 19217 records. The query plan yields : Unique (cost=12.05..12.06 rows=1 width=90) (actual time=654491.967..654491.984 rows=3 loops=1) -> Sort (cost=12.05..12.05 rows=1 width=90) (actual time=654491.958..654491.959 rows=3 loops=1) Sort Key: c.client_id, c.instance, c.contact_id, c.uname -> Nested Loop IN Join (cost=0.00..12.04 rows=1 width=90) (actual time=577763.884..654491.864 rows=3 loops=1) Join Filter: ("outer".contact_id = "inner".community_id) -> Index Scan using cos_uk on contacts c (cost=0.00..6.02 rows=1 width=90) (actual time=0.227..123.862 rows=2791loops=1) Index Cond: ((instance = '0'::bpchar) AND ((client_id)::text = 'gadget'::text)) Filter: ((("type")::text = 'COMMUNITY'::text) OR (("type")::text = 'OU'::text) OR (("type")::text ='INDIVIDUAL'::text)) -> Index Scan using ca_pk on contact_att subb (cost=0.00..6.01 rows=1 width=8) (actual time=0.207..234.423rows=3 loops=2791) Index Cond: ((instance = '0'::bpchar) AND ((client_id)::text = 'gadget'::text)) Filter: ((contact_id)::numeric = 3854.000000) Total runtime: 654492.320 ms Definitions for the two relevant tables are: create table contacts ( INSTANCE CHARACTER (1) NOT NULL , client_id varchar (50) not null , contact_id bigint default nextval('contacts_contact_id_seq'::text), UNAME VARCHAR (32) NOT NULL , TYPE VARCHAR (20) NOT NULL DEFAULT 'INDIVIDUAL', parent bigint, NAME VARCHAR (240) , PHONE VARCHAR (32) , FAX VARCHAR (32) , EMAIL VARCHAR (240) , BASIC_PW VARCHAR (128) , DESCRIPTION VARCHAR (240), custom_data varchar(8192), CONSTRAINT COS_PK PRIMARY KEY(INSTANCE,CLIENT_ID,CONTACT_ID) ) create table contact_att( instance character(1), client_id varchar(50) not null, contact_id bigint, community_id bigint, inherited smallint, CONSTRAINT CA_PK PRIMARY KEY(INSTANCE,CLIENT_ID,CONTACT_ID,COMMUNITY_ID) ) CREATE INDEX CO_PA_IND ON CONTACTS (PARENT); CREATE INDEX CO_TY_IND ON CONTACTS (TYPE); -- Hope you find that useful! If there is anything else I can provide you with then please do let me know. Pete ------------------------------------------------------------------------ Peter Wilson - YellowHawk Ltd, http://www.yellowhawk.co.uk
pgsql-general by date: