Thread: foreign key constraint, planner ignore index.
Greetings, List. Environment: Linux, (PostgreSQL) 8.3beta4 or (PostgreSQL) 8.2.4, same results. Billing database with two tables. 1. Small table with nodes (23 rows) inms=> \d nodes Table "public.nodes" Column | Type | Modifiers -------------+------------------------+-------------------------------------------------- id | integer | not null default nextval('nodesidseq'::regclass) description | character varying(256) | identifier | character varying(256) | not null Indexes: "nodes_pkey" PRIMARY KEY, btree (id) "NodeIdentifierIndex" UNIQUE, btree (identifier) inms=> analyze verbose nodes; INFO: analyzing "public.nodes" INFO: "nodes": scanned 1 of 1 pages, containing 23 live rows and 4 dead rows; 23 rows in sample, 23 estimated total rows 2. Large table with collected traffic ( 15795383 rows ) inms=> \d sf_ipv4traffic Table "public.sf_ipv4traffic" Column | Type | Modifiers ---------------------+--------------------------+----------- timeframe | integer | not null timemark | timestamp with time zone | not null node | integer | not null source_address | bytea | not null source_port | integer | not null destination_address | bytea | not null destination_port | integer | not null protocol_type | integer | not null octets_counter | bigint | packets_counter | integer | Indexes: "sf_ipv4traffic_pkey" PRIMARY KEY, btree (timeframe, timemark, node, source_address, source_port, destination_address, destination_port, protocol_type) "fki_nodes" btree (node) "sf_ipv4traffic_idx" btree (source_port, timeframe, source_address) "sf_ipv4traffic_idx1" btree (timeframe, node, timemark) "sf_ipv4traffic_idx3" btree (destination_address, destination_port, timeframe) "sf_ipv4traffic_idx4" btree (protocol_type, timeframe) Foreign-key constraints: "nodes" FOREIGN KEY (node) REFERENCES nodes(id) ON UPDATE RESTRICT ON DELETE RESTRICT "sf_ipv4traffic_timeframe_fkey" FOREIGN KEY (timeframe) REFERENCES sf_timeframes(id) ON UPDATE CASCADE ON DELETE RESTRICT inms=> ANALYZE verbose sf_ipv4traffic; INFO: analyzing "public.sf_ipv4traffic" INFO: "sf_ipv4traffic": scanned 3000 of 162839 pages, containing 291000 live rows and 0 dead rows; 3000 rows in sample, 15795383 estimated total rows Problem is. Planner ignore index when delete some node from nodes tables. Test script: begin; --set enable_seqscan to off; delete from decimalnodeattributes where node=2003; delete from stringnodeattributes where node=2003; delete from datenodeattributes where node=2003; delete from topology where fromnode=2003 or tonode=2003; explain analyze delete from nodes where id=2003; rollback; QUERY PLAN ----------------------------------------------------------------------------------------------- Seq Scan on nodes (cost=0.00..1.29 rows=1 width=6) (actual time=0.046..0.047 rows=1 loops=1) Filter: (id = 2003) Trigger for constraint booleannodeattributes_node_fkey: time=1.315 calls=1 Trigger for constraint datenodeattributes_node_fkey: time=0.361 calls=1 Trigger for constraint decimalnodeattributes_node_fkey: time=0.288 calls=1 Trigger for constraint node: time=28.109 calls=1 Trigger for constraint nodes: time=71011.395 calls=1 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Trigger for constraint snmp_nodes_access_nodeid_fkey: time=372.504 calls=1 Trigger for constraint stringnodeattributes_node_fkey: time=7.008 calls=1 Trigger for constraint topology_fromnode_fkey: time=0.368 calls=1 Trigger for constraint topology_tonode_fkey: time=0.274 calls=1 Total runtime: 71430.159 ms (12 rows) ------- !!!!!!!! --------------- Trigger for constraint nodes: time=71011.395 calls=1 -------------------------------- But if, turn off <seqscan>, same test begin; set enable_seqscan to off; -- !!! delete from decimalnodeattributes where node=2003; delete from stringnodeattributes where node=2003; delete from datenodeattributes where node=2003; delete from topology where fromnode=2003 or tonode=2003; explain analyze delete from nodes where id=2003; rollback; Index Scan using nodes_pkey on nodes (cost=0.00..8.27 rows=1 width=6) (actual time=0.029..0.033 rows=1 loops=1) Index Cond: (id = 2003) Trigger for constraint booleannodeattributes_node_fkey: time=1.365 calls=1 Trigger for constraint datenodeattributes_node_fkey: time=0.359 calls=1 Trigger for constraint decimalnodeattributes_node_fkey: time=0.252 calls=1 Trigger for constraint node: time=28.197 calls=1 Trigger for constraint nodes: time=1.911 calls=1 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Trigger for constraint snmp_nodes_access_nodeid_fkey: time=0.611 calls=1 Trigger for constraint stringnodeattributes_node_fkey: time=0.310 calls=1 Trigger for constraint topology_fromnode_fkey: time=0.351 calls=1 Trigger for constraint topology_tonode_fkey: time=0.289 calls=1 Total runtime: 42.930 ms (12 rows) ------ Additional info: Termination Ctrl+C - during execution of psql dumps BEGIN DELETE 1 DELETE 1 DELETE 1 DELETE 2 Cancel request sent psql:test.sql:7: ERROR: canceling statement due to user request CONTEXT: SQL statement "SELECT 1 FROM ONLY "public"."sf_ipv4traffic" x WHERE $1 OPERATOR(pg_catalog.=) "node" FOR SHARE OF x" --- And if i'm execute same statement without access to nodes table planer chose to use index "fki_nodes"!!! explain analyze SELECT 1 FROM ONLY "public"."sf_ipv4traffic" x WHERE 2003 OPERATOR(pg_catalog.=) "node" FOR SHARE OF x; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------- Index Scan using fki_nodes on sf_ipv4traffic x (cost=0.00..9.65 rows=1 width=6) (actual time=0.019..0.019 rows=0 loops=1) Index Cond: (2003 = node) Total runtime: 0.089 ms (3 rows) --- Any suggesions?, thanks. -- __________________________________ WBR, Andrew Nesheret ICQ:10518066
Andrew Nesheret wrote: > > And if i'm execute same statement without access to nodes table planer > chose to use index "fki_nodes"!!! > > explain analyze SELECT 1 FROM ONLY "public"."sf_ipv4traffic" x WHERE > 2003 OPERATOR(pg_catalog.=) "node" FOR SHARE OF x; > > QUERY > PLAN > ---------------------------------------------------------------------------------------------------------------------------- > > Index Scan using fki_nodes on sf_ipv4traffic x (cost=0.00..9.65 rows=1 > width=6) (actual time=0.019..0.019 rows=0 loops=1) > Index Cond: (2003 = node) > Total runtime: 0.089 ms > (3 rows) If you PREPARE .... then EXECUTE the same query, does it still use the index? The only thing I can think of is that the trigger is planning the query to work for any given value and you have a lot of rows with e.g. node=2004. -- Richard Huxton Archonet Ltd
Richard Huxton wrote: > If you PREPARE .... then EXECUTE the same query, does it still use the > index? Done, same result. --- Code --- stmt = connection.prepareStatement( "explain analyze SELECT 1 FROM ONLY sf_ipv4traffic x WHERE ? OPERATOR(pg_catalog.=) node FOR SHARE OF x" ); stmt.setLong( 1, 2004 ); rs = stmt.executeQuery(); while(rs.next()) { System.out.println( rs.getString(1) ); } -------------- Output: Index Scan using fki_nodes on sf_ipv4traffic x (cost=0.00..9.65 rows=1 width=6) (actual time=0.063..0.063 rows=0 loops=1) Index Cond: ($1 = node) Total runtime: 0.131 ms ---- Test again with node=165 Seq Scan on sf_ipv4traffic x (cost=0.00..360281.29 rows=15795383 width=6) (actual time=0.041..105833.129 rows=15795376 loops=1) Filter: ($1 = node) Total runtime: 414955.170 ms Planner/optimiser choice different strategies on different node IDS! But 'foreign key constraint checker' always use one strategy. > The only thing I can think of is that the trigger is planning the > query to work for any given value and you have a lot of rows with e.g. > node=2004. > Child table sf_ipv4traffic, contains only ONE value (in node column) and this node never deleted ALL 15 millions rows referenced to node 156 inms=> select node from sf_ipv4traffic limit 10; node ------ 156 156 156 156 156 156 156 156 156 156 (10 rows) -- __________________________________ WBR, Andrew Nesheret ICQ:10518066
Andrew Nesheret wrote: > Richard Huxton wrote: >> If you PREPARE .... then EXECUTE the same query, does it still use the >> index? > Done, same result. > --- Code --- > stmt = connection.prepareStatement( "explain analyze SELECT 1 FROM > ONLY sf_ipv4traffic x WHERE ? OPERATOR(pg_catalog.=) node FOR SHARE OF > x" ); > stmt.setLong( 1, 2004 ); > rs = stmt.executeQuery(); > while(rs.next()) > { > System.out.println( rs.getString(1) ); > } No, this will produce different plans - what happens if you supply each parameter in turn - the second plan should stay the same as the first. -- Richard Huxton Archonet Ltd
Richard Huxton wrote: > Andrew Nesheret wrote: >> Richard Huxton wrote: >>> If you PREPARE .... then EXECUTE the same query, does it still use >>> the index? Tested, comments? --- code ----- --- result --- Result for node #2007 Index Scan using fki_nodes on sf_ipv4traffic x (cost=0.00..9.65 rows=1 width=6) (actual time=0.060..0.060 rows=0 loops=1) Index Cond: ($1 = node) Total runtime: 0.131 ms Result for node #156 Seq Scan on sf_ipv4traffic x (cost=0.00..360281.29 rows=15795383 width=6) (actual time=0.048..108939.142 rows=15795376 loops=1) Filter: ($1 = node) Total runtime: 365989.953 ms ---- UP(subject), we are go away from my problem, in case of integrity check with foreign key. -- __________________________________ WBR, Andrew Nesheret ICQ:10518066
Forget to add test source code. --- code --- long [] IDs = new long[] { 2007, // no references 156 // all rows in sf_ipv4traffic referenced to }; stmt = connection.prepareStatement( "explain analyze SELECT 1 FROM ONLY sf_ipv4traffic x WHERE ? OPERATOR(pg_catalog.=) node FOR SHARE OF x" ); for ( long nodeID : IDs ) { stmt.setLong( 1, nodeID ); rs = stmt.executeQuery(); System.out.println( "Result for node #" + nodeID ); while ( rs.next() ) System.out.println( " " + rs.getString( 1 ) ); } rs.close(); connection.close(); --- -- __________________________________ WBR, Andrew Nesheret ICQ:10518066
Andrew Nesheret wrote: > Richard Huxton wrote: >> Andrew Nesheret wrote: >>> Richard Huxton wrote: >>>> If you PREPARE .... then EXECUTE the same query, does it still use >>>> the index? > Tested, comments? > > --- code ----- > > --- result --- > Result for node #2007 > Index Scan using fki_nodes on sf_ipv4traffic x (cost=0.00..9.65 rows=1 > width=6) (actual time=0.060..0.060 rows=0 loops=1) > Index Cond: ($1 = node) > Total runtime: 0.131 ms > Result for node #156 > Seq Scan on sf_ipv4traffic x (cost=0.00..360281.29 rows=15795383 > width=6) (actual time=0.048..108939.142 rows=15795376 loops=1) > Filter: ($1 = node) > Total runtime: 365989.953 ms Since you are getting different plans, this can't be a planned query. Just try feeding a text-file with some SQL PREPARE/EXECUTEs to psql - you can EXPLAIN ANALYSE EXECUTE ... -- Richard Huxton Archonet Ltd
Richard Huxton wrote: > Since you are getting different plans, this can't be a planned query. > Just try feeding a text-file with some SQL PREPARE/EXECUTEs to psql - > you can EXPLAIN ANALYSE EXECUTE ... Test 1. with set enable_seqscan to on; set enable_seqscan to on; prepare testStatement (int) as SELECT 1 FROM ONLY sf_ipv4traffic x WHERE $1 OPERATOR(pg_catalog.=) node FOR SHARE OF x; EXPLAIN ANALYZE execute testStatement( 2007 ); EXPLAIN ANALYZE execute testStatement( 156 ); SET PREPARE QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------- Seq Scan on sf_ipv4traffic x (cost=0.00..360281.29 rows=15795383 width=6) (actual time=129082.768..129082.768 rows=0 loops=1) Filter: ($1 = node) Total runtime: 129131.315 ms (3 rows) QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------- Seq Scan on sf_ipv4traffic x (cost=0.00..360281.29 rows=15795383 width=6) (actual time=0.042..105788.088 rows=15795376 loops=1) Filter: ($1 = node) Total runtime: 420342.751 ms (3 rows) Test 2. with set enable_seqscan to on; set enable_seqscan to off; prepare testStatement (int) as SELECT 1 FROM ONLY sf_ipv4traffic x WHERE $1 OPERATOR(pg_catalog.=) node FOR SHARE OF x; EXPLAIN ANALYZE execute testStatement( 2007 ); EXPLAIN ANALYZE execute testStatement( 156 ); SET PREPARE QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ Index Scan using fki_nodes on sf_ipv4traffic x (cost=0.00..577918.84 rows=15795383 width=6) (actual time=93.810..93.810 rows=0 loops=1) Index Cond: ($1 = node) Total runtime: 93.944 ms (3 rows) QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using fki_nodes on sf_ipv4traffic x (cost=0.00..577918.84 rows=15795383 width=6) (actual time=13.725..157090.878 rows=15795376 loops=1) Index Cond: ($1 = node) Total runtime: 445145.901 ms (3 rows) PS: Test 2 is more applicable, for both types of nodes a) for nodes with no statistics b) for nodes with statistics in sf_ipv4traffic too. -- __________________________________ WBR, Andrew Nesheret ICQ:10518066
Andrew Nesheret wrote: > set enable_seqscan to on; > > prepare testStatement (int) as > SELECT 1 FROM ONLY sf_ipv4traffic x WHERE $1 OPERATOR(pg_catalog.=) > node FOR SHARE OF x; > > EXPLAIN ANALYZE execute testStatement( 2007 ); > EXPLAIN ANALYZE execute testStatement( 156 ); > QUERY PLAN > Seq Scan on sf_ipv4traffic x (cost=0.00..360281.29 rows=15795383 > Total runtime: 129131.315 ms > Seq Scan on sf_ipv4traffic x (cost=0.00..360281.29 rows=15795383 > Total runtime: 420342.751 ms > set enable_seqscan to off; > Index Scan using fki_nodes on sf_ipv4traffic x (cost=0.00..577918.84 > Total runtime: 93.944 ms > Index Scan using fki_nodes on sf_ipv4traffic x (cost=0.00..577918.84 > Total runtime: 445145.901 ms OK, so your cost estimates are about 360,000 for seq-scan and 578,000 for index-scan. Of course the row estimates are fixed regardless of the value you test so the estimated cost is the same for both 2007 and 156. However, the timings aren't in line with the costs for node=156, and you'd hope they would be. That should be fixable by tweaking the planner cost settings (see chapter "17.6.2 planner cost constants"), particularly effective_cache_size and random_page cost I'd guess. I'm putting together a small test case to see if I can reproduce your behaviour here. -- Richard Huxton Archonet Ltd
Richard Huxton wrote: > > I'm putting together a small test case to see if I can reproduce your > behaviour here. Does the attached small script misbehave in the same way as your real data? From here it works fine when the fkey is ON ... RESTRICT. I'm right in thinking that your "nodes" fkey is RESTRICT on update and delete? -- Richard Huxton Archonet Ltd CREATE TABLE testnode (node integer, nodename text, PRIMARY KEY (node)); CREATE TABLE traffic (id SERIAL, node integer NOT NULL, ts TIMESTAMP(0) WITH TIME ZONE NOT NULL, msg text); INSERT INTO testnode SELECT s, 'node number ' || s FROM generate_series(1,25) s; INSERT INTO traffic (id, node, ts, msg) SELECT s, 11, 'epoch'::timestamptz + s * '1 second'::interval, null FROM generate_series(1,9999999)s; ALTER TABLE traffic ADD CONSTRAINT traffic_node_fkey FOREIGN KEY (node) REFERENCES testnode (node) ON UPDATE RESTRICT ONDELETE RESTRICT; CREATE INDEX traffic_node_idx ON traffic (node); BEGIN; EXPLAIN ANALYSE DELETE FROM testnode WHERE node=9; ROLLBACK; BEGIN; EXPLAIN ANALYSE DELETE FROM testnode WHERE node=11; ROLLBACK;
Richard Huxton wrote: > Does the attached small script misbehave in the same way as your real > data? From here it works fine when the fkey is ON ... RESTRICT. > > I'm right in thinking that your "nodes" fkey is RESTRICT on update and > delete? > Yes, data is real and my my foreign key contraint is "restrict" (i'm understand this is very important). -- __________________________________ WBR, Andrew Nesheret ICQ:10518066
Andrew Nesheret wrote: > Richard Huxton wrote: >> Does the attached small script misbehave in the same way as your real >> data? From here it works fine when the fkey is ON ... RESTRICT. >> >> I'm right in thinking that your "nodes" fkey is RESTRICT on update and >> delete? >> > Yes, data is real and my my foreign key contraint is "restrict" (i'm > understand this is very important). Well, the RESTRICT bit is because with that test script in my last email it's instant here on PostgreSQL 8.2 (not bothered testing 8.3beta). The test script generates 10 million rows in the "traffic" table which should be close enough to replicate your problem. -- Richard Huxton Archonet Ltd
Richard Huxton wrote: > Richard Huxton wrote: >> >> I'm putting together a small test case to see if I can reproduce your >> behaviour here. > > Does the attached small script misbehave in the same way as your real > data? From here it works fine when the fkey is ON ... RESTRICT. > > I'm right in thinking that your "nodes" fkey is RESTRICT on update and > delete? > You script is worked fine on same database, where is difference? -- output --- psql xxxx xxxxx -f fkey_index_prob.sql psql:fkey_index_prob.sql:2: NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "testnode_pkey" for table "testnode" CREATE TABLE psql:fkey_index_prob.sql:3: NOTICE: CREATE TABLE will create implicit sequence "traffic_id_seq" for serial column "traffic.id" CREATE TABLE INSERT 0 25 INSERT 0 9999999 ALTER TABLE CREATE INDEX BEGIN QUERY PLAN -------------------------------------------------------------------------------------------------------------------------- Index Scan using testnode_pkey on testnode (cost=0.00..8.27 rows=1 width=6) (actual time=43.480..43.485 rows=1 loops=1) Index Cond: (node = 9) Trigger for constraint traffic_node_fkey: time=106.053 calls=1 Total runtime: 165.925 ms ~~~~~~~~~~~~~~~~~~~~~~~~~~ GOOD! (4 rows) ROLLBACK BEGIN psql:fkey_index_prob.sql:16: ERROR: update or delete on table "testnode" violates foreign key constraint "traffic_node_fkey" on table "traffic" DETAIL: Key (node)=(11) is still referenced from table "traffic". ROLLBACK --- My database --------------- CREATE TABLE nodes ( id integer NOT NULL DEFAULT nextval('nodesidseq'::regclass), description character varying(256), identifier character varying(256) NOT NULL, CONSTRAINT nodes_pkey PRIMARY KEY (id) ) WITH (OIDS=FALSE); ALTER TABLE nodes OWNER TO inms; -- Index: "NodeIdentifierIndex" -- DROP INDEX "NodeIdentifierIndex"; CREATE UNIQUE INDEX "NodeIdentifierIndex" ON nodes USING btree (identifier); CREATE TABLE sf_ipv4traffic ( timeframe integer NOT NULL, timemark timestamp with time zone NOT NULL, node integer NOT NULL, source_address bytea NOT NULL, source_port integer NOT NULL, destination_address bytea NOT NULL, destination_port integer NOT NULL, protocol_type integer NOT NULL, octets_counter bigint, packets_counter integer, CONSTRAINT sf_ipv4traffic_pkey PRIMARY KEY (timeframe, timemark, node, source_address, source_port, destination_address, destination_port, protocol_type), CONSTRAINT nodes FOREIGN KEY (node) REFERENCES nodes (id) MATCH SIMPLE ON UPDATE RESTRICT ON DELETE RESTRICT, CONSTRAINT sf_ipv4traffic_timeframe_fkey FOREIGN KEY (timeframe) REFERENCES sf_timeframes (id) MATCH SIMPLE ON UPDATE CASCADE ON DELETE RESTRICT ) WITH (OIDS=FALSE); ALTER TABLE sf_ipv4traffic OWNER TO inms; -- Index: fki_nodes -- DROP INDEX fki_nodes; CREATE INDEX fki_nodes ON sf_ipv4traffic USING btree (node); -- Index: sf_ipv4traffic_idx -- DROP INDEX sf_ipv4traffic_idx; CREATE INDEX sf_ipv4traffic_idx ON sf_ipv4traffic USING btree (source_port, timeframe, source_address); -- Index: sf_ipv4traffic_idx1 -- DROP INDEX sf_ipv4traffic_idx1; CREATE INDEX sf_ipv4traffic_idx1 ON sf_ipv4traffic USING btree (timeframe, node, timemark); -- Index: sf_ipv4traffic_idx3 -- DROP INDEX sf_ipv4traffic_idx3; CREATE INDEX sf_ipv4traffic_idx3 ON sf_ipv4traffic USING btree (destination_address, destination_port, timeframe); -- Index: sf_ipv4traffic_idx4 -- DROP INDEX sf_ipv4traffic_idx4; CREATE INDEX sf_ipv4traffic_idx4 ON sf_ipv4traffic USING btree (protocol_type, timeframe); Other tables definition skipped..... --- test1.sql --------- begin; --set enable_seqscan to off; delete from decimalnodeattributes where node=2007; delete from stringnodeattributes where node=2007; delete from datenodeattributes where node=2007; delete from topology where fromnode=2007 or tonode=2007; explain analyze delete from nodes where id=2007; rollback; --- output --- inms=> \i test1.sql BEGIN DELETE 0 DELETE 1 DELETE 1 DELETE 2 QUERY PLAN ----------------------------------------------------------------------------------------------- Seq Scan on nodes (cost=0.00..1.29 rows=1 width=6) (actual time=0.034..0.036 rows=1 loops=1) Filter: (id = 2007) Trigger for constraint booleannodeattributes_node_fkey: time=89.885 calls=1 Trigger for constraint datenodeattributes_node_fkey: time=0.466 calls=1 Trigger for constraint decimalnodeattributes_node_fkey: time=0.268 calls=1 Trigger for constraint node: time=369.983 calls=1 Trigger for constraint nodes: time=64278.862 calls=1 ~~~~~~~~~~~~~~~~~~~~~~~~~~~ BAD. Trigger for constraint snmp_nodes_access_nodeid_fkey: time=41.661 calls=1 Trigger for constraint stringnodeattributes_node_fkey: time=0.408 calls=1 Trigger for constraint topology_fromnode_fkey: time=0.308 calls=1 Trigger for constraint topology_tonode_fkey: time=0.277 calls=1 Total runtime: 64814.359 ms ~~~~~~~~~~~~~~~~~~~~~~~~~~~ BAD. (12 rows) ROLLBACK -- __________________________________ WBR, Andrew Nesheret ICQ:10518066
Andrew Nesheret wrote: > Richard Huxton wrote: >> Richard Huxton wrote: >>> >>> I'm putting together a small test case to see if I can reproduce your >>> behaviour here. >> >> Does the attached small script misbehave in the same way as your real >> data? From here it works fine when the fkey is ON ... RESTRICT. >> >> I'm right in thinking that your "nodes" fkey is RESTRICT on update and >> delete? >> > You script is worked fine on same database, where is difference? Good question. There *must* be a difference. Why can't either of us see it? 1. Try adding another 5 million rows to the test "traffic" table and see if that makes any difference. It shouldn't. 2. Run a "vacuum verbose sf_ipv4traffic" and see if there's a lot of dead rows. I shouldn't have thought there are. We might want to move this to the performance mailing-list, and see if anyone has an idea over there. -- Richard Huxton Archonet Ltd
Richard Huxton wrote: > > > 1. Try adding another 5 million rows to the test "traffic" table and > see if that makes any difference. It shouldn't. Opps. 1. Step drop table testnode cascade; drop table traffic; CREATE TABLE testnode (node integer, nodename text, PRIMARY KEY (node)); CREATE TABLE traffic (id SERIAL, node integer NOT NULL, ts TIMESTAMP(0) WITH TIME ZONE NOT NULL, msg text); INSERT INTO testnode SELECT s, 'node number ' || s FROM generate_series(1,25) s; INSERT INTO traffic (id, node, ts, msg) SELECT s, 11, 'epoch'::timestamptz + s * '1 second'::interval, null FR OM generate_series(1,15999999) s; ALTER TABLE traffic ADD CONSTRAINT traffic_node_fkey FOREIGN KEY (node) REFERENCES testnode (node) ON UPDATE RESTRICT ON DELETE RESTRICT; CREATE INDEX traffic_node_idx ON traffic (node); BEGIN; EXPLAIN ANALYSE DELETE FROM testnode WHERE node=9; ROLLBACK; BEGIN; EXPLAIN ANALYSE DELETE FROM testnode WHERE node=11; ROLLBACK; --- output --- inms=> \i fkey_index_prob.sql psql:fkey_index_prob.sql:1: NOTICE: drop cascades to constraint traffic_node_fkey on table traffic DROP TABLE DROP TABLE psql:fkey_index_prob.sql:4: NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "testnode_pkey" for table "testnode" CREATE TABLE psql:fkey_index_prob.sql:5: NOTICE: CREATE TABLE will create implicit sequence "traffic_id_seq" for serial column "traffic.id" CREATE TABLE INSERT 0 25 INSERT 0 15999999 ALTER TABLE CREATE INDEX BEGIN QUERY PLAN -------------------------------------------------------------------------------------------------------------------------- Index Scan using testnode_pkey on testnode (cost=0.00..8.27 rows=1 width=6) (actual time=45.494..45.509 rows=1 loops=1) Index Cond: (node = 9) Trigger for constraint traffic_node_fkey: time=459.164 calls=1 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Good. Total runtime: 656.148 ms (4 rows) ROLLBACK BEGIN psql:fkey_index_prob.sql:18: ERROR: update or delete on table "testnode" violates foreign key constraint "traffic_node_fkey" on table "traffic" DETAIL: Key (node)=(11) is still referenced from table "traffic". ROLLBACK 2. Step Run script again w/o creating data. *NO ANY MODIFICATIONS* to database. BEGIN; EXPLAIN ANALYSE DELETE FROM testnode WHERE node=9; ROLLBACK; BEGIN; EXPLAIN ANALYSE DELETE FROM testnode WHERE node=11; ROLLBACK; --- output ---- BEGIN QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Index Scan using testnode_pkey on testnode (cost=0.00..8.27 rows=1 width=6) (actual time=0.041..0.043 rows=1 loops=1) Index Cond: (node = 9) Trigger for constraint traffic_node_fkey: time=41469.620 calls=1 ~~~~~~~~~~~~~~~~~~~~ BAD Total runtime: 41497.467 ms (4 rows) ROLLBACK BEGIN psql:fkey_index_prob.sql:18: ERROR: update or delete on table "testnode" violates foreign key constraint "traffic_node_fkey" on table "traffic" DETAIL: Key (node)=(11) is still referenced from table "traffic". ROLLBACK ----- 3. Step Recreate data with 4999999 rows (SAME DIFFERENT results on first execute and second!!!) First: CREATE TABLE INSERT 0 25 INSERT 0 4999999 ALTER TABLE CREATE INDEX BEGIN QUERY PLAN -------------------------------------------------------------------------------------------------------------------------- Index Scan using testnode_pkey on testnode (cost=0.00..8.27 rows=1 width=6) (actual time=25.050..25.054 rows=1 loops=1) Index Cond: (node = 9) Trigger for constraint traffic_node_fkey: time=18.339 calls=1 ~~~~~~~~~~~~~~~~~~~~~~ GOOD! Total runtime: 43.519 ms (4 rows) Second: BEGIN QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Index Scan using testnode_pkey on testnode (cost=0.00..8.27 rows=1 width=6) (actual time=0.114..0.116 rows=1 loops=1) Index Cond: (node = 9) Trigger for constraint traffic_node_fkey: time=7183.677 calls=1 ~~~~~~~~~~~~~~~~~~~ Perfomance degradation!!! Total runtime: 7183.928 ms (4 rows) ROLLBACK You comments? > > 2. Run a "vacuum verbose sf_ipv4traffic" and see if there's a lot of > dead rows. I shouldn't have thought there are. No dead rows. inms=> VACUUM VERBOSE sf_ipv4traffic; INFO: vacuuming "public.sf_ipv4traffic" INFO: index "sf_ipv4traffic_pkey" now contains 15795376 row versions in 122709 pages DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 3.88s/0.52u sec elapsed 38.44 sec. INFO: index "fki_nodes" now contains 15795376 row versions in 34664 pages DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 1.04s/0.07u sec elapsed 13.34 sec. INFO: index "sf_ipv4traffic_idx" now contains 15795376 row versions in 60822 pages DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 1.69s/0.14u sec elapsed 16.71 sec. INFO: index "sf_ipv4traffic_idx1" now contains 15795376 row versions in 60822 pages DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 1.90s/0.19u sec elapsed 17.77 sec. INFO: index "sf_ipv4traffic_idx3" now contains 15795376 row versions in 60822 pages DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 1.90s/0.19u sec elapsed 16.97 sec. INFO: index "sf_ipv4traffic_idx4" now contains 15795376 row versions in 43311 pages DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 1.38s/0.13u sec elapsed 12.09 sec. INFO: "sf_ipv4traffic": found 0 removable, 15795376 nonremovable row versions in 162839 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 1 pages contain useful free space. 0 pages are entirely empty. CPU 21.91s/4.93u sec elapsed 252.77 sec. INFO: vacuuming "pg_toast.pg_toast_16555" INFO: index "pg_toast_16555_index" now contains 0 row versions in 1 pages DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "pg_toast_16555": found 0 removable, 0 nonremovable row versions in 0 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages contain useful free space. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.01 sec. -- __________________________________ WBR, Andrew Nesheret ICQ:10518066
Andrew Nesheret wrote: > Richard Huxton wrote: >> >> >> 1. Try adding another 5 million rows to the test "traffic" table and >> see if that makes any difference. It shouldn't. > Opps. > 1. Step [snip re-running of script] > -------------------------------------------------------------------------------------------------------------------------- > > Index Scan using testnode_pkey on testnode (cost=0.00..8.27 rows=1 > width=6) (actual time=45.494..45.509 rows=1 loops=1) > Index Cond: (node = 9) > Trigger for constraint traffic_node_fkey: time=459.164 calls=1 > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Good. > Total runtime: 656.148 ms > 2. Step > Run script again w/o creating data. *NO ANY MODIFICATIONS* to database. > ------------------------------------------------------------------------------------------------------------------------ > > Index Scan using testnode_pkey on testnode (cost=0.00..8.27 rows=1 > width=6) (actual time=0.041..0.043 rows=1 loops=1) > Index Cond: (node = 9) > Trigger for constraint traffic_node_fkey: time=41469.620 calls=1 > ~~~~~~~~~~~~~~~~~~~~ BAD > Total runtime: 41497.467 ms Hmm - not seeing that here. Is it just that your machine has a very variable workload? The times above are far enough apart from the times below that I'm not sure they can be trusted. What if you run it 10 times - do the times stay consistent? > 3. Step Recreate data with 4999999 rows (SAME DIFFERENT results on first > execute and second!!!) > -------------------------------------------------------------------------------------------------------------------------- > > Index Scan using testnode_pkey on testnode (cost=0.00..8.27 rows=1 > width=6) (actual time=25.050..25.054 rows=1 loops=1) > Index Cond: (node = 9) > Trigger for constraint traffic_node_fkey: time=18.339 calls=1 > ~~~~~~~~~~~~~~~~~~~~~~ GOOD! > Total runtime: 43.519 ms > ------------------------------------------------------------------------------------------------------------------------ > > Index Scan using testnode_pkey on testnode (cost=0.00..8.27 rows=1 > width=6) (actual time=0.114..0.116 rows=1 loops=1) > Index Cond: (node = 9) > Trigger for constraint traffic_node_fkey: time=7183.677 calls=1 There's no reason for changes in timing here - the traffic table isn't updated by the delete, only the testnode table and that's small enough not to matter. >> 2. Run a "vacuum verbose sf_ipv4traffic" and see if there's a lot of >> dead rows. I shouldn't have thought there are. > No dead rows. [snip vacuum verbose output] No, so that can't be anything to do with it. -- Richard Huxton Archonet Ltd
Richard Huxton wrote: > Hmm - not seeing that here. Is it just that your machine has a very > variable workload? The times above are far enough apart from the times > below that I'm not sure they can be trusted. > > What if you run it 10 times - do the times stay consistent? stay consistent and this test has exclusive access to postgres and no other activites on this system. Thank you Richard! And my question is. What should i do further ? -- __________________________________ WBR, Andrew Nesheret ICQ:10518066