Performance of query (fwd) - Mailing list pgsql-general
From | Edmund Dengler |
---|---|
Subject | Performance of query (fwd) |
Date | |
Msg-id | Pine.BSO.4.44.0306102315180.9781-100000@cyclops4.esentire.com Whole thread Raw |
Responses |
Re: Performance of query (fwd)
|
List | pgsql-general |
Hmmm, it's been a couple of days, and this still hasn't appeared on the mailing list as far as I can tell. This is a resend. ---------- Forwarded message ---------- Date: Mon, 9 Jun 2003 00:23:38 -0400 (EDT) From: Edmund Dengler <edmundd@eSentire.com> To: pgsql-general@postgresql.org Subject: Performance of query Ok, now I am really confused. I have a query that runs really slow, yet when I code it up as a function it runs very fast! The optimizer seems to not evaluate a good method for performing the query. Background: This is a database based off of Snort. There is a base table called <event> (with <sid>=sensor ID, <cid>=event ID, <timestamp>=when event collected, <signature>=which event was recognized), and a number of depending tables for various bits and pieces of network packet data (<iphdr>=IP header info, <tcphdr>=TCP header info, <data>=packet payload) and <signature> (what event). Notes: Running 7.3.3 Database is around 20GB on disk. => select version(); version --------------------------------------------------------------------- PostgreSQL 7.3.3 on i386-unknown-openbsd3.0, compiled by GCC 2.95.3 (1 row) => \d event Table "public.event" Column | Type | Modifiers -----------+--------------------------+----------- sid | integer | not null cid | bigint | not null signature | integer | not null timestamp | timestamp with time zone | not null Indexes: event_pkey primary key btree (sid, cid), event_pull_idx unique btree ("timestamp", sid, cid), signature_idx btree (signature), timestamp_idx btree ("timestamp") Triggers: event_summary => \d signature Table "public.signature" Column | Type | Modifiers --------------+---------+---------------------------------------------------------- sig_id | integer | not null default nextval('"signature_sig_id_seq"'::text) <<...more columns...>> Indexes: signature_pkey primary key btree (sig_id), sig_class_id btree (sig_class_id), sig_name_idx btree (sig_name) Rest of the tables are primary keyed with (sid, cid). For example, => \d iphdr Table "public.iphdr" Column | Type | Modifiers ----------+----------+----------- sid | integer | not null cid | bigint | not null <<...more columns...>> Indexes: iphdr_pkey primary key btree (sid, cid), ip_dst_idx btree (ip_dst), ip_src_idx btree (ip_src) Triggers: iphdr_summary If I run the query: explain analyze select event.sid, event.cid, event.timestamp, signature.sig_name, iphdr.ip_src, iphdr.ip_dst, icmphdr.icmp_type, icmphdr.icmp_code, icmphdr.icmp_csum, icmphdr.icmp_id, icmphdr.icmp_seq, udphdr.udp_sport, udphdr.udp_dport, udphdr.udp_len, udphdr.udp_csum, tcphdr.tcp_sport, tcphdr.tcp_dport, tcphdr.tcp_seq, tcphdr.tcp_ack, tcphdr.tcp_off, tcphdr.tcp_res, tcphdr.tcp_flags, tcphdr.tcp_win, tcphdr.tcp_csum, tcphdr.tcp_urp, sensor.hostname, sensor.interface, data.data_payload from ( select * from event where timestamp > (select now() - '2 hours'::interval) and exists (select 1 from hack_pull_sid where sid = event.sid) ) as event left join signature on signature.sig_id = event.signature left join iphdr on iphdr.sid = event.sid and iphdr.cid = event.cid left join icmphdr on icmphdr.sid = event.sid and icmphdr.cid = event.cid left join udphdr on udphdr.sid = event.sid and udphdr.cid = event.cid left join tcphdr on tcphdr.sid = event.sid and tcphdr.cid = event.cid left join sensor on sensor.sid = event.sid left join data on data.sid = event.sid and data.cid = event.cid order by timestamp desc, event.sid desc, event.cid desc ; (note: <hack_pull_sid> is a table of SIDs I am interested in so that I avoid the issues with IN) I get the following output: -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=1712181.78..1712776.52 rows=237893 width=853) (actual time=163484.81..163484.97 rows=129 loops=1) Sort Key: public.event."timestamp", public.event.sid, public.event.cid InitPlan -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.04..0.04 rows=1 loops=1) -> Merge Join (cost=1471148.73..1551631.57 rows=237893 width=853) (actual time=152693.86..163484.26 rows=129 loops=1) Merge Cond: (("outer".sid = "inner".sid) AND ("outer".cid = "inner".cid)) -> Merge Join (cost=728456.82..802284.20 rows=237893 width=258) (actual time=33652.33..39050.11 rows=129 loops=1) Merge Cond: ("outer".sid = "inner".sid) -> Merge Join (cost=728454.78..798713.57 rows=237893 width=226) (actual time=33635.79..39031.06 rows=129loops=1) Merge Cond: (("outer".sid = "inner".sid) AND ("outer".cid = "inner".cid)) -> Merge Join (cost=728454.78..736750.59 rows=237893 width=172) (actual time=22767.67..24071.84rows=129loops=1) Merge Cond: (("outer".sid = "inner".sid) AND ("outer".cid = "inner".cid)) -> Merge Join (cost=728454.78..731219.48 rows=237893 width=144) (actual time=22164.44..22681.23rows=129 loops=1) Merge Cond: (("outer".sid = "inner".sid) AND ("outer".cid = "inner".cid)) -> Sort (cost=701134.84..701729.57 rows=237893 width=116) (actual time=15343.87..15344.05rows=129 loops=1) Sort Key: public.event.sid, public.event.cid -> Merge Join (cost=590896.58..671684.37 rows=237893 width=116) (actual time=10857.65..15343.57rows=129 loops=1) Merge Cond: (("outer".sid = "inner".sid) AND ("outer".cid = "inner".cid)) -> Index Scan using iphdr_pkey on iphdr (cost=0.00..73422.89 rows=1432042width=28) (actual time=23.19..13810.58 rows=687918 loops=1) -> Sort (cost=590896.58..591491.31 rows=237893 width=88) (actual time=101.79..101.95rows=129 loops=1) Sort Key: public.event.sid, public.event.cid -> Hash Join (cost=185.07..563108.55 rows=237893 width=88) (actual time=97.42..101.39rows=129 loops=1) Hash Cond: ("outer".signature = "inner".sig_id) -> Index Scan using timestamp_idx on event (cost=0.00..558165.62rows=237893 width=24) (actual time=0.18..3.05 rows=129 loops=1) Index Cond: ("timestamp" > $0) Filter: (subplan) SubPlan -> Seq Scan on hack_pull_sid (cost=0.00..1.15 rows=1 width=0)(actual time=0.01..0.01 rows=1 loops=171) Filter: (sid = $1) -> Hash (cost=167.26..167.26 rows=7126 width=64) (actual time=97.12..97.12rows=0 loops=1) -> Seq Scan on signature (cost=0.00..167.26 rows=7126width=64)(actual time=0.01..80.86 rows=7126 loops=1) -> Sort (cost=27319.94..27897.87 rows=231171 width=28) (actual time=6606.52..7073.38 rows=110717loops=1) Sort Key: icmphdr.sid, icmphdr.cid -> Seq Scan on icmphdr (cost=0.00..3784.71 rows=231171 width=28) (actual time=19.81..2082.01rows=231303 loops=1) -> Index Scan using udphdr_pkey on udphdr (cost=0.00..4411.44 rows=115228 width=28) (actualtime=60.23..1269.61 rows=54669 loops=1) -> Index Scan using tcphdr_pkey on tcphdr (cost=0.00..56586.06 rows=1139141 width=54) (actual time=88.72..13878.52rows=484419 loops=1) -> Sort (cost=2.04..2.11 rows=30 width=32) (actual time=16.46..16.64 rows=136 loops=1) Sort Key: sensor.sid -> Seq Scan on sensor (cost=0.00..1.30 rows=30 width=32) (actual time=16.26..16.34 rows=30 loops=1) -> Sort (cost=742691.92..745059.80 rows=947154 width=595) (actual time=108643.60..123322.67 rows=417145 loops=1) Sort Key: data.sid, data.cid -> Seq Scan on data (cost=0.00..51821.54 rows=947154 width=595) (actual time=9.67..54494.61 rows=947324loops=1) Total runtime: 164147.78 msec (43 rows) For some runs, I was getting over 800,000 msec!! Now, this seems very strange for only 129 rows. The inner select returns exactly the 129 rows I am interested in, and now we only need to do a few left joins to get the corresponding rows (which may or may not exist, which is why the LEFT JOIN's). Note that the (<sid>,<cid>) is the primary key for most of these joins, and so the LEFT JOIN should be able to recognize that there will only be one or zero rows matching. In addition, (<sid>,<cid>) is the primary key for the <event> table. Similar discussion with <signature> as <sig_id> is it's primary key. Notes: - The table <event> has 1,427,411 rows - <event>.<timestamp> ranges from 2002-01-19 12:28:29-05 to 2003-06-08 23:40:01-04 - The last 2 hours is a very small range assuming an even distribution - All tables are using 100 bins for statistics - A VACUUM ANALYZE was performed on the entire database before this test As can be seem, the estimator seems to think that 237,893 rows will be returned from the INNER SELECT out of 1,427,411 rows. Can anyone explain how this number could be calculated? Even assuming a straight linear assumption, this should fall into 2hours/(24hours/day*400days) rough estimate which is (very roughly) 0.02% of the table. So, I wrote a function. (Please, no remarks on style, it was a quick hack to compare performance. I am sure it could be made better and more optimized). It simply performs the inner SELECT, and then for each row, does the outer select for the match. CREATE TYPE hack_pull_type AS ( sid int, cid bigint, timestamp timestamp with time zone, sig_name text, ip_src bigint, ip_dst bigint, icmp_type smallint, icmp_code smallint, icmp_csum int, icmp_id int, icmp_seq int, udp_sport int, udp_dport int, udp_len int, udp_csum int, tcp_sport int, tcp_dport int, tcp_seq bigint, tcp_ack bigint, tcp_off smallint, tcp_res smallint, tcp_flags smallint, tcp_win int, tcp_csum int, tcp_urp int, hostname text, interface text, data_payload text ); CREATE OR REPLACE FUNCTION hack_pull_func() RETURNS SETOF hack_pull_type AS ' DECLARE event_rec event%ROWTYPE; -- pull_rec hack_pull_type; pull_rec RECORD; BEGIN FOR event_rec IN SELECT * FROM event WHERE timestamp > (SELECT now() - ''2 hours''::interval) AND EXISTS (SELECT 1 FROM hack_pull_sid WHERE sid = event.sid) ORDER BY timestamp desc, sid desc, cid desc LOOP SELECT event.sid, event.cid, event.timestamp, sig_name, ip_src, ip_dst, icmp_type, icmp_code, icmp_csum, icmp_id, icmp_seq, udp_sport, udp_dport, udp_len, udp_csum, tcp_sport, tcp_dport, tcp_seq, tcp_ack, tcp_off, tcp_res, tcp_flags, tcp_win, tcp_csum, tcp_urp, sensor.hostname, sensor.interface, data.data_payload INTO pull_rec FROM event left join signature on signature.sig_id = event.signature left join iphdr on iphdr.sid = event.sid and iphdr.cid = event.cid left join icmphdr on icmphdr.sid = event.sid and icmphdr.cid = event.cid left join udphdr on udphdr.sid = event.sid and udphdr.cid = event.cid left join tcphdr on tcphdr.sid = event.sid and tcphdr.cid = event.cid left join sensor on sensor.sid = event.sid left join data on data.sid = event.sid and data.cid = event.cid WHERE event.cid = event_rec.cid and event.sid = event_rec.sid LIMIT 1; RETURN NEXT pull_rec; END LOOP; RETURN; END; ' LANGUAGE 'plpgsql'; => explain analyze select * from hack_pull_func(); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Function Scan on hack_pull_func (cost=0.00..12.50 rows=1000 width=238) (actual time=5259.46..5259.62 rows=23 loops=1) Total runtime: 5259.70 msec (2 rows) => explain analyze select * from hack_pull_func(); QUERY PLAN -------------------------------------------------------------------------------------------------------------------- Function Scan on hack_pull_func (cost=0.00..12.50 rows=1000 width=238) (actual time=38.02..38.18 rows=23 loops=1) Total runtime: 38.26 msec (2 rows) (Sorry, the data has changed as time moves forward and I type this email. However, the numbers are reflective of the performance difference I observed during testing). So, what is happening? How come my function (ugly as it may be), is orders of magnitude faster than what should be a clean query in SQL? Why is the estimator so far off the real issue? Is there a way of writing the query to get the good performance? Note that there are a number of other queries I run which exhibit similar performance/estimation issues, and I wonder if I will need to rwrite all of them into custom functions (thanks for the SETOF piece as this makes it possible to write this stuff now). Thanks! Ed
pgsql-general by date: