Subselect query enhancement - Mailing list pgsql-performance
From | Michael Artz |
---|---|
Subject | Subselect query enhancement |
Date | |
Msg-id | e9c163070702010842ja639378v75e8c0eb757d9b72@mail.gmail.com Whole thread Raw |
Responses |
Re: Subselect query enhancement
Re: Subselect query enhancement Re: Subselect query enhancement |
List | pgsql-performance |
I'm needing help determining the best all-around query for the following situation. I have primary table that holds ip information and two other tables that hold event data for the specific IP in with a one-to-many mapping between them, ie: CREATE TABLE ip_info ( ip IP4, --other data ); CREATE TABLE network_events ( ip IP4 NOT NULL REFERENCES ip_info(ip), name VARCHAR, port INTEGER, --other data ); CREATE TABLE host_events ( ip IP4 NOT NULL REFERENCES ip_info(ip), name VARCHAR port INTEGER, --other data ); There is quite a bit of commonality between the network_events and host_events schemas, but they do not currently share an ancestor. ip_info has about 13 million rows, the network_events table has about 30 million rows, and the host_events table has about 7 million rows. There are indexes on all the rows. The query that I would like to execute is to select all the rows of ip_info that have either network or host events that meet some criteria, i.e. name='blah'. I have 3 different possibilities that I have thought of to execute this. First, 2 'ip IN (SELECT ...)' statements joined by an OR: SELECT * FROM ip_info WHERE ip IN (SELECT ip FROM network_events WHERE name='blah') OR ip IN (SELECT ip FROM host_events WHERE name='blah'); Next, 1 'ip IN (SELECT ... UNION SELECT ...) statement: SELECT * FROM ip_info WHERE ip IN (SELECT ip FROM network_events WHERE name='blah' UNION SELECT ip FROM host_events WHERE name='blah'); Or, finally, the UNION statment with DISTINCTs: SELECT * FROM ip_info WHERE ip IN (SELECT DISTINCT ip FROM network_events WHERE name='blah' UNION SELECT DISTINCT ip FROM host_events WHERE name='blah'); From what I have read, the UNION statement does an implicit DISTINCT, but I thought that doing it on each of the tables would result in slightly faster execution. Can you think of any other ways to implement the previous query? I have explained/analyzed all the queries but, unfortunately, they are on an isolated computer. The gist is that, for relatively low-incidence values of name, the UNION performs better, but for queries on a common name, the dual-subselect query performs better. The explains look something like: Dual-subselect: Seq scan on ip_info Filter: ... AND ((hashed_subplan) OR (hashed_subplan)) Subplan -> Result -> Append -> various scans on host_events -> Result -> Append -> various scans on network_events UNION SELECT DISTINCT: Nested Loop -> Unique -> Sort -> Append -> Unique -> Sort -> Result -> Append -> various scans on host_events -> Unique -> Sort -> Result -> Append -> various scans on network_events If it would help to have more information, I could retype some of numbers in the explain. Any ideas? Thanks, -Mike
pgsql-performance by date: