slow view - Mailing list pgsql-sql
From | Stuart |
---|---|
Subject | slow view |
Date | |
Msg-id | egk560$p53$1@sea.gmane.org Whole thread Raw |
Responses |
Re: slow view
|
List | pgsql-sql |
I am having a problem with the performance of a view that will be a critical part of a database system I am working on, and would appreciate some advice. Apologies for the length of this posting! I have a parent table P and to child tables A and B,: CREATE TABLE p ( id INT NOT NULL PRIMARY KEY); CREATE TABLE a ( id INT NOT NULL PRIMARY KEY, p INT REFERENCES p(id)); CREATE INDEX ON a(p); CREATE TABLE b ( id INT NOT NULL PRIMARY KEY, p INT REFERENCES p(id)); CREATE INDEX ON b(p); Each "p" row has between 1 and 5 (or so) child rows in the "a" table, and between 0 and 4 rows in the "b" table. Now for most p's the a's and the b's are independent, and all combinations of a's and b's for that p are ok. But for a small percentage of p's (<5%) there are some combinations of a's and b's are distinguished (I will call them "invalid"). So I created a table to record these "invalid" combinations: CREATE TABLE x ( a INT NOT NULL REFERENCES a(id), b INT NOT NULL REFERENCES b(id), PRIMARY KEY (a,b)); CREATE INDEXON x(a); CREATE INDEX ON x(b); Here is some sample data with a single p: # Create one parent item... INSERT INTO p VALUES(1) # Create 4 a-items for that parent... INSERT INTO a VALUES(1,1) INSERT INTO a VALUES(2,1) INSERT INTO a VALUES(3,1) INSERT INTO a VALUES(4,1) # Create 3 b-items for that parent... INSERT INTO b VALUES(11,1) INSERT INTO b VALUES(12,1) INSERT INTO b VALUES(13,1) So for parent p=1, there are 12 combinations of a and b items (each of the 4 a items can be paired with any of the 3 b items). Now, make some combinations of a items and b items "invalid"... # For a=2, make b=13 invalid, i.e only b=11 and b=12 are valid. INSERT INTO x VALUES(2,13) # For a=3, only b=11 is valid. INSERT INTO x VALUES(3,12) INSERT INTO x VALUES(3,13) # For a=4, no b's are valid. INSERT INTO x VALUES(4,11) INSERT INTO x VALUES(4,12) INSERT INTO p VALUES(4,13) Now I need a view that will display, for each p, its a's and for each a, only the valid b's, that is, the combinations of a and b that are *not* in table x. OK, no problem... (#1) SELECT p.id AS pid, a.id AS aid, b.id AS bid FROM p JOIN a ON a.p=p.id LEFT JOIN b ON b.p=a.p LEFT JOINx ON x.a=a.id AND x.b=b.id WHERE x.a IS NULL AND p.id=1; Here is the result on the data given above:pid | aid | bid -----+-----+----- 1 | 1 | 11 1 | 1 | 12 1 | 1 | 13 1 | 2 | 11 1 | 2 | 13 1 | 3 | 11 1 | 3 | 12 Ok, but I want all a's in the output, even when they have no valid b's. aid=4 is not in the output. So I did what I thought was the obvious answer, a left join between a, and the above query... (#2) SELECT p.id AS pid, a.id AS aid, sub.bid AS bid FROM p JOIN a ON a.p=p.id LEFT JOIN ( SELECT a.idAS aid, b.id as bid FROM a LEFT JOIN b ON b.p=a.p LEFT JOIN x ON x.a=a.idAND x.b=b.id WHERE x.a IS NULL ) AS sub ON sub.aid=a.id WHERE p.id=1; Results:pid | aid | bid -----+-----+----- 1 | 1 | 11 1 | 1 | 12 1 | 1 | 13 1 | 2 | 11 1 | 2 | 13 1 | 3 | 11 1 | 3 | 12 1 | 4 | Exactly what I want. The problem is that when there are ~100K parent entries the above query (#2) takes ~10 seconds to run but the first query (#1) runs in a few tens of milliseconds. Is there any way I can get postgresql to better optimize query #2, or rewrite it to that is is more "postgresql friendly"? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If it helps, here is a python script I used to generate enough pseudo-data to show the time difference... #!/usr/bin/python import psycopg2, random def main(): cn = psycopg2.connect (database="test", user="postgres",password="") c =cn.cursor() pkp = 1; pka = 1; pkb = 1; while pkp < 30000: c.execute ("INSERT INTO p VALUES(%s)",(pkp,)) na = random.randint(1,5) for a in range(na): c.execute ("INSERT INTOa VALUES(%s,%s)", (pka+a,pkp)) nb = random.randint(0,4) for b in range(nb): c.execute("INSERT INTO b VALUES(%s,%s)", (pkb+b,pkp)) if na*nb > 1 and random.randint (0,99) < 10: zlst = [(a,b) for a in range(pka,pka+na) for b in range(pkb,pkb+nb)] for zin random.sample (zlst, random.randint(1,na*nb-1)): c.execute ("INSERT INTO x VALUES(%s,%s)", z) pkp += 1; pka += na; pkb += nb cn.commit() if __name__ == '__main__': main ()