Thread: I feel the need for speed. What am I doing wrong?
I have a query using two postgres tables. One is called "CNX_DS_53_SIS_STU_OPT_FEE_TB" and the other is called "CNX_DS2_53_SIS_STU_OPT_FEE_TB". I am getting 3 times slower performance than Microsoft Access when performing a left outer join. Here are the tables in question: connxdatasync=# \d "CNX_DS_53_SIS_STU_OPT_FEE_TB" Table "CNX_DS_53_SIS_STU_OPT_FEE_TB" Attribute | Type | Modifier ----------------+---------------+----------RT_REC_KEY | character(14) |cnxarraycolumn | integer |CRC | bigint | not null connxdatasync=# \d "CNX_DS2_53_SIS_STU_OPT_FEE_TB" Table "CNX_DS2_53_SIS_STU_OPT_FEE_TB" Attribute | Type |Modifier ----------------+---------------+----------RT_REC_KEY | character(14) |cnxarraycolumn | integer |CRC | bigint | not null Here is the query: select a."RT_REC_KEY", a."cnxarraycolumn", a."CRC" from "CNX_DS_53_SIS_STU_OPT_FEE_TB" a left outer join "CNX_DS2_53_SIS_STU_OPT_FEE_TB" b on ( a."RT_REC_KEY" = b."RT_REC_KEY" and a."cnxarraycolumn" = b."cnxarraycolumn") where b.oid is null ; Creating the following index had no effect on performance! create unique index i1 on "CNX_DS2_53_SIS_STU_OPT_FEE_TB" ("RT_REC_KEY", "cnxarraycolumn", "CRC"); Both tables had 6139062 rows of data. In this query ... all rows of data match perfectly, so no results are returned. Is there a way to reformulate this query so that it will use the index?
Added -general list so that the next followup can remove -hackers and everyone there will have had notice. On Mon, 6 Jan 2003, Dann Corbit wrote: > > I have a query using two postgres tables. > One is called "CNX_DS_53_SIS_STU_OPT_FEE_TB" and the other is called > "CNX_DS2_53_SIS_STU_OPT_FEE_TB". > > I am getting 3 times slower performance than Microsoft Access when > performing a left outer join. > > ... > > Here is the query: > select a."RT_REC_KEY", a."cnxarraycolumn", a."CRC" from > "CNX_DS_53_SIS_STU_OPT_FEE_TB" a left outer join > "CNX_DS2_53_SIS_STU_OPT_FEE_TB" b on ( a."RT_REC_KEY" = b."RT_REC_KEY" > and a."cnxarraycolumn" = b."cnxarraycolumn") where b.oid is null ; > > > Creating the following index had no effect on performance! > create unique index i1 on "CNX_DS2_53_SIS_STU_OPT_FEE_TB" ("RT_REC_KEY", > "cnxarraycolumn", "CRC"); > > Both tables had 6139062 rows of data. > > In this query ... all rows of data match perfectly, so no results are > returned. I suspect you get no results because it's unlikely b.oid will be null. Are you sure the query is how it should be since you seem to be expecting no rows to be returned and yet your reason for that doesn't match the query as shown. Without the oid test I'd bet you get a result set of 6139062 rows. > Is there a way to reformulate this query so that it will use the index? Given the above comment I'd say no since the entirety of both tables will be tested to make the result set. Alternatively, if the query is right try something along the lines of: SELECT a.blah, a.foo, FROM a, b WHERE a.blah = b.blah AND a.foo = b.foo AND b.oid IS NULL if that doesn't use a query try pushing the null test into a subselect like: SELECT a.blah, a.foo, FROM a, (SELECT * FROM b WHERE oid IS NULL) b WHERE a.blah = b.blah AND a.foo = b.foo After that let's hope I haven't embarrassed myself. -- Nigel J. Andrews
> -----Original Message----- > From: Nigel J. Andrews [mailto:nandrews@investsystems.co.uk] > Sent: Monday, January 06, 2003 4:58 PM > To: Dann Corbit > Cc: pgsql-hackers@postgresql.org; pgsql-general@postgresql.org > Subject: Re: [HACKERS] I feel the need for speed. What am I > doing wrong? > > Added -general list so that the next followup can remove > -hackers and everyone there will have had notice. > > > On Mon, 6 Jan 2003, Dann Corbit wrote: > > > > I have a query using two postgres tables. > > One is called "CNX_DS_53_SIS_STU_OPT_FEE_TB" and the other is called > > "CNX_DS2_53_SIS_STU_OPT_FEE_TB". > > > > I am getting 3 times slower performance than Microsoft Access when > > performing a left outer join. > > > > ... > > > > Here is the query: > > select a."RT_REC_KEY", a."cnxarraycolumn", a."CRC" from > > "CNX_DS_53_SIS_STU_OPT_FEE_TB" a left outer join > > "CNX_DS2_53_SIS_STU_OPT_FEE_TB" b on ( a."RT_REC_KEY" = > b."RT_REC_KEY" > > and a."cnxarraycolumn" = b."cnxarraycolumn") where b.oid is null ; > > > > > > Creating the following index had no effect on performance! create > > unique index i1 on "CNX_DS2_53_SIS_STU_OPT_FEE_TB" ("RT_REC_KEY", > > "cnxarraycolumn", "CRC"); > > > > Both tables had 6139062 rows of data. > > > > In this query ... all rows of data match perfectly, so no > results are > > returned. > > I suspect you get no results because it's unlikely b.oid will > be null. Are you sure the query is how it should be since you > seem to be expecting no rows to be returned and yet your > reason for that doesn't match the query as shown. Without the > oid test I'd bet you get a result set of 6139062 rows. There are supposed to be no results for this particular query. The data is unique with only RT_REC_KEY and cnxarraycolumn (I tried using that as an index also). The basic gist of it is like this: I want to know where the keys match (e.g.: RT_REC_KEY + cnxarraycolumn) but the CRC has changed (which will tell me what data has changed). This is for a data synchronization system that uses PostgreSQL as a join engine. I store primary key data together with a 64 bit CRC in data tables stored in PostgreSQL. In this particular case, there were no changes but there will be changes at other times. > > Is there a way to reformulate this query so that it will use the > > index? > > Given the above comment I'd say no since the entirety of both > tables will be tested to make the result set. > > Alternatively, if the query is right try something along the lines of: > > SELECT a.blah, a.foo, > FROM a, b > WHERE a.blah = b.blah AND a.foo = b.foo AND b.oid IS NULL OID is never null. I don't think that this query is equivalent. This Oid is just the standard PostgreSQL Oid, and not some arbitrary field. > if that doesn't use a query try pushing the null test into a > subselect like: > > SELECT a.blah, a.foo, > FROM a, (SELECT * FROM b WHERE oid IS NULL) b > WHERE a.blah = b.blah AND a.foo = b.foo > OID is never null. I don't think that this query is equivalent. > After that let's hope I haven't embarrassed myself. Probably, I did not explain myself clearly enough.
"Dann Corbit" <DCorbit@connx.com> writes: > Creating the following index had no effect on performance! > create unique index i1 on "CNX_DS2_53_SIS_STU_OPT_FEE_TB" ("RT_REC_KEY", > "cnxarraycolumn", "CRC"); What does EXPLAIN ANALYZE have to say about the query? If you set enable_seqscan = 0, does the explain result change? regards, tom lane
"Nigel J. Andrews" <nandrews@investsystems.co.uk> writes: >> select a."RT_REC_KEY", a."cnxarraycolumn", a."CRC" from >> "CNX_DS_53_SIS_STU_OPT_FEE_TB" a left outer join >> "CNX_DS2_53_SIS_STU_OPT_FEE_TB" b on ( a."RT_REC_KEY" = b."RT_REC_KEY" >> and a."cnxarraycolumn" = b."cnxarraycolumn") where b.oid is null ; > I suspect you get no results because it's unlikely b.oid will be null. Try "it's impossible for b.oid to be null --- unless a dummy b row is being provided by the LEFT JOIN". I interpret the purpose of the query to be to look for "a" rows that have no matching "b" row. Using OID for this is kind of cute, I guess, since it is guaranteed not-null in a real row; he doesn't have to think about whether any of his regular columns are not-null. regards, tom lane