Re: Query Performance SQL Server vs. Postgresql - Mailing list pgsql-performance
From | Pavel Stehule |
---|---|
Subject | Re: Query Performance SQL Server vs. Postgresql |
Date | |
Msg-id | AANLkTinQmaQ5Ui-vQwfqWFtOTCj76NWFBr6VDKMdVUCs@mail.gmail.com Whole thread Raw |
In response to | Re: Query Performance SQL Server vs. Postgresql (Humair Mohammed <humairm@hotmail.com>) |
Responses |
Re: Query Performance SQL Server vs. Postgresql
Re: Query Performance SQL Server vs. Postgresql |
List | pgsql-performance |
Hello, there should be a problem in a statistic, they are out of reality. Please, try to use a DISTINCT OF operator now - maybe a statistic will be better. Next - try to increase a work_mem. Hash join is untypically slow in your comp. Regards Pavel Stehule 2010/11/17 Humair Mohammed <humairm@hotmail.com>: > Yes strange indeed, I did rerun ANALYZE and VACCUM. Took 70 seconds to rerun > the query. Results from EXPLAIN ANALYZE below: > "Hash Join (cost=16212.30..52586.43 rows=92869 width=17) (actual > time=43200.223..49502.874 rows=3163 loops=1)" > " Hash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text = > (pg.question)::text))" > " Join Filter: ((COALESCE(pb.response, 'MISSING'::character varying))::text > <> (COALESCE(pg.response, 'MISSING'::character varying))::text)" > " -> Seq Scan on pivotbad pb (cost=0.00..2804.96 rows=93496 width=134) > (actual time=0.009..48.200 rows=93496 loops=1)" > " -> Hash (cost=7537.12..7537.12 rows=251212 width=134) (actual > time=42919.453..42919.453 rows=251212 loops=1)" > " Buckets: 1024 Batches: 64 Memory Usage: 650kB" > " -> Seq Scan on pivotgood pg (cost=0.00..7537.12 rows=251212 > width=134) (actual time=0.119..173.019 rows=251212 loops=1)" > "Total runtime: 49503.450 ms" > >> From: pavel.stehule@gmail.com >> Date: Wed, 17 Nov 2010 05:47:51 +0100 >> Subject: Re: Query Performance SQL Server vs. Postgresql >> To: humairm@hotmail.com >> CC: pgsql-performance@postgresql.org >> >> 2010/11/17 Humair Mohammed <humairm@hotmail.com>: >> > >> > There are no indexes on the tables either in SQL Server or Postgresql - >> > I am >> > comparing apples to apples here. I ran ANALYZE on the postgresql tables, >> > after that query performance times are still high 42 seconds with >> > COALESCE >> > and 35 seconds with IS DISTINCT FROM. >> > Here is the execution plan from Postgresql for qurey - select pb.id from >> > pivotbad pb inner join pivotgood pg on pb.id = pg.id and pb.question = >> > pg.question and coalesce(pb.response,'MISSING') <> >> > coalesce(pg.response,'MISSING') >> > Execution Time: 42 seconds >> > "Hash Join (cost=16212.30..48854.24 rows=93477 width=17)" >> > " Hash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text >> > = >> > (pg.question)::text))" >> > " Join Filter: ((COALESCE(pb.response, 'MISSING'::character >> > varying))::text >> > <> (COALESCE(pg.response, 'MISSING'::character varying))::text)" >> > " -> Seq Scan on pivotbad pb (cost=0.00..2804.96 rows=93496 >> > width=134)" >> > " -> Hash (cost=7537.12..7537.12 rows=251212 width=134)" >> > " -> Seq Scan on pivotgood pg (cost=0.00..7537.12 rows=251212 >> > width=134)" >> >> this is little bit strange - did you ANALYZE and VACUUM? >> >> please send result of EXPLAIN ANALYZE >> >> Pavel >> >> > >> > And here is the execution plan from SQL Server for query - select pb.id >> > from >> > pivotbad pb inner join pivotgood pg on pb.id = pg.id and pb.question = >> > pg.question and isnull(pb.response,'ISNULL')<> >> > isnull(pg.response,'ISNULL') >> > Execution Time: < 1 second >> > Cost: 1% |--Parallelism(Gather Streams) >> > Cost: 31% |--Hash Match(Inner Join, HASH:([pb].[ID], >> > [pb].[Question])=([pg].[ID], [pg].[Question]), >> > RESIDUAL:([master].[dbo].[pivotbad].[ID] as >> > [pb].[ID]=[master].[dbo].[pivotgood].[ID] as [pg].[ID] AND >> > [master].[dbo].[pivotbad].[Question] as >> > [pb].[Question]=[master].[dbo].[pivotgood].[Question] as [pg].[Question] >> > AND >> > [Expr1006]<>[Expr1007])) >> > Cost: 0% |--Bitmap(HASH:([pb].[ID], [pb].[Question]), >> > DEFINE:([Bitmap1008])) >> > Cost: 0% |--Compute >> > Scalar(DEFINE:([Expr1006]=isnull([master].[dbo].[pivotbad].[Response] as >> > [pb].[Response],'ISNULL'))) >> > Cost: 6% |--Parallelism(Repartition Streams, Hash >> > Partitioning, PARTITION COLUMNS:([pb].[ID], [pb].[Question])) >> > Cost: 12% |--Table Scan(OBJECT:([master].[dbo].[pivotbad] >> > AS >> > [pb])) >> > Cost: 0% |--Compute >> > Scalar(DEFINE:([Expr1007]=isnull([master].[dbo].[pivotgood].[Response] >> > as >> > [pg].[Response],'ISNULL'))) >> > Cost: 17% |--Parallelism(Repartition Streams, Hash >> > Partitioning, PARTITION COLUMNS:([pg].[ID], [pg].[Question])) >> > Cost: 33% |--Table >> > Scan(OBJECT:([master].[dbo].[pivotgood] AS [pg]), >> > WHERE:(PROBE([Bitmap1008],[master].[dbo].[pivotgood].[ID] as >> > [pg].[ID],[master].[dbo].[pivotgood].[Question] as [pg].[Question]))) >> > >> > >> > >> >> From: pavel.stehule@gmail.com >> >> Date: Tue, 16 Nov 2010 08:12:03 +0100 >> >> Subject: Re: [PERFORM] >> >> To: humairm@hotmail.com >> >> CC: pgsql-performance@postgresql.org >> >> >> >> 2010/11/15 Humair Mohammed <humairm@hotmail.com>: >> >> > I have 2 tables with a 200,000 rows of data 3 character/string >> >> > columns >> >> > ID, >> >> > Question and Response. The query below compares the data between the >> >> > 2 >> >> > tables based on ID and Question and if the Response does not match >> >> > between >> >> > the left table and the right table it identifies the ID's where there >> >> > is >> >> > a >> >> > mismatch. Running the query in SQL Server 2008 using the ISNULL >> >> > function >> >> > take a few milliseconds. Running the same query in Postgresql takes >> >> > over >> >> > 70 >> >> > seconds. The 2 queries are below: >> >> > SQL Server 2008 R2 Query >> >> > select t1.id from table1 t1 inner join table2 t2 on t1.id = t2.id and >> >> > t1.question = t2.question and isnull(t1.response,'ISNULL') <> >> >> > isnull(t2.response,'ISNULL') >> >> >> >> > Postgres 9.1 Query >> >> > select t1.id from table1 t1 inner join table2 t2 on t1.id = t2.id and >> >> > t1.question = t2.question and coalesce(t1.response,'ISNULL') <> >> >> > coalesce(t2.response,'ISNULL') >> >> > What gives? >> >> >> >> I think, so must problem can be in ugly predicate >> >> coalesce(t1.response,'ISNULL') <> >> >> > coalesce(t2.response,'ISNULL') >> >> >> >> try use a IS DISTINCT OF operator >> >> >> >> ... AND t1.response IS DISTINCT t2.response >> >> >> >> Regards >> >> >> >> Pavel Stehule >> >> >> >> p.s. don't use a coalesce in WHERE clause if it is possible. >> > >
pgsql-performance by date: