Re: Query Performance SQL Server vs. Postgresql - Mailing list pgsql-performance
From | Pavel Stehule |
---|---|
Subject | Re: Query Performance SQL Server vs. Postgresql |
Date | |
Msg-id | AANLkTi=yFV69yNZpHzKiZRtZ0X0PtKcpOrMNNkAo+Si6@mail.gmail.com Whole thread Raw |
In response to | (Humair Mohammed <humairm@hotmail.com>) |
Responses |
Re: Query Performance SQL Server vs. Postgresql
Re: Query Performance SQL Server vs. Postgresql |
List | pgsql-performance |
2010/11/17 Humair Mohammed <humairm@hotmail.com>: > > There are no indexes on the tables either in SQL Server or Postgresql - I am > comparing apples to apples here. I ran ANALYZE on the postgresql tables, > after that query performance times are still high 42 seconds with COALESCE > and 35 seconds with IS DISTINCT FROM. > Here is the execution plan from Postgresql for qurey - select pb.id from > pivotbad pb inner join pivotgood pg on pb.id = pg.id and pb.question = > pg.question and coalesce(pb.response,'MISSING') <> > coalesce(pg.response,'MISSING') > Execution Time: 42 seconds > "Hash Join (cost=16212.30..48854.24 rows=93477 width=17)" > " Hash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text = > (pg.question)::text))" > " Join Filter: ((COALESCE(pb.response, 'MISSING'::character varying))::text > <> (COALESCE(pg.response, 'MISSING'::character varying))::text)" > " -> Seq Scan on pivotbad pb (cost=0.00..2804.96 rows=93496 width=134)" > " -> Hash (cost=7537.12..7537.12 rows=251212 width=134)" > " -> Seq Scan on pivotgood pg (cost=0.00..7537.12 rows=251212 > width=134)" this is little bit strange - did you ANALYZE and VACUUM? please send result of EXPLAIN ANALYZE Pavel > > And here is the execution plan from SQL Server for query - select pb.id from > pivotbad pb inner join pivotgood pg on pb.id = pg.id and pb.question = > pg.question and isnull(pb.response,'ISNULL')<> isnull(pg.response,'ISNULL') > Execution Time: < 1 second > Cost: 1% |--Parallelism(Gather Streams) > Cost: 31% |--Hash Match(Inner Join, HASH:([pb].[ID], > [pb].[Question])=([pg].[ID], [pg].[Question]), > RESIDUAL:([master].[dbo].[pivotbad].[ID] as > [pb].[ID]=[master].[dbo].[pivotgood].[ID] as [pg].[ID] AND > [master].[dbo].[pivotbad].[Question] as > [pb].[Question]=[master].[dbo].[pivotgood].[Question] as [pg].[Question] AND > [Expr1006]<>[Expr1007])) > Cost: 0% |--Bitmap(HASH:([pb].[ID], [pb].[Question]), > DEFINE:([Bitmap1008])) > Cost: 0% |--Compute > Scalar(DEFINE:([Expr1006]=isnull([master].[dbo].[pivotbad].[Response] as > [pb].[Response],'ISNULL'))) > Cost: 6% |--Parallelism(Repartition Streams, Hash > Partitioning, PARTITION COLUMNS:([pb].[ID], [pb].[Question])) > Cost: 12% |--Table Scan(OBJECT:([master].[dbo].[pivotbad] AS > [pb])) > Cost: 0% |--Compute > Scalar(DEFINE:([Expr1007]=isnull([master].[dbo].[pivotgood].[Response] as > [pg].[Response],'ISNULL'))) > Cost: 17% |--Parallelism(Repartition Streams, Hash > Partitioning, PARTITION COLUMNS:([pg].[ID], [pg].[Question])) > Cost: 33% |--Table > Scan(OBJECT:([master].[dbo].[pivotgood] AS [pg]), > WHERE:(PROBE([Bitmap1008],[master].[dbo].[pivotgood].[ID] as > [pg].[ID],[master].[dbo].[pivotgood].[Question] as [pg].[Question]))) > > > >> From: pavel.stehule@gmail.com >> Date: Tue, 16 Nov 2010 08:12:03 +0100 >> Subject: Re: [PERFORM] >> To: humairm@hotmail.com >> CC: pgsql-performance@postgresql.org >> >> 2010/11/15 Humair Mohammed <humairm@hotmail.com>: >> > I have 2 tables with a 200,000 rows of data 3 character/string columns >> > ID, >> > Question and Response. The query below compares the data between the 2 >> > tables based on ID and Question and if the Response does not match >> > between >> > the left table and the right table it identifies the ID's where there is >> > a >> > mismatch. Running the query in SQL Server 2008 using the ISNULL function >> > take a few milliseconds. Running the same query in Postgresql takes over >> > 70 >> > seconds. The 2 queries are below: >> > SQL Server 2008 R2 Query >> > select t1.id from table1 t1 inner join table2 t2 on t1.id = t2.id and >> > t1.question = t2.question and isnull(t1.response,'ISNULL') <> >> > isnull(t2.response,'ISNULL') >> >> > Postgres 9.1 Query >> > select t1.id from table1 t1 inner join table2 t2 on t1.id = t2.id and >> > t1.question = t2.question and coalesce(t1.response,'ISNULL') <> >> > coalesce(t2.response,'ISNULL') >> > What gives? >> >> I think, so must problem can be in ugly predicate >> coalesce(t1.response,'ISNULL') <> >> > coalesce(t2.response,'ISNULL') >> >> try use a IS DISTINCT OF operator >> >> ... AND t1.response IS DISTINCT t2.response >> >> Regards >> >> Pavel Stehule >> >> p.s. don't use a coalesce in WHERE clause if it is possible. >
pgsql-performance by date: