Home > mailing lists

Re: Query Performance SQL Server vs. Postgresql - Mailing list pgsql-performance

From	Pavel Stehule
Subject	Re: Query Performance SQL Server vs. Postgresql
Date	November 17, 2010 00:48:40
Msg-id	AANLkTi=yFV69yNZpHzKiZRtZ0X0PtKcpOrMNNkAo+Si6@mail.gmail.com Whole thread
In response to	(Humair Mohammed <humairm@hotmail.com>)
Responses	Re: Query Performance SQL Server vs. Postgresql Re: Query Performance SQL Server vs. Postgresql
List	pgsql-performance

Tree view

2010/11/17 Humair Mohammed <humairm@hotmail.com>:
>
> There are no indexes on the tables either in SQL Server or Postgresql - I am
> comparing apples to apples here. I ran ANALYZE on the postgresql tables,
> after that query performance times are still high 42 seconds with COALESCE
> and 35 seconds with IS DISTINCT FROM.
> Here is the execution plan from Postgresql for qurey - select pb.id from
> pivotbad pb inner join pivotgood pg on pb.id = pg.id and pb.question =
> pg.question and coalesce(pb.response,'MISSING') <>
> coalesce(pg.response,'MISSING')
> Execution Time: 42 seconds
> "Hash Join  (cost=16212.30..48854.24 rows=93477 width=17)"
> "  Hash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text =
> (pg.question)::text))"
> "  Join Filter: ((COALESCE(pb.response, 'MISSING'::character varying))::text
> <> (COALESCE(pg.response, 'MISSING'::character varying))::text)"
> "  ->  Seq Scan on pivotbad pb  (cost=0.00..2804.96 rows=93496 width=134)"
> "  ->  Hash  (cost=7537.12..7537.12 rows=251212 width=134)"
> "        ->  Seq Scan on pivotgood pg  (cost=0.00..7537.12 rows=251212
> width=134)"

this is little bit strange - did you ANALYZE and VACUUM?

please send result of EXPLAIN ANALYZE

Pavel

>
> And here is the execution plan from SQL Server for query - select pb.id from
> pivotbad pb inner join pivotgood pg on pb.id = pg.id and pb.question =
> pg.question and isnull(pb.response,'ISNULL')<>  isnull(pg.response,'ISNULL')
> Execution Time: < 1 second
> Cost: 1%  |--Parallelism(Gather Streams)
> Cost: 31%       |--Hash Match(Inner Join, HASH:([pb].[ID],
> [pb].[Question])=([pg].[ID], [pg].[Question]),
> RESIDUAL:([master].[dbo].[pivotbad].[ID] as
> [pb].[ID]=[master].[dbo].[pivotgood].[ID] as [pg].[ID] AND
> [master].[dbo].[pivotbad].[Question] as
> [pb].[Question]=[master].[dbo].[pivotgood].[Question] as [pg].[Question] AND
> [Expr1006]<>[Expr1007]))
>     Cost: 0%  |--Bitmap(HASH:([pb].[ID], [pb].[Question]),
> DEFINE:([Bitmap1008]))
>             Cost: 0%    |--Compute
> Scalar(DEFINE:([Expr1006]=isnull([master].[dbo].[pivotbad].[Response] as
> [pb].[Response],'ISNULL')))
>             Cost:  6%   |--Parallelism(Repartition Streams, Hash
> Partitioning, PARTITION COLUMNS:([pb].[ID], [pb].[Question]))
>             Cost: 12%  |--Table Scan(OBJECT:([master].[dbo].[pivotbad] AS
> [pb]))
>             Cost: 0% |--Compute
> Scalar(DEFINE:([Expr1007]=isnull([master].[dbo].[pivotgood].[Response] as
> [pg].[Response],'ISNULL')))
>                 Cost: 17% |--Parallelism(Repartition Streams, Hash
> Partitioning, PARTITION COLUMNS:([pg].[ID], [pg].[Question]))
>                     Cost: 33% |--Table
> Scan(OBJECT:([master].[dbo].[pivotgood] AS [pg]),
> WHERE:(PROBE([Bitmap1008],[master].[dbo].[pivotgood].[ID] as
> [pg].[ID],[master].[dbo].[pivotgood].[Question] as [pg].[Question])))
>
>
>
>> From: pavel.stehule@gmail.com
>> Date: Tue, 16 Nov 2010 08:12:03 +0100
>> Subject: Re: [PERFORM]
>> To: humairm@hotmail.com
>> CC: pgsql-performance@postgresql.org
>>
>> 2010/11/15 Humair Mohammed <humairm@hotmail.com>:
>> > I have 2 tables with a 200,000 rows of data 3 character/string columns
>> > ID,
>> > Question and Response. The query below compares the data between the 2
>> > tables based on ID and Question and if the Response does not match
>> > between
>> > the left table and the right table it identifies the ID's where there is
>> > a
>> > mismatch. Running the query in SQL Server 2008 using the ISNULL function
>> > take a few milliseconds. Running the same query in Postgresql takes over
>> > 70
>> > seconds. The 2 queries are below:
>> > SQL Server 2008 R2 Query
>> > select t1.id from table1 t1 inner join table2 t2 on t1.id = t2.id and
>> > t1.question = t2.question and isnull(t1.response,'ISNULL') <>
>> > isnull(t2.response,'ISNULL')
>>
>> > Postgres 9.1 Query
>> > select t1.id from table1 t1 inner join table2 t2 on t1.id = t2.id and
>> > t1.question = t2.question and coalesce(t1.response,'ISNULL') <>
>> > coalesce(t2.response,'ISNULL')
>> > What gives?
>>
>> I think, so must problem can be in ugly predicate
>> coalesce(t1.response,'ISNULL') <>
>> > coalesce(t2.response,'ISNULL')
>>
>> try use a IS DISTINCT OF operator
>>
>> ... AND t1.response IS DISTINCT t2.response
>>
>> Regards
>>
>> Pavel Stehule
>>
>> p.s. don't use a coalesce in WHERE clause if it is possible.
>

pgsql-performance by date:

From: Robert Haas
Date: 16 November 2010, 23:07:57
Subject: Re: Defaulting wal_sync_method to fdatasync on Linux for 9.1?

From: Eric Comeau
Date: 17 November 2010, 11:01:33
Subject: How to achieve sustained disk performance of 1.25 GB write for 5 mins

Re: Query Performance SQL Server vs. Postgresql - Mailing list pgsql-performance

Previous

Next