Home > mailing lists

Re: Improving executor performance - tidbitmap - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: Improving executor performance - tidbitmap
Date	July 14, 2016 03:06:27
Msg-id	20160714030616.dvlkboz6cw555ixb@alap3.anarazel.de Whole thread Raw
In response to	Improving executor performance (Andres Freund <andres@anarazel.de>)
Responses	Re: Improving executor performance - tidbitmap Re: Improving executor performance - tidbitmap
List	pgsql-hackers

Tree view

Hi,

On 2016-06-24 16:29:53 -0700, Andres Freund wrote:
> 4) Various missing micro optimizations have to be performed, for more
>    architectural issues to become visible. E.g. [2] causes such bad
>    slowdowns in hash-agg workloads, that other bottlenecks are hidden.

One such issue is the usage of dynahash.c in tidbitmap.c. In many
queries, e.g. tpch q7, the bitmapscan is often the bottleneck. Profiling
shows that this is largely due to dynahash.c being slow. Primary issues
are: a) two level structure doubling the amount of indirect lookups b)
indirect function calls c) using separate chaining based conflict
resolution d) being too general.

I've quickly hacked up an alternative linear addressing hashtable
implementation. And the improvements are quite remarkable.

Example Query:
EXPLAIN ANALYZE SELECT SUM(l_extendedprice) FROM lineitem WHERE (l_shipdate >= '1995-01-01'::date) AND (l_shipdate <=
'1996-12-31'::date);

before:

┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                    QUERY PLAN
                          │ 

├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Aggregate  (cost=942046.72..942046.73 rows=1 width=8) (actual time=4647.908..4647.909 rows=1 loops=1)
                          │ 
│   ->  Bitmap Heap Scan on lineitem  (cost=193514.84..919246.17 rows=9120222 width=8) (actual time=2667.821..3885.598
rows=9113219loops=1)       │ 
│         Recheck Cond: ((l_shipdate >= '1995-01-01'::date) AND (l_shipdate <= '1996-12-31'::date))
                          │ 
│         Heap Blocks: exact=585542
                          │ 
│         ->  Bitmap Index Scan on i_l_shipdate  (cost=0.00..191234.78 rows=9120222 width=0) (actual
time=2461.622..2461.622rows=9113219 loops=1) │ 
│               Index Cond: ((l_shipdate >= '1995-01-01'::date) AND (l_shipdate <= '1996-12-31'::date))
                          │ 
│ Planning time: 0.170 ms
                          │ 
│ Execution time: 4648.921 ms
                          │ 

└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

timing without analyze: 4136.425 4101.873 4177.441

after:

┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                   QUERY PLAN
                        │ 

├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Aggregate  (cost=942046.72..942046.73 rows=1 width=8) (actual time=3218.422..3218.423 rows=1 loops=1)
                        │ 
│   ->  Bitmap Heap Scan on lineitem  (cost=193514.84..919246.17 rows=9120222 width=8) (actual time=1153.707..2430.500
rows=9113219loops=1)     │ 
│         Recheck Cond: ((l_shipdate >= '1995-01-01'::date) AND (l_shipdate <= '1996-12-31'::date))
                        │ 
│         Heap Blocks: exact=585542
                        │ 
│         ->  Bitmap Index Scan on i_l_shipdate  (cost=0.00..191234.78 rows=9120222 width=0) (actual
time=952.161..952.161rows=9113219 loops=1) │ 
│               Index Cond: ((l_shipdate >= '1995-01-01'::date) AND (l_shipdate <= '1996-12-31'::date))
                        │ 
│ Planning time: 1.075 ms
                        │ 
│ Execution time: 3221.861 ms
                        │ 

└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(8 rows)

timing without analyze: 2647.364 2674.456 2680.197

as you can see the the time for the bitmap index scan goes from 2461.622
to 952.161.

I'm not proposing to apply the patch as-is, but I think it's a good
starting point to discuss how to evolve our use of hash tables.

I'm wondering whether we can do 'macro based templates' or
something. I.e. have something like the simplehash in the patch in
simplehash.h, but the key/value widths, the function names, are all
determined by macros (oh, this would be easier with C++ templates...).

Does anybody have a better idea?

The major issue with the simplehash implementation in the path is
probably the deletion; which should rather move cells around, rather
than use toombstones. But that was too complex for a POC ;). Also, it'd
likely need a proper iterator interface.

FWIW, the dynahash usage in nodeAgg.c is a major bottleneck in a number
of other queries.

Regards,

Andres

Attachment

0008-WIP-Use-more-efficient-hashtable-for-tidbitmap.c.patch

pgsql-hackers by date:

From: Andres Freund
Date: 14 July 2016, 01:19:19
Subject: Re: Improving executor performance

From: Robert Haas
Date: 14 July 2016, 04:34:47
Subject: Re: Showing parallel status in \df+

Re: Improving executor performance - tidbitmap - Mailing list pgsql-hackers

Attachment

Previous

Next