Wrong example in the bloom documentation - Mailing list pgsql-docs
From | Daniel Westermann (DWE) |
---|---|
Subject | Wrong example in the bloom documentation |
Date | |
Msg-id | ZR0P278MB0122119FAE78721A694C30C8D2340@ZR0P278MB0122.CHEP278.PROD.OUTLOOK.COM Whole thread Raw |
Responses |
Re: Wrong example in the bloom documentation
|
List | pgsql-docs |
Hi, I've briefly discussed this with Bruce some time ago in [1]. Replaying the example referenced in the documentation does not give a Bitmap Heap Scan on tbloom but a parallel seq scanwith the default configuration: -- tested on head postgres=# CREATE TABLE tbloom AS postgres-# SELECT postgres-# (random() * 1000000)::int as i1, postgres-# (random() * 1000000)::int as i2, postgres-# (random() * 1000000)::int as i3, postgres-# (random() * 1000000)::int as i4, postgres-# (random() * 1000000)::int as i5, postgres-# (random() * 1000000)::int as i6 postgres-# FROM postgres-# generate_series(1,10000000); SELECT 10000000 postgres=# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6); CREATE INDEX postgres=# CREATE index btreeidx ON tbloom (i1, i2, i3, i4, i5, i6); CREATE INDEX postgres=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------- Gather (cost=1000.00..127220.00 rows=250 width=24) (actual time=2134.851..2221.836 rows=0 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Seq Scan on tbloom (cost=0.00..126195.00 rows=104 width=24) (actual time=1770.691..1770.692 rows=0 loops=3) Filter: ((i2 = 898732) AND (i5 = 123451)) Rows Removed by Filter: 3333333 Planning Time: 0.895 ms JIT: Functions: 6 Options: Inlining false, Optimization false, Expressions true, Deforming true Timing: Generation 65.512 ms, Inlining 0.000 ms, Optimization 46.328 ms, Emission 40.658 ms, Total 152.499 ms Execution Time: 2288.056 ms (12 rows) As bloom was introduced in 9.6 I quickly tried with 9.6.17 and indeed for this version the example is correct: postgres=# select version(); version ---------------------------------------------------------------------------------------------------------- PostgreSQL 9.6.17 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.3.1 20190507 (Red Hat 8.3.1-4), 64-bit (1 row) postgres=# CREATE TABLE tbloom AS postgres-# SELECT postgres-# (random() * 1000000)::int as i1, postgres-# (random() * 1000000)::int as i2, postgres-# (random() * 1000000)::int as i3, postgres-# (random() * 1000000)::int as i4, postgres-# (random() * 1000000)::int as i5, postgres-# (random() * 1000000)::int as i6 postgres-# FROM postgres-# generate_series(1,10000000); SELECT 10000000 postgres=# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6); CREATE INDEX postgres=# CREATE index btreeidx ON tbloom (i1, i2, i3, i4, i5, i6); CREATE INDEX postgres=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on tbloom (cost=178436.06..179392.83 rows=250 width=24) (actual time=2279.363..2279.363 rows=0 loops=1) Recheck Cond: ((i2 = 898732) AND (i5 = 123451)) Rows Removed by Index Recheck: 2329 Heap Blocks: exact=2288 -> Bitmap Index Scan on bloomidx (cost=0.00..178436.00 rows=250 width=0) (actual time=994.406..994.406 rows=2329 loops=1) Index Cond: ((i2 = 898732) AND (i5 = 123451)) Planning time: 282.059 ms Execution time: 2286.138 ms (8 rows) The reason is that parallel execution is disabled by default in 9.6, and if that is turned on the plan changes there as well: postgres=# set max_parallel_workers_per_gather = 2; SET postgres=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------- Gather (cost=1000.00..127194.29 rows=1 width=24) (actual time=1148.047..1148.206 rows=0 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Seq Scan on tbloom (cost=0.00..126194.19 rows=1 width=24) (actual time=1039.501..1039.501 rows=0 loops=3) Filter: ((i2 = 898732) AND (i5 = 123451)) Rows Removed by Filter: 3333333 Planning time: 0.580 ms Execution time: 1148.247 ms (8 rows) Starting with PostgreSQL 10 the example in the documentation is therefore wrong. Attached a proposal to fix this. The newexample starts with 100x reduced rows (as suggested by Bruce in [1] and adds a note that the behavior changes as soonas parallel execution is cheaper than the index access. Thoughts? Regards Daniel [1] https://www.postgresql.org/message-id/flat/20191105231854.GA26542%40momjian.us#7859b106ce614dd9530941196dad5bbc
Attachment
pgsql-docs by date: