Thread: BUG #18588: Cannot force/let database use parallel execution in simple case.

BUG #18588: Cannot force/let database use parallel execution in simple case.

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      18588
Logged by:          Maxim Boguk
Email address:      maxim.boguk@gmail.com
PostgreSQL version: 15.6
Operating system:   Ubuntu
Description:

Hi,

I found very strange case when database simple do not use parallel plan in
trivial query (where it should provide huge benefit)
In the same time - if I add one not exists (subselect) the database execute
query in parallel mode (providing 8x speedup).

Initial query:
explain analyze select topic_id as y0_ from
public.negotiation_topic_archive_p005 this_ where
this_.employer_id='816144';

                      QUERY PLAN
                                                       

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using negotiation_topic_archive_p005_employer_id_resume_id_idx
on negotiation_topic_archive_p005 this_  (cost=0.11..2501972.13 rows=505757
width=8) (actual time=0.020..699.322 rows=510040 loops=1)
   Index Cond: (employer_id = 816144)
 Planning Time: 0.132 ms
 Execution Time: 717.005 ms

setting 
set min_parallel_index_scan_size to '8kB';
                                         
set min_parallel_table_scan_size to '8kB';
set parallel_tuple_cost to 0;
set parallel_setup_cost to 0;

have no effect.

Even with the set force_parallel_mode to on - no effect:
negotiation_chat_archive=# explain analyze select topic_id as y0_ from
public.negotiation_topic_archive_p005 this_ where
this_.employer_id='816144';

                         QUERY PLAN
                                                             

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=0.11..2501972.13 rows=505757 width=8) (actual
time=5.234..748.606 rows=510040 loops=1)
   Workers Planned: 1
   Workers Launched: 1
   Single Copy: true
   ->  Index Scan using
negotiation_topic_archive_p005_employer_id_resume_id_idx on
negotiation_topic_archive_p005 this_  (cost=0.11..2501972.13 rows=505757
width=8) (actual time=0.021..662.798 rows=510040 loops=1)
         Index Cond: (employer_id = 816144)
 Planning Time: 0.117 ms
 Execution Time: 768.117 ms


In the same time a bit more complicated query over the same data - happily
work in parallel (added AND NOT EXISTS (SELECT FROM applicant_black_list
WHERE this_.employer_id=applicant_black_list.employer_id AND
this_.applicant_id=applicant_black_list.applicant_id) condition):

explain analyze select topic_id as y0_ from
public.negotiation_topic_archive_p005 this_ where this_.employer_id='816144'
AND NOT EXISTS (SELECT FROM applicant_black_list WHERE
this_.employer_id=applicant_black_list.employer_id AND
this_.applicant_id=applicant_black_list.applicant_id);

                                QUERY PLAN
                                                                          

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=10.27..2498445.60 rows=505754 width=8) (actual
time=1.452..106.335 rows=509901 loops=1)
   Workers Planned: 7
   Workers Launched: 7
   ->  Parallel Hash Anti Join  (cost=10.27..2498445.60 rows=72251 width=8)
(actual time=0.224..94.754 rows=63738 loops=8)
         Hash Cond: ((this_.employer_id = applicant_black_list.employer_id)
AND (this_.applicant_id = applicant_black_list.applicant_id))
         ->  Parallel Index Scan using
negotiation_topic_archive_p005_employer_id_resume_id_idx on
negotiation_topic_archive_p005 this_  (cost=0.11..2497637.07 rows=72251
width=16) (actual time=0.019..85.305 rows=63755 loops=8)
               Index Cond: (employer_id = 816144)
         ->  Parallel Hash  (cost=10.13..10.13 rows=2 width=8) (actual
time=0.027..0.027 rows=5 loops=8)
               Buckets: 1024  Batches: 1  Memory Usage: 40kB
               ->  Parallel Index Only Scan using employer_to_user_index on
applicant_black_list  (cost=0.08..10.13 rows=2 width=8) (actual
time=0.013..0.022 rows=41 loops=1)
                     Index Cond: (employer_id = 816144)
                     Heap Fetches: 5
 Planning Time: 2.550 ms
 Execution Time: 121.810 ms
(14 rows)

I feel something wrong there and I happy to assist debugging this issue (I
can use gdb with separated instance of the same data).
May be I missing some obvious restrictions but cannot find anything such in
documentation and speedup provided by parallel query would be enormous in my
case.

Kind Regards,
Maxim


On Fri, 23 Aug 2024 at 01:14, PG Bug reporting form
<noreply@postgresql.org> wrote:
> set min_parallel_index_scan_size to '8kB';
>
> set min_parallel_table_scan_size to '8kB';
> set parallel_tuple_cost to 0;
> set parallel_setup_cost to 0;
>
> have no effect.
>
> Even with the set force_parallel_mode to on - no effect:
> negotiation_chat_archive=# explain analyze select topic_id as y0_ from
> public.negotiation_topic_archive_p005 this_ where
> this_.employer_id='816144';

Does it choose a parallel plan if you do:

ALTER TABLE public.negotiation_topic_archive_p005 SET (parallel_workers = 2);

I assume it must be a btree index given the other plan does use a
parallel scan. So I wondered if something weird was happening in
compute_parallel_worker() and it was choosing 1 worker, which might
cause the path to be rejected.

You might need to come up with a self-contained test case here as when
I tried this on PG15, I do get a parallel index scan.

create table t1 (a int);
insert into t1 select x/1000 from generate_Series(1,1000000)X;
create index on t1(a);
set min_parallel_index_scan_size to '8kB';
set min_parallel_table_scan_size to '8kB';
set parallel_tuple_cost to 0;
set parallel_setup_cost to 0;
explain select * from t1 where a < 10;
                                         QUERY PLAN
--------------------------------------------------------------------------------------------
 Gather  (cost=0.42..179.40 rows=8912 width=4)
   Workers Planned: 2
   ->  Parallel Index Only Scan using t1_a_idx on t1
(cost=0.42..179.40 rows=3713 width=4)
         Index Cond: (a < 10)
(4 rows)

select version();
                                                version
-------------------------------------------------------------------------------------------------------
 PostgreSQL 15.6 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
(1 row)

David



On Thu, Aug 22, 2024 at 5:34 PM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Fri, 23 Aug 2024 at 01:14, PG Bug reporting form
> <noreply@postgresql.org> wrote:
> > set min_parallel_index_scan_size to '8kB';
> >
> > set min_parallel_table_scan_size to '8kB';
> > set parallel_tuple_cost to 0;
> > set parallel_setup_cost to 0;
> >
> > have no effect.
> >
> > Even with the set force_parallel_mode to on - no effect:
> > negotiation_chat_archive=# explain analyze select topic_id as y0_ from
> > public.negotiation_topic_archive_p005 this_ where
> > this_.employer_id='816144';
>
> Does it choose a parallel plan if you do:
>
> ALTER TABLE public.negotiation_topic_archive_p005 SET (parallel_workers = 2);
>
> I assume it must be a btree index given the other plan does use a
> parallel scan. So I wondered if something weird was happening in
> compute_parallel_worker() and it was choosing 1 worker, which might
> cause the path to be rejected.
>
> You might need to come up with a self-contained test case here as when
> I tried this on PG15, I do get a parallel index scan.
>
> create table t1 (a int);
> insert into t1 select x/1000 from generate_Series(1,1000000)X;
> create index on t1(a);
> set min_parallel_index_scan_size to '8kB';
> set min_parallel_table_scan_size to '8kB';
> set parallel_tuple_cost to 0;
> set parallel_setup_cost to 0;
> explain select * from t1 where a < 10;
>                                          QUERY PLAN
> --------------------------------------------------------------------------------------------
>  Gather  (cost=0.42..179.40 rows=8912 width=4)
>    Workers Planned: 2
>    ->  Parallel Index Only Scan using t1_a_idx on t1
> (cost=0.42..179.40 rows=3713 width=4)
>          Index Cond: (a < 10)
> (4 rows)
>
> select version();
>                                                 version
> -------------------------------------------------------------------------------------------------------
>  PostgreSQL 15.6 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
> 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
> (1 row)
>
> David

Hi David,

Some additional info about table:
\dt+ negotiation_topic_archive_p005
                                                List of relations
 Schema |              Name              | Type  |  Owner   |
Persistence | Access method |  Size  | Description
--------+--------------------------------+-------+----------+-------------+---------------+--------+-------------
 public | negotiation_topic_archive_p005 | table | postgres |
permanent   | heap          | 112 GB |

select relpages from pg_class where relname='negotiation_topic_archive_p005';
 relpages
----------
 14688625

select count(*) from negotiation_topic_archive_p005;
   count
-----------
 499710323
Quite big table, but nothing unusual for contemporary hardware.
Hm integer(?) overflow somewhere in planner costing model of parallel scans?
I'll looked into compute_parallel_worker code (
https://doxygen.postgresql.org/paths_8h.html#a64fa1e38a91614994697a7404cea242a
) but I didn't found anything obviously wrong for me there (but my C
knowledge pretty limited).

There are some results of different tests:
0. ALTER TABLE public.negotiation_topic_archive_p005 SET
(parallel_workers = 2); - have no effect

1. Create fresh table on same db with only required data (local
minimal test case):
Unfortunately - no luck:

--trying make miminal isolated case
create table user_tmp.negotiation_topic_archive_test as select
employer_id, applicant_id from negotiation_topic_archive_p005;
SELECT 499710323

create index negotiation_topic_archive_test_employer_id_key on
user_tmp.negotiation_topic_archive_test(employer_id);
CREATE INDEX

analyze verbose user_tmp.negotiation_topic_archive_test;
INFO:  analyzing "user_tmp.negotiation_topic_archive_test"
INFO:  "negotiation_topic_archive_test": scanned 30000 of 2211108
pages, containing 6780000 live rows and 0 dead rows; 30000 rows in
sample, 499710408 estimated total rows
ANALYZE

select relpages from pg_class where relname='negotiation_topic_archive_test';
 relpages
----------
  2211108

--no luck
explain
select * from user_tmp.negotiation_topic_archive_test this_
where
this_.employer_id=816144;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using negotiation_topic_archive_test_employer_id_key on
negotiation_topic_archive_test this_  (cost=0.11..2047834.92
rows=449739 width=8)
   Index Cond: (employer_id = 816144)

--still no luck
set min_parallel_index_scan_size to 0;
set min_parallel_table_scan_size to 0;
set parallel_tuple_cost to 0;
set parallel_setup_cost to 0;
set force_parallel_mode to on;

explain
select * from user_tmp.negotiation_topic_archive_test this_
where
this_.employer_id=816144;


  QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=0.11..2047834.92 rows=449739 width=8)
   Workers Planned: 1
   Single Copy: true
   ->  Index Scan using negotiation_topic_archive_test_employer_id_key
on negotiation_topic_archive_test this_  (cost=0.11..2047834.92
rows=449739 width=8)
         Index Cond: (employer_id = 816144)

--with not exists - still ok
explain
select * from user_tmp.negotiation_topic_archive_test this_
where
this_.employer_id=816144
    AND NOT EXISTS (
        SELECT FROM applicant_black_list
        WHERE this_.employer_id=applicant_black_list.employer_id
        AND this_.applicant_id=applicant_black_list.applicant_id
      );

         QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=10.27..2045714.44 rows=449737 width=8)
   Workers Planned: 4
   ->  Parallel Hash Anti Join  (cost=10.27..2045714.44 rows=112434 width=8)
         Hash Cond: ((this_.employer_id =
applicant_black_list.employer_id) AND (this_.applicant_id =
applicant_black_list.applicant_id))
         ->  Parallel Index Scan using
negotiation_topic_archive_test_employer_id_key on
negotiation_topic_archive_test this_  (cost=0.11..2044461.88
rows=112435 width=8)
               Index Cond: (employer_id = 816144)
         ->  Parallel Hash  (cost=10.13..10.13 rows=2 width=8)
               ->  Parallel Index Only Scan using
employer_to_user_index on applicant_black_list  (cost=0.08..10.13
rows=2 width=8)
                     Index Cond: (employer_id = 816144)

So isolated tests on table without inheritance parents or some
legacy/forgotten custom settings - show the same results.

2.Now lets try dump/reload to the test host with Postgresql 16.4 on
stock pgdg default postgresql.conf:
pg_dump -Fc --compress=2
--table=user_tmp.negotiation_topic_archive_test
--table=applicant_black_list db > negotiation_topic_archive_test.dump

ls -la negotiation_topic_archive_test.dump
-rw-r--r-- 1 postgres postgres 4028520703 Aug 22 19:10
negotiation_topic_archive_test.dump

--on test host... and still no luck
pg_restore -Fc --port=5433 --dbname=test negotiation_topic_archive_test.dump

test=# analyze applicant_black_list;
ANALYZE
test=# analyze user_tmp.negotiation_topic_archive_test;
ANALYZE

set min_parallel_index_scan_size to 0;
set min_parallel_table_scan_size to 0;
set parallel_tuple_cost to 0;
set parallel_setup_cost to 0;
set debug_parallel_query to on;
explain
select * from user_tmp.negotiation_topic_archive_test this_
where
this_.employer_id=816144;

  QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=0.57..1529820.39 rows=416420 width=8)
   Workers Planned: 1
   Single Copy: true
   ->  Index Scan using negotiation_topic_archive_test_employer_id_key
on negotiation_topic_archive_test this_  (cost=0.57..1529820.39
rows=416420 width=8)
         Index Cond: (employer_id = 816144)


In the same time modified youre test case on the same 16.4 test db -
test case seems ok even with large dataset:
insert into t1 select x/10000 from generate_Series(1,100000000)X;
create index on t1(a);
analyze t1;
set min_parallel_index_scan_size to 0;
set min_parallel_table_scan_size to 0;
set parallel_tuple_cost to 0;
set parallel_setup_cost to 0;
set debug_parallel_query to on;
explain select * from t1 where a=10;
explain select * from t1 where a=10;
                                         QUERY PLAN
---------------------------------------------------------------------------------------------
 Gather  (cost=0.57..4636.40 rows=10786 width=4)
   Workers Planned: 2
   ->  Parallel Index Only Scan using t1_a_idx on t1
(cost=0.57..4636.40 rows=4494 width=4)
         Index Cond: (a = 10)



--
Maxim Boguk
Senior Postgresql DBA

Phone UA: +380 99 143 0000
Phone AU: +61  45 218 5678



Re: BUG #18588: Cannot force/let database use parallel execution in simple case.

From
Andrei Lepikhov
Date:
On 22/8/2024 14:58, PG Bug reporting form wrote:
>   Index Scan using negotiation_topic_archive_p005_employer_id_resume_id_idx
> on negotiation_topic_archive_p005 this_  (cost=0.11..2501972.13 rows=505757
> width=8) (actual time=0.020..699.322 rows=510040 loops=1)
>     Index Cond: (employer_id = 816144)
>   Planning Time: 0.132 ms
>   Execution Time: 717.005 ms
Can you disable IndexScan and Bitmapscan step-by-step and show explain 
analyze verbose with costs?

-- 
regards, Andrei Lepikhov




Ok there are reproducer test case on the stock postgresql 16.4 config:

--good case with small relation
drop table t1;
create table t1 (a int);
insert into t1 select (random()*100)::int from generate_Series(1,1000000)X;
create index on t1(a);
analyze t1;
set min_parallel_index_scan_size to 0;
set min_parallel_table_scan_size to 0;
set parallel_tuple_cost to 0;
set parallel_setup_cost to 0;
set debug_parallel_query to on;
set enable_bitmapscan to 0;
set enable_seqscan to 0;
explain analyze select * from t1 where a<10;

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=0.42..19070.70 rows=91047 width=4) (actual
time=0.527..66.086 rows=95306 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Index Only Scan using t1_a_idx on t1
(cost=0.42..19070.70 rows=37936 width=4) (actual time=0.046..19.293
rows=31769 loops=3)
         Index Cond: (a < 10)
         Heap Fetches: 95306
 Planning Time: 0.227 ms
 Execution Time: 69.944 ms


--bad case with big relation
--let's make relation 100x larger (and supposedly better suited for
parallel scans)
drop table t1;
create table t1 (a int);
insert into t1 select (random()*10000)::int from generate_Series(1,100000000)X;
create index on t1(a);
analyze t1;
set min_parallel_index_scan_size to 0;
set min_parallel_table_scan_size to 0;
set parallel_tuple_cost to 0;
set parallel_setup_cost to 0;
set debug_parallel_query to on;
set enable_bitmapscan to 0;
set enable_seqscan to 0;
explain analyze select * from t1 where a<10;
explain analyze select * from t1 where a<10;
                                                              QUERY
PLAN

---------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=0.57..357316.03 rows=98718 width=4) (actual
time=59.419..651.500 rows=95072 loops=1)
   Workers Planned: 1
   Workers Launched: 1
   Single Copy: true
   ->  Index Only Scan using t1_a_idx on t1  (cost=0.57..357316.03
rows=98718 width=4) (actual time=0.030..585.517 rows=95072 loops=1)
         Index Cond: (a < 10)
         Heap Fetches: 95072
 Planning Time: 0.144 ms
 JIT:
   Functions: 2
   Options: Inlining false, Optimization false, Expressions true, Deforming true
   Timing: Generation 0.396 ms, Inlining 0.000 ms, Optimization 0.000
ms, Emission 0.000 ms, Total 0.396 ms
 Execution Time: 660.359 ms



-- 
Maxim Boguk
Senior Postgresql DBA

Phone UA: +380 99 143 0000
Phone AU: +61  45 218 5678



Maxim Boguk <maxim.boguk@gmail.com> writes:
> Ok there are reproducer test case on the stock postgresql 16.4 config:

Hmm, you can see both behaviors on the small version of t1, just by
varying the comparison constant in the WHERE clause.  For me, it'll
use only one worker with "where a<1", and not parallelize at all
with "where a<0".  It looks like it's deciding that it's not worth
starting workers when too few rows are expected to be returned.  That
would be unsurprising with a normal setting of parallel_setup_cost,
but it does seem odd with parallel_setup_cost=0.

In any case, I think this isn't about the big table being big but
about changing the range of values of "a", which changes the
selectivity of "where a<10" 100-fold.

(I tested on HEAD not v16)

            regards, tom lane





On Thu, Aug 22, 2024 at 10:01 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Maxim Boguk <maxim.boguk@gmail.com> writes:
> Ok there are reproducer test case on the stock postgresql 16.4 config:

Hmm, you can see both behaviors on the small version of t1, just by
varying the comparison constant in the WHERE clause.  For me, it'll
use only one worker with "where a<1", and not parallelize at all
with "where a<0".  It looks like it's deciding that it's not worth
starting workers when too few rows are expected to be returned.  That
would be unsurprising with a normal setting of parallel_setup_cost,
but it does seem odd with parallel_setup_cost=0.

In any case, I think this isn't about the big table being big but
about changing the range of values of "a", which changes the
selectivity of "where a<10" 100-fold.

(I tested on HEAD not v16)

                        regards, tom lane

I especially designed both tests in a way that the expected amount of rows - is the same in both cases. 

With sufficiently large table - I didn't manage find a combinations of settings to force parallel execution (im my real case - the database huge and mostly cold on ssd raid - so parallel execution provides almost linear speedup with amount of parallel workers even on 1000 rows... not talking about 10k-10m rows, but I cannot convince the PostgreSQL use parallel index scan for 2m returned rows case).



--
Maxim Boguk
Senior Postgresql DBA

Phone UA: +380 99 143 0000
Phone AU: +61  45 218 5678

I wrote:
> Hmm, you can see both behaviors on the small version of t1, just by
> varying the comparison constant in the WHERE clause.  For me, it'll
> use only one worker with "where a<1", and not parallelize at all
> with "where a<0".  It looks like it's deciding that it's not worth
> starting workers when too few rows are expected to be returned.  That
> would be unsurprising with a normal setting of parallel_setup_cost,
> but it does seem odd with parallel_setup_cost=0.

Ah, I traced through it, and here's what's happening: at small enough
estimated rowcounts, the parallel and non-parallel plans have fuzzily
the same cost (parallel is a shade cheaper, but only a shade).
Their other properties such as pathkeys are the same too.  So we get
to the tie-breaking logic in add_path, and what breaks the tie is
the difference in parallel safety: the non-parallel plan is marked
parallel_safe and the parallel one (which by this point is a Gather)
is not.

That tie-break rule is not wrong, because preserving parallel safety
can be a good thing when we come to consider use of the path at higher
plan levels.  So I think there's nothing to see here.

            regards, tom lane



Maxim Boguk <maxim.boguk@gmail.com> writes:
> With sufficiently large table - I didn't manage find a combinations of
> settings to force parallel execution (im my real case - the database huge
> and mostly cold on ssd raid - so parallel execution provides almost linear
> speedup with amount of parallel workers even on 1000 rows... not talking
> about 10k-10m rows, but I cannot convince the PostgreSQL use parallel index
> scan for 2m returned rows case).

Well, there may be something going on in your real case that isn't
apparent from this toy example.  Perhaps look for a better example?

            regards, tom lane





On Thu, Aug 22, 2024 at 11:52 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Maxim Boguk <maxim.boguk@gmail.com> writes:
> With sufficiently large table - I didn't manage find a combinations of
> settings to force parallel execution (im my real case - the database huge
> and mostly cold on ssd raid - so parallel execution provides almost linear
> speedup with amount of parallel workers even on 1000 rows... not talking
> about 10k-10m rows, but I cannot convince the PostgreSQL use parallel index
> scan for 2m returned rows case).

Well, there may be something going on in your real case that isn't
apparent from this toy example.  Perhaps look for a better example?

                        regards, tom lane

Just some results from sample table copied from production DB:

postgres=# \c test
You are now connected to database "test" as user "postgres".
test=# set enable_seqscan to off;
SET
test=# set enable_bitmapscan to off;
SET
test=# set jit to off;
SET
test=# select employer_id, count(*) from user_tmp.negotiation_topic_archive_test group by 1 order by 2 desc limit 20;
 employer_id |  count  
-------------+---------
     5516123 | 6213979
        2180 | 2871578
        3529 | 2007333
     9498112 | 1952964
     5120112 | 1475554
       39305 | 1428762
        1740 | 1386494
       78638 | 1318288
     1947314 | 1256758
     1689259 | 1221705
       49357 | 1182208
     9498120 | 1061869
     3036416 | 1044110
          80 |  953178
     1942330 |  952285
        4181 |  921238
        4934 |  903208
        7172 |  825062
       87021 |  805812
        3776 |  773594
(20 rows)

test=# explain analyze  select * from user_tmp.negotiation_topic_archive_test where employer_id=5516123;
                                                                                              QUERY PLAN                                                                                              
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using negotiation_topic_archive_test_employer_id_key on negotiation_topic_archive_test  (cost=0.57..20867791.00 rows=6412935 width=8) (actual time=0.094..10117.874 rows=6213979 loops=1)
   Index Cond: (employer_id = 5516123)
 Planning Time: 0.221 ms
 Execution Time: 10278.577 ms
(4 rows)

test=# explain analyze  select * from user_tmp.negotiation_topic_archive_test where employer_id=9498120;
                                                                                             QUERY PLAN                                                                                            
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using negotiation_topic_archive_test_employer_id_key on negotiation_topic_archive_test  (cost=0.57..3684011.82 rows=1082703 width=8) (actual time=0.165..4350.693 rows=1061869 loops=1)
   Index Cond: (employer_id = 9498120)
 Planning Time: 0.129 ms
 Execution Time: 4385.463 ms
(4 rows)

test=# explain analyze  select * from user_tmp.negotiation_topic_archive_test where employer_id=3776;
                                                                                            QUERY PLAN                                                                                            
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using negotiation_topic_archive_test_employer_id_key on negotiation_topic_archive_test  (cost=0.57..2341528.08 rows=666279 width=8) (actual time=0.152..3491.172 rows=773594 loops=1)
   Index Cond: (employer_id = 3776)
 Planning Time: 0.127 ms
 Execution Time: 3517.418 ms
(4 rows)

test=# set min_parallel_index_scan_size to 0;
SET
test=# set min_parallel_table_scan_size to 0;
SET
test=# set parallel_tuple_cost to 0;
SET
test=# set parallel_setup_cost to 0;
SET
test=# set debug_parallel_query to on;
SET
test=# explain analyze  select * from user_tmp.negotiation_topic_archive_test where employer_id=5516123;
                                                                                                QUERY PLAN                                                                                                
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=0.57..20867791.00 rows=6412935 width=8) (actual time=12.920..8932.146 rows=6213979 loops=1)
   Workers Planned: 1
   Workers Launched: 1
   Single Copy: true
   ->  Index Scan using negotiation_topic_archive_test_employer_id_key on negotiation_topic_archive_test  (cost=0.57..20867791.00 rows=6412935 width=8) (actual time=0.185..8814.097 rows=6213979 loops=1)
         Index Cond: (employer_id = 5516123)
 Planning Time: 0.128 ms
 Execution Time: 9424.556 ms
(8 rows)

test=# explain analyze  select * from user_tmp.negotiation_topic_archive_test where employer_id=9498120;
                                                                                                QUERY PLAN                                                                                                
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=0.57..3684011.82 rows=1082703 width=8) (actual time=14.698..3837.904 rows=1061869 loops=1)
   Workers Planned: 1
   Workers Launched: 1
   Single Copy: true
   ->  Index Scan using negotiation_topic_archive_test_employer_id_key on negotiation_topic_archive_test  (cost=0.57..3684011.82 rows=1082703 width=8) (actual time=0.179..3808.156 rows=1061869 loops=1)
         Index Cond: (employer_id = 9498120)
 Planning Time: 0.130 ms
 Execution Time: 3933.561 ms
(8 rows)

test=# explain analyze  select * from user_tmp.negotiation_topic_archive_test where employer_id=3776;
                                                                                               QUERY PLAN                                                                                              
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=0.57..2341528.08 rows=666279 width=8) (actual time=14.010..3093.991 rows=773594 loops=1)
   Workers Planned: 1
   Workers Launched: 1
   Single Copy: true
   ->  Index Scan using negotiation_topic_archive_test_employer_id_key on negotiation_topic_archive_test  (cost=0.57..2341528.08 rows=666279 width=8) (actual time=0.119..3067.923 rows=773594 loops=1)
         Index Cond: (employer_id = 3776)
 Planning Time: 0.127 ms
 Execution Time: 3163.729 ms
(8 rows)

No amount of *_cost changes - provides any benefit
(not even extreme values such as set random_page_cost to 110; + set seq_page_cost to 100; which I expected should make parallel plan way cheaper with parallelized disk access).

I feel this is the same issue as you described but in case of IO latency bound database (which is most of huge datasets over EBS or local ssd storage) - results seem suboptimal.



--
Maxim Boguk
Senior Postgresql DBA

Phone UA: +380 99 143 0000
Phone AU: +61  45 218 5678



On Thu, Aug 22, 2024 at 11:52 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Maxim Boguk <maxim.boguk@gmail.com> writes:
> With sufficiently large table - I didn't manage find a combinations of
> settings to force parallel execution (im my real case - the database huge
> and mostly cold on ssd raid - so parallel execution provides almost linear
> speedup with amount of parallel workers even on 1000 rows... not talking
> about 10k-10m rows, but I cannot convince the PostgreSQL use parallel index
> scan for 2m returned rows case).

Well, there may be something going on in your real case that isn't
apparent from this toy example.  Perhaps look for a better example?

                        regards, tom lane

On the production DB even in fully cached case performance difference 4х
in parallel vs single process mode

single process 3s
explain (analyze, costs, buffers, timing) select * from user_tmp.negotiation_topic_archive_test this_ where employer_id = 5516123;
                                                                                                QUERY PLAN                                                                                                
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using negotiation_topic_archive_test_employer_id_key on negotiation_topic_archive_test this_  (cost=0.11..11171373.49 rows=6446264 width=8) (actual time=0.020..2823.467 rows=6213979 loops=1)
   Index Cond: (employer_id = 5516123)
   Buffers: shared hit=1783337
 Planning Time: 0.054 ms
 Execution Time: 3069.555 ms

parallel (forced adding unrelated antijoin so with some extra work):
 explain (analyze, costs, buffers, timing) select * from user_tmp.negotiation_topic_archive_test this_ where employer_id = 5516123 AND NOT EXISTS (
        SELECT FROM applicant_black_list
        WHERE this_.employer_id=applicant_black_list.employer_id
        AND this_.applicant_id=applicant_black_list.applicant_id
);
                                                                                                         QUERY PLAN                                                                                                          
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=10.27..11123882.70 rows=6446241 width=8) (actual time=0.835..468.214 rows=6213979 loops=1)
   Workers Planned: 8
   Workers Launched: 8
   Buffers: shared hit=1787471
   ->  Parallel Hash Anti Join  (cost=10.27..11123882.70 rows=805780 width=8) (actual time=0.164..484.239 rows=690442 loops=9)
         Hash Cond: ((this_.employer_id = applicant_black_list.employer_id) AND (this_.applicant_id = applicant_black_list.applicant_id))
         Buffers: shared hit=1787471
         ->  Parallel Index Scan using negotiation_topic_archive_test_employer_id_key on negotiation_topic_archive_test this_  (cost=0.11..11114968.68 rows=805783 width=8) (actual time=0.031..352.918 rows=690442 loops=9)
               Index Cond: (employer_id = 5516123)
               Buffers: shared hit=1787076
         ->  Parallel Hash  (cost=10.13..10.13 rows=2 width=8) (actual time=0.022..0.023 rows=0 loops=9)
               Buckets: 1024  Batches: 1  Memory Usage: 8kB
               Buffers: shared hit=3
               ->  Parallel Index Only Scan using employer_to_user_index on applicant_black_list  (cost=0.08..10.13 rows=2 width=8) (actual time=0.009..0.009 rows=0 loops=1)
                     Index Cond: (employer_id = 5516123)
                     Heap Fetches: 0
                     Buffers: shared hit=3
 Planning:
   Buffers: shared hit=4
 Planning Time: 0.164 ms
 Execution Time: 711.817 ms


Once there is some IO involved - speed difference is usually close to 8x and so far I didn't find a way
to force a parallel plan except adding cheap unrelated anti-join.



--
Maxim Boguk
Senior Postgresql DBA

Phone UA: +380 99 143 0000
Phone AU: +61  45 218 5678

David Rowley <dgrowleyml@gmail.com> writes:
> I played around with the attached script and set some breakpoints in
> cost_index(). I'm seeing the same thing as you with the parallel path
> being only slightly cheaper, but when looking at cost_index(), it's
> easy to see why.

> It's only the cpu_run_cost that's divided by the parallel_divisor.

D'oh.

What Maxim is observing is that he's getting I/O concurrency from
multiple workers; and that's an effect we're simply not accounting
for here.  cost_seqscan excuses the same omission with

         * It may be possible to amortize some of the I/O cost, but probably
         * not very much, because most operating systems already do aggressive
         * prefetching.  For now, we assume that the disk run cost can't be
         * amortized at all.

but I don't think this prefetch argument applies to index scans.

            regards, tom lane



On Fri, 23 Aug 2024 at 10:52, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> What Maxim is observing is that he's getting I/O concurrency from
> multiple workers; and that's an effect we're simply not accounting
> for here.  cost_seqscan excuses the same omission with
>
>          * It may be possible to amortize some of the I/O cost, but probably
>          * not very much, because most operating systems already do aggressive
>          * prefetching.  For now, we assume that the disk run cost can't be
>          * amortized at all.
>
> but I don't think this prefetch argument applies to index scans.

I don't think it does either. The problem is worse for index scans too
as the I/O costs are likely to be higher due to a larger portion of
reads being costed with random_page_cost.

I had a look in src/backend/optimizer/README to see if we've defined
what parallel plan costs mean and I didn't see anything under the
"Parallel Query and Partial Paths" heading. I don't see anything in
the header comment to struct Path either. I was hoping we could fall
back on some guidance as to what they mean.  The way parallel query
works today means that we could have 10 workers do 9x more work and
the plan could come up 10% cheaper than the equivalent serial plan.
It's really just the sanity of the selected number of workers that
protects plans from becoming too resource inefficient.

Since we don't normally backpatch cost changes, I think Maxim might
need to try tweaking effective_cache_size. Maybe we can revisit which
costs we divide by the worker count in master.

David



David Rowley <dgrowleyml@gmail.com> writes:
> On Fri, 23 Aug 2024 at 10:52, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> What Maxim is observing is that he's getting I/O concurrency from
>> multiple workers; and that's an effect we're simply not accounting
>> for here.

> I had a look in src/backend/optimizer/README to see if we've defined
> what parallel plan costs mean and I didn't see anything under the
> "Parallel Query and Partial Paths" heading. I don't see anything in
> the header comment to struct Path either.

Yeah, it'd be good to have some clarity about that.  The whole
notion of dividing cost by number of workers seems a little squishy,
because we certainly aren't doing less work by having more workers.
I guess parallel query has redefined cost as being some measure of
elapsed time, which isn't unreasonable but it ought to be written
down somewhere.  Also, if we're defining it like that, how do we
deal with overlap between I/O and computation?

            regards, tom lane




> .. I think Maxim might need to try tweaking effective_cache_size. ..

I'm right now doing some troubleshooting for Bitmap Index Scan vs Index
Scan and effective_cache_size caught my attention. I want to take this
chance to confirm my understanding about this parameter.

1. By design, effective_cache_size should not less than shared_buffer
since file-system cache should be considered as well.
2. Mackert and Lohman's algorithm in index_pages_fetched looks doesn't
consider the number of concurrency? so if we have N sessions to access N
different small relations, but sum_of_n_relations is larger than
effective_cache_size, would the index_pages_fetched figure out a much
smaller value than the fact?

-- 
Best Regards
Andy Fan




On Fri, Aug 23, 2024 at 1:44 AM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Fri, 23 Aug 2024 at 08:42, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Ah, I traced through it, and here's what's happening: at small enough
> > estimated rowcounts, the parallel and non-parallel plans have fuzzily
> > the same cost (parallel is a shade cheaper, but only a shade).
> > Their other properties such as pathkeys are the same too.  So we get
> > to the tie-breaking logic in add_path, and what breaks the tie is
> > the difference in parallel safety: the non-parallel plan is marked
> > parallel_safe and the parallel one (which by this point is a Gather)
> > is not.
>
> I played around with the attached script and set some breakpoints in
> cost_index(). I'm seeing the same thing as you with the parallel path
> being only slightly cheaper, but when looking at cost_index(), it's
> easy to see why.
>
> It's only the cpu_run_cost that's divided by the parallel_divisor.  In
> this case, cpu_run_cost is just 7852.89 for the parallel path and the
> parallel_divisor is 2.4. The run_cost is not divided and is much
> higher at 501792, so dividing the CPU cost does not save much. Just a
> few thousand in half a million, which is why the plans are fuzzily the
> same cost.
>
> If I make the cpu_tuple_cost 0.02 instead of 0.01, I get the parallel
> plan. Possibly increasing effective_cache_size would be the best way
> for Maxim to get the parallel plan. I wonder if that's just left at
> the default 4GB... Not many people tune that.
>
> David

Thank you David about hint for cpu_tuple_cost
after some fiddling I found a sweep spot of cpu_tuple_cost for my
particular case when required query start using parallel plans and no
visible slowdown of other queries
in my case it  cpu_tuple_cost=0.05

PS: After 25 years with PostgreSQL, correct tuning of *_costs
parameter - is still somewhere between black magic and shamanic ritual
of talking with the spirits of ancestors for me
and in border cases (like this one) performed via trial and error.

PPS: effective_cache_size is set to 75% of RAM (=shared buffer
size=298GB) so future increase of it seems unlikely to help.

--
Maxim Boguk
Senior Postgresql DBA

Phone UA: +380 99 143 0000
Phone AU: +61  45 218 5678