Thread: parallel index creation: maintenance_work_mem not honored?

parallel index creation: maintenance_work_mem not honored?

From

Fabio Pardi

Date:

20 June 2022, 09:42:44

Hi all,

I noticed that after an upgrade from 9.6 to 14.x, Postgres started producing temp files.

I narrowed down the behavior to parallel operations.

I was able to reproduce the following on a test machine running latest Postgres 14.4 on Ubuntu 18.04

CREATE TABLE test_parallel (i int);
INSERT INTO test_parallel VALUES (generate_series(0,10000000));

table size: 346 MB
maintenance_work_mem = '5GB' (but the same would happen if we set much higher values)

CREATE INDEX test_parallel_i_idx ON test_parallel (i);

... LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp4611.1.sharedfileset/2.0", size 37879808
... STATEMENT: CREATE INDEX test_parallel_i_idx ON test_parallel (i);
... LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp4611.1.sharedfileset/0.0", size 59326464
... STATEMENT: CREATE INDEX test_parallel_i_idx ON test_parallel (i);
... LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp4611.1.sharedfileset/1.0", size 103194624
... STATEMENT: CREATE INDEX test_parallel_i_idx ON test_parallel (i);

After disabling parallelism with:

ALTER TABLE test_parallel SET (parallel_workers = 0);

The same 'CREATE INDEX' statement does not produce temp files.

Is this behavior expected? I cannot find mentions on the documentation nor on the literature available to me.

regards,

fabio pardi

Re: parallel index creation: maintenance_work_mem not honored?

From

Peter Geoghegan

Date:

20 June 2022, 14:47:21

On Mon, Jun 20, 2022 at 2:42 AM Fabio Pardi <f.pardi@portavita.eu> wrote:
> Is this behavior expected? I cannot find mentions on the documentation nor on the literature available to me.

Yes, this is expected. Parallel CREATE INDEX performs its parallel
sort by having workers generate sorted runs, which are written to temp
files, and then having the leader process merge the sorted runs
together as the index is built.

In the case where there is amble maintenance_work_mem, all writes to
and reads from temp files will be sequential.

-- 
Peter Geoghegan

Re: parallel index creation: maintenance_work_mem not honored?

From

Fabio Pardi

Date:

21 June 2022, 05:38:17

Thanks Peter for your answer,

On 20/06/2022 17:47, Peter Geoghegan wrote:

Yes, this is expected. Parallel CREATE INDEX performs its parallel
sort by having workers generate sorted runs, which are written to temp
files, and then having the leader process merge the sorted runs
together as the index is built.

Just to make sure we are on the same page: every parallel 'CREATE INDEX' writes to temp files, ok.

But why the temp files are always on disk and not in RAM as other operations do?

In the case where there is amble maintenance_work_mem, all writes to
and reads from temp files will be sequential.

Re: parallel index creation: maintenance_work_mem not honored?

From

Peter Geoghegan

Date:

21 June 2022, 05:46:52

On Mon, Jun 20, 2022 at 10:38 PM Fabio Pardi <f.pardi@portavita.eu> wrote:
> Just to make sure we are on the same page: every parallel 'CREATE INDEX' writes to temp files, ok.

Yes.

> But why the temp files are always on disk and not in RAM as other operations do?

Why not?

It is probably true that it would be better to use shared memory in
the case where there is sufficient memory, but it's not all that
significant compared to everything else that must happen at the same
time. It's rarely the bottleneck. External sorts are often faster than
internal sorts. That happens because they can be more CPU cache
friendly, for reasons that aren't fundamental.

> In the case where there is amble maintenance_work_mem, all writes to
> and reads from temp files will be sequential.

Yes.

-- 
Peter Geoghegan

Re: parallel index creation: maintenance_work_mem not honored?

From

Fabio Pardi

Date:

22 June 2022, 06:15:34

Thanks for the clarification Peter,

On 21/06/2022 08:46, Peter Geoghegan wrote:
>
>> But why the temp files are always on disk and not in RAM as other operations do?
> Why not?

Systems running on slow disks will probably suffer from workers writing temp files to disk.

> It is probably true that it would be better to use shared memory in
> the case where there is sufficient memory, but it's not all that
> significant compared to everything else that must happen at the same
> time. It's rarely the bottleneck. External sorts are often faster than
> internal sorts. That happens because they can be more CPU cache
> friendly, for reasons that aren't fundamental.
>


I would expect this behavior to be mentioned in the documentation.




regards,

fabio pardi

Re: parallel index creation: maintenance_work_mem not honored?

From

MichaelDBA

Date:

25 June 2022, 16:37:14

Hi all,

Not quite following this line of logic... If "External sorts are often faster than internal sorts", why bother with increasing work_mem to avoid sorting on disk?

Regards,
Michael Vitale

Fabio Pardi wrote on 6/22/2022 2:15 AM:

Thanks for the clarification Peter,

On 21/06/2022 08:46, Peter Geoghegan wrote:

But why the temp files are always on disk and not in RAM as other operations do?
Why not?

Systems running on slow disks will probably suffer from workers writing temp files to disk.

It is probably true that it would be better to use shared memory in
the case where there is sufficient memory, but it's not all that
significant compared to everything else that must happen at the same
time. It's rarely the bottleneck. External sorts are often faster than
internal sorts. That happens because they can be more CPU cache
friendly, for reasons that aren't fundamental.

I would expect this behavior to be mentioned in the documentation.

regards,

fabio pardi

Regards,

Michael Vitale

Michaeldba@sqlexec.com

703-600-9343

Attachment

pgadvanced3.jpg

Re: parallel index creation: maintenance_work_mem not honored?

From

Peter Geoghegan

Date:

25 June 2022, 17:17:03

On Sat, Jun 25, 2022 at 9:37 AM MichaelDBA <MichaelDBA@sqlexec.com> wrote:
> If "External sorts are often faster than internal sorts", why bother with increasing work_mem to avoid sorting on
disk?

They're faster despite spilling to disk -- not because of it. This
effect is quite visible at times, though usually only with large
sorts. It's noticeable, but not huge. And it depends on things like
the datatypes being sorted, to a degree. (My original point was that
external sorts shouldn't be assumed to be significantly slower or
significantly less efficient than equivalent large internal sorts,
where CPU costs tend to dominate either way.)

Only an external sort can do a final on-the-fly merge, where tuplesort
outputs tuples in their final sorted order before the entire input has
been fully sorted. It's not uncommon for that to more than make up for
any latency from writing out temp files. The added latency is usually
not noticeable because we also need to read from random locations in
memory to gather tuples to output to temp files. That can totally hide
any I/O latency in many important cases.

In principle you could get the same benefit with an internal sort,
without any I/O for temp files -- nothing makes it fundamentally
impossible to teach tuplesort how to do a similar final on-the-fly
strategy. But that's tricky for a variety of reasons, mostly related
to the fact that the final size of the input cannot easily be
predicted. That's why it hasn't been implemented.

--
Peter Geoghegan