Thread: [PATCH] pg_dump: lock tables in batches

[PATCH] pg_dump: lock tables in batches

From

Aleksander Alekseev

Date:

07 December 2022, 15:08:45

Hi hackers,

A colleague of mine reported a slight inconvenience with pg_dump.

He is dumping the data from a remote server. There are several
thousands of tables in the database. Making a dump locally and/or
using pg_basebackup and/or logical replication is not an option. So
what pg_dump currently does is sending LOCK TABLE queries one after
another. Every query needs an extra round trip. So if we have let's
say 2000 tables and every round trip takes 100 ms then ~3.5 minutes
are spent in the not most useful way.

What he proposes is taking the locks in batches. I.e. instead of:

LOCK TABLE foo IN ACCESS SHARE MODE;
LOCK TABLE bar IN ACCESS SHARE MODE;

do:

LOCK TABLE foo, bar, ... IN ACCESS SHARE MODE;

The proposed patch makes this change. It's pretty straightforward and
as a side effect saves a bit of network traffic too.

Thoughts?

-- 
Best regards,
Aleksander Alekseev

Attachment

v1-0001-pg_dump-lock-tables-in-batches.patch

Re: [PATCH] pg_dump: lock tables in batches

From

Fabrízio de Royes Mello

Date:

07 December 2022, 15:30:48

On Wed, Dec 7, 2022 at 12:09 PM Aleksander Alekseev <aleksander@timescale.com> wrote:
>
> Hi hackers,
>
> A colleague of mine reported a slight inconvenience with pg_dump.
>
> He is dumping the data from a remote server. There are several
> thousands of tables in the database. Making a dump locally and/or
> using pg_basebackup and/or logical replication is not an option. So
> what pg_dump currently does is sending LOCK TABLE queries one after
> another. Every query needs an extra round trip. So if we have let's
> say 2000 tables and every round trip takes 100 ms then ~3.5 minutes
> are spent in the not most useful way.
>
> What he proposes is taking the locks in batches. I.e. instead of:
>
> LOCK TABLE foo IN ACCESS SHARE MODE;
> LOCK TABLE bar IN ACCESS SHARE MODE;
>
> do:
>
> LOCK TABLE foo, bar, ... IN ACCESS SHARE MODE;
>
> The proposed patch makes this change. It's pretty straightforward and
> as a side effect saves a bit of network traffic too.
>

+1 for that change. It will improve the dump for databases with thousands of relations.

The code LGTM and it passes in all tests and compiles without any warning.

Regards,

Fabrízio de Royes Mello

Re: [PATCH] pg_dump: lock tables in batches

From

Tom Lane

Date:

07 December 2022, 15:44:33

Aleksander Alekseev <aleksander@timescale.com> writes:
> What he proposes is taking the locks in batches.

I have a strong sense of deja vu here.  I'm pretty sure I experimented
with this idea last year and gave up on it.  I don't recall exactly
why, but either it didn't show any meaningful performance improvement
for me or there was some actual downside (that I'm not remembering
right now).

This would've been in the leadup to 989596152 and adjacent commits.
I took a quick look through the threads cited in those commit messages
and didn't find anything about it, but I think the discussion had
gotten scattered across more threads.  Some digging in the archives
could be useful.

            regards, tom lane

Re: [PATCH] pg_dump: lock tables in batches

From

Andres Freund

Date:

07 December 2022, 17:08:48

Hi,

On 2022-12-07 10:44:33 -0500, Tom Lane wrote:
> Aleksander Alekseev <aleksander@timescale.com> writes:
> > What he proposes is taking the locks in batches.
> 
> I have a strong sense of deja vu here.  I'm pretty sure I experimented
> with this idea last year and gave up on it.  I don't recall exactly
> why, but either it didn't show any meaningful performance improvement
> for me or there was some actual downside (that I'm not remembering
> right now).

> This would've been in the leadup to 989596152 and adjacent commits.
> I took a quick look through the threads cited in those commit messages
> and didn't find anything about it, but I think the discussion had
> gotten scattered across more threads.  Some digging in the archives
> could be useful.

IIRC the case we were looking at around 989596152 were CPU bound workloads,
rather than latency bound workloads. It'd not be surprising to have cases
where batching LOCKs helps latency, but not CPU bound.


I wonder if "manual" batching is the best answer. Alexander, have you
considered using libpq level pipelining?

Greetings,

Andres Freund

Re: [PATCH] pg_dump: lock tables in batches

From

Tom Lane

Date:

07 December 2022, 17:28:03

Andres Freund <andres@anarazel.de> writes:
> On 2022-12-07 10:44:33 -0500, Tom Lane wrote:
>> I have a strong sense of deja vu here.  I'm pretty sure I experimented
>> with this idea last year and gave up on it.  I don't recall exactly
>> why, but either it didn't show any meaningful performance improvement
>> for me or there was some actual downside (that I'm not remembering
>> right now).

> IIRC the case we were looking at around 989596152 were CPU bound workloads,
> rather than latency bound workloads. It'd not be surprising to have cases
> where batching LOCKs helps latency, but not CPU bound.

Yeah, perhaps.  Anyway my main point is that I don't want to just assume
this is a win; I want to see some actual performance tests.

> I wonder if "manual" batching is the best answer. Alexander, have you
> considered using libpq level pipelining?

I'd be a bit nervous about how well that works with older servers.

            regards, tom lane

Re: [PATCH] pg_dump: lock tables in batches

From

Andres Freund

Date:

07 December 2022, 17:44:39

Hi,

On 2022-12-07 12:28:03 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2022-12-07 10:44:33 -0500, Tom Lane wrote:
> >> I have a strong sense of deja vu here.  I'm pretty sure I experimented
> >> with this idea last year and gave up on it.  I don't recall exactly
> >> why, but either it didn't show any meaningful performance improvement
> >> for me or there was some actual downside (that I'm not remembering
> >> right now).
> 
> > IIRC the case we were looking at around 989596152 were CPU bound workloads,
> > rather than latency bound workloads. It'd not be surprising to have cases
> > where batching LOCKs helps latency, but not CPU bound.
> 
> Yeah, perhaps.  Anyway my main point is that I don't want to just assume
> this is a win; I want to see some actual performance tests.

FWIW, one can simulate network latency with 'netem' on linux. Works even for
'lo'.

ping -c 3 -n localhost

64 bytes from ::1: icmp_seq=1 ttl=64 time=0.035 ms
64 bytes from ::1: icmp_seq=2 ttl=64 time=0.049 ms
64 bytes from ::1: icmp_seq=3 ttl=64 time=0.043 ms

tc qdisc add dev lo root netem delay 10ms

64 bytes from ::1: icmp_seq=1 ttl=64 time=20.1 ms
64 bytes from ::1: icmp_seq=2 ttl=64 time=20.2 ms
64 bytes from ::1: icmp_seq=3 ttl=64 time=20.2 ms

tc qdisc delete dev lo root netem

64 bytes from ::1: icmp_seq=1 ttl=64 time=0.036 ms
64 bytes from ::1: icmp_seq=2 ttl=64 time=0.047 ms
64 bytes from ::1: icmp_seq=3 ttl=64 time=0.050 ms


> > I wonder if "manual" batching is the best answer. Alexander, have you
> > considered using libpq level pipelining?
> 
> I'd be a bit nervous about how well that works with older servers.

I don't think there should be any problem - E.g. pgjdbc has been using
pipelining for ages.

Not sure if it's the right answer, just to be clear. I suspect that eventually
we're going to need to have a special "acquire pg_dump locks" function that is
cheaper than retail lock acquisition and perhaps deals more gracefully with
exceeding max_locks_per_transaction. Which would presumably not be pipelined.

Greetings,

Andres Freund

Re: [PATCH] pg_dump: lock tables in batches

From

Fabrízio de Royes Mello

Date:

07 December 2022, 21:14:01

On Wed, Dec 7, 2022 at 2:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Andres Freund <andres@anarazel.de> writes:
> > On 2022-12-07 10:44:33 -0500, Tom Lane wrote:
> >> I have a strong sense of deja vu here. I'm pretty sure I experimented
> >> with this idea last year and gave up on it. I don't recall exactly
> >> why, but either it didn't show any meaningful performance improvement
> >> for me or there was some actual downside (that I'm not remembering
> >> right now).
>
> > IIRC the case we were looking at around 989596152 were CPU bound workloads,
> > rather than latency bound workloads. It'd not be surprising to have cases
> > where batching LOCKs helps latency, but not CPU bound.
>
> Yeah, perhaps. Anyway my main point is that I don't want to just assume
> this is a win; I want to see some actual performance tests.
>

Here we have some numbers about the Aleksander's patch:

1) Setup script

CREATE DATABASE t1000;
CREATE DATABASE t10000;
CREATE DATABASE t100000;

\c t1000
SELECT format('CREATE TABLE t%s(c1 INTEGER PRIMARY KEY, c2 TEXT, c3 TIMESTAMPTZ);', i) FROM generate_series(1, 1000) AS i \gexec

\c t10000
SELECT format('CREATE TABLE t%s(c1 INTEGER PRIMARY KEY, c2 TEXT, c3 TIMESTAMPTZ);', i) FROM generate_series(1, 10000) AS i \gexec

\c t100000
SELECT format('CREATE TABLE t%s(c1 INTEGER PRIMARY KEY, c2 TEXT, c3 TIMESTAMPTZ);', i) FROM generate_series(1, 100000) AS i \gexec

2) Execution script

time pg_dump -s t1000 > /dev/null
time pg_dump -s t10000 > /dev/null
time pg_dump -s t100000 > /dev/null

3) HEAD execution

$ time pg_dump -s t1000 > /dev/null
0.02user 0.01system 0:00.36elapsed 8%CPU (0avgtext+0avgdata 11680maxresident)k
0inputs+0outputs (0major+1883minor)pagefaults 0swaps

$ time pg_dump -s t10000 > /dev/null
0.30user 0.10system 0:05.04elapsed 8%CPU (0avgtext+0avgdata 57772maxresident)k
0inputs+0outputs (0major+14042minor)pagefaults 0swaps

$ time pg_dump -s t100000 > /dev/null
3.42user 2.13system 7:50.09elapsed 1%CPU (0avgtext+0avgdata 517900maxresident)k
0inputs+0outputs (0major+134636minor)pagefaults 0swaps

4) PATCH execution

$ time pg_dump -s t1000 > /dev/null
0.02user 0.00system 0:00.28elapsed 9%CPU (0avgtext+0avgdata 11700maxresident)k
0inputs+0outputs (0major+1886minor)pagefaults 0swaps

$ time pg_dump -s t10000 > /dev/null
0.18user 0.03system 0:02.17elapsed 10%CPU (0avgtext+0avgdata 57592maxresident)k
0inputs+0outputs (0major+14072minor)pagefaults 0swaps

$ time pg_dump -s t100000 > /dev/null
1.97user 0.32system 0:21.39elapsed 10%CPU (0avgtext+0avgdata 517932maxresident)k
0inputs+0outputs (0major+134892minor)pagefaults 0swaps

5) Summary

HEAD patch
1k tables 0:00.36 0:00.28
10k tables 0:05.04 0:02.17
100k tables 7:50.09 0:21.39

Seems we get very good performance gain using Aleksander's patch. I used the "-s" to not waste time issuing COPY for each relation (even all is empty) and evidence the difference due to roundtrip for LOCK TABLE. This patch will also improve the pg_upgrade execution over database with thousands of relations.

Regards,

--
Fabrízio de Royes Mello

Re: [PATCH] pg_dump: lock tables in batches

From

Andres Freund

Date:

07 December 2022, 21:32:42

Hi,

On 2022-12-07 18:14:01 -0300, Fabrízio de Royes Mello wrote:
> Here we have some numbers about the Aleksander's patch:

Hm. Were they taken in an assertion enabled build or such? Just testing the
t10000 case on HEAD, I get 0:01.23 elapsed for an unpatched pg_dump in an
optimized build. And that's on a machine with not all that fast cores.

Greetings,

Andres Freund

Re: [PATCH] pg_dump: lock tables in batches

From

Andres Freund

Date:

07 December 2022, 22:43:57

Hi,

On 2022-12-07 13:32:42 -0800, Andres Freund wrote:
> On 2022-12-07 18:14:01 -0300, Fabrízio de Royes Mello wrote:
> > Here we have some numbers about the Aleksander's patch:
> 
> Hm. Were they taken in an assertion enabled build or such? Just testing the
> t10000 case on HEAD, I get 0:01.23 elapsed for an unpatched pg_dump in an
> optimized build. And that's on a machine with not all that fast cores.

Comparing the overhead on the server side.

CREATE OR REPLACE FUNCTION exec(v_sql text) RETURNS text LANGUAGE plpgsql AS $$BEGIN EXECUTE v_sql;RETURN
v_sql;END;$$;

Acquire locks in separate statements, three times:

SELECT exec(string_agg(format('LOCK t%s;', i), '')) FROM generate_series(1, 10000) AS i;
1267.909 ms
1116.008 ms
1108.383 ms

Acquire all locks in a single statement, three times:
SELECT exec('LOCK '||string_agg(format('t%s', i), ', ')) FROM generate_series(1, 10000) AS i;
1210.732 ms
1101.390 ms
1105.331 ms

So there's some performance difference that's independent of the avoided
roundtrips - but it's pretty small.

With an artificial delay of 100ms, the perf difference between the batching
patch and not using the batching patch is huge. Huge enough that I don't have
the patience to wait for the non-batched case to complete.

With batching pg_dump -s -h localhost t10000 took 0:16.23 elapsed, without I
cancelled after 603 tables had been locked, which took 2:06.43.

This made me try out using pipeline mode. Seems to work nicely from what I can
tell. The timings are a tad slower than the "manual" batch mode - I think
that's largely due to the extended protocol just being overcomplicated.

Greetings,

Andres Freund

Re: [PATCH] pg_dump: lock tables in batches

From

Tom Lane

Date:

07 December 2022, 22:53:05

Andres Freund <andres@anarazel.de> writes:
> With an artificial delay of 100ms, the perf difference between the batching
> patch and not using the batching patch is huge. Huge enough that I don't have
> the patience to wait for the non-batched case to complete.

Clearly, if you insert a sufficiently large artificial round-trip delay,
even squeezing a single command out of a pg_dump run will appear
worthwhile.  What I'm unsure about is whether it's worthwhile at
realistic round-trip delays (where "realistic" means that the dump
performance would otherwise be acceptable).  I think the reason I didn't
pursue this last year is that experimentation convinced me the answer
was "no".

> With batching pg_dump -s -h localhost t10000 took 0:16.23 elapsed, without I
> cancelled after 603 tables had been locked, which took 2:06.43.

Is "-s" mode actually a relevant criterion here?  With per-table COPY
commands added into the mix you could not possibly get better than 2x
improvement, and likely a good deal less.

            regards, tom lane

Re: [PATCH] pg_dump: lock tables in batches

From

Jacob Champion

Date:

07 December 2022, 23:13:43

On Wed, Dec 7, 2022 at 2:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Is "-s" mode actually a relevant criterion here?  With per-table COPY
> commands added into the mix you could not possibly get better than 2x
> improvement, and likely a good deal less.

Don't we hit this code path in pg_upgrade? You won't see huge
round-trip times, of course, but you also won't see COPY.

Absolute performance aside, isn't there another concern that, the
longer it takes for us to lock the tables, the bigger the window there
is for someone to interfere with them between our catalog query and
the lock itself?

--Jacob

Re: [PATCH] pg_dump: lock tables in batches

From

Andres Freund

Date:

07 December 2022, 23:45:05

Hi,

On 2022-12-07 17:53:05 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > With an artificial delay of 100ms, the perf difference between the batching
> > patch and not using the batching patch is huge. Huge enough that I don't have
> > the patience to wait for the non-batched case to complete.
> 
> Clearly, if you insert a sufficiently large artificial round-trip delay,
> even squeezing a single command out of a pg_dump run will appear
> worthwhile.  What I'm unsure about is whether it's worthwhile at
> realistic round-trip delays (where "realistic" means that the dump
> performance would otherwise be acceptable).  I think the reason I didn't
> pursue this last year is that experimentation convinced me the answer
> was "no".

It seems to be a win even without any artificial delay. Not a *huge* win, but
a noticable win. And even just a few ms make it quite painful.


> > With batching pg_dump -s -h localhost t10000 took 0:16.23 elapsed, without I
> > cancelled after 603 tables had been locked, which took 2:06.43.
> 
> Is "-s" mode actually a relevant criterion here?  With per-table COPY
> commands added into the mix you could not possibly get better than 2x
> improvement, and likely a good deal less.

Well, -s isn't something used all that rarely, so it'd not be insane to
optimize it in isolation. But more importantly, I think the potential win
without -s is far bigger than 2x, because the COPYs can be done in parallel,
whereas the locking happens in the non-parallel stage.

With just a 5ms delay, very well within normal network latency range, I get:

pg_dump.master -h localhost -j10 -f /tmp/pg_dump_backup -Fd t10000
2m7.830s

pg_dump.pipeline -h localhost -j10 -f /tmp/pg_dump_backup -Fd t10000
0m24.183s

pg_dump.batch -h localhost -j10 -f /tmp/pg_dump_backup -Fd t10000
0m24.321s

Greetings,

Andres Freund

Re: [PATCH] pg_dump: lock tables in batches

From

Tom Lane

Date:

08 December 2022, 00:03:00

Andres Freund <andres@anarazel.de> writes:
> On 2022-12-07 17:53:05 -0500, Tom Lane wrote:
>> Is "-s" mode actually a relevant criterion here?  With per-table COPY
>> commands added into the mix you could not possibly get better than 2x
>> improvement, and likely a good deal less.

> Well, -s isn't something used all that rarely, so it'd not be insane to
> optimize it in isolation. But more importantly, I think the potential win
> without -s is far bigger than 2x, because the COPYs can be done in parallel,
> whereas the locking happens in the non-parallel stage.

True, and there's the reduce-the-lock-window issue that Jacob mentioned.

> With just a 5ms delay, very well within normal network latency range, I get:
> [ a nice win ]

OK.  I'm struggling to figure out why I rejected this idea last year.
I know that I thought about it and I'm fairly certain I actually
tested it.  Maybe I only tried it with near-zero-latency local
loopback; but I doubt that, because the potential for network
latency was certainly a factor in that whole discussion.

One idea is that I might've tried it before getting rid of all the
other per-object queries, at which point it wouldn't have stood out
quite so much.  But I'm just guessing.  I have a nagging feeling
there was something else.

Oh well, I guess we can always revert it if we discover a problem later.

            regards, tom lane

Re: [PATCH] pg_dump: lock tables in batches

From

Gilles Darold

Date:

03 January 2023, 19:46:47

Le 08/12/2022 à 01:03, Tom Lane a écrit :
> Andres Freund <andres@anarazel.de> writes:
>> On 2022-12-07 17:53:05 -0500, Tom Lane wrote:
>>> Is "-s" mode actually a relevant criterion here?  With per-table COPY
>>> commands added into the mix you could not possibly get better than 2x
>>> improvement, and likely a good deal less.
>> Well, -s isn't something used all that rarely, so it'd not be insane to
>> optimize it in isolation. But more importantly, I think the potential win
>> without -s is far bigger than 2x, because the COPYs can be done in parallel,
>> whereas the locking happens in the non-parallel stage.
> True, and there's the reduce-the-lock-window issue that Jacob mentioned.
>
>> With just a 5ms delay, very well within normal network latency range, I get:
>> [ a nice win ]
> OK.  I'm struggling to figure out why I rejected this idea last year.
> I know that I thought about it and I'm fairly certain I actually
> tested it.  Maybe I only tried it with near-zero-latency local
> loopback; but I doubt that, because the potential for network
> latency was certainly a factor in that whole discussion.
>
> One idea is that I might've tried it before getting rid of all the
> other per-object queries, at which point it wouldn't have stood out
> quite so much.  But I'm just guessing.  I have a nagging feeling
> there was something else.
>
> Oh well, I guess we can always revert it if we discover a problem later.
>
>             regards, tom lane
>
Hi,


I have done a review of this patch, it applies well on current master 
and compiles without problem.

make check/installcheck and world run without failure, pg_dump tests 
with pgtap enabled work fine too.

I have also given a try to the bench mentioned in the previous posts and 
I have the same performances gain with the -s option.


As it seems to have a consensus to apply this patch I will change the 
commitfest status to ready for committers.


Regards,

-- 
Gilles Darold

Re: [PATCH] pg_dump: lock tables in batches

From

Tom Lane

Date:

03 January 2023, 23:00:37

Gilles Darold <gilles@migops.com> writes:
> As it seems to have a consensus to apply this patch I will change the 
> commitfest status to ready for committers.

Yeah.  I still have a nagging worry about why I didn't do this already,
but without evidence I can't fairly block the patch.  Hence, pushed.

I did cut the LOCK TABLE query length limit from 1MB to 100K, because
I doubt there is any measurable performance difference, and I'm a
little worried about overstressing smaller servers.

            regards, tom lane