Thread: PATCH: pg_dump to support "on conflict do update"

PATCH: pg_dump to support "on conflict do update"

From
Tanin Na Nakorn
Date:
Hi hackers,

Here's the patch (against the latest master) that will make pg_dump support "on conflict do update" .

I've used this patch on v16 for our company's CI (on Github Actions), and it works perfectly fine.

Users would be able to use it like this:

./src/bin/pg_dump/pg_dump $DATABASE_URL \
            --table=some_random_table \
            --data-only \
            --on-conflict-target-columns url,payload_checksum \
            --on-conflict-update-clause='last_used_at=EXCLUDED.last_used_at' \

            --inserts \
            --rows-per-insert=10 \
            --no-sync \
            --file=/tmp/test.dump

There are 3 caveats:

1. The "on conflict do update" would apply to every table. In my opinion, this is fine. It's the user's choice if they want to apply it to one or all tables. We could make the options more powerful (i.e. support multi-tables) but it would add a lot of complexity.

2. -on-conflict-target-columns should have accepted a list of strings instead. I'm working on it but I'd like an early review of the overall patch first.

3. I can't figure out how to add a test for pg_dump. Any pointer would be appreciated here.

Please help me review this patch as it's my first time submitting a patch to Postgres.

Thank you!
Tanin

Attachment

Re: PATCH: pg_dump to support "on conflict do update"

From
Laurenz Albe
Date:
On Sat, 2025-05-03 at 22:47 -0700, Tanin Na Nakorn wrote:
> Here's the patch (against the latest master) that will make pg_dump support "on conflict do update" .
>
> I've used this patch on v16 for our company's CI (on Github Actions), and it works perfectly fine.
>
> Users would be able to use it like this:
>
> ./src/bin/pg_dump/pg_dump $DATABASE_URL \
>             --table=some_random_table \
>             --data-only \
>             --on-conflict-target-columns url,payload_checksum \
>             --on-conflict-update-clause='last_used_at=EXCLUDED.last_used_at' \
>             --inserts \
>             --rows-per-insert=10 \
>             --no-sync \
>             --file=/tmp/test.dump
>
> There are 3 caveats:
>
> 1. The "on conflict do update" would apply to every table. In my opinion, this is fine.

I don't think that is fine.  I think it would make the feature unusable for most cases.
At the very least, there would have to be a way to specify which tables are affected.

Yours,
Laurenz Albe



Re: PATCH: pg_dump to support "on conflict do update"

From
Tom Lane
Date:
Laurenz Albe <laurenz.albe@cybertec.at> writes:
> On Sat, 2025-05-03 at 22:47 -0700, Tanin Na Nakorn wrote:
>> Here's the patch (against the latest master) that will make pg_dump support "on conflict do update" .
>>
>> There are 3 caveats:
>>
>> 1. The "on conflict do update" would apply to every table. In my opinion, this is fine.

> I don't think that is fine.  I think it would make the feature unusable for most cases.
> At the very least, there would have to be a way to specify which tables are affected.

Yeah.  I kind of feel that this entire idea is misguided.  pg_dump is
not an ETL tool, and bolting ETL-ish features onto it one at a time
seems destined to end in a mess.  But it's particularly awful that
the proposed switch design would apply to all tables.  That pretty
much makes it useless except in a dump that selects only one table.
It's also useless except in a --data-only dump, since if we create
the target table then we know perfectly well that it's empty to
start with.  So at this point you barely need pg_dump at all,
as opposed to some other tool that does a light syntactic
transformation on the result of COPY.

I think it could be interesting to try to build something that
*is* an ETL tool and is meant for cases like partial data loads.
But pg_dump is serving more than enough masters already.  Let's
not add this to its plate.

            regards, tom lane