Thread: PATCH: pg_dump to support "on conflict do update"
Hi hackers,
Here's the patch (against the latest master) that will make pg_dump support "on conflict do update" .
I've used this patch on v16 for our company's CI (on Github Actions), and it works perfectly fine.
Users would be able to use it like this:
./src/bin/pg_dump/pg_dump $DATABASE_URL \
--table=some_random_table \
--data-only \
--on-conflict-target-columns url,payload_checksum \
--on-conflict-update-clause='last_used_at=EXCLUDED.last_used_at' \
--inserts \
--rows-per-insert=10 \
--no-sync \
--file=/tmp/test.dump
./src/bin/pg_dump/pg_dump $DATABASE_URL \
--table=some_random_table \
--data-only \
--on-conflict-target-columns url,payload_checksum \
--on-conflict-update-clause='last_used_at=EXCLUDED.last_used_at' \
--inserts \
--rows-per-insert=10 \
--no-sync \
--file=/tmp/test.dump
There are 3 caveats:
1. The "on conflict do update" would apply to every table. In my opinion, this is fine. It's the user's choice if they want to apply it to one or all tables. We could make the options more powerful (i.e. support multi-tables) but it would add a lot of complexity.
2. -on-conflict-target-columns should have accepted a list of strings instead. I'm working on it but I'd like an early review of the overall patch first.
3. I can't figure out how to add a test for pg_dump. Any pointer would be appreciated here.
Please help me review this patch as it's my first time submitting a patch to Postgres.
Thank you!
Tanin
Attachment
On Sat, 2025-05-03 at 22:47 -0700, Tanin Na Nakorn wrote: > Here's the patch (against the latest master) that will make pg_dump support "on conflict do update" . > > I've used this patch on v16 for our company's CI (on Github Actions), and it works perfectly fine. > > Users would be able to use it like this: > > ./src/bin/pg_dump/pg_dump $DATABASE_URL \ > --table=some_random_table \ > --data-only \ > --on-conflict-target-columns url,payload_checksum \ > --on-conflict-update-clause='last_used_at=EXCLUDED.last_used_at' \ > --inserts \ > --rows-per-insert=10 \ > --no-sync \ > --file=/tmp/test.dump > > There are 3 caveats: > > 1. The "on conflict do update" would apply to every table. In my opinion, this is fine. I don't think that is fine. I think it would make the feature unusable for most cases. At the very least, there would have to be a way to specify which tables are affected. Yours, Laurenz Albe
Laurenz Albe <laurenz.albe@cybertec.at> writes: > On Sat, 2025-05-03 at 22:47 -0700, Tanin Na Nakorn wrote: >> Here's the patch (against the latest master) that will make pg_dump support "on conflict do update" . >> >> There are 3 caveats: >> >> 1. The "on conflict do update" would apply to every table. In my opinion, this is fine. > I don't think that is fine. I think it would make the feature unusable for most cases. > At the very least, there would have to be a way to specify which tables are affected. Yeah. I kind of feel that this entire idea is misguided. pg_dump is not an ETL tool, and bolting ETL-ish features onto it one at a time seems destined to end in a mess. But it's particularly awful that the proposed switch design would apply to all tables. That pretty much makes it useless except in a dump that selects only one table. It's also useless except in a --data-only dump, since if we create the target table then we know perfectly well that it's empty to start with. So at this point you barely need pg_dump at all, as opposed to some other tool that does a light syntactic transformation on the result of COPY. I think it could be interesting to try to build something that *is* an ETL tool and is meant for cases like partial data loads. But pg_dump is serving more than enough masters already. Let's not add this to its plate. regards, tom lane