Re: [HACKERS] Replication/backup defaults - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: [HACKERS] Replication/backup defaults |
Date | |
Msg-id | c2d7db7e-7e05-51cf-206d-07a8cafe1c98@2ndquadrant.com Whole thread Raw |
In response to | [HACKERS] Replication/backup defaults (Magnus Hagander <magnus@hagander.net>) |
Responses |
Re: [HACKERS] Replication/backup defaults
|
List | pgsql-hackers |
On 01/05/2017 05:37 PM, Stephen Frost wrote: > Tomas, > > * Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote: >> On 01/05/2017 02:23 PM, Magnus Hagander wrote: >>> It's easy enough to construct a benchmark specifically to show the >>> difference, but of any actual "normal workload" for it. Typically the >>> optimization applies to things like bulk loading, which typically never >>> done alone and does not lend itself to that type of benchmarking very >>> easily. >> >> Not sure if I understand correctly what you're saying. You're saying >> that although it'd be easy to construct a benchmark showing >> significant performance impact, it won't represent a common >> workload. Correct? > > I think he's saying that it's not very easy to construct a good example > of typical bulk-loading workloads using just pgbench. Bulk loading > certainly happens with PG and I don't think we'll make very many friends > if we break optimizations when wal_level is set to minimal like those > you get using: > > BEGIN; > CREATE TABLE x (c1 int); > COPY x FROM STDIN; > COMMIT; > > or: > > BEGIN; > TRUNCATE x; > COPY x FROM STDIN; > COMMIT; > > Changing the wal_level from 'minimal' to 'replica' or 'logical' with > such a benchmark is going to make the WAL go from next-to-nothing to > size-of-database. Sure, I do know how to construct such workloads - and it's trivial even with pgbench custom scripts. The question is whether such workloads are common or not. Most importantly, no one is proposing to break the optimizations, but changing the defaults - users relying on the optimizations are free to switch back to wal_level=minimal if needed. > > One doesn't typically *just* do bulk loads, however, > often it's a bulk load into a table and then the contents of that table > are merged with another table or perhaps joined to it to produce some > report or something along those lines. In many of those cases, our > more-recently added capability to have UNLOGGED tables will work, but > not all (in particular, it can be very handy to load everything in using > the above technique and then switch the wal_level to replica, which > avoids having to have the bulk of the data sent through WAL, something > you can't avoid if you want to turn an unlogged table into a logged > one). > Ultimately, the question is whether the number of people running into "Hey, I can't take pg_basebackup or setup a standby with the default config!" is higher or lower than number of people running into "Hey, CREATE TABLE + COPY is slower now!" I haven't seen many systems relying on such load optimizations, for a number of reasons: 1) The important/critical systems usually have replicas, so are inherently incompatible with wal_level=minimal. 2) The batch jobs usually don't truncate the main table, but load the increment into a temporary/unlogged table first, then merge it into the main one. That is not to say there are no other cases benefiting from those optimizations, but we're talking about the default value - we're not removing the wal_level=minimal. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: