Re: WAL insert delay settings - Mailing list pgsql-hackers
From | Stephen Frost |
---|---|
Subject | Re: WAL insert delay settings |
Date | |
Msg-id | 20190220234609.GC6197@tamriel.snowman.net Whole thread Raw |
In response to | Re: WAL insert delay settings (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Responses |
Re: WAL insert delay settings
|
List | pgsql-hackers |
Greetings, * Tomas Vondra (tomas.vondra@2ndquadrant.com) wrote: > On 2/20/19 10:43 PM, Stephen Frost wrote: > > Just to share a few additional thoughts after pondering this for a > > while, but the comment Andres made up-thread really struck a chord- we > > don't necessairly want to throttle anything, what we'd really rather do > > is *prioritize* things, whereby foreground work (regular queries and > > such) have a higher priority than background/bulk work (VACUUM, REINDEX, > > etc) but otherwise we use the system to its full capacity. We don't > > actually want to throttle a VACUUM run any more than a CREATE INDEX, we > > just don't want those to hurt the performance of regular queries that > > are happening. > > I think you're forgetting the motivation of this very patch was to > prevent replication lag caused by a command generating large amounts of > WAL (like CREATE INDEX / ALTER TABLE etc.). That has almost nothing to > do with prioritization or foreground/background split. > > I'm not arguing against ability to prioritize stuff, but I disagree it > somehow replaces throttling. Why is replication lag an issue though? I would contend it's an issue because with sync replication, it makes foreground processes wait, and with async replication, it makes the actions of foreground processes show up late on the replicas. If the actual WAL records for the foreground processes got priority and were pushed out earlier than the background ones, that would eliminate both of those issues with replication lag. Perhaps there's other issues that replication lag cause but which aren't solved by prioritizing the actual WAL records that you care about getting to the replicas faster, but if so, I'd like to hear what those are. > > The other thought I had was that doing things on a per-table basis, in > > particular, isn't really addressing the resource question appropriately. > > WAL is relatively straight-forward and independent of a resource from > > the IO for the heap/indexes, so getting an idea from the admin of how > > much capacity they have for WAL makes sense. When it comes to the > > capacity for the heap/indexes, in terms of IO, that really goes to the > > underlying storage system/channel, which would actually be a tablespace > > in properly set up environments (imv anyway). > > > > Wrapping this up- it seems unlikely that we're going to get a > > priority-based system in place any time particularly soon but I do think > > it's worthy of some serious consideration and discussion about how we > > might be able to get there. On the other hand, if we can provide a way > > for the admin to say "these are my IO channels (per-tablespace values, > > plus a value for WAL), here's what their capacity is, and here's how > > much buffer for foreground work I want to have (again, per IO channel), > > so, PG, please arrange to not use more than 'capacity-buffer' amount of > > resources for background/bulk tasks (per IO channel)" then we can at > > least help them address the issue that foreground tasks are being > > stalled or delayed due to background/bulk work. This approach means > > that they won't be utilizing the system to its full capacity, but > > they'll know that and they'll know that it's because, for them, it's > > more important that they have that low latency for foreground tasks. > > I think it's mostly orthogonal feature to throttling. I'm... not sure that what I was getting at above really got across. What I was saying above, in a nutshell, is that if we're going to provide throttling then we should give users a way to configure the throttling on a per-IO-channel basis, which means at the tablespace level, plus an independent configuration option for WAL since we allow that to be placed elsewhere too. Ideally, the configuration parameter would be in the same units as the actual resource is too- which would probably be IOPS+bandwidth, really. Just doing it in terms of bandwidth ends up being a bit of a mismatch as compared to reality, and would mean that users would have to tune it down farther than they might otherwise and therefore give up that much more in terms of system capability. Thanks! Stephen
Attachment
pgsql-hackers by date: