Thread: Tracking replication slot "blockings"
I'm thinking it could be interesting to know how many times (or in some other useful unit than "times" - how often) a specific replication slot has "blocked" xlog rotation. Since this AFAIK only happens during checkpoints, it seems it should be "reasonably cheap" to track? It would serve as an indicator of which slave(s) are having enough trouble keeping up to potentially cause issues.
Not having looked at that code at all yet, would this be something that's simple to add?
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
Not having looked at that code at all yet, would this be something that's simple to add?
Or is it a silly idea? :)
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
Hi, On 2014-04-16 18:51:41 +0200, Magnus Hagander wrote: > I'm thinking it could be interesting to know how many times (or in some > other useful unit than "times" - how often) a specific replication slot has > "blocked" xlog rotation. Since this AFAIK only happens during checkpoints, > it seems it should be "reasonably cheap" to track? It would serve as an > indicator of which slave(s) are having enough trouble keeping up to > potentially cause issues. The xlog removal code just check the "global minimum" required LSN - it doesn't check the individual slots. So you'd need to add a bit more code to that location. But it'd be easy. But I think I'd just monitor/graph the byte difference for all slots using pg_replication_slots... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Apr 16, 2014 at 6:56 PM, Andres Freund <andres@2ndquadrant.com> wrote:
Hi,The xlog removal code just check the "global minimum" required LSN - it
On 2014-04-16 18:51:41 +0200, Magnus Hagander wrote:
> I'm thinking it could be interesting to know how many times (or in some
> other useful unit than "times" - how often) a specific replication slot has
> "blocked" xlog rotation. Since this AFAIK only happens during checkpoints,
> it seems it should be "reasonably cheap" to track? It would serve as an
> indicator of which slave(s) are having enough trouble keeping up to
> potentially cause issues.
doesn't check the individual slots. So you'd need to add a bit more code
to that location. But it'd be easy.
Do we have statistics there somewhere - how often that global minimum blocks something? That on it's own might be a start :)
But I think I'd just monitor/graph the byte difference for all slots
using pg_replication_slots...
Yeah, that would work when monitored continously. I was more looking for the view of "hey, could this be what happened" into a system that did not previously have any monitoring installed and therefor no such history.
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
On 2014-04-16 19:09:09 +0200, Magnus Hagander wrote: > On Wed, Apr 16, 2014 at 6:56 PM, Andres Freund <andres@2ndquadrant.com>wrote: > > The xlog removal code just check the "global minimum" required LSN - it > > doesn't check the individual slots. So you'd need to add a bit more code > > to that location. But it'd be easy. > > > > Do we have statistics there somewhere - how often that global minimum > blocks something? That on it's own might be a start :) Nope. Check xlog.c:KeepLogSeg(), it's pretty simple stuff ;). It's the same place where wal_keep_segments is enforced... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services