Re: RFC: replace pg_stat_activity.waiting with something more descriptive - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: RFC: replace pg_stat_activity.waiting with something more descriptive |
Date | |
Msg-id | CA+TgmoZ-8ZpoUM9BGtBUP1u4dUQhC-9EpEDLzyK0dG4pKMDUwQ@mail.gmail.com Whole thread Raw |
In response to | Re: RFC: replace pg_stat_activity.waiting with something more descriptive (Alexander Korotkov <aekorotkov@gmail.com>) |
Responses |
Re: RFC: replace pg_stat_activity.waiting with something
more descriptive
|
List | pgsql-hackers |
On Mon, Sep 14, 2015 at 5:32 AM, Alexander Korotkov <aekorotkov@gmail.com> wrote: > In order to build the consensus we need the roadmap for waits monitoring. > Would single byte in PgBackendStatus be the only way for tracking wait > events? Could we have pluggable infrastructure in waits monitoring: for > instance, hooks for wait event begin and end? No, it's not the only way of doing it. I proposed doing that way because it's simple and cheap, but I'm not hell-bent on it. My basic concern here is about the cost of this. I think that the most data we can report without some kind of synchronization protocol is one 4-byte integer. If we want to report anything more than that, we're going to need something like the st_changecount protocol, or a lock, and that's going to add very significantly - and in my view unacceptably - to the cost. I care very much about having this facility be something that we can use in lots of places, even extremely frequent operations like buffer reads and contended lwlock acquisition. I think that there may be some *kinds of waits* for which it's practical to report additional detail. For example, suppose that when a heavyweight lock wait first happens, we just report the lock type (relation, tuple, etc.) but then when the deadlock detector expires, if we're still waiting, we report the entire lock tag. Well, that's going to happen infrequently enough, and is expensive enough anyway, that the cost doesn't matter. But if, every time we read a disk block, we take a lock (or bump a changecount and do a write barrier), dump the whole block tag in there, release the lock (or do another write barrier and bump the changecount again) that sounds kind of expensive to me. Maybe we can prove that it doesn't matter on any workload, but I doubt it. We're fighting for every cycle in some of these code paths, and there's good evidence that we're burning too many of them compared to competing products already. I am not a big fan of hooks as a way of resolving disagreements about the design. We may find that there are places where it's useful to have hooks so that different extensions can do different things, and that is fine. But we shouldn't use that as a way of punting the difficult questions. There isn't enough common understanding here of what we're all trying to get done and why we're trying to do it in particular ways rather than in other ways to jump to the conclusion that a hook is the right answer. I'd prefer to have a nice, built-in system that everyone agrees represents a good set of trade-offs than an extensible system. I think it's reasonable to consider reporting this data in the PGPROC using a 4-byte integer rather than reporting it through a singe byte in the backend status structure. I believe that addresses the concerns about reporting from auxiliary processes, and it also allows a little more data to be reported. For anything in excess of that, I think we should think rather harder. Most likely, such addition detail should be reported only for certain types of wait events, or on a delay, or something like that, so that the core mechanism remains really, really fast. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: