Re: RFC: replace pg_stat_activity.waiting with something more descriptive - Mailing list pgsql-hackers
From | Ilya Kosmodemiansky |
---|---|
Subject | Re: RFC: replace pg_stat_activity.waiting with something more descriptive |
Date | |
Msg-id | CAG95seVQGu+UwD=VTx28U==meLrCkhmFMOsEZ14FoZ+yc3YCeA@mail.gmail.com Whole thread Raw |
In response to | Re: RFC: replace pg_stat_activity.waiting with something more descriptive (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: RFC: replace pg_stat_activity.waiting with something
more descriptive
|
List | pgsql-hackers |
On Thu, Jun 25, 2015 at 1:49 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: >> Personally I think, that tracking waits is a not a good idea for >> pg_stat_activity (at least in that straight-forward manner). > > As mentioned in the initial mail by Robert, that sometimes system becomes > slow (either due to contention on various kinds of locks or due to I/O or > due > to some other such reasons) that such kind of handy information via some > view is quite useful. Recently while working on one of the > performance/scalability > projects, I need to use gdb to attach to different processes to see what > they > are doing (of course one can use perf or some other utilities as well) and I > found most of them were trying to wait on some LW locks, now having such > an information available via view could be really useful, because sometimes > at customer sites, we can't use gdb or perf to see what's going on. Yes, I understand such a use-case. But I hardly see if suggested design can help for such cases. Basically, a DBA has two reasons to take a look on waits: 1. Long response time for particular query (or some type of queries). In that case it is good to know how much time we spend on waiting for particular resources we need to get query results 2. Overall bad performance of a database. We know, that something goes wrong and consumes resources, we need to identify which backend, which query causes the most of waits. In both cases we need a) some historical data rather than simple snapshot b) some approach how to aggregate it because the will be certainly a lot of events So my point is, we need separate interface for waits, instead of integrating in pg_stat_activity. And it should be several interfaces: one for approximate top of waiting sessions (like active_sessions_history in oracle), one for detailed tracing of a session, one for waits per resource statistics etc. >> One >> process can wait for lots of things between 2 sampling of >> pg_stat_activity and that sampling can be pretty useless. >> > > Yeah, that's right and I am not sure if we should bother about such > scenario's > as the system is generally fine in such situations, however there are other > cases where we can find most of the backends are waiting on one or other > thing. I think approach with top of waiting sessions covers both scenarios (well, with only one exception: if we have billions of very short waits and high contention is the problem) However, it maybe a good idea, to identify the resource we are waiting for from pg_stat_activity if we are waiting for a long time. > > I think this is some what different kind of utility which can give us > aggregated information and I think this will address different kind of > usecase and will have somewhat more complex design and it doesn't > look impossible to use part of what will be developed as part of this > proposal. > I think it is more than possible to mix both approaches. My proof of concept now is only about LWLocks - yours and Robert's is more general, and certainly some wait event classification will be needed for both approaches and its much better to implement one rather than two different. And at least, I will be interesting in reviewing your approach. > > With Regards, > Amit Kapila. > EnterpriseDB: http://www.enterprisedb.com -- Ilya Kosmodemiansky, PostgreSQL-Consulting.com tel. +14084142500 cell. +4915144336040 ik@postgresql-consulting.com
pgsql-hackers by date: