Rethinking stats communication mechanisms - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Rethinking stats communication mechanisms |
Date | |
Msg-id | 23144.1150578742@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: Rethinking stats communication mechanisms
Re: Rethinking stats communication mechanisms |
List | pgsql-hackers |
In view of my oprofile results http://archives.postgresql.org/pgsql-hackers/2006-06/msg00859.php I'm thinking we need some major surgery on the way that the stats collection mechanism works. It strikes me that we are using a single communication mechanism to handle what are really two distinct kinds of data: * Current-state information, eg, what backends are alive and what commands they are currently working on. Ideally we'd like this type of info to be 100% up-to-date. But once a particular bit of information (eg a command string) is obsolete, it's not of interest anymore. * Event counts. These accumulate and so past information is still important. On the other hand, it's not so critical that the info be completely up-to-date --- the central counters can lag behind a bit, so long as events eventually get counted. I believe the stats code was designed with the second case in mind, but we've abused it to handle the first case, and that's why we've now got performance problems. If we are willing to assume that the current-state information is of fixed maximum size, we could store it in shared memory. (This suggestion already came up in the recent thread about ps_status, and I think it's been mentioned before too --- but my point here is that we have to separate this case from the event-counting case.) The only real restriction we'd be making is that we can only show the first N characters of current command string, but we're already accepting that limitation in the existing stats code. (And we could make N whatever we wanted, without worrying about UDP datagram limits.) I'm envisioning either adding fields to the PGPROC array, or perhaps better using a separate array with an entry for each backend ID. Backends would write their status info into this array and any interested backend could read it out again. The stats collector process needn't be involved at all AFAICS. This eliminates any process-dispatch overhead to report command start or command termination. Instead we'd have some locking overhead, but contention ought to be low enough that that's not a serious problem. I'm assuming a separate lock for each array entry so that backends don't contend with each other to update their entries; contention occurs only when someone is actively reading the information. We should probably use LWLocks not spinlocks because the time taken to copy a long command string into the shared area would be longer than we ought to hold a spinlock (but this seems a bit debatable given the expected low contention ... any thoughts?) The existing stats collection mechanism seems OK for event counts, although I'd propose two changes: one, get rid of the separate buffer process, and two, find a way to emit event reports in a time-driven way rather than once per transaction commit. I'm a bit vague about how to do the latter at the moment. Comments? regards, tom lane
pgsql-hackers by date: