Re: Access statistics - Mailing list pgsql-hackers
From | Jan Wieck |
---|---|
Subject | Re: Access statistics |
Date | |
Msg-id | 200106011331.f51DV5P02078@jupiter.us.greatbridge.com Whole thread Raw |
In response to | Re: Access statistics (Peter Eisentraut <peter_e@gmx.net>) |
Responses |
Re: Access statistics
|
List | pgsql-hackers |
Peter Eisentraut wrote: > Jan Wieck writes: > > > In the next couple of hours (at least tomorrow) I would be > > ready to commit the backend changes for table-/index-access > > statistics and current backend activity views. > > > > Should I apply the patches or provide a separate patch for > > review first? > > Maybe you could describe what it's going to do and how it's going to work. Real programmers don't comment - if it was hard to write it should be hard to read :-) So outing myself not beeing a *real programmer*, this is what I have so far: * On startup the postmaster creates an INET domain UDP socket and bind(2)'s it to localhost:0, meaning the kernel will assign a yet unused, unprivileged port that could be seen with getsockaddr(2). It then starts two background processes of which one is simply a wraparound buffer doing recvfrom(2) on thesocket, checking that the source address of the received packets is the sockets own address (!) and forwardingapproved ones over a pipe to the second one, discribed later. * Backends call some collector functions at various places now (these will finally be macros), that count uptable scans, tuples returned by scans, buffer fetches/hits and the like. At the beginning of a statement thebackends send a message telling the first couple of hundred bytes of the querystring and after the statementis done (just before getting ready for the next command) they send the collected access numbers. Tables, indexes etc. in these statistics are identified by OID, so the data doesn't tell much so far. * The second background process fired by the postmaster collects these numbers into hashtables and as longas it receives messages, it'll write out a summary file every 500 or so milliseconds, telling a snapshot ofcurrent stats. Using select(2) with timeouts ensures that a complete idle DB instance not to waste a single CPUcycle or IO to write these snapshots. On startup it tries to read the last snapshot file in, so the collected statistics survive a postmaster restart. Vacuum reads the file too and sends bulk delete messages for objects that are gone. So the stats don't grow infinitely. * A bunch of new builtin functions gain access to the snapshot file. At first call of one of these functions during a transaction, the backend will read the current file and return the numbers from in memory then. Based on these functions a couple of views can tell these collected stats. Information from the databasessystem catalog is of course required to identify the objects in the stats, but I think those informationsshould only be visible to someone who identified herself as a valid DB user anyway. The visibility of querystrings (this info is available cross DB) is restricted to DB superusers. There has been discussion already about using an INET vs. UNIX UDP socket for the communication. At least forLinux I found INET to be the most effective way of communication. And for security concerns: If someone else thanroot can really send packets to that socket that show up with a source address of 127.0.0.1:n, where n is a portnumber actually occupied by your own socket, be sure you'll have more severe problems than modified accessstatistics. The views should be considered examples. The final naming and layout is subject for discussion. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com # _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
pgsql-hackers by date: