Thread: Tracking wait event for latches
Hi all, As I mentioned $subject a couple of months back after looking at the wait event facility here: http://www.postgresql.org/message-id/CAB7nPqTJpgAvOK4qSC96Fpm5W+aCtJ9D=3Vn9AfiEYsur=-juw@mail.gmail.com I have actually taken some time to implement this idea. The particular case that I had in mind was to be able to track in pg_stat_activity processes that are waiting on a latch for synchronous replication, and here is what this patch gives in this case: =# alter system set synchronous_standby_names = 'foo'; ALTER SYSTEM =# select pg_reload_conf(); pg_reload_conf ---------------- t (1 row) =# -- Do something [...] And from another session: =# select wait_event_type, wait_event from pg_stat_activity where pid = 83316; wait_event_type | wait_event -----------------+------------ Latch | SyncRep (1 row) This is a boring patch, and it relies on the wait event facility that has been added recently in 9.6. Note a couple of things though: 1) There is something like 30 code paths calling WaitLatch in the backend code, all those events are classified and documented similarly to LWLock events. 2) After discussing this stuff while at PGCon, it does not seem worth to have any kind of APIs to be able to add in shared memory custom latch names that extensions could load through _PG_init(). In replacement to that, there is a latch type flag called "Extension" that can be used for this purpose. Comments are welcome, I am adding that in the 9.7 queue. Regards, -- Michael
Attachment
On Thu, May 19, 2016 at 4:14 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > Comments are welcome, I am adding that in the 9.7 queue. Take that as 10.0 as things are going. -- Michael
On Fri, May 20, 2016 at 8:14 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > Hi all, > > As I mentioned $subject a couple of months back after looking at the > wait event facility here: > http://www.postgresql.org/message-id/CAB7nPqTJpgAvOK4qSC96Fpm5W+aCtJ9D=3Vn9AfiEYsur=-juw@mail.gmail.com > I have actually taken some time to implement this idea. > > The particular case that I had in mind was to be able to track in > pg_stat_activity processes that are waiting on a latch for synchronous > replication, and here is what this patch gives in this case: > =# alter system set synchronous_standby_names = 'foo'; > ALTER SYSTEM > =# select pg_reload_conf(); > pg_reload_conf > ---------------- > t > (1 row) > =# -- Do something > [...] > > And from another session: > =# select wait_event_type, wait_event from pg_stat_activity where pid = 83316; > wait_event_type | wait_event > -----------------+------------ > Latch | SyncRep > (1 row) > > This is a boring patch, and it relies on the wait event facility that > has been added recently in 9.6. Note a couple of things though: > 1) There is something like 30 code paths calling WaitLatch in the > backend code, all those events are classified and documented similarly > to LWLock events. > 2) After discussing this stuff while at PGCon, it does not seem worth > to have any kind of APIs to be able to add in shared memory custom > latch names that extensions could load through _PG_init(). In > replacement to that, there is a latch type flag called "Extension" > that can be used for this purpose. > Comments are welcome, I am adding that in the 9.7 queue. This is very cool, and I can't wait to have this feature! It'll be useful for all kinds of developers and users. I wanted this today to help debug something I am working on, and remembered this patch. I have signed up as a reviewer for the September commitfest. But here is some initial feedback based on a quick glance at it: This patch allows identifiers to be specified by the WaitLatch and WaitLatchOrSocket calls, but not for WaitEventSetWait, which is the more general waiting primitive. I think it should be done by WaitEventSetWait, and merely passed down by those wrappers functions. These are actually names for *wait points*, not names for latches. Some of the language in this patch makes it sound like they are latch names/latch identifiers, which is inaccurate (the latches themselves wouldn't be very interesting eg MyLatch). In some cases the main thing of interest is actually a socket or timer anyway, not a latch, so maybe it should appear as wait_event_type = WaitEventSet? Is there a reason you chose names like 'WALWriterMain'? That particular wait point is in the function WalWriterMain (note different case). It seems odd to use the containing function names to guide naming, but not use them exactly. Since this namespace needs to be able to name wait points anywhere in the postgres source tree (and maybe outside it too, for extensions), maybe it should be made hierarchical, like 'walwriter.WalWriterMain' (or some other organisational scheme). I personally think it would be very useful for extensions to be able to name their wait points. For example, I'd rather see 'postgres_fdw.pgfdw_get_result' or similar than a vague 'Extension' string which appears not only for all wait points in an extension but also for all extensions. I hope we can figure out a good way to do that. Clearly that would involve some shmem registry machinery to make the names visible across backends (a similar problem exists with lock tranche names). -- Thomas Munro http://www.enterprisedb.com
On Thu, Jun 2, 2016 at 12:25 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > This patch allows identifiers to be specified by the WaitLatch and > WaitLatchOrSocket calls, but not for WaitEventSetWait, which is the > more general waiting primitive. I think it should be done by > WaitEventSetWait, and merely passed down by those wrappers functions. The advantage of having WaitEventSetWait track is that we could track the events of secure_read and secure_write. One potential problem by doing so is if those routines are called during authentication. I don't recall that's the case, but this needs a double-check. > These are actually names for *wait points*, not names for latches. OK. > Some of the language in this patch makes it sound like they are latch > names/latch identifiers, which is inaccurate (the latches themselves > wouldn't be very interesting eg MyLatch). In some cases the main > thing of interest is actually a socket or timer anyway, not a latch, > so maybe it should appear as wait_event_type = WaitEventSet? Hm. A latch could wait for multiple types things it is waiting for, so don't you think we'd need to add more details in what is reported to pg_stat_activity? There are two fields now, and in the context of this patch: - wait_event_type, which I'd like to think is associated to a latch, so I named it so. - wait_event, which is the code path that we are waiting at, like SyncRep, the WAL writer main routine, etc. Now if you would like to get a list of all the things that are being waited for, shouldn't we add a third column to the set that has text[] as return type? Let's name it wait_event_details, and for a latch we have the following: - WL_LATCH_SET - WL_POSTMASTER_DEATH - WL_SOCKET_READABLE - WL_SOCKET_WRITEABLE Compared to all the other existing wait types, that's a bit new and that's exclusive to latches because we need a higher level of details. Don't you think so? But actually I don't think that's necessary to go into this level of details. We already know what a latch is waiting for by looking at the code... > Is there a reason you chose names like 'WALWriterMain'? That > particular wait point is in the function WalWriterMain (note different > case). It seems odd to use the containing function names to guide > naming, but not use them exactly. Since this namespace needs to be > able to name wait points anywhere in the postgres source tree (and > maybe outside it too, for extensions), maybe it should be made > hierarchical, like 'walwriter.WalWriterMain' (or some other > organisational scheme). Yeah, possibly this could be improved. I have put some thoughts in having clear names for each one of them, so I'd rather keep them simple. > I personally think it would be very useful for extensions to be able > to name their wait points. For example, I'd rather see > 'postgres_fdw.pgfdw_get_result' or similar than a vague 'Extension' > string which appears not only for all wait points in an extension but > also for all extensions. I hope we can figure out a good way to do > that. Clearly that would involve some shmem registry machinery to > make the names visible across backends (a similar problem exists with > lock tranche names). This patch is shaped this way intentionally based on the feedback I received at PGCon (Robert and others). We could provide a routine that extensions call in _PG_init to register a new latch event name in shared memory, but I didn't see much use in doing so, take for example the case of background worker, it is possible to register a custom string for pg_stat_activity via pgstat_report_activity. One could take advantage of having custom latch wait names in shared memory if an extension has wait points with latches though... But I am still not sure if that's worth the complexity. -- Michael
On Thu, Jun 2, 2016 at 1:34 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > This patch is shaped this way intentionally based on the feedback I > received at PGCon (Robert and others). We could provide a routine that > extensions call in _PG_init to register a new latch event name in > shared memory, but I didn't see much use in doing so, take for example > the case of background worker, it is possible to register a custom > string for pg_stat_activity via pgstat_report_activity. One could take > advantage of having custom latch wait names in shared memory if an > extension has wait points with latches though... But I am still not > sure if that's worth the complexity. I can't see how you could ever guarantee that it wouldn't just fail. We allocate a certain amount of "slop" in the main shared memory segment, but it's not infinite and can certainly be exhausted. It seems like it would suck if you tried to load your extension and it failed because there was no room left for more wait-point names. Maybe it would suck less than not having wait-point names, but I'm not really sure. I think we'd do better to get something that handles the core stuff well and then consider extensions later or not at all. It only matters if you're running multiple extensions that all use LWLock tranches and you need to distinguish between waits on their various LWLocks. But since LWLock contention is something we hope will be infrequent I'm just not sure that case is common enough to justify building a lot of mechanism. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Jun 4, 2016 at 2:41 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Jun 2, 2016 at 1:34 AM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> This patch is shaped this way intentionally based on the feedback I >> received at PGCon (Robert and others). We could provide a routine that >> extensions call in _PG_init to register a new latch event name in >> shared memory, but I didn't see much use in doing so, take for example >> the case of background worker, it is possible to register a custom >> string for pg_stat_activity via pgstat_report_activity. One could take >> advantage of having custom latch wait names in shared memory if an >> extension has wait points with latches though... But I am still not >> sure if that's worth the complexity. > > I can't see how you could ever guarantee that it wouldn't just fail. > We allocate a certain amount of "slop" in the main shared memory > segment, but it's not infinite and can certainly be exhausted. It > seems like it would suck if you tried to load your extension and it > failed because there was no room left for more wait-point names. > Maybe it would suck less than not having wait-point names, but I'm not > really sure. I think we'd do better to get something that handles the > core stuff well and then consider extensions later or not at all. Yeah, that's as well my line of thoughts on the matter since the beginning: keep it simple and done. What is written just after those words is purely hand-waving and I have no way to prove it, but my instinctive guess is that more than 90% of the real use cases where we need to track the latch waits in pgstat would be covered without the need of this extra shared memory infrastructure for extensions. -- Michael
On Wed, Jun 8, 2016 at 10:11 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > Yeah, that's as well my line of thoughts on the matter since the > beginning: keep it simple and done. What is written just after those > words is purely hand-waving and I have no way to prove it, but my > instinctive guess is that more than 90% of the real use cases where we > need to track the latch waits in pgstat would be covered without the > need of this extra shared memory infrastructure for extensions. So, I have done some extra tests with my patch to see if I move the event reporting down to WaitEventSetWait and see what is the effect on secure_read and secure_write. And the conclusion is that I am seeing no difference, so I changed the patch to the way suggested by Thomas. In this test, what I have done was using the following pgbench script with -C via an SSL connection: \set id random(1,1000) As the script file is not empty, a connection to the server is opened, so this goes though secure_read at minimal cost on the backend. Also, I have change the event ID notation as follows to be consistent with the routine names: WAL -> Wal PG -> Pg I also found that LATCH_RECOVERY_WAL_ALL was reporting "XLogAfterAvailable" as name, which was incorrect. Finally, I have changed the patch so as it does not track "Latch" events, but "EventSet" events instead, per the suggestion of Thomas. "WaitEventSet" is too close to wait_event in my taste so I shortened the suggeston. There were also some conflicts caused by the recent commit 887feefe, which are fixed. Attached is an updated patch. -- Michael
Attachment
On Tue, Aug 2, 2016 at 10:31 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > Attached is an updated patch. Updated version for 2 minor issues: 1) s/stram/stream/ 2) Docs used incorrect number -- Michael
Attachment
Hi, Michael!
On Thu, Aug 4, 2016 at 8:26 AM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Tue, Aug 2, 2016 at 10:31 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> Attached is an updated patch.
Updated version for 2 minor issues:
1) s/stram/stream/
2) Docs used incorrect number
I took a look at your patch. Couple of notes from me.
const char *
GetEventIdentifier(uint16 eventId)
{
const char *res;
switch (eventId)
{
case EVENT_ARCHIVER_MAIN:
res = "ArchiverMain";
break;
... long long list of events ...
case EVENT_WAL_SENDER_WRITE_DATA:
res = "WalSenderWriteData";
break;
default:
res = "???";
}
return res;
}
Would it be better to use an array here?
typedef enum EventIdentifier
{
EventIdentifier seems too general name for me, isn't it? Could we name it WaitEventIdentifier? Or WaitEventId for shortcut?
The Russian Postgres Company
On Mon, Aug 22, 2016 at 12:09 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
I took a look at your patch. Couple of notes from me.const char *
GetEventIdentifier(uint16 eventId)
{
const char *res;
switch (eventId)
{
case EVENT_ARCHIVER_MAIN:
res = "ArchiverMain";
break;
... long long list of events ...
case EVENT_WAL_SENDER_WRITE_DATA:
res = "WalSenderWriteData";
break;
default:
res = "???";
}
return res;
}Would it be better to use an array here?typedef enum EventIdentifier
{EventIdentifier seems too general name for me, isn't it? Could we name it WaitEventIdentifier? Or WaitEventId for shortcut?
I'm also not sure about handling of secure_read() and secure_write() functions.
In the current patch we're tracking latch event wait inside them. But we setup latch only for blocking sockets and can do it multiple times in loop.
Would it be better to make separate wait events NETWORK_READ and NETWORK_WRITE and setup them for the whole time spent in secure_read() and secure_write()?
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On Mon, Aug 22, 2016 at 6:09 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > Hi, Michael! > > On Thu, Aug 4, 2016 at 8:26 AM, Michael Paquier <michael.paquier@gmail.com> > wrote: > I took a look at your patch. Couple of notes from me. Thanks! >> const char * >> GetEventIdentifier(uint16 eventId) >> { >> const char *res; >> switch (eventId) >> { >> case EVENT_ARCHIVER_MAIN: >> res = "ArchiverMain"; >> break; >> ... long long list of events ... >> case EVENT_WAL_SENDER_WRITE_DATA: >> res = "WalSenderWriteData"; >> break; >> default: >> res = "???"; >> } >> return res; >> } > > > Would it be better to use an array here? The reason why I chose this way is that there are a lot of them. It is painful to maintain the order of the array elements in perfect mapping with the list of IDs... >> typedef enum EventIdentifier >> { > > > EventIdentifier seems too general name for me, isn't it? Could we name it > WaitEventIdentifier? Or WaitEventId for shortcut? OK. So WaitEventIdentifier? The reason to include Identifier is for consistency with lwlock structure notation. -- Michael
On Mon, Aug 22, 2016 at 9:49 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > The reason why I chose this way is that there are a lot of them. It is > painful to maintain the order of the array elements in perfect mapping > with the list of IDs... You can use stupid macro tricks to help with that problem... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Aug 23, 2016 at 12:44 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Aug 22, 2016 at 9:49 AM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> The reason why I chose this way is that there are a lot of them. It is >> painful to maintain the order of the array elements in perfect mapping >> with the list of IDs... > > You can use stupid macro tricks to help with that problem... Yeah, still after thinking about it I think I would just go with an array like lock types and be done with it. With a comment to mention that the order should be respected things would be enough... -- Michael
On Mon, Aug 22, 2016 at 6:46 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > On Mon, Aug 22, 2016 at 12:09 PM, Alexander Korotkov > <a.korotkov@postgrespro.ru> wrote: >> >> I took a look at your patch. Couple of notes from me. >> >>> const char * >>> GetEventIdentifier(uint16 eventId) >>> { >>> const char *res; >>> switch (eventId) >>> { >>> case EVENT_ARCHIVER_MAIN: >>> res = "ArchiverMain"; >>> break; >>> ... long long list of events ... >>> case EVENT_WAL_SENDER_WRITE_DATA: >>> res = "WalSenderWriteData"; >>> break; >>> default: >>> res = "???"; >>> } >>> return res; >>> } >> >> >> Would it be better to use an array here? Okay, I have switched to an array.... >>> typedef enum EventIdentifier >>> { >> >> >> EventIdentifier seems too general name for me, isn't it? Could we name it >> WaitEventIdentifier? Or WaitEventId for shortcut? ... And WaitEventIdentifier. > I'm also not sure about handling of secure_read() and secure_write() > functions. > In the current patch we're tracking latch event wait inside them. But we > setup latch only for blocking sockets and can do it multiple times in loop. > Would it be better to make separate wait events NETWORK_READ and > NETWORK_WRITE and setup them for the whole time spent in secure_read() and > secure_write()? The whole point is to track a waiting event when we are sure that it is going to wait, which is why the patch depends on WaitEventSetWait. If we would set up those flags at the beginning and reset them of secure_read and secure_write, we may actually track an event that is not blocking. -- Michael
Attachment
+ <para> + <literal>EventSet</>: The server process is waiting on a socket + or a timer. This wait happens in a latch, an inter-process facility + using boolean variables letting a process sleep until it is set. + Latches support several type of operations, like postmaster death + handling, timeout and socket activity lookup, and are a useful + replacement for <function>sleep</> where signal handling matters. + </para> This paragraph seems a bit confused. I suggest something more like this: "The server process is waiting for one or more sockets, a timer or an interprocess latch. The wait happens in a WaitEventSet, <productname>PostgreSQL</>'s portable IO multiplexing abstraction." On Tue, Aug 23, 2016 at 7:01 PM, Michael Paquier <michael.paquier@gmail.com> wrote: >> On Mon, Aug 22, 2016 at 12:09 PM, Alexander Korotkov >> <a.korotkov@postgrespro.ru> wrote: >>> Would it be better to use an array here? > > Okay, I have switched to an array.... I looked at various macro tricks[1] but they're all pretty unpleasant!+1 for the simple array with carefully managed order. About that order... +const char *const WaitEventNames[] = { + "ArchiverMain", + "AutoVacuumMain", + "BaseBackupThrottle", + "BgWorkerStartup", + "BgWorkerShutdown", + "BgWriterMain", + "BgWriterHibernate", + "CheckpointerMain", + "ExecuteGather", + "Extension", + "MessageQueuePutMessage", + "MessageQueueSend", + "MessageQueueReceive", + "MessageQueueInternal", + "ParallelFinish", + "PgStatMain", + "ProcSleep", + "ProcSignal", + "PgSleep", + "SecureRead", + "SecureWrite", + "SSLOpenServer", + "SyncRep", + "SysLoggerMain", + "RecoveryApplyDelay", + "RecoveryWalAll", + "RecoveryWalStream", + "WalReceiverWaitStart", + "WalReceiverMain", + "WalSenderMain", + "WalSenderWaitForWAL", + "WalSenderWriteData" + "WalWriterMain", +}; It looks like this array wants to be in alphabetical order, but it isn't quite. Also, perhaps a compile time assertion about the size of the array matching EVENT_LAST_TYPE could be useful? +typedef enum WaitEventIdentifier +{ + EVENT_ARCHIVER_MAIN, + EVENT_AUTOVACUUM_MAIN, + EVENT_BASEBACKUP_THROTTLE, + EVENT_BGWORKER_STARTUP, + EVENT_BGWORKER_SHUTDOWN, + EVENT_BGWRITER_MAIN, + EVENT_BGWRITER_HIBERNATE, + EVENT_CHECKPOINTER_MAIN, + EVENT_EXECUTE_GATHER, + EVENT_EXTENSION, + EVENT_MQ_PUT_MESSAGE, + EVENT_MQ_SEND_BYTES, + EVENT_MQ_RECEIVE_BYTES, + EVENT_MQ_WAIT_INTERNAL, + EVENT_PARALLEL_WAIT_FINISH, + EVENT_PGSTAT_MAIN, + EVENT_PROC_SLEEP, + EVENT_PROC_SIGNAL, + EVENT_PG_SLEEP, + EVENT_SECURE_READ, + EVENT_SECURE_WRITE, + EVENT_SSL_OPEN_SERVER, + EVENT_SYNC_REP, + EVENT_SYSLOGGER_MAIN, + EVENT_RECOVERY_APPLY_DELAY, + EVENT_RECOVERY_WAL_ALL, + EVENT_RECOVERY_WAL_STREAM, + EVENT_WAL_RECEIVER_WAIT_START, + EVENT_WAL_RECEIVER_MAIN, + EVENT_WAL_SENDER_WRITE_DATA, + EVENT_WAL_SENDER_MAIN, + EVENT_WAL_SENDER_WAIT_WAL, + EVENT_WALWRITER_MAIN +} WaitEventIdentifier; This is also nearly but not exactly in alphabetical order (EVENT_PROC_SIGNAL comes after EVENT_PROC_SLEEP for example). But it's not currently possible to have the strings and the enumerators both in alphabetical order because they're not the same, even accounting for camel-case to upper-case transliteration. I think at least one of them should be sorted. Shouldn't they match fully and then *both* be sorted alphabetically? For example "MessageQueueInternal" doesn't match EVENT_MQ_WAIT_INTERNAL. Then there are some cases where I'd expect underscores for consistency with other enumerators and with the corresponding camel-case strings: you have EVENT_WAL_SENDER_MAIN, but EVENT_WALWRITER_MAIN. The documentation is in a slightly different order again but also not exactly alphabetical: for example ProcSleep is listed before ProcSignal. Sorry if this all sounds terribly picky but I think we should try to be strictly systematic here. >>> EventIdentifier seems too general name for me, isn't it? Could we name it >>> WaitEventIdentifier? Or WaitEventId for shortcut? > > ... And WaitEventIdentifier. +1 from me too for avoiding the overly general term 'event'. It does seem a little odd to leave the enumerators names as EVENT_... though; shouldn't these be WAIT_EVENT_... or WE_...? Or perhaps you could consider WaitPointIdentifier and WP_SECURE_READ or WaitEventPointIdentifier and WEP_SECURE_READ, if you buy my earlier argument that what we are really naming here is point in the code where we wait, not the events we're waiting for. Contrast with LWLocks where we report the lock that you're waiting for, not the place in the code where you're waiting for that lock. On Wed, Aug 3, 2016 at 1:31 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > Finally, I have changed the patch so as it does not track "Latch" > events, but "EventSet" events instead, per the suggestion of Thomas. > "WaitEventSet" is too close to wait_event in my taste so I shortened > the suggeston. This is good, because showing "Latch" when we were really waiting for a socket was misleading. On the other hand, if we could *accurately* report whether it's a "Latch", "Socket" or "Latch|Socket" that we're waiting for, it might be cool to do that instead. One way to do that would be to use up several class IDs: WAIT_EVENT_LATCH, WAIT_EVENT_LATCH_OR_SOCKET, WAIT_EVENT_SOCKET (or perhaps WAIT_EVENT_LATCH | WAIT_EVENT_SOCKET, reserving 2 or 3 upper bits from the 8 bit class ID for this). Then we could figure out the right class ID by looking at set->latch and set->nevents (perhaps an extra boolean would be needed to record whether postmaster death is in there so we could deduce whether there are any sockets). It would be interesting specifically for the case of FDWs where it would be nice to be able to see clearly that it's waiting for a remote server ("Socket"). It may also be interesting to know if there is a timeout. Postmaster death doesn't seem newsworthy, we're nearly always also waiting for that exceptional event so it'd just be clutter to report it. [1] http://stackoverflow.com/questions/147267/easy-way-to-use-variables-of-enum-types-as-string-in-c -- Thomas Munro http://www.enterprisedb.com
On Wed, Sep 21, 2016 at 8:13 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > + <para> > + <literal>EventSet</>: The server process is waiting on a socket > + or a timer. This wait happens in a latch, an inter-process facility > + using boolean variables letting a process sleep until it is set. > + Latches support several type of operations, like postmaster death > + handling, timeout and socket activity lookup, and are a useful > + replacement for <function>sleep</> where signal handling matters. > + </para> > > This paragraph seems a bit confused. I suggest something more like > this: "The server process is waiting for one or more sockets, a timer > or an interprocess latch. The wait happens in a WaitEventSet, > <productname>PostgreSQL</>'s portable IO multiplexing abstraction." OK, I have tweaked the paragraph as follows using your suggestion: + <listitem> + <para> + <literal>EventSet</>: The server process is waiting on one or more + sockets, a time or an inter-process latch. The wait happens in a + <function>WaitEventSet</>, <productname>PostgreSQL</>'s portable + I/O multiplexing abstraction using boolean variables letting a + process sleep until it is set. It supports several type of + operations, like postmaster death handling, timeout and socket + activity lookup, and are a useful replacement for <function>sleep</> + where signal handling matters. + </para> + </listitem> > On Tue, Aug 23, 2016 at 7:01 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: >>> On Mon, Aug 22, 2016 at 12:09 PM, Alexander Korotkov >>> <a.korotkov@postgrespro.ru> wrote: >>>> Would it be better to use an array here? >> >> Okay, I have switched to an array.... > > I looked at various macro tricks[1] but they're all pretty unpleasant! > +1 for the simple array with carefully managed order. About that > order... > [1] http://stackoverflow.com/questions/147267/easy-way-to-use-variables-of-enum-types-as-string-in-c Yes, I recall bumping on this one, or something really similar to that... > +const char *const WaitEventNames[] = { > [...] > +}; > > It looks like this array wants to be in alphabetical order, but it > isn't quite. Also, perhaps a compile time assertion about the size of > the array matching EVENT_LAST_TYPE could be useful? In GetWaitEventIdentifier()? I'd think that just returning ??? would have been fine if there is a non-matching call. > +typedef enum WaitEventIdentifier > +{ > [...] > +} WaitEventIdentifier; > > This is also nearly but not exactly in alphabetical order > (EVENT_PROC_SIGNAL comes after EVENT_PROC_SLEEP for example). But > it's not currently possible to have the strings and the enumerators > both in alphabetical order because they're not the same, even > accounting for camel-case to upper-case transliteration. I think at > least one of them should be sorted. Shouldn't they match fully and > then *both* be sorted alphabetically? For example > "MessageQueueInternal" doesn't match EVENT_MQ_WAIT_INTERNAL. Then > there are some cases where I'd expect underscores for consistency with > other enumerators and with the corresponding camel-case strings: you > have EVENT_WAL_SENDER_MAIN, but EVENT_WALWRITER_MAIN. Not wrong.. > The documentation is in a slightly different order again but also not > exactly alphabetical: for example ProcSleep is listed before > ProcSignal. Right. > Sorry if this all sounds terribly picky but I think we should try to > be strictly systematic here. No worries about that, it matters a lot for this patch. The user-faced documentation is what should do the decision-making I think. So let's order the names, and adapt the enum depending on that. I have done so after double-checking both lists, and added a comment for anybody updating that in the fiture. >>>> EventIdentifier seems too general name for me, isn't it? Could we name it >>>> WaitEventIdentifier? Or WaitEventId for shortcut? >> >> ... And WaitEventIdentifier. > > +1 from me too for avoiding the overly general term 'event'. It does > seem a little odd to leave the enumerators names as EVENT_... though; > shouldn't these be WAIT_EVENT_... or WE_...? Or perhaps you could > consider WaitPointIdentifier and WP_SECURE_READ or > WaitEventPointIdentifier and WEP_SECURE_READ, if you buy my earlier > argument that what we are really naming here is point in the code > where we wait, not the events we're waiting for. Contrast with > LWLocks where we report the lock that you're waiting for, not the > place in the code where you're waiting for that lock. Well, WE_ if I need make a choice for something else than EVENT_. > On the other hand, if we could *accurately* report whether it's a > "Latch", "Socket" or "Latch|Socket" that we're waiting for, it might > be cool to do that instead. One way to do that would be to use up > several class IDs: WAIT_EVENT_LATCH, WAIT_EVENT_LATCH_OR_SOCKET, > WAIT_EVENT_SOCKET (or perhaps WAIT_EVENT_LATCH | WAIT_EVENT_SOCKET, > reserving 2 or 3 upper bits from the 8 bit class ID for this). Then > we could figure out the right class ID by looking at set->latch and > set->nevents (perhaps an extra boolean would be needed to record > whether postmaster death is in there so we could deduce whether there > are any sockets). It would be interesting specifically for the case > of FDWs where it would be nice to be able to see clearly that it's > waiting for a remote server ("Socket"). It may also be interesting to > know if there is a timeout. Postmaster death doesn't seem newsworthy, > we're nearly always also waiting for that exceptional event so it'd > just be clutter to report it. That's actually pretty close to what I mentioned upthread here: https://www.postgresql.org/message-id/CAB7nPqQx4OEym9cf22CY%3D5eWqqiAMjij6EBCoNReezt9-NvGkw%40mail.gmail.com In order to support that adding a column wait_event_details with text[] makes the most sense I guess. Still I think that's another discussion, this patch does already a lot. So I have adjusted the patch in many ways, tweaked the order of the items, and adjusted some of their names as suggested by Thomas. Updated version attached. -- Michael
Attachment
On Wed, Sep 21, 2016 at 3:40 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Wed, Sep 21, 2016 at 8:13 AM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: >> It looks like this array wants to be in alphabetical order, but it >> isn't quite. Also, perhaps a compile time assertion about the size of >> the array matching EVENT_LAST_TYPE could be useful? > > In GetWaitEventIdentifier()? I'd think that just returning ??? would > have been fine if there is a non-matching call. Yeah but that's at run time. I meant you could help developers discover ASAP if they add a new item to one place but not the other with a compile time assertion: const char * GetWaitEventIdentifier(uint16 eventId) { StaticAssertStmt(lengthof(WaitEventNames) == WE_LAST_TYPE+ 1, "WaitEventNames must match WaitEventIdentifiers"); if (eventId > WE_LAST_TYPE) return "???"; return WaitEventNames[eventId]; } >> +1 from me too for avoiding the overly general term 'event'. It does >> seem a little odd to leave the enumerators names as EVENT_... though; >> shouldn't these be WAIT_EVENT_... or WE_...? Or perhaps you could >> consider WaitPointIdentifier and WP_SECURE_READ or >> WaitEventPointIdentifier and WEP_SECURE_READ, if you buy my earlier >> argument that what we are really naming here is point in the code >> where we wait, not the events we're waiting for. Contrast with >> LWLocks where we report the lock that you're waiting for, not the >> place in the code where you're waiting for that lock. > > Well, WE_ if I need make a choice for something else than EVENT_. You missed a couple that are hiding inside #ifdef WIN32: From pgstat.c: + EVENT_PGSTAT_MAIN); From syslogger.c: + EVENT_SYSLOGGER_MAIN); -- Thomas Munro http://www.enterprisedb.com
On Wed, Sep 21, 2016 at 1:03 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > Yeah but that's at run time. I meant you could help developers > discover ASAP if they add a new item to one place but not the other > with a compile time assertion: > const char * > GetWaitEventIdentifier(uint16 eventId) > { > StaticAssertStmt(lengthof(WaitEventNames) == WE_LAST_TYPE + 1, > "WaitEventNames must match WaitEventIdentifiers"); > if (eventId > WE_LAST_TYPE) > return "???"; > return WaitEventNames[eventId]; > } Ah, OK, good idea. I had AssertStmt in mind, not StaticAssertStmt. > You missed a couple that are hiding inside #ifdef WIN32: > > From pgstat.c: > + EVENT_PGSTAT_MAIN); > > From syslogger.c: > + EVENT_SYSLOGGER_MAIN); Oops. Fixed those ones and checked the builds on WIN32. -- Michael
Attachment
On Tue, Sep 20, 2016 at 7:13 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > This paragraph seems a bit confused. I suggest something more like > this: "The server process is waiting for one or more sockets, a timer > or an interprocess latch. The wait happens in a WaitEventSet, > <productname>PostgreSQL</>'s portable IO multiplexing abstraction." I'm worried we're exposing an awful lot of internal detail here. Moreover, it's pretty confusing that we have this general concept of wait events in pg_stat_activity, and then here the specific type of wait event we're waiting for is the ... wait event kind. Uh, what? I have to admit that I like the individual event names quite a bit, and I think the detail will be useful to users. But I wonder if there's a better way to describe the class of events that we're talking about that's not so dependent on internal data structures. Maybe we could divide these waits into a couple of categories - e.g. "Socket", "Timeout", "Process" - and then divide these detailed wait events among those classes. The "SecureRead" and "SecureWrite" wait events are going to be confusing, because the connection isn't necessarily secure. I think those should be called "ClientRead" and "ClientWrite". Comprehensibility is more important than absolute consistency with the C function names. Another thing to think about is that there's no way to actually see wait event information for a number of the processes which this patch instruments, because they don't appear in pg_stat_activity. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Sep 21, 2016 at 10:23 PM, Robert Haas <robertmhaas@gmail.com> wrote: > I have to admit that I like the individual event names quite a bit, > and I think the detail will be useful to users. But I wonder if > there's a better way to describe the class of events that we're > talking about that's not so dependent on internal data structures. > Maybe we could divide these waits into a couple of categories - e.g. > "Socket", "Timeout", "Process" - and then divide these detailed wait > events among those classes. pgstat.h is mentioning that there is 1 byte still free. I did not notice that until a couple of minutes ago. There are 2 bytes used for the event ID, and 1 byte for the class ID, but there are 4 bytes available. Perhaps we could use this extra byte to store this extra status information, then use it for WaitEventSet to build up a string that will be stored in classId field? For example if a process is waiting on a socket and a timeout, we'd write "Socket,Timeout" as a text field. > The "SecureRead" and "SecureWrite" wait events are going to be > confusing, because the connection isn't necessarily secure. I think > those should be called "ClientRead" and "ClientWrite". > Comprehensibility is more important than absolute consistency with the > C function names. Noted. > Another thing to think about is that there's no way to actually see > wait event information for a number of the processes which this patch > instruments, because they don't appear in pg_stat_activity. We could create a new system to track the activity of system-related processes, for example pg_stat_system_activity, or pg_system_activity, and list all the processes that are not counted in max_connections... -- Michael
On Wed, Sep 21, 2016 at 10:02 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Wed, Sep 21, 2016 at 10:23 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> I have to admit that I like the individual event names quite a bit, >> and I think the detail will be useful to users. But I wonder if >> there's a better way to describe the class of events that we're >> talking about that's not so dependent on internal data structures. >> Maybe we could divide these waits into a couple of categories - e.g. >> "Socket", "Timeout", "Process" - and then divide these detailed wait >> events among those classes. > > pgstat.h is mentioning that there is 1 byte still free. I did not > notice that until a couple of minutes ago. There are 2 bytes used for > the event ID, and 1 byte for the class ID, but there are 4 bytes > available. Perhaps we could use this extra byte to store this extra > status information, then use it for WaitEventSet to build up a string > that will be stored in classId field? For example if a process is > waiting on a socket and a timeout, we'd write "Socket,Timeout" as a > text field. No, that's not what I want to do. I think we should categorize the events administratively by their main purpose, rather than technologically by what we're waiting for. >> Another thing to think about is that there's no way to actually see >> wait event information for a number of the processes which this patch >> instruments, because they don't appear in pg_stat_activity. > > We could create a new system to track the activity of system-related > processes, for example pg_stat_system_activity, or pg_system_activity, > and list all the processes that are not counted in max_connections... Yes. Or we could decide to include everything in pg_stat_activity. I think those are the two reasonable options. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Sep 21, 2016 at 11:18 PM, Robert Haas <robertmhaas@gmail.com> wrote: > No, that's not what I want to do. I think we should categorize the > events administratively by their main purpose, rather than > technologically by what we're waiting for. So we'd just have three class IDs instead of one? Well why not. -- Michael
On Wed, Sep 21, 2016 at 10:23 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Wed, Sep 21, 2016 at 11:18 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> No, that's not what I want to do. I think we should categorize the >> events administratively by their main purpose, rather than >> technologically by what we're waiting for. > > So we'd just have three class IDs instead of one? Well why not. Yeah, or, I mean, it doesn't have to be three precisely, but I'd like to try to avoid exposing the users to the fact that we have an internal data structure called a WaitEventSet and instead classify by function. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Sep 22, 2016 at 1:23 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Sep 20, 2016 at 7:13 PM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: >> This paragraph seems a bit confused. I suggest something more like >> this: "The server process is waiting for one or more sockets, a timer >> or an interprocess latch. The wait happens in a WaitEventSet, >> <productname>PostgreSQL</>'s portable IO multiplexing abstraction." > > I'm worried we're exposing an awful lot of internal detail here. Ok, maybe it's not a good idea to mention WaitEventSet in the manual. I was trying to keep the same basic structure of text as Michael proposed, but fix the bit that implied that waiting happens "in a latch", even when no latch is involved. Perhaps only the first sentence is necessary. This will all become much clearer if there is a follow-up patch that shows strings like "Socket", "Socket|Latch", "Latch" instead of a general catch-all wait_event_type of "EventSet", as discussed. > Moreover, it's pretty confusing that we have this general concept of > wait events in pg_stat_activity, and then here the specific type of > wait event we're waiting for is the ... wait event kind. Uh, what? Yeah, that is confusing. It comes about because of the coincidence that pg_stat_activity finished up with a wait_event column, and our IO multiplexing abstraction finished up with the name WaitEventSet. <stuck-record-mode>We could instead call these new things "wait points", because, well, erm, they name points in the code at which we wait. They don't name events (they just happen to use the WaitEventSet mechanism, which itself does involve events: but those events are things like "hey, this socket is now ready for writing").</> > I have to admit that I like the individual event names quite a bit, > and I think the detail will be useful to users. But I wonder if > there's a better way to describe the class of events that we're > talking about that's not so dependent on internal data structures. > Maybe we could divide these waits into a couple of categories - e.g. > "Socket", "Timeout", "Process" - and then divide these detailed wait > events among those classes. Well we already discussed a couple of different ways to get "Socket", "Latch", "Socket|Latch", ... or something like that into the wait_event_type column or new columns. Wouldn't that be better, and automatically fall out of the code rather than needing human curated categories? Although Michael suggested that that should be done as a separate patch on top of the basic feature. > The "SecureRead" and "SecureWrite" wait events are going to be > confusing, because the connection isn't necessarily secure. I think > those should be called "ClientRead" and "ClientWrite". > Comprehensibility is more important than absolute consistency with the > C function names. Devil's advocate mode: Then why not improve those function names? Keeping the wait point names systematically in sync with the actual code makes things super simple and avoids a whole decision process and discussion to create new user-friendly obfuscation every time anyone introduces a new wait point. This is fundamentally an introspection mechanism allowing expert users to shine a light on the engine and see what's going on inside it, so I don't see what's wrong with being straight up about what is actually going on and using the names for parts of our code that we already have. If that leads us to improve some function names I'm not sure why that's a bad thing. Obviously this gets complicated by the existence of static functions whose names are ambiguous and lack context, and multiple wait points in a single function. Previously I've suggested a hierarchical arrangement for these names which might help with that. Imagine names like: executor.Hash.<function> (reported by a background worker executing a hypothetical parallel hash join), executor.Hash.<function>.<something> to disambiguate multiple wait points in one function, walsender.<function> etc. That way we could have a tidy curated meaningful naming scheme based on modules, but a no-brainer systematic way to name the most numerous leaf bits of that hierarchy. Just an idea... > Another thing to think about is that there's no way to actually see > wait event information for a number of the processes which this patch > instruments, because they don't appear in pg_stat_activity. Good point. Perhaps there could be another extended view, or system process view, or some other mechanism. That could be material for a separate patch? -- Thomas Munro http://www.enterprisedb.com
On Thu, Sep 22, 2016 at 6:49 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Thu, Sep 22, 2016 at 1:23 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> Moreover, it's pretty confusing that we have this general concept of >> wait events in pg_stat_activity, and then here the specific type of >> wait event we're waiting for is the ... wait event kind. Uh, what? > > Yeah, that is confusing. It comes about because of the coincidence > that pg_stat_activity finished up with a wait_event column, and our IO > multiplexing abstraction finished up with the name WaitEventSet. > <stuck-record-mode>We could instead call these new things "wait > points", because, well, erm, they name points in the code at which we > wait. They don't name events (they just happen to use the > WaitEventSet mechanism, which itself does involve events: but those > events are things like "hey, this socket is now ready for > writing").</> What about trying first to come up with a class name generic enough that would cover the set of events we are trying to cover. One idea is to simply call the class "Process" ("Point", "Poll"), and mention in the docs that this can wait for a socket, a timeout, postmaster death or another process to tell it to move on. The idea is to come with a name generic enough. In the first patch we don't detail much what a process is waiting for at a given point, like what has been submitted does. This would still be helpful for the user, because it would be possible to look back at the code and guess where this is waiting. Then comes the interesting bits: I don't think that it is a good idea to associate only one event for a waiting point in the system views. Say that at a waiting point WaitLatch is called with WL_POSTMASTER_DEATH and WL_TIMEOUT, I'd rather let the user know that both of them have been set up and not choose just one of them using some hierarchy. But I rather think that we should list all of them depending on the WL_* flags set up by the caller. Here comes a second patch: let's use the last byte of the wait events and add this new text[] column in pg_stat_activity, say wait_details. So for a waiting point it would be possible to tell to the user that it is waiting on '{socket,timeout,postmaster_death}' for example. Then comes a third patch: addition of a system view to list all the activity happening for processes that are not dependent on max_connections. pg_stat_activity has the advantage, as Tom mentioned a couple of days ago, that one can just run count(*) on it to estimate the number of connections left. I think this is quite helpful. Or we could just add a column in pg_stat_activity to mark a process type: is it a system process or not? This deserves its own discussion for sure. Patches 2 and 3 are independent things. Patch 1 is a requirement for 2 and 3. >> Another thing to think about is that there's no way to actually see >> wait event information for a number of the processes which this patch >> instruments, because they don't appear in pg_stat_activity. > > Good point. Perhaps there could be another extended view, or system > process view, or some other mechanism. That could be material for a > separate patch? Definitely a separate patch.. -- Michael
On Wed, Sep 21, 2016 at 5:49 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: >> Moreover, it's pretty confusing that we have this general concept of >> wait events in pg_stat_activity, and then here the specific type of >> wait event we're waiting for is the ... wait event kind. Uh, what? > > Yeah, that is confusing. It comes about because of the coincidence > that pg_stat_activity finished up with a wait_event column, and our IO > multiplexing abstraction finished up with the name WaitEventSet. > <stuck-record-mode>We could instead call these new things "wait > points", because, well, erm, they name points in the code at which we > wait. They don't name events (they just happen to use the > WaitEventSet mechanism, which itself does involve events: but those > events are things like "hey, this socket is now ready for > writing").</> Sure, we could do that, but it means breaking backward compatibility for pg_stat_activity *again*. I'd be willing to do it but I don't think I'd win any popularity contests. >> I have to admit that I like the individual event names quite a bit, >> and I think the detail will be useful to users. But I wonder if >> there's a better way to describe the class of events that we're >> talking about that's not so dependent on internal data structures. >> Maybe we could divide these waits into a couple of categories - e.g. >> "Socket", "Timeout", "Process" - and then divide these detailed wait >> events among those classes. > > Well we already discussed a couple of different ways to get "Socket", > "Latch", "Socket|Latch", ... or something like that into the > wait_event_type column or new columns. Wouldn't that be better, and > automatically fall out of the code rather than needing human curated > categories? Although Michael suggested that that should be done as a > separate patch on top of the basic feature. I think making that a separate patch is just punting the decision down the field to a day that may never come. Let's try to agree on something that we can all live with and implement it now. In terms of avoiding human-curated categories, I basically see two options: 1. Find a name that is sufficiently generic that it covers everything that might reach WaitEventSetWait either now or in the future when it might wait for even more kinds of things than it does now. For example, we could call it "Stuff" or "Thing". Less tongue-in-cheek suggestions are welcome, but it's hard to come up with something that is sufficiently-generic without being tautological. "Event" is an example of a name which is general enough to encompass everything but also stupid: the column is called "wait_event" so everything that appears in it is an event by definition. 2. Classify the events that fall into this category by some rigid principle based on the types of things being awaited. For example, we could decide that if any sockets are awaited, the event class will be "Client" if it is connected to a user and "IPC" for auxiliary processes. For myself, I don't see any real problem with using humans to classify things; that's pretty much the one thing humans are much better at than computers, so we might as well take advantage of it. I think that it's useful to try to group together types of waits which the user will see as logically related to each other, even if that involves applying some human judgement that might someday lead to some discussion about what the best categorization for some new thing is. PostgreSQL is intended to be used by humans, and avoiding discussions (or even arguments) on pgsql-hackers shouldn't outrank usability on the list of concerns. So, I tried to classify these. Here's what I came up with. Activity: ArchiverMain, AutoVacuumMain, BgWriterMain, BgWriterHibernate, CheckpointerMain, PgStatMain, RecoveryWalAll, RecoveryWalStream, SysLoggerMain, WalReceiverMain, WalWriterMain Client: SecureRead, SecureWrite, SSLOpenServer, WalSenderMain, WalSenderWaitForWAL, WalSenderWriteData, WalReceiverWaitStart Timeout: BaseBackupThrottle, PgSleep, RecoveryApplyDelay IPC: BgWorkerShutdown, BgWorkerStartup, ExecuteGather, MessageQueueInternal, MessageQueuePutMessage, MessageQueueReceive, MessageQueueSend, ParallelFinish, ProcSignal, ProcSleep, SyncRep Extension: Extension I classified all of the main loop waits as waiting for activity; all of those are background processes that are waiting for something to happen and are more or less happy to sleep forever until it does. I also included the RecoveryWalAll and RecoveryWalStream events in there; those don't have the sort of "main loop" flavor of the others but they are happy to wait more or less indefinitely for something to occur. Likewise, it was pretty easy to find all of the events that were waiting for client I/O, and I grouped those all under "Client". A few of the remaining wait events seemed like they were clearly waiting for a particular timeout to expire, so I gave those their own "Timeout" category. I believe these categorizations are actually useful for users. For example, you might want to see all of the waits in the system but exclude the "Client", "Activity", and "Timeout" categories because those are things that aren't signs of a problem. A "Timeout" wait is one that you explicitly requested, a "Client" wait isn't the server's fault, and an "Activity" wait just means nothing is happening. In contrast, a "Lock" or "LWLock" or "IPC" wait shows that something is actually delaying work that we'd ideally prefer to have get done sooner. I grouped the rest of this stuff as "IPC" because all of these events are cases where one server process is waiting for another server processes . That could be further subdivided, of course: most of those events are only going to occur in relation to parallel query, but I didn't want to group it that way explicitly because both background workers and shm_mq have other potential uses. ProcSignal and ProcSleep are related to heavyweight locks and SyncRep is of course related to synchronous replication. But they're all related in that one server process is waiting for another server process to tell it that a certain state has been reached, so IPC seems like a good categorization. Finally, extensions got their own category in this taxonomy, though I wonder if it would be better to instead have Activity/ExtensionActivity, Client/ExtensionClient, Timeout/ExtensionTimeout, and IPC/ExtensionIPC instead of making it a separate toplevel category. To me, this seems like a pretty solid toplevel categorization and a lot more useful than just throwing all of these in one bucket and saying "good luck". I'm not super-attached to the details but, again, I think it's worth trying to bin things in a way that will be useful. >> The "SecureRead" and "SecureWrite" wait events are going to be >> confusing, because the connection isn't necessarily secure. I think >> those should be called "ClientRead" and "ClientWrite". >> Comprehensibility is more important than absolute consistency with the >> C function names. > > Devil's advocate mode: Then why not improve those function names? That'd be fine. > Obviously this gets complicated by the existence of static functions > whose names are ambiguous and lack context, and multiple wait points > in a single function. Previously I've suggested a hierarchical > arrangement for these names which might help with that. Imagine names > like: executor.Hash.<function> (reported by a background worker > executing a hypothetical parallel hash join), > executor.Hash.<function>.<something> to disambiguate multiple wait > points in one function, walsender.<function> etc. That way we could > have a tidy curated meaningful naming scheme based on modules, but a > no-brainer systematic way to name the most numerous leaf bits of that > hierarchy. Just an idea... Considering we only have a few dozen of those, this feels like it might be overkill to me, and I suspect we'll end up finding that it's a bit harder to make it consistent and useful than one might hope. I am basically happy with the way Michael named them, but that's not to say I could never be happy with anything else. >> Another thing to think about is that there's no way to actually see >> wait event information for a number of the processes which this patch >> instruments, because they don't appear in pg_stat_activity. > > Good point. Perhaps there could be another extended view, or system > process view, or some other mechanism. That could be material for a > separate patch? I agree. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Sep 22, 2016 at 1:52 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > Then comes the interesting bits: I don't think that it is a good idea > to associate only one event for a waiting point in the system views. > Say that at a waiting point WaitLatch is called with > WL_POSTMASTER_DEATH and WL_TIMEOUT, I'd rather let the user know that > both of them have been set up and not choose just one of them using > some hierarchy. But I rather think that we should list all of them > depending on the WL_* flags set up by the caller. Here comes a second > patch: let's use the last byte of the wait events and add this new > text[] column in pg_stat_activity, say wait_details. So for a waiting > point it would be possible to tell to the user that it is waiting on > '{socket,timeout,postmaster_death}' for example. I think this is focusing on the wrong thing. What we need here is not infinite detail on exactly what is going on under the hood but useful classification rules. For example, as I said in my email to Thomas, being able to exclude all cases of waiting for client I/O is useful because those aren't signs of a problem in the way that LWLock or Lock waits are. It's better for us to provide that classification using the existing system than for users to have to work out exactly which things ought to be excluded on the basis of specific event names. On the other hand, I submit that knowing which waits will be interrupted by the death of the postmaster and which will not doesn't add much. In fact, I think that's almost an anti-feature, because it will encourage users to care about details that are very rarely relevant. > Then comes a third patch: addition of a system view to list all the > activity happening for processes that are not dependent on > max_connections. pg_stat_activity has the advantage, as Tom mentioned > a couple of days ago, that one can just run count(*) on it to estimate > the number of connections left. I think this is quite helpful. Or we > could just add a column in pg_stat_activity to mark a process type: is > it a system process or not? This deserves its own discussion for sure. Sure. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Sep 23, 2016 at 1:49 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Sep 21, 2016 at 5:49 PM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: >>> Moreover, it's pretty confusing that we have this general concept of >>> wait events in pg_stat_activity, and then here the specific type of >>> wait event we're waiting for is the ... wait event kind. Uh, what? >> >> Yeah, that is confusing. It comes about because of the coincidence >> that pg_stat_activity finished up with a wait_event column, and our IO >> multiplexing abstraction finished up with the name WaitEventSet. >> <stuck-record-mode>We could instead call these new things "wait >> points", because, well, erm, they name points in the code at which we >> wait. They don't name events (they just happen to use the >> WaitEventSet mechanism, which itself does involve events: but those >> events are things like "hey, this socket is now ready for >> writing").</> > > Sure, we could do that, but it means breaking backward compatibility > for pg_stat_activity *again*. I'd be willing to do it but I don't > think I'd win any popularity contests. I didn't mean changing the column headings in pg_stat_activity. I meant that the enum called WaitEventIdentifier in Michael's v5 patch should be called WaitPointIdentifier, and if we go with a single name to appear in the wait_event_type column then it could be "WaitPoint". But I would also prefer to see something more informative in that column, as discussed below (and upthread). >> Well we already discussed a couple of different ways to get "Socket", >> "Latch", "Socket|Latch", ... or something like that into the >> wait_event_type column or new columns. Wouldn't that be better, and >> automatically fall out of the code rather than needing human curated >> categories? Although Michael suggested that that should be done as a >> separate patch on top of the basic feature. > > I think making that a separate patch is just punting the decision down > the field to a day that may never come. Let's try to agree on > something that we can all live with and implement it now. In terms of > avoiding human-curated categories, I basically see two options: > > 1. Find a name that is sufficiently generic that it covers everything > that might reach WaitEventSetWait either now or in the future when it > might wait for even more kinds of things than it does now. For > example, we could call it "Stuff" or "Thing". Less tongue-in-cheek > suggestions are welcome, but it's hard to come up with something that > is sufficiently-generic without being tautological. "Event" is an > example of a name which is general enough to encompass everything but > also stupid: the column is called "wait_event" so everything that > appears in it is an event by definition. > > 2. Classify the events that fall into this category by some rigid > principle based on the types of things being awaited. For example, we > could decide that if any sockets are awaited, the event class will be > "Client" if it is connected to a user and "IPC" for auxiliary > processes. > > [...] > occur. Likewise, it was pretty easy to find all of the events that > were waiting for client I/O, and I grouped those all under "Client". > A few of the remaining wait events seemed like they were clearly > waiting for a particular timeout to expire, so I gave those their own > "Timeout" category. Interesting. OK, I agree that it'd be useful to show that we're waiting because there's nothing happening, or waiting because the user asked us to sleep, or waiting on IO, or waiting for an IPC response because something is happening, and that higher level information is difficult/impossible to extract automatically from the WaitEventSet. I understand that "Activity" is the category of wait points that are waiting for activity, but I wonder if it might be clearer to users if that were called "Idle", because it's the category of idle waits caused by non-activity. Why is WalSenderMain not in that category alongside WalReceiverMain? Hmm, actually it's kind of a tricky one: whether it's really idle or waiting for IO depends. It's always ready to wait for clients to send messages, but I'd say that's part of its "idle" behaviour. But it's sometimes waiting for the socket to be writable: if (pq_is_send_pending()) wakeEvents |= WL_SOCKET_WRITABLE, and that's when it's definitely not idle, it's actively trying to feed WAL down the pipe. Do we want to get into dynamic categories depending on conditions like that? I was thinking about suggesting a category "Replication" to cover the waits for client IO relating to replication, as opposed to client IO waits relating to regular user connections. Then you could put sync rep into that category instead of IPC, even though technically it is waiting for IPC from walsender process(es), on the basis that it's more newsworthy to a DBA that it's really waiting for a remote replica to respond. But it's probably pretty clear what's going on from the the wait point names, so maybe it's not worth a category. Thoughts? WalSenderWaitForWAL is waiting for both IPC and Client, but if you have to pick one isn't IPC the main thing it's waiting for? It'll stop waiting when it gets IPC telling it about flushed location reaching some point. It's multiplexing client IO processing sort of "incidentally" while it does so. WalReceiverWaitStart is waiting for IPC, not Client. >> Obviously this gets complicated by the existence of static functions >> whose names are ambiguous and lack context, and multiple wait points >> in a single function. Previously I've suggested a hierarchical >> arrangement for these names which might help with that. Imagine names >> like: executor.Hash.<function> (reported by a background worker >> executing a hypothetical parallel hash join), >> executor.Hash.<function>.<something> to disambiguate multiple wait >> points in one function, walsender.<function> etc. That way we could >> have a tidy curated meaningful naming scheme based on modules, but a >> no-brainer systematic way to name the most numerous leaf bits of that >> hierarchy. Just an idea... > > Considering we only have a few dozen of those, this feels like it > might be overkill to me, and I suspect we'll end up finding that it's > a bit harder to make it consistent and useful than one might hope. I > am basically happy with the way Michael named them, but that's not to > say I could never be happy with anything else. Yeah, it's fine as it is. I do suspect that the set of wait points will grow quite a bit as we develop more parallel stuff though. For example, I have been working on a patch that adds several more wait points, indirectly via condition variables (using your patch). Actually in my case it's BarrierWait -> ConditionVariableWait -> WaitEventSetWait. I propose that these higher level wait primitives should support passing a wait point identifier through to WaitEventSetWait. Some of my patch's wait points are in the same function and some are in static functions, so I'll finish up making up various affixes to disambiguate, and then I might be tempted to include some separator characters... I am also aware of three other hackers who are working on patches that also make use of condition variables, so (if you agree that condition variable waits are wait points) they'll be adding more too. That said, ad-hoc name construction is fine and there is no reason we couldn't revise things if names start to look messy when concrete patches that add more names emerge. -- Thomas Munro http://www.enterprisedb.com
On Thu, Sep 22, 2016 at 7:10 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > Interesting. OK, I agree that it'd be useful to show that we're > waiting because there's nothing happening, or waiting because the user > asked us to sleep, or waiting on IO, or waiting for an IPC response > because something is happening, and that higher level information is > difficult/impossible to extract automatically from the WaitEventSet. Cool. :-) > I understand that "Activity" is the category of wait points that are > waiting for activity, but I wonder if it might be clearer to users if > that were called "Idle", because it's the category of idle waits > caused by non-activity. I thought about that but figured it would be better to consistently state the thing *for which* we were waiting. We wait FOR a client or a timeout or activity. We do not wait FOR idle; we wait to be NOT idle. > Why is WalSenderMain not in that category alongside WalReceiverMain? > Hmm, actually it's kind of a tricky one: whether it's really idle or > waiting for IO depends. It's always ready to wait for clients to send > messages, but I'd say that's part of its "idle" behaviour. But it's > sometimes waiting for the socket to be writable: if > (pq_is_send_pending()) wakeEvents |= WL_SOCKET_WRITABLE, and that's > when it's definitely not idle, it's actively trying to feed WAL down > the pipe. Do we want to get into dynamic categories depending on > conditions like that? I suspect that's overkill. I don't want wait-point-naming to make programming the system noticeably more difficult, so I think it's fine to pick a categorization of what we think the typical case will be and call it good. If we try that and people find it's a nuisance, we can fix it then. In the case of WAL sender, I assume it will normally be waiting for more WAL to be generated; whereas in the case of WAL receiver, I assume it will normally be waiting for more WAL to be received from the remote side. The reverse cases are possible: the sender could be waiting for the socket buffer to drain so it can push more WAL onto the wire, and the receiver could likewise be waiting for buffer space to push out feedback messages. But probably mostly not. At least for a first cut, I'd be inclined to handle this fuzziness by putting weasel-words in the documentation rather than by trying to make the reporting 100% perfectly accurate. > I was thinking about suggesting a category "Replication" to cover the > waits for client IO relating to replication, as opposed to client IO > waits relating to regular user connections. Then you could put sync > rep into that category instead of IPC, even though technically it is > waiting for IPC from walsender process(es), on the basis that it's > more newsworthy to a DBA that it's really waiting for a remote replica > to respond. But it's probably pretty clear what's going on from the > the wait point names, so maybe it's not worth a category. Thoughts? I thought about a replication category but either it will only have SyncRep in it, which is odd, or it will pull in other things that otherwise fit nicely into the Activity category, and then that boundaries of all the categories become mushy: is the subsystem that causes the wait that we are trying to document, or the kind of thing for which we are waiting? > I do suspect that the set of wait points will grow quite a bit as we > develop more parallel stuff though. For example, I have been working > on a patch that adds several more wait points, indirectly via > condition variables (using your patch). Actually in my case it's > BarrierWait -> ConditionVariableWait -> WaitEventSetWait. I propose > that these higher level wait primitives should support passing a wait > point identifier through to WaitEventSetWait. +1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Sep 22, 2016 at 7:19 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Sep 21, 2016 at 5:49 PM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: > > So, I tried to classify these. Here's what I came up with. > > Activity: ArchiverMain, AutoVacuumMain, BgWriterMain, > BgWriterHibernate, CheckpointerMain, PgStatMain, RecoveryWalAll, > RecoveryWalStream, SysLoggerMain, WalReceiverMain, WalWriterMain > > Client: SecureRead, SecureWrite, SSLOpenServer, WalSenderMain, > WalSenderWaitForWAL, WalSenderWriteData, WalReceiverWaitStart > > Timeout: BaseBackupThrottle, PgSleep, RecoveryApplyDelay > > IPC: BgWorkerShutdown, BgWorkerStartup, ExecuteGather, > MessageQueueInternal, MessageQueuePutMessage, MessageQueueReceive, > MessageQueueSend, ParallelFinish, ProcSignal, ProcSleep, SyncRep > > Extension: Extension > We already call lwlock waits from an extension as "extension", so I think just naming this an Extension might create some confusion. > I classified all of the main loop waits as waiting for activity; all > of those are background processes that are waiting for something to > happen and are more or less happy to sleep forever until it does. I > also included the RecoveryWalAll and RecoveryWalStream events in > there; those don't have the sort of "main loop" flavor of the others > but they are happy to wait more or less indefinitely for something to > occur. Likewise, it was pretty easy to find all of the events that > were waiting for client I/O, and I grouped those all under "Client". > A few of the remaining wait events seemed like they were clearly > waiting for a particular timeout to expire, so I gave those their own > "Timeout" category. > > I believe these categorizations are actually useful for users. For > example, you might want to see all of the waits in the system but > exclude the "Client", "Activity", and "Timeout" categories because > those are things that aren't signs of a problem. A "Timeout" wait is > one that you explicitly requested, a "Client" wait isn't the server's > fault, and an "Activity" wait just means nothing is happening. In > contrast, a "Lock" or "LWLock" or "IPC" wait shows that something is > actually delaying work that we'd ideally prefer to have get done > sooner. > > I grouped the rest of this stuff as "IPC" because all of these events > are cases where one server process is waiting for another server > processes . That could be further subdivided, of course: most of > those events are only going to occur in relation to parallel query, > but I didn't want to group it that way explicitly because both > background workers and shm_mq have other potential uses. ProcSignal > and ProcSleep are related to heavyweight locks and SyncRep is of > course related to synchronous replication. But they're all related > in that one server process is waiting for another server process to > tell it that a certain state has been reached, so IPC seems like a > good categorization. > > Finally, extensions got their own category in this taxonomy, though I > wonder if it would be better to instead have > Activity/ExtensionActivity, Client/ExtensionClient, > Timeout/ExtensionTimeout, and IPC/ExtensionIPC instead of making it a > separate toplevel category. > +1. It can avoid confusion. > To me, this seems like a pretty solid toplevel categorization and a > lot more useful than just throwing all of these in one bucket and > saying "good luck". Agreed. This categorisation is very good and can help patch author to proceed. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Fri, Sep 23, 2016 at 7:02 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Sep 22, 2016 at 7:10 PM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: > >> I was thinking about suggesting a category "Replication" to cover the >> waits for client IO relating to replication, as opposed to client IO >> waits relating to regular user connections. Then you could put sync >> rep into that category instead of IPC, even though technically it is >> waiting for IPC from walsender process(es), on the basis that it's >> more newsworthy to a DBA that it's really waiting for a remote replica >> to respond. But it's probably pretty clear what's going on from the >> the wait point names, so maybe it's not worth a category. Thoughts? > > I thought about a replication category but either it will only have > SyncRep in it, which is odd, or it will pull in other things that > otherwise fit nicely into the Activity category, and then that > boundaries of all the categories become mushy: is the subsystem that > causes the wait that we are trying to document, or the kind of thing > for which we are waiting? > I also think that it can add some confusion in defining boundaries and also might not be consistent with current way we have defined the waits, however there is some value in using subsystem which is that user can identify the bottlenecks with ease. I think this applies to both Replication and Parallel Query. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Fri, Sep 23, 2016 at 10:32 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Sep 22, 2016 at 7:10 PM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: >> I was thinking about suggesting a category "Replication" to cover the >> waits for client IO relating to replication, as opposed to client IO >> waits relating to regular user connections. Then you could put sync >> rep into that category instead of IPC, even though technically it is >> waiting for IPC from walsender process(es), on the basis that it's >> more newsworthy to a DBA that it's really waiting for a remote replica >> to respond. But it's probably pretty clear what's going on from the >> the wait point names, so maybe it's not worth a category. Thoughts? > > I thought about a replication category but either it will only have > SyncRep in it, which is odd, or it will pull in other things that > otherwise fit nicely into the Activity category, and then that > boundaries of all the categories become mushy: is the subsystem that > causes the wait that we are trying to document, or the kind of thing > for which we are waiting? Using category IPC for SyncRep looks fine to me. >> I do suspect that the set of wait points will grow quite a bit as we >> develop more parallel stuff though. For example, I have been working >> on a patch that adds several more wait points, indirectly via >> condition variables (using your patch). Actually in my case it's >> BarrierWait -> ConditionVariableWait -> WaitEventSetWait. I propose >> that these higher level wait primitives should support passing a wait >> point identifier through to WaitEventSetWait. > > +1. As much as I suspect that inclusion of pgstat.h will become more and more usual to allow more code paths to access to a given WaitClass. After digesting all the comments given, I have produced the patch attached that uses the following categories: +const char *const WaitEventNames[] = { + /* activity */ + "ArchiverMain", + "AutoVacuumMain", + "BgWriterHibernate", + "BgWriterMain", + "CheckpointerMain", + "PgStatMain", + "RecoveryWalAll", + "RecoveryWalStream", + "SysLoggerMain", + "WalReceiverMain", + "WalSenderMain", + "WalWriterMain", + /* client */ + "SecureRead", + "SecureWrite", + "SSLOpenServer", + "WalReceiverWaitStart", + "WalSenderWaitForWAL", + "WalSenderWriteData", + /* Extension */ + "Extension", + /* IPC */ + "BgWorkerShutdown", + "BgWorkerStartup", + "ExecuteGather", + "MessageQueueInternal", + "MessageQueuePutMessage", + "MessageQueueReceive", + "MessageQueueSend", + "ParallelFinish", + "ProcSignal", + "ProcSleep", + "SyncRep", + /* timeout */ + "BaseBackupThrottle", + "PgSleep", + "RecoveryApplyDelay", +}; I have moved WalSenderMain as it tracks a main loop, so it was strange to not have it in Activity. I also kept SecureRead and SecureWrite because this is referring to the functions of the same name. For the rest, I got convinced by what has been discussed and it makes things clearer. My head is not spinning anymore when I try to keep in sync the list of names and its associated enum, which is good. I have as well rewritten the docs to follow that. -- Michael
Attachment
On Thu, Sep 22, 2016 at 10:49 PM, Robert Haas <robertmhaas@gmail.com> wrote: > Finally, extensions got their own category in this taxonomy, though I > wonder if it would be better to instead have > Activity/ExtensionActivity, Client/ExtensionClient, > Timeout/ExtensionTimeout, and IPC/ExtensionIPC instead of making it a > separate toplevel category. I don't think that it is necessary to go up to that level. If you look at the latest patch, WaitLatch & friends have been extended with two arguments: classId and eventId, so extensions could just use WAIT_ACTIVITY as classId and WE_EXTENSION as eventId to do the equivalent of ExtensionActivity. -- Michael
On Sat, Sep 24, 2016 at 1:44 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > After digesting all the comments given, I have produced the patch > attached that uses the following categories: > +const char *const WaitEventNames[] = { > + /* activity */ > + "ArchiverMain", > + "AutoVacuumMain", > + "BgWriterHibernate", > + "BgWriterMain", > + "CheckpointerMain", > + "PgStatMain", > + "RecoveryWalAll", > + "RecoveryWalStream", > + "SysLoggerMain", > + "WalReceiverMain", > + "WalSenderMain", > + "WalWriterMain", > + /* client */ > + "SecureRead", > + "SecureWrite", > + "SSLOpenServer", > + "WalReceiverWaitStart", > + "WalSenderWaitForWAL", > + "WalSenderWriteData", > + /* Extension */ > + "Extension", > + /* IPC */ > + "BgWorkerShutdown", > + "BgWorkerStartup", > + "ExecuteGather", > + "MessageQueueInternal", > + "MessageQueuePutMessage", > + "MessageQueueReceive", > + "MessageQueueSend", > + "ParallelFinish", > + "ProcSignal", > + "ProcSleep", > + "SyncRep", > + /* timeout */ > + "BaseBackupThrottle", > + "PgSleep", > + "RecoveryApplyDelay", > +}; > I have moved WalSenderMain as it tracks a main loop, so it was strange > to not have it in Activity. I also kept SecureRead and SecureWrite > because this is referring to the functions of the same name. For the > rest, I got convinced by what has been discussed and it makes things > clearer. My head is not spinning anymore when I try to keep in sync > the list of names and its associated enum, which is good. I have as > well rewritten the docs to follow that. -extern int WaitEventSetWait(WaitEventSet *set, long timeout, WaitEvent *occurred_events, int nevents); -extern int WaitLatch(volatile Latch *latch, int wakeEvents, long timeout); +extern int WaitEventSetWait(WaitEventSet *set, long timeout, + WaitEvent *occurred_events, int nevents, + uint8 classId, uint16 eventId); +extern int WaitLatch(volatile Latch *latch, int wakeEvents, long timeout, + uint8 classId, uint16 eventId);extern int WaitLatchOrSocket(volatile Latch *latch, int wakeEvents, - pgsocket sock, long timeout); + pgsocket sock, long timeout, uint8 classId, + uint16 eventId); + WaitLatch(MyLatch, WL_LATCH_SET, 0, WAIT_IPC, WE_MQ_RECEIVE); If the class really is strictly implied by the WaitEventIdentifier, then do we really need to supply it everywhere when calling the various wait functions? That's going to be quite a few functions: WaitLatch, WaitLatchOrSocket, WaitEventSetWait for now, and if some more patches out there have legs then also ConditionVariableWait, BarrierWait, and possibly further kinds of wait points. And then all their call sites will have have to supply the wait class ID, even though it is implied by the other ID. Perhaps that array that currently holds the names should instead hold { classId, displayName } so we could just look it up? -- Thomas Munro http://www.enterprisedb.com
On Mon, Sep 26, 2016 at 1:46 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > If the class really is strictly implied by the WaitEventIdentifier, > then do we really need to supply it everywhere when calling the > various wait functions? That's going to be quite a few functions: > WaitLatch, WaitLatchOrSocket, WaitEventSetWait for now, and if some > more patches out there have legs then also ConditionVariableWait, > BarrierWait, and possibly further kinds of wait points. And then all > their call sites will have have to supply the wait class ID, even > though it is implied by the other ID. Perhaps that array that > currently holds the names should instead hold { classId, displayName } > so we could just look it up? I considered this reverse-engineering, but arrived at the conclusion that having a flexible API mattered more to give more flexibility to module developers. In short this avoids having extra class IDs like ActivityExtention, TimeoutExtension, etc. But perhaps that's just a matter of taste.. -- Michael
On Mon, Sep 26, 2016 at 7:07 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Mon, Sep 26, 2016 at 1:46 PM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: >> If the class really is strictly implied by the WaitEventIdentifier, >> then do we really need to supply it everywhere when calling the >> various wait functions? That's going to be quite a few functions: >> WaitLatch, WaitLatchOrSocket, WaitEventSetWait for now, and if some >> more patches out there have legs then also ConditionVariableWait, >> BarrierWait, and possibly further kinds of wait points. And then all >> their call sites will have have to supply the wait class ID, even >> though it is implied by the other ID. Perhaps that array that >> currently holds the names should instead hold { classId, displayName } >> so we could just look it up? > > I considered this reverse-engineering, but arrived at the conclusion > that having a flexible API mattered more to give more flexibility to > module developers. In short this avoids having extra class IDs like > ActivityExtention, TimeoutExtension, etc. But perhaps that's just a > matter of taste.. Ok, if they really are independent then shouldn't we take advantage of that at call sites where we might be idle but we might also be waiting for the network? For example we could do this: /* Sleep until something happens or we time out */ WaitLatchOrSocket(MyLatch, wakeEvents, MyProcPort->sock, sleeptime, pq_is_send_pending() ? WAIT_CLIENT : WAIT_ACTIVITY, WE_WAL_SENDER_MAIN); Isn't WE_WAL_SENDER_WAIT_WAL primarily WAIT_IPC, not WAIT_CLIENT? Or perhaps "pq_is_send_pending() ? WAIT_CLIENT : WAIT_IPC". Actually, I'm still not sold on "Activity" and "Client". I think "Idle" and "Network" would be better. Everybody knows intuitively what "Idle" means. "Network" is better than "Client" because it avoids confusion about user applications vs replication connections or clients vs servers. + <listitem> + <para> + <literal>Activity</>: The server process is waiting for some + activity to happen on a socket. This is mainly used system processes + in their main processing loop. <literal>wait_event</> will identify + the type of activity waited for. + </para> + </listitem> "The server process is waiting for some activity to happen on a socket." Not true: could be a latch, or both. I think what this category is really getting at is that the server process is idle. Everything in it could otherwise go in the IPC or Client categories on the basis that it's mainly waiting for a socket or a latch, but we're deciding to separate the wait points representing "idleness" and put them here. How about: "The server process is idle. This is used by system processes waiting for activity in their main processing loop. <literal>wait_event</a> will identify the specific wait point." + <listitem> + <para> + <literal>Client</>: The server process is waiting for some activity + on a socket from a client process, and that the server expects + something to happen that is independent from its internal processes. + <literal>wait_event</> will identify the type of client activity + waited for. + </para> + </listitem> Is it worth spelling out that "client process" here doesn't just mean user applications, it's also remote PostgreSQL servers doing replication? "wait_event" doesn't really identify the type of client activity waited for, it identifies the code that is waiting. + <listitem> + <para> + <literal>Extension</>: The server process is waiting for activity + in a plugin or an extension. This category is useful for plugin + and module developers to track custom waiting points. + </para> + </listitem> Plugin, extension, module? How about just "extension"? Why developers? + <listitem> + <para> + <literal>IPC</>: The server process is waiting for some activity + from an auxilliary process. <literal>wait_event</> will identify + the type of activity waited for. + </para> + </listitem> s/auxilliary/auxiliary/, but I wouldn't it be better to say something more general like "from another process in the cluster"? Background workers are not generally called auxiliary processes, and some of these wait points are waiting for those. -- Thomas Munro http://www.enterprisedb.com
On Wed, Sep 28, 2016 at 9:39 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > Ok, if they really are independent then shouldn't we take advantage of > that at call sites where we might be idle but we might also be waiting > for the network? For example we could do this: > > /* Sleep until something happens or we time out */ > WaitLatchOrSocket(MyLatch, wakeEvents, > MyProcPort->sock, sleeptime, > pq_is_send_pending() ? WAIT_CLIENT : > WAIT_ACTIVITY, > WE_WAL_SENDER_MAIN); Yes, we can do fancy things with this API. Extensions could do the same, the important thing being to have generic enough event categories (take class). > Isn't WE_WAL_SENDER_WAIT_WAL primarily WAIT_IPC, not WAIT_CLIENT? Or > perhaps "pq_is_send_pending() ? WAIT_CLIENT : WAIT_IPC". > > Actually, I'm still not sold on "Activity" and "Client". I think > "Idle" and "Network" would be better. Everybody knows intuitively > what "Idle" means. "Network" is better than "Client" because it > avoids confusion about user applications vs replication connections or > clients vs servers. Personally I am buying the argument of Robert instead. The class of an event means what it is waiting for, not in which state in it waiting for something. "Idle" means that the process *is* idle, not that it is waiting for something. > + <listitem> > + <para> > + <literal>Activity</>: The server process is waiting for some > + activity to happen on a socket. This is mainly used system processes > + in their main processing loop. <literal>wait_event</> will identify > + the type of activity waited for. > + </para> > + </listitem> > > "The server process is waiting for some activity to happen on a > socket." Not true: could be a latch, or both. > > I think what this category is really getting at is that the server > process is idle. Everything in it could otherwise go in the IPC or > Client categories on the basis that it's mainly waiting for a socket > or a latch, but we're deciding to separate the wait points > representing "idleness" and put them here. > > How about: "The server process is idle. This is used by system > processes waiting for activity in their main processing loop. > <literal>wait_event</a> will identify the specific wait point." You have a better way to word things than I do. > + <listitem> > + <para> > + <literal>Client</>: The server process is waiting for some activity > + on a socket from a client process, and that the server expects > + something to happen that is independent from its internal processes. > + <literal>wait_event</> will identify the type of client activity > + waited for. > + </para> > + </listitem> > > Is it worth spelling out that "client process" here doesn't just mean > user applications, it's also remote PostgreSQL servers doing > replication? "wait_event" doesn't really identify the type of client > activity waited for, it identifies the code that is waiting. Okay, switched to "user applications", and precised that this is a wait point that wait_event tracks. > + <listitem> > + <para> > + <literal>Extension</>: The server process is waiting for activity > + in a plugin or an extension. This category is useful for plugin > + and module developers to track custom waiting points. > + </para> > + </listitem> > > Plugin, extension, module? How about just "extension"? Why developers? Let's say "extension module", this is used in a couple of places in the docs. > + <listitem> > + <para> > + <literal>IPC</>: The server process is waiting for some activity > + from an auxilliary process. <literal>wait_event</> will identify > + the type of activity waited for. > + </para> > + </listitem> > > s/auxilliary/auxiliary/, but I wouldn't it be better to say something > more general like "from another process in the cluster"? Background > workers are not generally called auxiliary processes, and some of > these wait points are waiting for those. There was the same typo in latch.h, still "from another process" looks better. -- Michael
Attachment
On Wed, Sep 28, 2016 at 6:25 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > wait-event-set-v8.patch Ok, I'm just about ready to mark this as 'Ready for Committer'. Just a couple of things: + pgstat_report_wait_start((uint8) classId, (uint16) eventId); Unnecessary casts. + <row> + <entry morerows="5"><literal>Client</></entry> + <entry><literal>SecureRead</></entry> + <entry>Waiting to read data from a secure connection.</entry> + </row> + <row> + <entry><literal>SecureWrite</></entry> + <entry>Waiting to write data to a secure connection.</entry> + </row> I think we want to drop the word 'secure' from the description lines in this patch. Then I think we plan to make a separate patch that will rename the functions themselves and the corresponding wait points to something more generic? I'm assuming that my suggestions for making WE_WAL_SENDER_WAIT_WAL and WE_WAL_SENDER_MAIN have a dynamically chosen class ID would also be material for another patch, but it doesn't matter much because those processes won't show up yet anyway. -- Thomas Munro http://www.enterprisedb.com
On Wed, Sep 28, 2016 at 3:40 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Wed, Sep 28, 2016 at 6:25 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> wait-event-set-v8.patch > > Ok, I'm just about ready to mark this as 'Ready for Committer'. Thanks. > Just a couple of things: > + pgstat_report_wait_start((uint8) classId, (uint16) eventId); > Unnecessary casts. Right.... > + <row> > + <entry morerows="5"><literal>Client</></entry> > + <entry><literal>SecureRead</></entry> > + <entry>Waiting to read data from a secure connection.</entry> > + </row> > + <row> > + <entry><literal>SecureWrite</></entry> > + <entry>Waiting to write data to a secure connection.</entry> > + </row> > > I think we want to drop the word 'secure' from the description lines > in this patch. Then I think we plan to make a separate patch that > will rename the functions themselves and the corresponding wait points > to something more generic? Robert mentioned ClientRead/ClientWrite upthread. I still think that SecureRead/SecureWrite is better as it respects the routine name where the wait point is, and that's consistent with the rest. > I'm assuming that my suggestions for making WE_WAL_SENDER_WAIT_WAL and > WE_WAL_SENDER_MAIN have a dynamically chosen class ID would also be > material for another patch, but it doesn't matter much because those > processes won't show up yet anyway. WAL senders do show up since 8299471 because they are counted as in max_connections. That's pretty cool combined with this patch. I am sending a new patch to save 30s to the committer potentially looking at this patch. -- Michael
Attachment
On Tue, Sep 27, 2016 at 8:39 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > Ok, if they really are independent then shouldn't we take advantage of > that at call sites where we might be idle but we might also be waiting > for the network? I certainly didn't intend for them to be independent, and I don't think they should be. I think it should be a hierarchy - as it is currently. I think it's a bad idea to introduce the notational overhead of having to pass through two integers rather than one everywhere, and a worse idea to encourage people to think of the wait_event_type and wait_event are related any way other than hierarchically. > Actually, I'm still not sold on "Activity" and "Client". I think > "Idle" and "Network" would be better. Everybody knows intuitively > what "Idle" means. "Network" is better than "Client" because it > avoids confusion about user applications vs replication connections or > clients vs servers. Hmm, I could live with that, if other people like it. > s/auxilliary/auxiliary/, but I wouldn't it be better to say something > more general like "from another process in the cluster"? Background > workers are not generally called auxiliary processes, and some of > these wait points are waiting for those. Agreed; or perhaps it could even be waiting for another regular backend. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Sep 28, 2016 at 9:35 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Sep 27, 2016 at 8:39 PM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: >> Ok, if they really are independent then shouldn't we take advantage of >> that at call sites where we might be idle but we might also be waiting >> for the network? > > I certainly didn't intend for them to be independent, and I don't > think they should be. I think it should be a hierarchy - as it is > currently. I think it's a bad idea to introduce the notational > overhead of having to pass through two integers rather than one > everywhere, and a worse idea to encourage people to think of the > wait_event_type and wait_event are related any way other than > hierarchically. So should I change back the patch to have only one argument for the eventId, and guess classId from it? -- Michael
On Wed, Sep 28, 2016 at 8:38 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Wed, Sep 28, 2016 at 9:35 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Tue, Sep 27, 2016 at 8:39 PM, Thomas Munro >> <thomas.munro@enterprisedb.com> wrote: >>> Ok, if they really are independent then shouldn't we take advantage of >>> that at call sites where we might be idle but we might also be waiting >>> for the network? >> >> I certainly didn't intend for them to be independent, and I don't >> think they should be. I think it should be a hierarchy - as it is >> currently. I think it's a bad idea to introduce the notational >> overhead of having to pass through two integers rather than one >> everywhere, and a worse idea to encourage people to think of the >> wait_event_type and wait_event are related any way other than >> hierarchically. > > So should I change back the patch to have only one argument for the > eventId, and guess classId from it? Why would you need to guess? But, yes, I think one argument is much preferable. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
PostgreSQL Friends:
Trying to build 9.6RC1, with Python3.4, on OpenIndiana (Illumos). It seems the detection of shared library status of the .so has changed. This appears to be related to a different(?) elucidation of python configuration.
A 'hardwired' change to the configure script to trap platform 'solaris' will work, but this seems the inappropriate approach.
Would be happy to work through this here - I'd like to make it a small 'contribution'.
Clipped from configure script:
- { $as_echo "$as_me:${as_lineno-$LINENO}: checking how to link an embedded Python application" >&5
- $as_echo_n "checking how to link an embedded Python application... " >&6; }
- python_libdir=`${PYTHON} -c "import distutils.sysconfig; print(' '.join(filter(None,distutils.sysconfig.get_config_vars('LIBDIR'))))"`
- python_ldlibrary=`${PYTHON} -c "import distutils.sysconfig; print(' '.join(filter(None,distutils.sysconfig.get_config_vars('LDLIBRARY'))))"`
- python_so=`${PYTHON} -c "import distutils.sysconfig; print(' '.join(filter(None,distutils.sysconfig.get_config_vars('SO'))))"`
- echo "-----"
- echo "----- LOU MOD: python_so: $python_so"
- echo "-----"
- configure finds '.so' on Python2.7
- configure finds '.cpython-34m.so' on Python3.4
- ------------- LATER in the config script, the following 'hardwired' change will 'fix' this, but is likely not the right approach:
- (our Python _is_ built as a shared lib. So, this is wonky, but it will work: )
- # We need libpython as a shared library. With Python >=2.5, we
- # check the Py_ENABLE_SHARED setting. On Debian, the setting is not
- # correct before the jessie release (http://bugs.debian.org/695979).
- # We also want to support older Python versions. So as a fallback
- # we see if there is a file that is named like a shared library.
- if test "$python_enable_shared" != 1; then
- if test "$PORTNAME" = darwin; then
- # macOS does supply a .dylib even though Py_ENABLE_SHARED does
- # not get set. The file detection logic below doesn't succeed
- # on older macOS versions, so make it explicit.
- python_enable_shared=1
- elif test "$PORTNAME" = win32; then
- # Windows also needs an explicit override.
- python_enable_shared=1
- # ----- MOD BY LOU: ----------------------------------------
- elif test "$PORTNAME" = solaris; then
- # Solaris explicit override.
- python_enable_shared=1
- else
- # We don't know the platform shared library extension here yet,
- # so we try some candidates.
- for dlsuffix in .so .sl; do
- if ls "$python_libdir"/libpython*${dlsuffix}* >/dev/null 2>&1; then
- python_enable_shared=1
- break
- fi
- done
- fi
- fi
Lou Picciano <loupicciano@comcast.net> writes: > Trying to build 9.6RC1, with Python3.4, on OpenIndiana (Illumos). It > seems the detection of shared library status of the .so has changed. Changed from what? I don't recall that we've touched that code in quite some time. What was the last version that worked for you? What exactly is failing? I'm having a hard time following your not-really-a-patch, but it looks like you're proposing forcing python_enable_shared=1 on Solaris, which sounds like a pretty bad idea. AFAIK the shared-ness of libpython is up to whoever built it. regards, tom lane
> From: "Tom Lane" <tgl@sss.pgh.pa.us>
> To: "Lou Picciano" <loupicciano@comcast.net>
> Cc: pgsql-hackers@postgresql.org
> Sent: Wednesday, September 28, 2016 9:33:06 AM
> Subject: Re: [HACKERS] Python3.4 detection on 9.6 configuration
> Lou Picciano <loupicciano@comcast.net> writes:
> > Trying to build 9.6RC1, with Python3.4, on OpenIndiana (Illumos). It
> > seems the detection of shared library status of the .so has changed.
> To: "Lou Picciano" <loupicciano@comcast.net>
> Cc: pgsql-hackers@postgresql.org
> Sent: Wednesday, September 28, 2016 9:33:06 AM
> Subject: Re: [HACKERS] Python3.4 detection on 9.6 configuration
> > Trying to build 9.6RC1, with Python3.4, on OpenIndiana (Illumos). It
> > seems the detection of shared library status of the .so has changed.
Specifically, configure is not finding the .so. It's not that the so isn't built 'shared'; it is.
> Changed from what? I don't recall that we've touched that code in quite
> some time. What was the last version that worked for you? What
> exactly is failing?
> some time. What was the last version that worked for you? What
> exactly is failing?
Core bit seems to be the python3.4-config behavior:
/usr/bin/python3.4-config --extension-suffix
.cpython-34m.so
I don't know if this is designed behavior for Python3.4 - or if it's a bug there? I'm working this with the Python gang as well.
Of course, this option doesn't exist under Python2.7.
> like you're proposing forcing python_enable_shared=1 on Solaris,
Certainly not! I was simply offering this as evidence that PostgreSQL will build just fine, against Python3.4, using this hack. (It's useful in getting us a working build in situ o continue other testing - even before the more elegant fix - whatever that turns out to be!)
> which sounds like a pretty bad idea. AFAIK the shared-ness of libpython is
> up to whoever built it.
> up to whoever built it.
Indeed. As I mentioned, our Python3.4 is built shared.
> regards, tom lane
On Wed, Sep 28, 2016 at 9:45 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Sep 28, 2016 at 8:38 AM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> So should I change back the patch to have only one argument for the >> eventId, and guess classId from it? > > Why would you need to guess? Incorrect wording from me perhaps? i just meant that processing needs to know what is the classId coming for a specific eventId. > But, yes, I think one argument is much preferable. OK. Here is the wanted patch. I have reduced the routines of WaitLatch & friends to use only one argument, and added what is the classId associated with a given eventId in an array of multiple fields, giving something like that: + const struct wait_event_entry WaitEventEntries[] = { + /* Activity */ + {WAIT_ACTIVITY, "ArchiverMain"}, [...] I have cleaned up as well the inclusions of pgstat.h that I added previously. Patch is attached. -- Michael
Attachment
On Wed, Sep 28, 2016 at 9:40 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Wed, Sep 28, 2016 at 9:45 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Wed, Sep 28, 2016 at 8:38 AM, Michael Paquier >> <michael.paquier@gmail.com> wrote: >>> So should I change back the patch to have only one argument for the >>> eventId, and guess classId from it? >> >> Why would you need to guess? > > Incorrect wording from me perhaps? i just meant that processing needs > to know what is the classId coming for a specific eventId. > >> But, yes, I think one argument is much preferable. > > OK. Here is the wanted patch. I have reduced the routines of WaitLatch > & friends to use only one argument, and added what is the classId > associated with a given eventId in an array of multiple fields, giving > something like that: > + const struct wait_event_entry WaitEventEntries[] = { > + /* Activity */ > + {WAIT_ACTIVITY, "ArchiverMain"}, > [...] > > I have cleaned up as well the inclusions of pgstat.h that I added > previously. Patch is attached. It seems to me that you haven't tested this patch very carefully, because as far as I can see it breaks wait event reporting for both heavyweight locks and buffer pins - or in other words two out of the three existing cases covered by the mechanism you are patching. The heavyweight lock portion is broken because WaitOnLock() reports a Lock wait before calling ProcSleep(), which now clobbers it. Instead of reporting that we're waiting for Lock/relation, a LOCK TABLE statement that hangs now reports IPC/ProcSleep. Similarly, a conflict over a tuple lock is now also reported as IPC/ProcSleep, and ditto for any other kind of lock, which were all reported separately before. Obviously, that's no good. I think it would be just fine to push down setting the wait event into ProcSleep(), doing it when we actually WaitLatch() or ResolveRecoveryConflictWithLock(), but we ought to report that we're waiting for the heavyweight lock, not ProcSleep(). As for buffer pins, note that LockBufferForCleanup() calls ProcWaitForSignal(), which now overwrites the wait event set in by its caller. I think we could just make ProcWaitForSignal() take a wait event as an argument; then LockBufferForCleanup() could pass an appropriate constant and other callers of ProcWaitForSignal() could pass their own wait events. The way that we're constructing the wait event ID that ultimately gets advertised in pg_stat_activity is a bit silly in this patch. We start with an event ID (which is an integer) and we then do an array lookup (via GetWaitEventClass) to get a second integer which is then XOR'd back into the first integer (via pgstat_report_wait_start) before storing it in the PGPROC. New idea: let's change pgstat_report_wait_start() to take a single integer argument which it stores without interpretation. Let's separate the construction of the wait event into a separate macro, say make_wait_event(). Then existing callers of pgstat_report_wait_start(a, b) will instead do pgstat_report_wait_start(make_wait_event(a, b)), but callers that want to construct the wait event and push it through a few levels of the call stack before advertising it only need to pass one integer. If done correctly, this should be cleaner and faster than what you've got right now. I am not a huge fan of the "WE_" prefix you've used. It's not terrible, but "we" doesn't obviously stand for "wait event" and it's also a word itself. I think a little more verbosity would add clarity. Maybe we could go with WAIT_EVENT_ instead. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Sep 30, 2016 at 1:48 AM, Robert Haas <robertmhaas@gmail.com> wrote: > It seems to me that you haven't tested this patch very carefully, > because as far as I can see it breaks wait event reporting for both > heavyweight locks and buffer pins - or in other words two out of the > three existing cases covered by the mechanism you are patching. Oops. The WaitLatch calls overlap the other things if another event is reported. > The heavyweight lock portion is broken because WaitOnLock() reports a > Lock wait before calling ProcSleep(), which now clobbers it. Instead > of reporting that we're waiting for Lock/relation, a LOCK TABLE > statement that hangs now reports IPC/ProcSleep. Similarly, a conflict > over a tuple lock is now also reported as IPC/ProcSleep, and ditto for > any other kind of lock, which were all reported separately before. > Obviously, that's no good. I think it would be just fine to push down > setting the wait event into ProcSleep(), doing it when we actually > WaitLatch() or ResolveRecoveryConflictWithLock(), but we ought to > report that we're waiting for the heavyweight lock, not ProcSleep(). Somewhat similar problem with ResolveRecoveryConflictWithBufferPin(), per this reasoning what we should wait for here is a buffer pin and not a IPC/WaitForSignal. > As for buffer pins, note that LockBufferForCleanup() calls > ProcWaitForSignal(), which now overwrites the wait event set in by its > caller. I think we could just make ProcWaitForSignal() take a wait > event as an argument; then LockBufferForCleanup() could pass an > appropriate constant and other callers of ProcWaitForSignal() could > pass their own wait events. Agreed. So changed the patch this way. > The way that we're constructing the wait event ID that ultimately gets > advertised in pg_stat_activity is a bit silly in this patch. We start > with an event ID (which is an integer) and we then do an array lookup > (via GetWaitEventClass) to get a second integer which is then XOR'd > back into the first integer (via pgstat_report_wait_start) before > storing it in the PGPROC. New idea: let's change > pgstat_report_wait_start() to take a single integer argument which it > stores without interpretation. Let's separate the construction of the > wait event into a separate macro, say make_wait_event(). Then > existing callers of pgstat_report_wait_start(a, b) will instead do > pgstat_report_wait_start(make_wait_event(a, b)), but callers that want > to construct the wait event and push it through a few levels of the > call stack before advertising it only need to pass one integer. If > done correctly, this should be cleaner and faster than what you've got > right now. Hm, OK.... So ProcSleep() and ProcWaitForSignal() need an additional argument in the shape of a uint32 wait_event. So we need to change the shape of WaitLatch&friends to have also just a uint32 as an extra argument. This has as result to force all the callers of WaitLatch to use the new routine pgstat_make_wait_event() (renamed it), so pgstat.h needs to be included where WaitLatch() is called. And this has as consequence to make the addition of classId in WaitEventEntries completely useless, because including them has the advantage to reduce the places where pgstat.h is included, but as make_wait_event is needed to the wait_event value... > I am not a huge fan of the "WE_" prefix you've used. It's not > terrible, but "we" doesn't obviously stand for "wait event" and it's > also a word itself. I think a little more verbosity would add > clarity. Maybe we could go with WAIT_EVENT_ instead. OK. Switched that back. Hopefully you find the result cleaner. -- Michael
Attachment
On Fri, Sep 30, 2016 at 5:47 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Fri, Sep 30, 2016 at 1:48 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> The way that we're constructing the wait event ID that ultimately gets >> advertised in pg_stat_activity is a bit silly in this patch. We start >> with an event ID (which is an integer) and we then do an array lookup >> (via GetWaitEventClass) to get a second integer which is then XOR'd >> back into the first integer (via pgstat_report_wait_start) before >> storing it in the PGPROC. New idea: let's change >> pgstat_report_wait_start() to take a single integer argument which it >> stores without interpretation. Let's separate the construction of the >> wait event into a separate macro, say make_wait_event(). Then >> existing callers of pgstat_report_wait_start(a, b) will instead do >> pgstat_report_wait_start(make_wait_event(a, b)), but callers that want >> to construct the wait event and push it through a few levels of the >> call stack before advertising it only need to pass one integer. If >> done correctly, this should be cleaner and faster than what you've got >> right now. > > Hm, OK.... So ProcSleep() and ProcWaitForSignal() need an additional > argument in the shape of a uint32 wait_event. So we need to change the > shape of WaitLatch&friends to have also just a uint32 as an extra > argument. This has as result to force all the callers of WaitLatch to > use the new routine pgstat_make_wait_event() (renamed it), so pgstat.h > needs to be included where WaitLatch() is called. Hmm. I like the use of pgstat in that name. It helps with the confusion created by the overloading of the term 'wait event' in the pg_stat_activity view and the WaitEventSet API, by putting it into a different pseudo-namespace. + uint32 wait_event; How about a typedef for that instead of using raw uint32 everywhere? Something like pgstat_wait_descriptor? Then a variable name like pgstat_desc, since this is most definitely not just a wait_event anymore. + /* Define event to wait for */ It's not defining the event to wait for at all, it's building a description for pgstat. + wait_event = pgstat_make_wait_event(WAIT_EXTENSION, + WAIT_EVENT_EXTENSION); It's not making a wait event, it's combining a class and an event. How about something like this: pgstat_desc = pgstat_make_wait_descriptor(WAIT_CLASS_EXTENSION, WAIT_EVENT_EXTENSION)? /* Sleep until there's something to do */ wc = WaitLatchOrSocket(MyLatch, WL_LATCH_SET | WL_SOCKET_READABLE, PQsocket(conn), - -1L); + -1L, + wait_event); ... then use 'pgstat_desc' here. -- Thomas Munro http://www.enterprisedb.com
On Mon, Oct 3, 2016 at 12:35 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > Hmm. I like the use of pgstat in that name. It helps with the > confusion created by the overloading of the term 'wait event' in the > pg_stat_activity view and the WaitEventSet API, by putting it into a > different pseudo-namespace. > > + uint32 wait_event; > > How about a typedef for that instead of using raw uint32 everywhere? > Something like pgstat_wait_descriptor? Then a variable name like > pgstat_desc, since this is most definitely not just a wait_event > anymore. We cannot do that because of latch.h, which now only includes <signal.h> and it would be a bad idea to add more dependencies to PG-specific headers. > + /* Define event to wait for */ > > It's not defining the event to wait for at all, it's building a > description for pgstat. > > + wait_event = pgstat_make_wait_event(WAIT_EXTENSION, > + WAIT_EVENT_EXTENSION); > > It's not making a wait event, it's combining a class and an event. > How about something like this: > > pgstat_desc = pgstat_make_wait_descriptor(WAIT_CLASS_EXTENSION, > WAIT_EVENT_EXTENSION)? Maybe I sound stupid here, but I am trying to keep the same of this macro short so I'd go for pgstat_make_wait_desc(). > /* Sleep until there's something to do */ > wc = WaitLatchOrSocket(MyLatch, > WL_LATCH_SET | WL_SOCKET_READABLE, > PQsocket(conn), > - -1L); > + -1L, > + wait_event); > > ... then use 'pgstat_desc' here. For this one I agree, your naming is better. It is kind of good to let callers of WaitLatch know where this is actually used. I have added as well comments on top of WaitLatch & friends to mention what pgstat_desc does, that's useful for the user. -- Michael
Attachment
On Mon, Oct 3, 2016 at 3:30 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > [ new patch ] I think this is unnecessarily awkward for callers; the attached version takes a different approach which I think will be more convenient. The attached version also (1) moves a lot more of the logic from latch.c/h to pgstat.c/h, which I think is more appropriate; (2) more thoroughly separates the wait events by class; (3) renames SecureRead/SecureWrite to ClientRead/ClientWrite (whether to also rename the C functions is an interesting question, but not the most pressing one IMHO), (4) creates a real wait event for GetSafeSnapshot and removes the unnecessary and overly generic ProcSleep and ProcSignal wait events, and (5) incorporates a bit of copy editing. I've tested that this seems to work in basic cases, but more testing is surely welcome. If there are no major objections, I will commit this version. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Tue, Oct 4, 2016 at 1:55 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Oct 3, 2016 at 3:30 AM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> [ new patch ] > > I think this is unnecessarily awkward for callers; the attached > version takes a different approach which I think will be more > convenient. The attached version also (1) moves a lot more of the > logic from latch.c/h to pgstat.c/h, which I think is more appropriate; > (2) more thoroughly separates the wait events by class; (3) renames > SecureRead/SecureWrite to ClientRead/ClientWrite (whether to also > rename the C functions is an interesting question, but not the most > pressing one IMHO), (4) creates a real wait event for GetSafeSnapshot > and removes the unnecessary and overly generic ProcSleep and > ProcSignal wait events, and (5) incorporates a bit of copy editing. OK with that. > I've tested that this seems to work in basic cases, but more testing > is surely welcome. If there are no major objections, I will commit > this version. In pgstat_get_wait_event_type you are forgetting WAIT_IPC. + <row> + <entry morerows="10"><literal>IPC</></entry> + <entry><literal>BgWorkerShutdown</></entry> + <entry>Waiting for background worker to shut down.</entry> + </row> Here this should be morerows=9. You removed two entries, and added one with SafeSnapshot. The rest looks good to me. Thanks for the feedback and the time! -- Michael
Attachment
On Mon, Oct 3, 2016 at 8:43 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > The rest looks good to me. Thanks for the feedback and the time! Thanks for the fixes. I committed this with an additional compile fix, but the buildfarm turned up a few more problems that my 'make check-world' didn't find. Hopefully those are fixed now, but we'll see. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Oct 5, 2016 at 4:59 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Oct 3, 2016 at 8:43 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> The rest looks good to me. Thanks for the feedback and the time! > > Thanks for the fixes. I committed this with an additional compile > fix, but the buildfarm turned up a few more problems that my 'make > check-world' didn't find. Hopefully those are fixed now, but we'll > see. Nitpicking: the includes in bgworker.c weren't sorted properly, and then this patch added "pgstat.h" in the wrong position. See attached suggestion. -- Thomas Munro http://www.enterprisedb.com
Attachment
On Wed, Oct 5, 2016 at 4:28 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Wed, Oct 5, 2016 at 4:59 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Mon, Oct 3, 2016 at 8:43 PM, Michael Paquier >> <michael.paquier@gmail.com> wrote: >>> The rest looks good to me. Thanks for the feedback and the time! >> >> Thanks for the fixes. I committed this... Yeh! >> ... with an additional compile >> fix, but the buildfarm turned up a few more problems that my 'make >> check-world' didn't find. Hopefully those are fixed now, but we'll >> see. I saw that after waking up... As usual the buildfarm is catching up many of the things I missed.. > Nitpicking: the includes in bgworker.c weren't sorted properly, and > then this patch added "pgstat.h" in the wrong position. See attached > suggestion. Yes, that should be fixed. And for the rest, sorry for the delay. Timezones... More seriously, the Windows animals have been complaining about pg_sleep() crashing the system: SELECT pg_sleep(0.1); ! server closed the connection unexpectedly ! This probably means the server terminated abnormally ! before or while processing the request. ! connection to server was lost And I think that the answer to this crash is in WaitForSingleObject(), where the macro WAIT_TIMEOUT is already defined, so there is an overlap with the new declarations in pgstat.h: https://msdn.microsoft.com/en-us/library/aa450988.aspx This is also generating a bunch of build warnings now that I compile HEAD on Windows. Regression tests are not crashing here, but I am getting a failure in stats.sql and pg_sleep is broken. I swear I tested that at some point and did not see a crash or those warnings... But well what's done is done. It seems to me that a correct answer would be to rename this class ID. But instead I'd suggest to append the prefix PG_* to all the class events like in the attached, that passes make-check, contrib-check, modules-check and builds without warnings on Windows. A more simple fix would be just to rename WAIT_TIMEOUT to something else but appending PG_ looks better in the long term. -- Michael
Attachment
On Tue, Oct 4, 2016 at 3:28 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > Nitpicking: the includes in bgworker.c weren't sorted properly, and > then this patch added "pgstat.h" in the wrong position. See attached > suggestion. Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Oct 4, 2016 at 5:07 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > More seriously, the Windows animals have been complaining about > pg_sleep() crashing the system: > SELECT pg_sleep(0.1); > ! server closed the connection unexpectedly > ! This probably means the server terminated abnormally > ! before or while processing the request. > ! connection to server was lost > > And I think that the answer to this crash is in WaitForSingleObject(), > where the macro WAIT_TIMEOUT is already defined, so there is an > overlap with the new declarations in pgstat.h: > https://msdn.microsoft.com/en-us/library/aa450988.aspx > This is also generating a bunch of build warnings now that I compile > HEAD on Windows. Regression tests are not crashing here, but I am > getting a failure in stats.sql and pg_sleep is broken. I swear I > tested that at some point and did not see a crash or those warnings... > But well what's done is done. > > It seems to me that a correct answer would be to rename this class ID. > But instead I'd suggest to append the prefix PG_* to all the class > events like in the attached, that passes make-check, contrib-check, > modules-check and builds without warnings on Windows. A more simple > fix would be just to rename WAIT_TIMEOUT to something else but > appending PG_ looks better in the long term. Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Oct 5, 2016 at 9:25 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Oct 4, 2016 at 5:07 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> More seriously, the Windows animals have been complaining about >> pg_sleep() crashing the system: >> SELECT pg_sleep(0.1); >> ! server closed the connection unexpectedly >> ! This probably means the server terminated abnormally >> ! before or while processing the request. >> ! connection to server was lost >> >> And I think that the answer to this crash is in WaitForSingleObject(), >> where the macro WAIT_TIMEOUT is already defined, so there is an >> overlap with the new declarations in pgstat.h: >> https://msdn.microsoft.com/en-us/library/aa450988.aspx >> This is also generating a bunch of build warnings now that I compile >> HEAD on Windows. Regression tests are not crashing here, but I am >> getting a failure in stats.sql and pg_sleep is broken. I swear I >> tested that at some point and did not see a crash or those warnings... >> But well what's done is done. >> >> It seems to me that a correct answer would be to rename this class ID. >> But instead I'd suggest to append the prefix PG_* to all the class >> events like in the attached, that passes make-check, contrib-check, >> modules-check and builds without warnings on Windows. A more simple >> fix would be just to rename WAIT_TIMEOUT to something else but >> appending PG_ looks better in the long term. > > Committed. Thanks. -- Michael