Thread: pg_stat_lwlocks view - lwlocks statistics
Hi all, I've been working on a new system view, pg_stat_lwlocks, to observe LWLock, and just completed my 'proof-of-concept' code that can work with version 9.1. Now, I'd like to know the possibility of this feature for future release. With this patch, DBA can easily determine a bottleneck around lwlocks. -------------------------------------------------- postgres=# SELECT * FROM pg_stat_lwlocks ORDER BY time_ms DESC LIMIT 10; lwlockid | calls | waits | time_ms ----------+--------+-------+--------- 49 | 193326 | 32096 | 23688 8 | 3305 | 133 | 1335 2 | 21 | 0 | 0 4 | 135188 | 0 | 0 5 | 57935 | 0 | 0 6 | 141 | 0 | 0 7 | 24580 | 1 | 0 3 | 3282 | 0 | 0 1 | 41 | 0 | 0 9 | 3 | 0 | 0 (10 rows) postgres=# -------------------------------------------------- In this view, 'lwlockid' column represents LWLockId used in the backends. 'calls' represents how many times LWLockAcquire() was called. 'waits' represents how many times LWLockAcquire() needed to wait within it before lock acquisition. 'time_ms' represents how long LWLockAcquire() totally waited on a lwlock. And lwlocks that use a LWLockId range, such as BufMappingLock or LockMgrLock, would be grouped and summed up in a single record. For example, lwlockid 49 in the above view represents LockMgrLock statistics. Now, I know there are some considerations. (1) Performance I've measured LWLock performance both with and without the patch, and confirmed that this patch does not affect the LWLock perfomance at all. pgbench scores with the patch: tps = 900.906658 (excluding connections establishing) tps = 908.528422 (excluding connections establishing) tps = 903.900977 (excluding connections establishing) tps = 910.470595 (excluding connections establishing) tps = 909.685396 (excluding connections establishing) pgbench scores without the patch: tps = 909.096785 (excluding connections establishing) tps = 894.868712 (excluding connections establishing) tps = 910.074669 (excluding connections establishing) tps = 904.022770 (excluding connections establishing) tps = 895.673830 (excluding connections establishing) Of course, this experiment was not I/O bound, and the cache hit ratio was >99.9%. (2) Memory space In this patch, I added three new members to LWLock structure as uint64 to collect statistics. It means that those members must be held in the shared memory, but I'm not sure whether it's appropriate. I think another possible option is holding those statistics values in local (backend) process memory, and send them through the stat collector process (like other statistics values). (3) LWLock names (or labels) Now, pg_stat_lwlocks view shows LWLockId itself. But LWLockId is not easy for DBA to determine actual lock type. So, I want to show LWLock names (or labels), like 'WALWriteLock' or 'LockMgrLock', but how should I implement it? Any comments? Regards, -- Satoshi Nagayasu <snaga@uptime.jp> Uptime Technologies, LLC. http://www.uptime.jp
Attachment
On 6/25/12 1:29 PM, Satoshi Nagayasu wrote: > (1) Performance > > I've measured LWLock performance both with and without the patch, > and confirmed that this patch does not affect the LWLock perfomance > at all. This would be my main concern with this patch; it's hard for me to imagine that it has no performance impact *at all*, since trace_lwlocks has quite a noticable one in my experience. However, the answer to that is to submit the patch and let people test. I will remark that it would be far more useful to me if we could also track lwlocks per session. Overall counts are somewhat useful, but more granular counts are even more useful. What period of time does the table cover? Since last reset? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
2012/06/26 6:44, Josh Berkus wrote: > On 6/25/12 1:29 PM, Satoshi Nagayasu wrote: >> (1) Performance >> >> I've measured LWLock performance both with and without the patch, >> and confirmed that this patch does not affect the LWLock perfomance >> at all. > > This would be my main concern with this patch; it's hard for me to > imagine that it has no performance impact *at all*, since trace_lwlocks > has quite a noticable one in my experience. However, the answer to that > is to submit the patch and let people test. Thanks. I will submit the patch to the CommitFest page with some fixes to be able to work with the latest PostgreSQL on Git. > I will remark that it would be far more useful to me if we could also > track lwlocks per session. Overall counts are somewhat useful, but more > granular counts are even more useful. What period of time does the > table cover? Since last reset? Yes. it has not yet been implemented yet since this code is just a PoC one, but it is another design issue which needs to be discussed. To implement it, a new array can be added in the local process memory to hold lwlock statistics, and update counters both in the shared memory and the local process memory at once. Then, the session can retrieve 'per-session' statistics from the local process memory via some dedicated function. Does it make sense? Any comments? Regards, -- Satoshi Nagayasu <snaga@uptime.jp> Uptime Technologies, LLC. http://www.uptime.jp
2012/06/26 7:04, Kevin Grittner wrote: > Josh Berkus<josh@agliodbs.com> wrote: >> On 6/25/12 1:29 PM, Satoshi Nagayasu wrote: >>> (1) Performance >>> >>> I've measured LWLock performance both with and without the >>> patch, and confirmed that this patch does not affect the LWLock >>> perfomance at all. >> >> This would be my main concern with this patch; it's hard for me to >> imagine that it has no performance impact *at all*, since >> trace_lwlocks has quite a noticable one in my experience. >> However, the answer to that is to submit the patch and let people >> test. > > I think overhead is going to depend quite a bit on the > gettimeofday() implementation, since that is called twice per lock > wait. Yes. It's one of my concerns, and what I actually want hackers to test. > It looked to me like there was nothing to prevent concurrent updates > of the counts while gathering the accumulated values for display. > Won't this be a problem on 32-bit builds? Actually, I'd like to know how I can improve my code in a 32bit box. However, unfortunately I don't have any 32bit (physical) box now, so I want someone to test it if it needs to be tested. > Please add this to the Open COmmitFest for a proper review: > > https://commitfest.postgresql.org/action/commitfest_view/open Will submit soon. Thanks. > > -Kevin > Regards, -- Satoshi Nagayasu <snaga@uptime.jp> Uptime Technologies, LLC. http://www.uptime.jp
Hi all, I've modified the pg_stat_lwlocks patch to be able to work with the latest PostgreSQL Git code. This patch provides: pg_stat_lwlocks New system view to show lwlock statistics. pg_stat_get_lwlocks() New function to retrieve lwlock statistics. pg_stat_reset_lwlocks() New function to reset lwlock statistics. Please try it out. Regards, 2012/06/26 5:29, Satoshi Nagayasu wrote: > Hi all, > > I've been working on a new system view, pg_stat_lwlocks, to observe > LWLock, and just completed my 'proof-of-concept' code that can work > with version 9.1. > > Now, I'd like to know the possibility of this feature for future > release. > > With this patch, DBA can easily determine a bottleneck around lwlocks. > -------------------------------------------------- > postgres=# SELECT * FROM pg_stat_lwlocks ORDER BY time_ms DESC LIMIT 10; > lwlockid | calls | waits | time_ms > ----------+--------+-------+--------- > 49 | 193326 | 32096 | 23688 > 8 | 3305 | 133 | 1335 > 2 | 21 | 0 | 0 > 4 | 135188 | 0 | 0 > 5 | 57935 | 0 | 0 > 6 | 141 | 0 | 0 > 7 | 24580 | 1 | 0 > 3 | 3282 | 0 | 0 > 1 | 41 | 0 | 0 > 9 | 3 | 0 | 0 > (10 rows) > > postgres=# > -------------------------------------------------- > > In this view, > 'lwlockid' column represents LWLockId used in the backends. > 'calls' represents how many times LWLockAcquire() was called. > 'waits' represents how many times LWLockAcquire() needed to wait > within it before lock acquisition. > 'time_ms' represents how long LWLockAcquire() totally waited on > a lwlock. > > And lwlocks that use a LWLockId range, such as BufMappingLock or > LockMgrLock, would be grouped and summed up in a single record. > For example, lwlockid 49 in the above view represents LockMgrLock > statistics. > > Now, I know there are some considerations. > > (1) Performance > > I've measured LWLock performance both with and without the patch, > and confirmed that this patch does not affect the LWLock perfomance > at all. > > pgbench scores with the patch: > tps = 900.906658 (excluding connections establishing) > tps = 908.528422 (excluding connections establishing) > tps = 903.900977 (excluding connections establishing) > tps = 910.470595 (excluding connections establishing) > tps = 909.685396 (excluding connections establishing) > > pgbench scores without the patch: > tps = 909.096785 (excluding connections establishing) > tps = 894.868712 (excluding connections establishing) > tps = 910.074669 (excluding connections establishing) > tps = 904.022770 (excluding connections establishing) > tps = 895.673830 (excluding connections establishing) > > Of course, this experiment was not I/O bound, and the cache hit ratio > was>99.9%. > > (2) Memory space > > In this patch, I added three new members to LWLock structure > as uint64 to collect statistics. > > It means that those members must be held in the shared memory, > but I'm not sure whether it's appropriate. > > I think another possible option is holding those statistics > values in local (backend) process memory, and send them through > the stat collector process (like other statistics values). > > (3) LWLock names (or labels) > > Now, pg_stat_lwlocks view shows LWLockId itself. But LWLockId is > not easy for DBA to determine actual lock type. > > So, I want to show LWLock names (or labels), like 'WALWriteLock' > or 'LockMgrLock', but how should I implement it? > > Any comments? > > Regards, -- Satoshi Nagayasu <snaga@uptime.jp> Uptime Technologies, LLC. http://www.uptime.jp
Attachment
> To implement it, a new array can be added in the local process memory > to hold lwlock statistics, and update counters both in the shared > memory and the local process memory at once. Then, the session can > retrieve 'per-session' statistics from the local process memory > via some dedicated function. That would be way cool from a diagnostics perspective. Not sure what impact it has, though. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Hi all, I have fixed my previous patch for pg_stat_lwlocks view, and as Josh commented, it now supports local and global (shared) statistics in the same system view. Local statistics means the counters are only effective in the same session, and shared ones means the counters are shared within the entire cluster. Also the global statistics would be collected via pgstat collector process like other statistics do. Now, the global statistics struct has been splitted into two parts for different use, for bgwriter stats and lwlock stats. Therefore, calling pg_stat_reset_shared('bgwriter') or pg_stat_reset_shared('lwlocks') would reset dedicated struct, not entire PgStat_GlobalStats. Comments and review are always welcome. Regards, ------------------------------------------------------------------------------ postgres=# SELECT * FROM pg_stat_lwlocks; lwlockid | local_calls | local_waits | local_time_ms | shared_calls | shared_waits | shared_time_ms ----------+-------------+-------------+---------------+--------------+--------------+---------------- 0 | 0 | 0 | 0 | 4268 | 0 | 0 1 | 43 | 0 | 0 | 387 | 0 | 0 2 | 0 | 0 | 0 | 19 | 0 | 0 3 | 0 | 0 | 0 | 28 | 0 | 0 4 | 3 | 0 | 0 | 315 | 0 | 0 5 | 0 | 0 | 0 | 24 | 0 | 0 6 | 1 | 0 | 0 | 76 | 0 | 0 7 | 0 | 0 | 0 | 16919 | 0 | 0 8 | 0 | 0 | 0 | 0 | 0 | 0 9 | 0 | 0 | 0 | 0 | 0 | 0 10 | 0 | 0 | 0 | 0 | 0 | 0 11 | 0 | 0 | 0 | 75 | 0 | 0 12 | 0 | 0 | 0 | 0 | 0 | 0 13 | 0 | 0 | 0 | 0 | 0 | 0 14 | 0 | 0 | 0 | 0 | 0 | 0 15 | 0 | 0 | 0 | 0 | 0 | 0 16 | 0 | 0 | 0 | 0 | 0 | 0 17 | 0 | 0 | 0 | 61451 | 6 | 0 18 | 0 | 0 | 0 | 0 | 0 | 0 19 | 0 | 0 | 0 | 0 | 0 | 0 20 | 0 | 0 | 0 | 0 | 0 | 0 21 | 1 | 0 | 0 | 9 | 0 | 0 22 | 0 | 0 | 0 | 0 | 0 | 0 23 | 0 | 0 | 0 | 0 | 0 | 0 24 | 0 | 0 | 0 | 1 | 0 | 0 25 | 0 | 0 | 0 | 0 | 0 | 0 26 | 2 | 0 | 0 | 18 | 0 | 0 27 | 0 | 0 | 0 | 0 | 0 | 0 28 | 0 | 0 | 0 | 0 | 0 | 0 29 | 0 | 0 | 0 | 0 | 0 | 0 30 | 0 | 0 | 0 | 0 | 0 | 0 31 | 0 | 0 | 0 | 0 | 0 | 0 32 | 0 | 0 | 0 | 0 | 0 | 0 33 | 4 | 0 | 0 | 207953 | 0 | 0 50 | 8 | 0 | 0 | 33388 | 0 | 0 67 | 0 | 0 | 0 | 0 | 0 | 0 (36 rows) postgres=# ------------------------------------------------------------------------------ 2012/06/26 21:11, Satoshi Nagayasu wrote: > Hi all, > > I've modified the pg_stat_lwlocks patch to be able to work with > the latest PostgreSQL Git code. > > This patch provides: > pg_stat_lwlocks New system view to show lwlock statistics. > pg_stat_get_lwlocks() New function to retrieve lwlock statistics. > pg_stat_reset_lwlocks() New function to reset lwlock statistics. > > Please try it out. > > Regards, > > 2012/06/26 5:29, Satoshi Nagayasu wrote: >> Hi all, >> >> I've been working on a new system view, pg_stat_lwlocks, to observe >> LWLock, and just completed my 'proof-of-concept' code that can work >> with version 9.1. >> >> Now, I'd like to know the possibility of this feature for future >> release. >> >> With this patch, DBA can easily determine a bottleneck around lwlocks. >> -------------------------------------------------- >> postgres=# SELECT * FROM pg_stat_lwlocks ORDER BY time_ms DESC LIMIT 10; >> lwlockid | calls | waits | time_ms >> ----------+--------+-------+--------- >> 49 | 193326 | 32096 | 23688 >> 8 | 3305 | 133 | 1335 >> 2 | 21 | 0 | 0 >> 4 | 135188 | 0 | 0 >> 5 | 57935 | 0 | 0 >> 6 | 141 | 0 | 0 >> 7 | 24580 | 1 | 0 >> 3 | 3282 | 0 | 0 >> 1 | 41 | 0 | 0 >> 9 | 3 | 0 | 0 >> (10 rows) >> >> postgres=# >> -------------------------------------------------- >> >> In this view, >> 'lwlockid' column represents LWLockId used in the backends. >> 'calls' represents how many times LWLockAcquire() was called. >> 'waits' represents how many times LWLockAcquire() needed to wait >> within it before lock acquisition. >> 'time_ms' represents how long LWLockAcquire() totally waited on >> a lwlock. >> >> And lwlocks that use a LWLockId range, such as BufMappingLock or >> LockMgrLock, would be grouped and summed up in a single record. >> For example, lwlockid 49 in the above view represents LockMgrLock >> statistics. >> >> Now, I know there are some considerations. >> >> (1) Performance >> >> I've measured LWLock performance both with and without the patch, >> and confirmed that this patch does not affect the LWLock perfomance >> at all. >> >> pgbench scores with the patch: >> tps = 900.906658 (excluding connections establishing) >> tps = 908.528422 (excluding connections establishing) >> tps = 903.900977 (excluding connections establishing) >> tps = 910.470595 (excluding connections establishing) >> tps = 909.685396 (excluding connections establishing) >> >> pgbench scores without the patch: >> tps = 909.096785 (excluding connections establishing) >> tps = 894.868712 (excluding connections establishing) >> tps = 910.074669 (excluding connections establishing) >> tps = 904.022770 (excluding connections establishing) >> tps = 895.673830 (excluding connections establishing) >> >> Of course, this experiment was not I/O bound, and the cache hit ratio >> was>99.9%. >> >> (2) Memory space >> >> In this patch, I added three new members to LWLock structure >> as uint64 to collect statistics. >> >> It means that those members must be held in the shared memory, >> but I'm not sure whether it's appropriate. >> >> I think another possible option is holding those statistics >> values in local (backend) process memory, and send them through >> the stat collector process (like other statistics values). >> >> (3) LWLock names (or labels) >> >> Now, pg_stat_lwlocks view shows LWLockId itself. But LWLockId is >> not easy for DBA to determine actual lock type. >> >> So, I want to show LWLock names (or labels), like 'WALWriteLock' >> or 'LockMgrLock', but how should I implement it? >> >> Any comments? >> >> Regards, > > -- Satoshi Nagayasu <snaga@uptime.jp> Uptime Technologies, LLC. http://www.uptime.jp
Attachment
Hi, 2012/10/13 23:05, Satoshi Nagayasu wrote: > Hi all, > > I have fixed my previous patch for pg_stat_lwlocks view, and > as Josh commented, it now supports local and global (shared) > statistics in the same system view. Sorry, I found my mistakes. New fixed one is attached to this mail. Regards, > > Local statistics means the counters are only effective in the > same session, and shared ones means the counters are shared within > the entire cluster. > > Also the global statistics would be collected via pgstat collector > process like other statistics do. > > Now, the global statistics struct has been splitted into two parts > for different use, for bgwriter stats and lwlock stats. > > Therefore, calling pg_stat_reset_shared('bgwriter') or > pg_stat_reset_shared('lwlocks') would reset dedicated struct, > not entire PgStat_GlobalStats. > > Comments and review are always welcome. > > Regards, > > ------------------------------------------------------------------------------ > postgres=# SELECT * FROM pg_stat_lwlocks; > lwlockid | local_calls | local_waits | local_time_ms | shared_calls | > shared_waits | shared_time_ms > ----------+-------------+-------------+---------------+--------------+--------------+---------------- > 0 | 0 | 0 | 0 | 4268 | > 0 | 0 > 1 | 43 | 0 | 0 | 387 | > 0 | 0 > 2 | 0 | 0 | 0 | 19 | > 0 | 0 > 3 | 0 | 0 | 0 | 28 | > 0 | 0 > 4 | 3 | 0 | 0 | 315 | > 0 | 0 > 5 | 0 | 0 | 0 | 24 | > 0 | 0 > 6 | 1 | 0 | 0 | 76 | > 0 | 0 > 7 | 0 | 0 | 0 | 16919 | > 0 | 0 > 8 | 0 | 0 | 0 | 0 | > 0 | 0 > 9 | 0 | 0 | 0 | 0 | > 0 | 0 > 10 | 0 | 0 | 0 | 0 | > 0 | 0 > 11 | 0 | 0 | 0 | 75 | > 0 | 0 > 12 | 0 | 0 | 0 | 0 | > 0 | 0 > 13 | 0 | 0 | 0 | 0 | > 0 | 0 > 14 | 0 | 0 | 0 | 0 | > 0 | 0 > 15 | 0 | 0 | 0 | 0 | > 0 | 0 > 16 | 0 | 0 | 0 | 0 | > 0 | 0 > 17 | 0 | 0 | 0 | 61451 | > 6 | 0 > 18 | 0 | 0 | 0 | 0 | > 0 | 0 > 19 | 0 | 0 | 0 | 0 | > 0 | 0 > 20 | 0 | 0 | 0 | 0 | > 0 | 0 > 21 | 1 | 0 | 0 | 9 | > 0 | 0 > 22 | 0 | 0 | 0 | 0 | > 0 | 0 > 23 | 0 | 0 | 0 | 0 | > 0 | 0 > 24 | 0 | 0 | 0 | 1 | > 0 | 0 > 25 | 0 | 0 | 0 | 0 | > 0 | 0 > 26 | 2 | 0 | 0 | 18 | > 0 | 0 > 27 | 0 | 0 | 0 | 0 | > 0 | 0 > 28 | 0 | 0 | 0 | 0 | > 0 | 0 > 29 | 0 | 0 | 0 | 0 | > 0 | 0 > 30 | 0 | 0 | 0 | 0 | > 0 | 0 > 31 | 0 | 0 | 0 | 0 | > 0 | 0 > 32 | 0 | 0 | 0 | 0 | > 0 | 0 > 33 | 4 | 0 | 0 | 207953 | > 0 | 0 > 50 | 8 | 0 | 0 | 33388 | > 0 | 0 > 67 | 0 | 0 | 0 | 0 | > 0 | 0 > (36 rows) > > postgres=# > ------------------------------------------------------------------------------ > > > 2012/06/26 21:11, Satoshi Nagayasu wrote: >> Hi all, >> >> I've modified the pg_stat_lwlocks patch to be able to work with >> the latest PostgreSQL Git code. >> >> This patch provides: >> pg_stat_lwlocks New system view to show lwlock statistics. >> pg_stat_get_lwlocks() New function to retrieve lwlock statistics. >> pg_stat_reset_lwlocks() New function to reset lwlock statistics. >> >> Please try it out. >> >> Regards, >> >> 2012/06/26 5:29, Satoshi Nagayasu wrote: >>> Hi all, >>> >>> I've been working on a new system view, pg_stat_lwlocks, to observe >>> LWLock, and just completed my 'proof-of-concept' code that can work >>> with version 9.1. >>> >>> Now, I'd like to know the possibility of this feature for future >>> release. >>> >>> With this patch, DBA can easily determine a bottleneck around lwlocks. >>> -------------------------------------------------- >>> postgres=# SELECT * FROM pg_stat_lwlocks ORDER BY time_ms DESC LIMIT 10; >>> lwlockid | calls | waits | time_ms >>> ----------+--------+-------+--------- >>> 49 | 193326 | 32096 | 23688 >>> 8 | 3305 | 133 | 1335 >>> 2 | 21 | 0 | 0 >>> 4 | 135188 | 0 | 0 >>> 5 | 57935 | 0 | 0 >>> 6 | 141 | 0 | 0 >>> 7 | 24580 | 1 | 0 >>> 3 | 3282 | 0 | 0 >>> 1 | 41 | 0 | 0 >>> 9 | 3 | 0 | 0 >>> (10 rows) >>> >>> postgres=# >>> -------------------------------------------------- >>> >>> In this view, >>> 'lwlockid' column represents LWLockId used in the backends. >>> 'calls' represents how many times LWLockAcquire() was called. >>> 'waits' represents how many times LWLockAcquire() needed to wait >>> within it before lock acquisition. >>> 'time_ms' represents how long LWLockAcquire() totally waited on >>> a lwlock. >>> >>> And lwlocks that use a LWLockId range, such as BufMappingLock or >>> LockMgrLock, would be grouped and summed up in a single record. >>> For example, lwlockid 49 in the above view represents LockMgrLock >>> statistics. >>> >>> Now, I know there are some considerations. >>> >>> (1) Performance >>> >>> I've measured LWLock performance both with and without the patch, >>> and confirmed that this patch does not affect the LWLock perfomance >>> at all. >>> >>> pgbench scores with the patch: >>> tps = 900.906658 (excluding connections establishing) >>> tps = 908.528422 (excluding connections establishing) >>> tps = 903.900977 (excluding connections establishing) >>> tps = 910.470595 (excluding connections establishing) >>> tps = 909.685396 (excluding connections establishing) >>> >>> pgbench scores without the patch: >>> tps = 909.096785 (excluding connections establishing) >>> tps = 894.868712 (excluding connections establishing) >>> tps = 910.074669 (excluding connections establishing) >>> tps = 904.022770 (excluding connections establishing) >>> tps = 895.673830 (excluding connections establishing) >>> >>> Of course, this experiment was not I/O bound, and the cache hit ratio >>> was>99.9%. >>> >>> (2) Memory space >>> >>> In this patch, I added three new members to LWLock structure >>> as uint64 to collect statistics. >>> >>> It means that those members must be held in the shared memory, >>> but I'm not sure whether it's appropriate. >>> >>> I think another possible option is holding those statistics >>> values in local (backend) process memory, and send them through >>> the stat collector process (like other statistics values). >>> >>> (3) LWLock names (or labels) >>> >>> Now, pg_stat_lwlocks view shows LWLockId itself. But LWLockId is >>> not easy for DBA to determine actual lock type. >>> >>> So, I want to show LWLock names (or labels), like 'WALWriteLock' >>> or 'LockMgrLock', but how should I implement it? >>> >>> Any comments? >>> >>> Regards, >> >> > > -- Satoshi Nagayasu <snaga@uptime.jp> Uptime Technologies, LLC. http://www.uptime.jp
Attachment
On Sat, Oct 13, 2012 at 11:34 PM, Satoshi Nagayasu <snaga@uptime.jp> wrote: > Hi, > > 2012/10/13 23:05, Satoshi Nagayasu wrote: >> Hi all, >> >> I have fixed my previous patch for pg_stat_lwlocks view, and >> as Josh commented, it now supports local and global (shared) >> statistics in the same system view. > > Sorry, I found my mistakes. New fixed one is attached to this mail. Thanks for revising the patch. Here are the comments: The document needs to be updated. The patch caused the following compile warnings in my machine. pgstat.c:1357: warning: no previous prototype for 'pgstat_report_lwlockstat' postgres.c:3922: warning: implicit declaration of function 'pgstat_report_lwlockstat' pgstatfuncs.c:1854: warning: no previous prototype for 'pg_stat_reset_lwlocks' In my test, this patch caused the measurable performance overhead. I created the test database by pgbench -s10 and ran pgbench -c8 -j8 -T60 -S. Results are: [HEAD] number of transactions actually processed: 1401369 tps = 23351.375811 (including connections establishing) tps = 23355.900043 (excluding connections establishing) [PATCH] number of transactions actually processed: 1401369 tps = 23351.375811 (including connections establishing) tps = 23355.900043 (excluding connections establishing) So I think that tracking lwlock usage should be enabled only when trace_lwlocks is enabled, so that a user who is not interested in lwlock usage can avoid such performance overhead. As far as I read the patch, only lwlock usage by backends is collected. Why aren't the lwlock usages by autovacuum worker and auxiliary processes collected? Regards, -- Fujii Masao
On Sun, Oct 14, 2012 at 3:34 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Sat, Oct 13, 2012 at 11:34 PM, Satoshi Nagayasu <snaga@uptime.jp> wrote: >> Hi, >> >> 2012/10/13 23:05, Satoshi Nagayasu wrote: >>> Hi all, >>> >>> I have fixed my previous patch for pg_stat_lwlocks view, and >>> as Josh commented, it now supports local and global (shared) >>> statistics in the same system view. >> >> Sorry, I found my mistakes. New fixed one is attached to this mail. > > Thanks for revising the patch. Here are the comments: > > The document needs to be updated. > > The patch caused the following compile warnings in my machine. > > pgstat.c:1357: warning: no previous prototype for 'pgstat_report_lwlockstat' > postgres.c:3922: warning: implicit declaration of function > 'pgstat_report_lwlockstat' > pgstatfuncs.c:1854: warning: no previous prototype for 'pg_stat_reset_lwlocks' > > In my test, this patch caused the measurable performance overhead. > I created the test database by pgbench -s10 and ran pgbench -c8 -j8 -T60 -S. > Results are: > > [HEAD] > number of transactions actually processed: 1401369 > tps = 23351.375811 (including connections establishing) > tps = 23355.900043 (excluding connections establishing) > > [PATCH] > number of transactions actually processed: 1401369 > tps = 23351.375811 (including connections establishing) > tps = 23355.900043 (excluding connections establishing) Oops! Obviously I copied and pasted the test result wrongly... Here is the right result. [HEAD] number of transactions actually processed: 1401369 tps = 23351.375811 (including connections establishing) tps = 23355.900043 (excluding connections establishing) [PATCH] number of transactions actually processed: 1092400 tps = 18179.498013 (including connections establishing) tps = 18182.450824 (excluding connections establishing) Another comment is; local_calls/waits/time_ms are really required? I'm not sure how those info would help the performance debugging. Regards, -- Fujii Masao
On Sun, Oct 14, 2012 at 6:00 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
Oops! Obviously I copied and pasted the test result wrongly...On Sun, Oct 14, 2012 at 3:34 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Sat, Oct 13, 2012 at 11:34 PM, Satoshi Nagayasu <snaga@uptime.jp> wrote:
>> Hi,
>>
>> 2012/10/13 23:05, Satoshi Nagayasu wrote:
>>> Hi all,
>>>
>>> I have fixed my previous patch for pg_stat_lwlocks view, and
>>> as Josh commented, it now supports local and global (shared)
>>> statistics in the same system view.
>>
>> Sorry, I found my mistakes. New fixed one is attached to this mail.
>
> Thanks for revising the patch. Here are the comments:
>
> The document needs to be updated.
>
> The patch caused the following compile warnings in my machine.
>
> pgstat.c:1357: warning: no previous prototype for 'pgstat_report_lwlockstat'
> postgres.c:3922: warning: implicit declaration of function
> 'pgstat_report_lwlockstat'
> pgstatfuncs.c:1854: warning: no previous prototype for 'pg_stat_reset_lwlocks'
>
> In my test, this patch caused the measurable performance overhead.
> I created the test database by pgbench -s10 and ran pgbench -c8 -j8 -T60 -S.
> Results are:
>
> [HEAD]
> number of transactions actually processed: 1401369
> tps = 23351.375811 (including connections establishing)
> tps = 23355.900043 (excluding connections establishing)
>
> [PATCH]
> number of transactions actually processed: 1401369
> tps = 23351.375811 (including connections establishing)
> tps = 23355.900043 (excluding connections establishing)
Here is the right result.number of transactions actually processed: 1092400
[HEAD]
number of transactions actually processed: 1401369
tps = 23351.375811 (including connections establishing)
tps = 23355.900043 (excluding connections establishing)
[PATCH]
tps = 18179.498013 (including connections establishing)
tps = 18182.450824 (excluding connections establishing)
Performance difference is due to only the mutex lock taken?
Another comment is; local_calls/waits/time_ms are really required?
I'm not sure how those info would help the performance debugging.
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
Michael Paquier
http://michael.otacoo.com
Thanks for the review. 2012/10/14 8:55, Michael Paquier wrote: > > > On Sun, Oct 14, 2012 at 6:00 AM, Fujii Masao <masao.fujii@gmail.com > <mailto:masao.fujii@gmail.com>> wrote: > > On Sun, Oct 14, 2012 at 3:34 AM, Fujii Masao <masao.fujii@gmail.com > <mailto:masao.fujii@gmail.com>> wrote: > > On Sat, Oct 13, 2012 at 11:34 PM, Satoshi Nagayasu > <snaga@uptime.jp <mailto:snaga@uptime.jp>> wrote: > >> Hi, > >> > >> 2012/10/13 23:05, Satoshi Nagayasu wrote: > >>> Hi all, > >>> > >>> I have fixed my previous patch for pg_stat_lwlocks view, and > >>> as Josh commented, it now supports local and global (shared) > >>> statistics in the same system view. > >> > >> Sorry, I found my mistakes. New fixed one is attached to this mail. > > > > Thanks for revising the patch. Here are the comments: > > > > The document needs to be updated. > > > > The patch caused the following compile warnings in my machine. > > > > pgstat.c:1357: warning: no previous prototype for > 'pgstat_report_lwlockstat' > > postgres.c:3922: warning: implicit declaration of function > > 'pgstat_report_lwlockstat' > > pgstatfuncs.c:1854: warning: no previous prototype for > 'pg_stat_reset_lwlocks' Oops. I just fixed them. Thanks. > > In my test, this patch caused the measurable performance overhead. > > I created the test database by pgbench -s10 and ran pgbench -c8 > -j8 -T60 -S. > > Results are: > > > > [HEAD] > > number of transactions actually processed: 1401369 > > tps = 23351.375811 (including connections establishing) > > tps = 23355.900043 (excluding connections establishing) > > > > [PATCH] > > number of transactions actually processed: 1401369 > > tps = 23351.375811 (including connections establishing) > > tps = 23355.900043 (excluding connections establishing) > > Oops! Obviously I copied and pasted the test result wrongly... > Here is the right result. > > [HEAD] > number of transactions actually processed: 1401369 > tps = 23351.375811 (including connections establishing) > tps = 23355.900043 (excluding connections establishing) > > [PATCH] > number of transactions actually processed: 1092400 > tps = 18179.498013 <tel:18179.498013> (including connections > establishing) > tps = 18182.450824 <tel:18182.450824> (excluding connections > establishing) > > Performance difference is due to only the mutex lock taken? I think it is coming from high-frequent reporting through pgstat collector process, which means calling pgstat_report_lwlocks() at PostgresMain(). diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index 585db1a..5ca2c6f 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -3919,6 +3919,8 @@ PostgresMain(int argc, char *argv[], const char *username) pgstat_report_activity(STATE_IDLE, NULL); } + pgstat_report_lwlockstat(); + ReadyForQuery(whereToSendOutput); send_ready_for_query = false; } When I reduced reporting (or just disabled reporting), it shows that the performance would not be affected by this patch. Here are some additional results of the performance test which is the same one Fujii-san did. HEAD ==== number of transactions actually processed: 3439971 tps = 57331.891602 (including connections establishing) tps = 57340.932324 (excluding connections establishing) pg_stat_lwlocks patch (reporting enabled) ========================================= number of transactions actually processed: 2665550 tps = 44425.038125 (including connections establishing) tps = 44430.565651 (excluding connections establishing) pg_stat_lwlocks patch (reporting disabled) ========================================== number of transactions actually processed: 3429370 tps = 57155.286475 (including connections establishing) tps = 57163.996943 (excluding connections establishing) pg_stat_lwlocks patch (reporting reduced 1/100) =============================================== number of transactions actually processed: 3421879 tps = 57030.660814 (including connections establishing) tps = 57038.950498 (excluding connections establishing) So, I think some additional hack to reduce reporting is needed. Would it be acceptable in terms of the performance? > Another comment is; local_calls/waits/time_ms are really required? > I'm not sure how those info would help the performance debugging. I think there are some needs to observe/determine how your test query is affected by the other workload from the other sessions. So, splitting local and shared statistics would be nice to have. Just my thought though. Regards, > > Regards, > > -- > Fujii Masao > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org > <mailto:pgsql-hackers@postgresql.org>) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > > > > > -- > Michael Paquier > http://michael.otacoo.com -- Satoshi Nagayasu <snaga@uptime.jp> Uptime Technologies, LLC. http://www.uptime.jp
On Sun, Oct 14, 2012 at 10:46 AM, Satoshi Nagayasu <snaga@uptime.jp> wrote: > HEAD > ==== > number of transactions actually processed: 3439971 > tps = 57331.891602 (including connections establishing) > tps = 57340.932324 (excluding connections establishing) <snip> > pg_stat_lwlocks patch (reporting disabled) > ========================================== > number of transactions actually processed: 3429370 > tps = 57155.286475 (including connections establishing) > tps = 57163.996943 (excluding connections establishing) > > So, I think some additional hack to reduce reporting is needed. > Would it be acceptable in terms of the performance? The tracing lwlock usage seems to still cause a small performance overhead even if reporting is disabled. I believe some users would prefer to avoid such overhead even if pg_stat_lwlocks is not available. It should be up to a user to decide whether to trace lwlock usage, e.g., by using trace_lwlock parameter, I think. >> Another comment is; local_calls/waits/time_ms are really required? >> I'm not sure how those info would help the performance debugging. > > > I think there are some needs to observe/determine how your test > query is affected by the other workload from the other sessions. > So, splitting local and shared statistics would be nice to have. > Just my thought though. What I don't like is that a session can see only local stats of its own session. It's hard to monitor local stats. Imagine the case where you'd like to monitor local stats of each pgbench session. To monitor such stats, you need to modify pgbench so that its each session monitors its own local stats. Even if you run a monitoring software, it cannot collect those stats because they don't belong to the session that it uses. Regards, -- Fujii Masao
(2012/10/14 13:26), Fujii Masao wrote: > On Sun, Oct 14, 2012 at 10:46 AM, Satoshi Nagayasu <snaga@uptime.jp> wrote: >> HEAD >> ==== >> number of transactions actually processed: 3439971 >> tps = 57331.891602 (including connections establishing) >> tps = 57340.932324 (excluding connections establishing) > <snip> >> pg_stat_lwlocks patch (reporting disabled) >> ========================================== >> number of transactions actually processed: 3429370 >> tps = 57155.286475 (including connections establishing) >> tps = 57163.996943 (excluding connections establishing) >> >> So, I think some additional hack to reduce reporting is needed. >> Would it be acceptable in terms of the performance? > > The tracing lwlock usage seems to still cause a small performance > overhead even if reporting is disabled. I believe some users would > prefer to avoid such overhead even if pg_stat_lwlocks is not available. > It should be up to a user to decide whether to trace lwlock usage, e.g., > by using trace_lwlock parameter, I think. Frankly speaking, I do not agree with disabling performance instrument to improve performance. DBA must *always* monitor the performance metrix when having such heavy workload. But it's ok to add a parameter to switch enable/disable it. Any other comments? >>> Another comment is; local_calls/waits/time_ms are really required? >>> I'm not sure how those info would help the performance debugging. >> >> >> I think there are some needs to observe/determine how your test >> query is affected by the other workload from the other sessions. >> So, splitting local and shared statistics would be nice to have. >> Just my thought though. > > What I don't like is that a session can see only local stats of its own > session. It's hard to monitor local stats. Imagine the case where you'd > like to monitor local stats of each pgbench session. To monitor such > stats, you need to modify pgbench so that its each session monitors > its own local stats. Even if you run a monitoring software, it cannot > collect those stats because they don't belong to the session that it uses. Ok. I'm waiting more comments from others. Dropping it is easy for me, but any other comments? Josh? Regards, > > Regards, > -- Satoshi Nagayasu <snaga@uptime.jp> Uptime Technologies, LLC. http://www.uptime.jp
On Sun, Oct 14, 2012 at 7:52 PM, Satoshi Nagayasu <snaga@uptime.jp> wrote: > (2012/10/14 13:26), Fujii Masao wrote: >> >> On Sun, Oct 14, 2012 at 10:46 AM, Satoshi Nagayasu <snaga@uptime.jp> >> wrote: >>> >>> HEAD >>> ==== >>> number of transactions actually processed: 3439971 >>> tps = 57331.891602 (including connections establishing) >>> tps = 57340.932324 (excluding connections establishing) >> >> <snip> >>> >>> pg_stat_lwlocks patch (reporting disabled) >>> ========================================== >>> number of transactions actually processed: 3429370 >>> tps = 57155.286475 (including connections establishing) >>> tps = 57163.996943 (excluding connections establishing) >>> >>> So, I think some additional hack to reduce reporting is needed. >>> Would it be acceptable in terms of the performance? >> >> >> The tracing lwlock usage seems to still cause a small performance >> overhead even if reporting is disabled. I believe some users would >> prefer to avoid such overhead even if pg_stat_lwlocks is not available. >> It should be up to a user to decide whether to trace lwlock usage, e.g., >> by using trace_lwlock parameter, I think. > > > Frankly speaking, I do not agree with disabling performance > instrument to improve performance. DBA must *always* monitor > the performance metrix when having such heavy workload. > > But it's ok to add a parameter to switch enable/disable it. > Any other comments? I thought that pg_stat_lwlocks would be mostly used for the debugging purpose. IOW most users would disable that in production system. But, if you think it should be enabled even in production system and most users would usually improve performance by using that view, I think that you need to reconsider the lwlockid in pg_stat_lwlocks view. It's hard to understand what kind of lwlock each lwlockid indicates. Which means that it's hard for most user to investigate something from current pg_stat_lwlocks. You might need to expose the lwlock name and its short description.... >>>> Another comment is; local_calls/waits/time_ms are really required? >>>> I'm not sure how those info would help the performance debugging. >>> >>> >>> >>> I think there are some needs to observe/determine how your test >>> query is affected by the other workload from the other sessions. >>> So, splitting local and shared statistics would be nice to have. >>> Just my thought though. >> >> >> What I don't like is that a session can see only local stats of its own >> session. It's hard to monitor local stats. Imagine the case where you'd >> like to monitor local stats of each pgbench session. To monitor such >> stats, you need to modify pgbench so that its each session monitors >> its own local stats. Even if you run a monitoring software, it cannot >> collect those stats because they don't belong to the session that it uses. > > > Ok. I'm waiting more comments from others. > Dropping it is easy for me, but any other comments? Josh? What I was thinking useful is to make the stats collector collect also local stats and report them for each backend ID or distinct query (e.g., adding such stats into pg_stat_statements, though this would not be good idea). Which enables every session to see any local stats. Regards, -- Fujii Masao
Satoshi Nagayasu <snaga@uptime.jp> writes: > (2012/10/14 13:26), Fujii Masao wrote: >> The tracing lwlock usage seems to still cause a small performance >> overhead even if reporting is disabled. I believe some users would >> prefer to avoid such overhead even if pg_stat_lwlocks is not available. >> It should be up to a user to decide whether to trace lwlock usage, e.g., >> by using trace_lwlock parameter, I think. > Frankly speaking, I do not agree with disabling performance > instrument to improve performance. DBA must *always* monitor > the performance metrix when having such heavy workload. This brings up a question that I don't think has been honestly considered, which is exactly whom a feature like this is targeted at. TBH I think it's of about zero use to DBAs (making the above argument bogus). It is potentially of use to developers, but a DBA is unlikely to be able to do anything about lwlock-level contention even if he has the knowledge to interpret the data. So I feel it isn't something that should be turned on in production builds. I'd vote for enabling it by a non-default configure option, and making sure that it doesn't introduce any overhead when the option is off. regards, tom lane
2012/10/15 1:43, Tom Lane wrote: > Satoshi Nagayasu <snaga@uptime.jp> writes: >> (2012/10/14 13:26), Fujii Masao wrote: >>> The tracing lwlock usage seems to still cause a small performance >>> overhead even if reporting is disabled. I believe some users would >>> prefer to avoid such overhead even if pg_stat_lwlocks is not available. >>> It should be up to a user to decide whether to trace lwlock usage, e.g., >>> by using trace_lwlock parameter, I think. > >> Frankly speaking, I do not agree with disabling performance >> instrument to improve performance. DBA must *always* monitor >> the performance metrix when having such heavy workload. > > This brings up a question that I don't think has been honestly > considered, which is exactly whom a feature like this is targeted at. > TBH I think it's of about zero use to DBAs (making the above argument > bogus). It is potentially of use to developers, but a DBA is unlikely > to be able to do anything about lwlock-level contention even if he has > the knowledge to interpret the data. Actually, I'm not a developer. I'm just a DBA, and I needed such instrument when I was asked to investigate storange WAL behavior that produced unexpected/random commit delays under heavy workload. And another patch (WAL dirty flush statistic patch) I have submitted is coming from the same reason. https://commitfest.postgresql.org/action/patch_view?id=893 Unfortunately, since I didn't have such instrument at that time, I used SystemTap to investigate WAL behaviors, including calls and waited time, but using SystemTap was really awful, and I thought PostgreSQL needs to have some "built-in" instrument for DBA. I needed to determine the bottleneck around WAL, such as lock contension and/or write performance of the device, but I couldn't find anything without an instrument. I accept that I'm focusing on only WAL related lwlocks, and it is not enough for ordinally DBAs, but I still need it to understand PostgreSQL behavior without having deep knowledge and experience on WAL design and implementation. > So I feel it isn't something that should be turned on in production > builds. I'd vote for enabling it by a non-default configure option, > and making sure that it doesn't introduce any overhead when the option > is off. There is another option to eliminate performance overhead for this purpose. As I tried in the first patch, instead of reporting through pgstat collector process, each backend could directly increment lwlock counters allocated in the shared memory. http://archives.postgresql.org/message-id/4FE9A6F5.2080405@uptime.jp Here are another benchmark results, including my first patch. [HEAD] number of transactions actually processed: 3439971 tps = 57331.891602 (including connections establishing) tps = 57340.932324 (excluding connections establishing) [My first patch] number of transactions actually processed: 3453745 tps = 57562.196971 (including connections establishing) tps = 57569.197838 (excluding connections establishing) Actually, I'm not sure why my patch makes PostgreSQL faster, :D but the performance seems better than my second patch. I think it still needs some trick to keep counters in "pgstat.stat" over restarting, but it would be more acceptable in terms of performance overhead. Regards, -- Satoshi Nagayasu <snaga@uptime.jp> Uptime Technologies, LLC. http://www.uptime.jp
On Sun, Oct 14, 2012 at 9:43 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Satoshi Nagayasu <snaga@uptime.jp> writes: >> (2012/10/14 13:26), Fujii Masao wrote: >>> The tracing lwlock usage seems to still cause a small performance >>> overhead even if reporting is disabled. I believe some users would >>> prefer to avoid such overhead even if pg_stat_lwlocks is not available. >>> It should be up to a user to decide whether to trace lwlock usage, e.g., >>> by using trace_lwlock parameter, I think. > >> Frankly speaking, I do not agree with disabling performance >> instrument to improve performance. DBA must *always* monitor >> the performance metrix when having such heavy workload. > > This brings up a question that I don't think has been honestly > considered, which is exactly whom a feature like this is targeted at. > TBH I think it's of about zero use to DBAs (making the above argument > bogus). It is potentially of use to developers, but a DBA is unlikely > to be able to do anything about lwlock-level contention even if he has > the knowledge to interpret the data. Waiting on BufFreelistLock suggests increasing shared_buffers. Waiting on ProcArrayLock perhaps suggests use of a connection pooler (or does it?) WALWriteLock suggests doing something about IO, either moving logs to different disks, or getting BBU, or something. WALInsertLock suggests trying to adapt your data loading process so it can take advantage of the bulk, or maybe increasing wal_buffers. And a lot of waiting on any of the locks gives a piece of information the DBA can use when asking the mailing lists for help, even if it doesn't allow him to take unilateral action. > So I feel it isn't something that should be turned on in production > builds. I'd vote for enabling it by a non-default configure option, > and making sure that it doesn't introduce any overhead when the option > is off. I think hackers would benefit from getting reports from DBAs in the field with concrete data on bottlenecks. If the only way to get this is to do some non-standard compile and deploy it to production, or to create a "benchmarking" copy of the production database system including a realistic work-load driver and run the non-standard compile there; either of those is going to dramatically cut down on the participation. Cheers, Jeff
2012/10/16 2:40, Jeff Janes wrote: > On Sun, Oct 14, 2012 at 9:43 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Satoshi Nagayasu <snaga@uptime.jp> writes: >>> (2012/10/14 13:26), Fujii Masao wrote: >>>> The tracing lwlock usage seems to still cause a small performance >>>> overhead even if reporting is disabled. I believe some users would >>>> prefer to avoid such overhead even if pg_stat_lwlocks is not available. >>>> It should be up to a user to decide whether to trace lwlock usage, e.g., >>>> by using trace_lwlock parameter, I think. >> >>> Frankly speaking, I do not agree with disabling performance >>> instrument to improve performance. DBA must *always* monitor >>> the performance metrix when having such heavy workload. >> >> This brings up a question that I don't think has been honestly >> considered, which is exactly whom a feature like this is targeted at. >> TBH I think it's of about zero use to DBAs (making the above argument >> bogus). It is potentially of use to developers, but a DBA is unlikely >> to be able to do anything about lwlock-level contention even if he has >> the knowledge to interpret the data. > > Waiting on BufFreelistLock suggests increasing shared_buffers. > > Waiting on ProcArrayLock perhaps suggests use of a connection pooler > (or does it?) > > WALWriteLock suggests doing something about IO, either moving logs to > different disks, or getting BBU, or something. > > WALInsertLock suggests trying to adapt your data loading process so it > can take advantage of the bulk, or maybe increasing wal_buffers. > > And a lot of waiting on any of the locks gives a piece of information > the DBA can use when asking the mailing lists for help, even if it > doesn't allow him to take unilateral action. > >> So I feel it isn't something that should be turned on in production >> builds. I'd vote for enabling it by a non-default configure option, >> and making sure that it doesn't introduce any overhead when the option >> is off. > > I think hackers would benefit from getting reports from DBAs in the > field with concrete data on bottlenecks. > > If the only way to get this is to do some non-standard compile and > deploy it to production, or to create a "benchmarking" copy of the > production database system including a realistic work-load driver and > run the non-standard compile there; either of those is going to > dramatically cut down on the participation. Agreed. The hardest thing to investigate performance issue is reproducing a situation in the different environment from the production environment. I often see people struggling to reproduce a situation with different hardware and (similar but) different workload. It is very time consuming, and also it often fails. So, we need to collect any piece of information, which would help us to understand what's going on within the production PostgreSQL, without any changes of binaries and configurations in the production environment. That's the reason why I stick to a "built-in" instrument, and I disagree to disable such instrument even if it has minor performance overhead. A flight-recorder must not be disabled. Collecting performance data must be top priority for DBA. Regards, > > Cheers, > > Jeff > -- Satoshi Nagayasu <snaga@uptime.jp> Uptime Technologies, LLC. http://www.uptime.jp
On Tue, Oct 16, 2012 at 11:31 AM, Satoshi Nagayasu <snaga@uptime.jp> wrote: > A flight-recorder must not be disabled. Collecting > performance data must be top priority for DBA. This analogy is inapposite, though, because a flight recorder rarely crashes the aircraft. If it did, people might have second thoughts about the "never disable the flight recorder" rule. I have had a couple of different excuses to look into the overhead of timing lately, and it does indeed seem that on many modern Linux boxes even extremely frequent gettimeofday calls produce only very modest amounts of overhead. Sadly, the situation on Windows doesn't look so good. I don't remember the exact numbers but I think it was something like 40 or 60 or 80 times slower on the Windows box one of my colleagues tested than it is on Linux. And it turns out that that overhead really is measurable and does matter if you do it in a code path that gets run frequently. Of course I am enough of a Linux geek that I don't use Windows myself and curse my fate when I do have to use it, but the reality is that we have a huge base of users who only use PostgreSQL at all because it runs on Windows, and we can't just throw those people under the bus. I think that older platforms like HP/UX likely have problems in this area as well although I confess to not having tested. That having been said, if we're going to do this, this is probably the right approach, because it only calls gettimeofday() in the case where the lock acquisition is contended, and that is a lot cheaper than calling it in all cases. Maybe it's worth finding a platform where pg_test_timing reports that timing is very slow and then measuring how much impact this has on something like a pgbench or pgbench -S workload. We might find that it is in fact negligible. I'm pretty certain that it will be almost if not entirely negligible on Linux but that's not really the case we need to worry about. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Oct 18, 2012 at 10:36 PM, Robert Haas <robertmhaas@gmail.com> wrote: > Sadly, the situation on Windows doesn't look so good. I > don't remember the exact numbers but I think it was something like 40 > or 60 or 80 times slower on the Windows box one of my colleagues > tested than it is on Linux. Do you happen to know the hardware and Windows version? Windows QueryPerformanceCounter that instr_time.h uses should use RDTSC based timing when the hardware can support it, just like Linux. I don't know if Windows can avoid syscall overhead though. > Maybe it's worth finding a platform where > pg_test_timing reports that timing is very slow and then measuring how > much impact this has on something like a pgbench or pgbench -S > workload. This can easily be tested on Linux by changing to the hpet or acpi_pm clocksource. There probably still are platforms that can do worse than this, but probably not by orders of magnitude. Regards, Ants Aasma -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de
On Wed, Oct 17, 2012 at 12:31 AM, Satoshi Nagayasu <snaga@uptime.jp> wrote: > 2012/10/16 2:40, Jeff Janes wrote: >> >> On Sun, Oct 14, 2012 at 9:43 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> >>> Satoshi Nagayasu <snaga@uptime.jp> writes: >>>> >>>> (2012/10/14 13:26), Fujii Masao wrote: >>>>> >>>>> The tracing lwlock usage seems to still cause a small performance >>>>> overhead even if reporting is disabled. I believe some users would >>>>> prefer to avoid such overhead even if pg_stat_lwlocks is not available. >>>>> It should be up to a user to decide whether to trace lwlock usage, >>>>> e.g., >>>>> by using trace_lwlock parameter, I think. >>> >>> >>>> Frankly speaking, I do not agree with disabling performance >>>> instrument to improve performance. DBA must *always* monitor >>>> the performance metrix when having such heavy workload. >>> >>> >>> This brings up a question that I don't think has been honestly >>> considered, which is exactly whom a feature like this is targeted at. >>> TBH I think it's of about zero use to DBAs (making the above argument >>> bogus). It is potentially of use to developers, but a DBA is unlikely >>> to be able to do anything about lwlock-level contention even if he has >>> the knowledge to interpret the data. >> >> >> Waiting on BufFreelistLock suggests increasing shared_buffers. >> >> Waiting on ProcArrayLock perhaps suggests use of a connection pooler >> (or does it?) >> >> WALWriteLock suggests doing something about IO, either moving logs to >> different disks, or getting BBU, or something. >> >> WALInsertLock suggests trying to adapt your data loading process so it >> can take advantage of the bulk, or maybe increasing wal_buffers. >> >> And a lot of waiting on any of the locks gives a piece of information >> the DBA can use when asking the mailing lists for help, even if it >> doesn't allow him to take unilateral action. >> >>> So I feel it isn't something that should be turned on in production >>> builds. I'd vote for enabling it by a non-default configure option, >>> and making sure that it doesn't introduce any overhead when the option >>> is off. >> >> >> I think hackers would benefit from getting reports from DBAs in the >> field with concrete data on bottlenecks. >> >> If the only way to get this is to do some non-standard compile and >> deploy it to production, or to create a "benchmarking" copy of the >> production database system including a realistic work-load driver and >> run the non-standard compile there; either of those is going to >> dramatically cut down on the participation. > > > Agreed. > > The hardest thing to investigate performance issue is > reproducing a situation in the different environment > from the production environment. > > I often see people struggling to reproduce a situation > with different hardware and (similar but) different > workload. It is very time consuming, and also it often > fails. > > So, we need to collect any piece of information, which > would help us to understand what's going on within > the production PostgreSQL, without any changes of > binaries and configurations in the production environment. > > That's the reason why I stick to a "built-in" instrument, > and I disagree to disable such instrument even if it has > minor performance overhead. > > A flight-recorder must not be disabled. Collecting > performance data must be top priority for DBA. pg_stat_lwlocks seems not adequate 'flight-recorder'. It collects only narrow performance data concerning lwlock. What we should have as 'flight-recorder' is something like Oracle wait event, I think. Not only lwlocks but also all of wait events should be collected for DBA to investigate the performance bottleneck. This idea was proposed by Itagaki-san before. Though he implemented the sampling-profiler patch, it failed to be committed. I'm not sure why not. Anyway, I think that this would be more right approach to provide the 'flight-recorder' to DBA. Regards, -- Fujii Masao
2012/10/19 4:36, Robert Haas wrote: > On Tue, Oct 16, 2012 at 11:31 AM, Satoshi Nagayasu <snaga@uptime.jp> wrote: >> A flight-recorder must not be disabled. Collecting >> performance data must be top priority for DBA. > > This analogy is inapposite, though, because a flight recorder rarely > crashes the aircraft. If it did, people might have second thoughts > about the "never disable the flight recorder" rule. I have had a > couple of different excuses to look into the overhead of timing > lately, and it does indeed seem that on many modern Linux boxes even > extremely frequent gettimeofday calls produce only very modest amounts > of overhead. I agree with that such performance instrument needs to be improved if it has critical performance issue against production environment. So, I'm still looking for a better implementation to decrease performance impact. However, the most important question here is that "How can we understand postgresql behavior without looking into tons of source code and hacking skill?" Recently, I've been having lots of conversation with database specialists (sales guys and DBAs) trying to use PostgreSQL instead of a commercial database, and they are always struggling with understand PostgreSQL behavior, because no one can observe and/or tell that. > Sadly, the situation on Windows doesn't look so good. I > don't remember the exact numbers but I think it was something like 40 > or 60 or 80 times slower on the Windows box one of my colleagues > tested than it is on Linux. And it turns out that that overhead > really is measurable and does matter if you do it in a code path that > gets run frequently. Of course I am enough of a Linux geek that I > don't use Windows myself and curse my fate when I do have to use it, > but the reality is that we have a huge base of users who only use > PostgreSQL at all because it runs on Windows, and we can't just throw > those people under the bus. I think that older platforms like HP/UX > likely have problems in this area as well although I confess to not > having tested. Do you mean my stat patch should have more performance test on the other platforms? Yes, I agree with that. > That having been said, if we're going to do this, this is probably the > right approach, because it only calls gettimeofday() in the case where > the lock acquisition is contended, and that is a lot cheaper than > calling it in all cases. Maybe it's worth finding a platform where > pg_test_timing reports that timing is very slow and then measuring how > much impact this has on something like a pgbench or pgbench -S > workload. We might find that it is in fact negligible. I'm pretty > certain that it will be almost if not entirely negligible on Linux but > that's not really the case we need to worry about. Thanks for a suggestion for a better implementation. As I mentioned above, I'm always looking for a better idea and solution to meet our purpose. Here, I want to share my concern with you again. PostgreSQL is getting more complicated in order to improve performance and stability, and I think it may be a good news. But also getting more difficult to understand without deep knowledge nowadays, and that would be some bad news actually. From my point of view, that's a huge hurdle to educate DBAs and expand PostgreSQL user base. Regards, -- Satoshi Nagayasu <snaga@uptime.jp> Uptime Technologies, LLC. http://www.uptime.jp
2012/10/19 23:48, Fujii Masao wrote: > On Wed, Oct 17, 2012 at 12:31 AM, Satoshi Nagayasu <snaga@uptime.jp> wrote: >> 2012/10/16 2:40, Jeff Janes wrote: >>> >>> On Sun, Oct 14, 2012 at 9:43 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> >>>> Satoshi Nagayasu <snaga@uptime.jp> writes: >>>>> >>>>> (2012/10/14 13:26), Fujii Masao wrote: >>>>>> >>>>>> The tracing lwlock usage seems to still cause a small performance >>>>>> overhead even if reporting is disabled. I believe some users would >>>>>> prefer to avoid such overhead even if pg_stat_lwlocks is not available. >>>>>> It should be up to a user to decide whether to trace lwlock usage, >>>>>> e.g., >>>>>> by using trace_lwlock parameter, I think. >>>> >>>> >>>>> Frankly speaking, I do not agree with disabling performance >>>>> instrument to improve performance. DBA must *always* monitor >>>>> the performance metrix when having such heavy workload. >>>> >>>> >>>> This brings up a question that I don't think has been honestly >>>> considered, which is exactly whom a feature like this is targeted at. >>>> TBH I think it's of about zero use to DBAs (making the above argument >>>> bogus). It is potentially of use to developers, but a DBA is unlikely >>>> to be able to do anything about lwlock-level contention even if he has >>>> the knowledge to interpret the data. >>> >>> >>> Waiting on BufFreelistLock suggests increasing shared_buffers. >>> >>> Waiting on ProcArrayLock perhaps suggests use of a connection pooler >>> (or does it?) >>> >>> WALWriteLock suggests doing something about IO, either moving logs to >>> different disks, or getting BBU, or something. >>> >>> WALInsertLock suggests trying to adapt your data loading process so it >>> can take advantage of the bulk, or maybe increasing wal_buffers. >>> >>> And a lot of waiting on any of the locks gives a piece of information >>> the DBA can use when asking the mailing lists for help, even if it >>> doesn't allow him to take unilateral action. >>> >>>> So I feel it isn't something that should be turned on in production >>>> builds. I'd vote for enabling it by a non-default configure option, >>>> and making sure that it doesn't introduce any overhead when the option >>>> is off. >>> >>> >>> I think hackers would benefit from getting reports from DBAs in the >>> field with concrete data on bottlenecks. >>> >>> If the only way to get this is to do some non-standard compile and >>> deploy it to production, or to create a "benchmarking" copy of the >>> production database system including a realistic work-load driver and >>> run the non-standard compile there; either of those is going to >>> dramatically cut down on the participation. >> >> >> Agreed. >> >> The hardest thing to investigate performance issue is >> reproducing a situation in the different environment >> from the production environment. >> >> I often see people struggling to reproduce a situation >> with different hardware and (similar but) different >> workload. It is very time consuming, and also it often >> fails. >> >> So, we need to collect any piece of information, which >> would help us to understand what's going on within >> the production PostgreSQL, without any changes of >> binaries and configurations in the production environment. >> >> That's the reason why I stick to a "built-in" instrument, >> and I disagree to disable such instrument even if it has >> minor performance overhead. >> >> A flight-recorder must not be disabled. Collecting >> performance data must be top priority for DBA. > > pg_stat_lwlocks seems not adequate 'flight-recorder'. It collects > only narrow performance data concerning lwlock. What we should > have as 'flight-recorder' is something like Oracle wait event, I think. > Not only lwlocks but also all of wait events should be collected for > DBA to investigate the performance bottleneck. That's the reason why I said "I accept that it's not enough for DBA", and I'm going to work on another lock stats. > This idea was > proposed by Itagaki-san before. Though he implemented the > sampling-profiler patch, it failed to be committed. I'm not sure why > not. Yeah, I know the previous patch posted by Itagaki-san. So, I'm questioning why (again) for this time. I think this is very important question because it would be critical in order to involve new DBAs to PostgreSQL. > Anyway, I think that this would be more right approach to > provide the 'flight-recorder' to DBA. Ok, I guess we have reached the consensus to have "some flight-recorder". Right? Actually, it seems a great progress from my point of view. :) Regards, -- Satoshi Nagayasu <snaga@uptime.jp> Uptime Technologies, LLC. http://www.uptime.jp
Satoshi Nagayasu <snaga@uptime.jp> writes: > Ok, I guess we have reached the consensus to have > "some flight-recorder". Right? I remain unconvinced. I think the argument that this will promote novice understanding is complete hogwash: making any sense of lwlock-level stats will require deep PG understanding, not create it. And despite Jeff's optimistic views upthread, I don't think very many people have or will acquire such understanding. So I'm really resistant to accepting even minimal overhead in pursuit of this goal --- and I do not believe the overhead will be minimal, either. regards, tom lane
On Fri, Oct 19, 2012 at 11:52 AM, Satoshi Nagayasu <snaga@uptime.jp> wrote: > I agree with that such performance instrument needs to be > improved if it has critical performance issue against > production environment. So, I'm still looking for a better > implementation to decrease performance impact. Frankly, I think the approach of simply providing an "off" switch is probably a good one. I admit that it would be nice if we could run with instrumentation like this all the time - but until very fast time-lookups become ubiquitous, we can't Another architectural problem here is that I believe this will increase the size of the stats file, which I think is going to cause pain for some people. I suspect that's going to be an issue even if we have an "off" switch. I think somebody's going to have to figure out a way to split that file up somehow. > However, the most important question here is that "How can > we understand postgresql behavior without looking into > tons of source code and hacking skill?" > > Recently, I've been having lots of conversation with > database specialists (sales guys and DBAs) trying to use > PostgreSQL instead of a commercial database, and they are > always struggling with understand PostgreSQL behavior, > because no one can observe and/or tell that. Agreed. >> Sadly, the situation on Windows doesn't look so good. I >> don't remember the exact numbers but I think it was something like 40 >> or 60 or 80 times slower on the Windows box one of my colleagues >> tested than it is on Linux. And it turns out that that overhead >> really is measurable and does matter if you do it in a code path that >> gets run frequently. Of course I am enough of a Linux geek that I >> don't use Windows myself and curse my fate when I do have to use it, >> but the reality is that we have a huge base of users who only use >> PostgreSQL at all because it runs on Windows, and we can't just throw >> those people under the bus. I think that older platforms like HP/UX >> likely have problems in this area as well although I confess to not >> having tested. > > Do you mean my stat patch should have more performance test > on the other platforms? Yes, I agree with that. Yes. >> That having been said, if we're going to do this, this is probably the >> right approach, because it only calls gettimeofday() in the case where >> the lock acquisition is contended, and that is a lot cheaper than >> calling it in all cases. Maybe it's worth finding a platform where >> pg_test_timing reports that timing is very slow and then measuring how >> much impact this has on something like a pgbench or pgbench -S >> workload. We might find that it is in fact negligible. I'm pretty >> certain that it will be almost if not entirely negligible on Linux but >> that's not really the case we need to worry about. > > Thanks for a suggestion for a better implementation. > As I mentioned above, I'm always looking for a better idea > and solution to meet our purpose. Actually, I meant that your existing implementation seemed to be making some good decisions. > Here, I want to share my concern with you again. > PostgreSQL is getting more complicated in order to improve > performance and stability, and I think it may be a good news. > But also getting more difficult to understand without deep > knowledge nowadays, and that would be some bad news actually. > > From my point of view, that's a huge hurdle to educate DBAs > and expand PostgreSQL user base. Yes, there is definitely more work to be done, here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
2012/10/20 2:45, Tom Lane wrote: > Satoshi Nagayasu <snaga@uptime.jp> writes: >> Ok, I guess we have reached the consensus to have >> "some flight-recorder". Right? > > I remain unconvinced. I think the argument that this will promote > novice understanding is complete hogwash: making any sense of > lwlock-level stats will require deep PG understanding, not create it. > And despite Jeff's optimistic views upthread, I don't think very many > people have or will acquire such understanding. So I'm really resistant > to accepting even minimal overhead in pursuit of this goal --- and I > do not believe the overhead will be minimal, either. As I wrote "some", I'm actually not pushing "lwlock stat" itself in this context, I still believe it would be useful though. (I don't think it would be hogwash, because I needed it.) I'm just thinking of how to help DBA understand PostgreSQL behavior without deep dive into the code when he/she faces some kind of performance issue, and also how to make it easier to help newbies by PG experts, including tech support people. As I posted before, I did an investigation on WAL performance when I faced with random commit delays, and I found some problem on the storage device by observing WALInsertLock and WALWriteLock statistic with using SystemTap. How could I do that without SystemTap on the production database? SystemTap would kill performance (more than my patch), and sometimes process, too. Do you really think that we do not need any kind of lock statistic anymore? Regards, -- Satoshi Nagayasu <snaga@uptime.jp> Uptime Technologies, LLC. http://www.uptime.jp
On Sat, Oct 20, 2012 at 1:03 AM, Satoshi Nagayasu <snaga@uptime.jp> wrote: > 2012/10/19 23:48, Fujii Masao wrote: >> >> On Wed, Oct 17, 2012 at 12:31 AM, Satoshi Nagayasu <snaga@uptime.jp> >> wrote: >>> >>> 2012/10/16 2:40, Jeff Janes wrote: >>>> >>>> >>>> On Sun, Oct 14, 2012 at 9:43 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>>> >>>>> >>>>> Satoshi Nagayasu <snaga@uptime.jp> writes: >>>>>> >>>>>> >>>>>> (2012/10/14 13:26), Fujii Masao wrote: >>>>>>> >>>>>>> >>>>>>> The tracing lwlock usage seems to still cause a small performance >>>>>>> overhead even if reporting is disabled. I believe some users would >>>>>>> prefer to avoid such overhead even if pg_stat_lwlocks is not >>>>>>> available. >>>>>>> It should be up to a user to decide whether to trace lwlock usage, >>>>>>> e.g., >>>>>>> by using trace_lwlock parameter, I think. >>>>> >>>>> >>>>> >>>>>> Frankly speaking, I do not agree with disabling performance >>>>>> instrument to improve performance. DBA must *always* monitor >>>>>> the performance metrix when having such heavy workload. >>>>> >>>>> >>>>> >>>>> This brings up a question that I don't think has been honestly >>>>> considered, which is exactly whom a feature like this is targeted at. >>>>> TBH I think it's of about zero use to DBAs (making the above argument >>>>> bogus). It is potentially of use to developers, but a DBA is unlikely >>>>> to be able to do anything about lwlock-level contention even if he has >>>>> the knowledge to interpret the data. >>>> >>>> >>>> >>>> Waiting on BufFreelistLock suggests increasing shared_buffers. >>>> >>>> Waiting on ProcArrayLock perhaps suggests use of a connection pooler >>>> (or does it?) >>>> >>>> WALWriteLock suggests doing something about IO, either moving logs to >>>> different disks, or getting BBU, or something. >>>> >>>> WALInsertLock suggests trying to adapt your data loading process so it >>>> can take advantage of the bulk, or maybe increasing wal_buffers. >>>> >>>> And a lot of waiting on any of the locks gives a piece of information >>>> the DBA can use when asking the mailing lists for help, even if it >>>> doesn't allow him to take unilateral action. >>>> >>>>> So I feel it isn't something that should be turned on in production >>>>> builds. I'd vote for enabling it by a non-default configure option, >>>>> and making sure that it doesn't introduce any overhead when the option >>>>> is off. >>>> >>>> >>>> >>>> I think hackers would benefit from getting reports from DBAs in the >>>> field with concrete data on bottlenecks. >>>> >>>> If the only way to get this is to do some non-standard compile and >>>> deploy it to production, or to create a "benchmarking" copy of the >>>> production database system including a realistic work-load driver and >>>> run the non-standard compile there; either of those is going to >>>> dramatically cut down on the participation. >>> >>> >>> >>> Agreed. >>> >>> The hardest thing to investigate performance issue is >>> reproducing a situation in the different environment >>> from the production environment. >>> >>> I often see people struggling to reproduce a situation >>> with different hardware and (similar but) different >>> workload. It is very time consuming, and also it often >>> fails. >>> >>> So, we need to collect any piece of information, which >>> would help us to understand what's going on within >>> the production PostgreSQL, without any changes of >>> binaries and configurations in the production environment. >>> >>> That's the reason why I stick to a "built-in" instrument, >>> and I disagree to disable such instrument even if it has >>> minor performance overhead. >>> >>> A flight-recorder must not be disabled. Collecting >>> performance data must be top priority for DBA. >> >> >> pg_stat_lwlocks seems not adequate 'flight-recorder'. It collects >> only narrow performance data concerning lwlock. What we should >> have as 'flight-recorder' is something like Oracle wait event, I think. >> Not only lwlocks but also all of wait events should be collected for >> DBA to investigate the performance bottleneck. > > > That's the reason why I said "I accept that it's not enough > for DBA", and I'm going to work on another lock stats. > > >> This idea was >> proposed by Itagaki-san before. Though he implemented the >> sampling-profiler patch, it failed to be committed. I'm not sure why >> not. > > > Yeah, I know the previous patch posted by Itagaki-san. > So, I'm questioning why (again) for this time. > I think this is very important question because it would > be critical in order to involve new DBAs to PostgreSQL. > > >> Anyway, I think that this would be more right approach to >> provide the 'flight-recorder' to DBA. > > > Ok, I guess we have reached the consensus to have > "some flight-recorder". Right? Yes, at least I agree to add that. But whatever is added, if it may have any impact on the performance, on-off switch must be required. Also the default value of the switch should be off until we'll have implemented the low-overhead instrument and agreed to enable it by default. Regards, -- Fujii Masao
Robert Haas escribió: > On Fri, Oct 19, 2012 at 11:52 AM, Satoshi Nagayasu <snaga@uptime.jp> wrote: > > I agree with that such performance instrument needs to be > > improved if it has critical performance issue against > > production environment. So, I'm still looking for a better > > implementation to decrease performance impact. > > Frankly, I think the approach of simply providing an "off" switch is > probably a good one. I admit that it would be nice if we could run > with instrumentation like this all the time - but until very fast > time-lookups become ubiquitous, we can't > > Another architectural problem here is that I believe this will > increase the size of the stats file, which I think is going to cause > pain for some people. I suspect that's going to be an issue even if > we have an "off" switch. I think somebody's going to have to figure > out a way to split that file up somehow. I am closing this patch as Returned with Feedback for now. Hopefully a version can be submitted for the next commitfest with the "off" switch, but I am not sure about burdening the submitter with the responsibility of splitting the stats file. Regarding Tom's objection to the fundamental issue of providing lwlocks data, I agree that maybe it's the wrong layer to be measuring to provide data to DBAs, but not providing any data is worse, because then even PG developers cannot know what are the real bottlenecks; and it's hard to see what other layer we need to be measuring. Maybe this can serve as a foundation to discover useful things to provide in the future. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Regarding Tom's objection to the fundamental issue of providing lwlocks > data, I agree that maybe it's the wrong layer to be measuring to provide > data to DBAs, but not providing any data is worse, because then even PG > developers cannot know what are the real bottlenecks; and it's hard to > see what other layer we need to be measuring. Maybe this can serve as a > foundation to discover useful things to provide in the future. FWIW, I am not objecting to having the *ability* to collect such data. I am questioning the usefulness/wisdom of having it turned on by default, and I am also concerned about whether there is residual overhead even when it's not turned on. regards, tom lane