Thread: Cpu usage 100% on slave. s_lock problem.

Cpu usage 100% on slave. s_lock problem.

From

Дмитрий Дегтярёв

Date:

27 August 2013, 07:57:24

Hello.

Exist 2 identical server DELL PowerEdge™ R720, CPU Dual Intel® Xeon® E5-2620 Hexa-Core inkl, RAM 256Gb, RAID-10 8 x 600 GB SAS 6 Gb/s 15000 rpm.

$ cat /etc/fedora-release

Fedora release 19

$ postgres --version

postgres (PostgreSQL) 9.2.4

Data ~220Gb and Indexes ~140Gb

iowait ~0.2-0.5. Disk usage only write ~0-2 Mb/sec

On each installed pg_bouncer. Pool size 24.

On Master in peak load ~1200 request/sec, ~30 ms/request avg, 24 CPU ~95% - this is no problem

$ perf top

21,14% [kernel] [k] isolate_freepages_block

12,27% [unknown] [.] 0x00007fc1bb303be0

5,93% postgres [.] hash_search_with_hash_value

4,85% libbz2.so.1.0.6 [.] 0x000000000000a6e0

2,70% postgres [.] PinBuffer

2,34% postgres [.] slot_deform_tuple

1,92% libbz2.so.1.0.6 [.] BZ2_compressBlock

1,85% postgres [.] LWLockAcquire

1,69% postgres [.] heap_page_prune_opt

1,48% postgres [.] _bt_checkkeys

1,40% [kernel] [k] page_fault

1,36% postgres [.] _bt_compare

1,23% postgres [.] heap_hot_search_buffer

1,19% [kernel] [k] get_pageblock_flags_group

1,18% postgres [.] AllocSetAlloc

On Slave max ~400-500 request/sec, ~200 and up 24 ms/request avg, 24 CPU ~95% - this is problem

$ perf top

30,10% postgres [.] s_lock

22,90% [unknown] [.] 0x0000000000729cfe

4,98% postgres [.] RecoveryInProgress.part.9

4,89% postgres [.] LWLockAcquire

4,57% postgres [.] hash_search_with_hash_value

3,50% postgres [.] PinBuffer

2,31% postgres [.] heap_page_prune_opt

2,27% postgres [.] LWLockRelease

1,18% postgres [.] heap_hot_search_buffer

1,03% postgres [.] AllocSetAlloc

...

Slave at a much lower load than the master hangs on the function s_lock. What can be done about it?

On Slave ~300 request/sec, ~5-8 ms/request avg, cpu usage ~20% - normal work by small load

$ perf top

10,74% postgres [.] hash_search_with_hash_value

4,94% postgres [.] PinBuffer

4,61% postgres [.] AllocSetAlloc

3,57% postgres [.] heap_page_prune_opt

3,24% postgres [.] LWLockAcquire

2,47% postgres [.] heap_hot_search_buffer

2,11% postgres [.] SearchCatCache

1,90% postgres [.] LWLockRelease

1,87% postgres [.] _bt_compare

1,68% postgres [.] FunctionCall2Coll

1,46% postgres [.] _bt_checkkeys

1,24% postgres [.] copyObject

1,15% postgres [.] RecoveryInProgress.part.9

1,05% postgres [.] slot_deform_tuple

...

Configuration Master postgres.conf

listen_addresses = '*'

max_connections = 100

shared_buffers = 200GB

work_mem = 20MB

maintenance_work_mem = 2GB

effective_io_concurrency = 4

wal_level = hot_standby

fsync = on

synchronous_commit = off

full_page_writes = on

wal_writer_delay = 200ms

checkpoint_segments = 100

checkpoint_timeout = 15min

checkpoint_completion_target = 0.9

archive_mode = on

archive_command = 'pbzip2 -f -c %p > /opt/pg/wals/wals/%f.bz2'

max_wal_senders = 3

random_page_cost = 0.5

cpu_tuple_cost = 0.02

cpu_index_tuple_cost = 0.01

cpu_operator_cost = 0.005

effective_cache_size = 40GB

default_statistics_target = 300

logging_collector = on

log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'

log_truncate_on_rotation = on

log_rotation_age = 1d

log_rotation_size = 0

log_min_duration_statement = 1000

log_checkpoints = on

log_line_prefix = '%t %p %c-%l %x %q(%u, %d, %r, %a) '

log_lock_waits = on

track_io_timing = on

track_activity_query_size = 4096

autovacuum = on

log_autovacuum_min_duration = 0

autovacuum_freeze_max_age = 1500000000

datestyle = 'iso, dmy'

timezone = 'Europe/Moscow'

lc_messages = 'en_US.UTF-8'

lc_monetary = 'ru_RU.UTF-8'

lc_numeric = 'ru_RU.UTF-8'

lc_time = 'ru_RU.UTF-8'

default_text_search_config = 'pg_catalog.russian'

shared_preload_libraries = 'pg_stat_statements'

pg_stat_statements.max = 1000

pg_stat_statements.track = all

max_locks_per_transaction = 264

Configuration Slave postgres.conf

listen_addresses = '*'

max_connections = 100

shared_buffers = 200GB

work_mem = 20MB

maintenance_work_mem = 2GB

effective_io_concurrency = 4

wal_level = hot_standby

fsync = on

synchronous_commit = off

full_page_writes = on

wal_writer_delay = 200ms

commit_delay = 1000

commit_siblings = 2

checkpoint_segments = 100

checkpoint_timeout = 15min

checkpoint_completion_target = 0.9

archive_mode = on

archive_command = 'pbzip2 -f -c %p > /opt/pg/wals/wals/%f.bz2'

max_wal_senders = 4

hot_standby = on

max_standby_archive_delay = 30s

max_standby_streaming_delay = 30s

hot_standby_feedback = on

random_page_cost = 0.5

cpu_tuple_cost = 0.02

cpu_index_tuple_cost = 0.01

cpu_operator_cost = 0.005

effective_cache_size = 40GB

default_statistics_target = 300

logging_collector = on

log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'

log_truncate_on_rotation = on

log_rotation_age = 1d

log_rotation_size = 0

log_min_duration_statement = 1000

log_checkpoints = on

log_line_prefix = '%t %p %c-%l %x %q(%u, %d, %r, %a) '

log_lock_waits = on

track_functions = none

track_io_timing = on

track_activity_query_size = 4096

autovacuum = on

datestyle = 'iso, dmy'

timezone = 'Europe/Moscow'

lc_messages = 'en_US.UTF-8'

lc_monetary = 'ru_RU.UTF-8'

lc_numeric = 'ru_RU.UTF-8'

lc_time = 'ru_RU.UTF-8'

default_text_search_config = 'pg_catalog.russian'

shared_preload_libraries = 'pg_stat_statements'

pg_stat_statements.max = 1000

pg_stat_statements.track = all

max_locks_per_transaction = 264

Thanks.

Re: Cpu usage 100% on slave. s_lock problem.

From

Merlin Moncure

Date:

27 August 2013, 13:23:12

On Tue, Aug 27, 2013 at 2:57 AM, Дмитрий Дегтярёв <degtyaryov@gmail.com> wrote:
> Hello.
>
> Exist 2 identical server DELL PowerEdge™ R720, CPU Dual Intel® Xeon® E5-2620
> Hexa-Core inkl, RAM 256Gb, RAID-10 8 x 600 GB SAS 6 Gb/s 15000 rpm.
>
> $ cat /etc/fedora-release
> Fedora release 19
>
> $ postgres --version
> postgres (PostgreSQL) 9.2.4
>
> Data ~220Gb and Indexes ~140Gb
>
> iowait ~0.2-0.5. Disk usage only write ~0-2 Mb/sec
>
> On each installed pg_bouncer. Pool size 24.
>
> On Master in peak load ~1200 request/sec, ~30 ms/request avg, 24 CPU ~95% -
> this is no problem
> $ perf top
>  21,14%  [kernel]                 [k] isolate_freepages_block
>  12,27%  [unknown]                [.] 0x00007fc1bb303be0
>   5,93%  postgres                 [.] hash_search_with_hash_value
>   4,85%  libbz2.so.1.0.6          [.] 0x000000000000a6e0
>   2,70%  postgres                 [.] PinBuffer
>   2,34%  postgres                 [.] slot_deform_tuple
>   1,92%  libbz2.so.1.0.6          [.] BZ2_compressBlock
>   1,85%  postgres                 [.] LWLockAcquire
>   1,69%  postgres                 [.] heap_page_prune_opt
>   1,48%  postgres                 [.] _bt_checkkeys
>   1,40%  [kernel]                 [k] page_fault
>   1,36%  postgres                 [.] _bt_compare
>   1,23%  postgres                 [.] heap_hot_search_buffer
>   1,19%  [kernel]                 [k] get_pageblock_flags_group
>   1,18%  postgres                 [.] AllocSetAlloc
>
> On Slave max ~400-500 request/sec, ~200 and up 24 ms/request avg, 24 CPU
> ~95% - this is problem
> $ perf top
>  30,10%  postgres               [.] s_lock
>  22,90%  [unknown]              [.] 0x0000000000729cfe
>   4,98%  postgres               [.] RecoveryInProgress.part.9
>   4,89%  postgres               [.] LWLockAcquire
>   4,57%  postgres               [.] hash_search_with_hash_value
>   3,50%  postgres               [.] PinBuffer
>   2,31%  postgres               [.] heap_page_prune_opt

It looks like you're hitting spinlock connection inside
heap_page_prune_opt().  Which is commented:
 * Note: this is called quite often.  It's important that it fall out quickly
 * if there's not any use in pruning.

This in turn calls RecoveryInProgress() which spinlocks in order to
get a guaranteed result.  At that call site, we are told:
/*
* We can't write WAL in recovery mode, so there's no point trying to
* clean the page. The master will likely issue a cleaning WAL record soon
* anyway, so this is no particular loss.
*/

So ISTM it's necessary to pedantically check  RecoveryInProgress on
each and every call of this routine (or at least, we should be able to
reduce the number of spinlocks).

Hm, what if we exposed LocalRecoveryInProgress() through a function
which would approximately satisfy the condition
"MightRecoveryInProgress()" in the basis the condition only moves in
one direction?  That could lead to optimization around the spinlock in
hot path cases like this where getting 'TRUE' incorrectly is mostly
harmless...

merlin

Re: Cpu usage 100% on slave. s_lock problem.

From

Merlin Moncure

Date:

27 August 2013, 13:39:05

On Tue, Aug 27, 2013 at 8:23 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
> It looks like you're hitting spinlock connection inside
> heap_page_prune_opt().  Which is commented:
>  * Note: this is called quite often.  It's important that it fall out quickly
>  * if there's not any use in pruning.
>
> This in turn calls RecoveryInProgress() which spinlocks in order to
> get a guaranteed result.  At that call site, we are told:
> /*
> * We can't write WAL in recovery mode, so there's no point trying to
> * clean the page. The master will likely issue a cleaning WAL record soon
> * anyway, so this is no particular loss.
> */
>
> So ISTM it's necessary to pedantically check  RecoveryInProgress on
> each and every call of this routine (or at least, we should be able to
> reduce the number of spinlocks).
>
> Hm, what if we exposed LocalRecoveryInProgress() through a function
> which would approximately satisfy the condition
> "MightRecoveryInProgress()" in the basis the condition only moves in
> one direction?  That could lead to optimization around the spinlock in
> hot path cases like this where getting 'TRUE' incorrectly is mostly
> harmless...

More specifically, this hypothetical routine would query
xlogctl->SharedRecoveryInProgress without taking a lock and would not
issue InitXLOGAccess(). RecoveryInProgress() seems to be called
everywhere (In particular: StartTransaction()) so I don't think
there's a lot of risk in terms of losing access to the xlog.

merlin

Re: Cpu usage 100% on slave. s_lock problem.

From

Merlin Moncure

Date:

27 August 2013, 14:13:04

On Tue, Aug 27, 2013 at 8:38 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Tue, Aug 27, 2013 at 8:23 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> It looks like you're hitting spinlock connection inside
>> heap_page_prune_opt().  Which is commented:
>>  * Note: this is called quite often.  It's important that it fall out quickly
>>  * if there's not any use in pruning.
>>
>> This in turn calls RecoveryInProgress() which spinlocks in order to
>> get a guaranteed result.  At that call site, we are told:
>> /*
>> * We can't write WAL in recovery mode, so there's no point trying to
>> * clean the page. The master will likely issue a cleaning WAL record soon
>> * anyway, so this is no particular loss.
>> */
>>
>> So ISTM it's necessary to pedantically check  RecoveryInProgress on
>> each and every call of this routine (or at least, we should be able to
>> reduce the number of spinlocks).
>>
>> Hm, what if we exposed LocalRecoveryInProgress() through a function
>> which would approximately satisfy the condition
>> "MightRecoveryInProgress()" in the basis the condition only moves in
>> one direction?  That could lead to optimization around the spinlock in
>> hot path cases like this where getting 'TRUE' incorrectly is mostly
>> harmless...
>
> More specifically, this hypothetical routine would query
> xlogctl->SharedRecoveryInProgress without taking a lock and would not
> issue InitXLOGAccess(). RecoveryInProgress() seems to be called
> everywhere (In particular: StartTransaction()) so I don't think
> there's a lot of risk in terms of losing access to the xlog.

Something like the attached.  Note, this patch is for research
purposes only and should *not* be applied to your production
environment.

merlin

Attachment

recovery.patch

Re: Cpu usage 100% on slave. s_lock problem.

From

Merlin Moncure

Date:

27 August 2013, 14:57:48

On Tue, Aug 27, 2013 at 9:12 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
> Something like the attached.  Note, this patch is for research
> purposes only and should *not* be applied to your production
> environment.

Here is a revised version that is missing the spurious whitespace edit.

merlin

Attachment

recovery2.patch

Re: Cpu usage 100% on slave. s_lock problem.

From

Andres Freund

Date:

27 August 2013, 15:56:02

On 2013-08-27 09:57:38 -0500, Merlin Moncure wrote:
> + bool
> + RecoveryMightBeInProgress(void)
> + {
> +     /*
> +      * We check shared state each time only until we leave recovery mode. We
> +      * can't re-enter recovery, so there's no need to keep checking after the
> +      * shared variable has once been seen false.
> +      */
> +     if (!LocalRecoveryInProgress)
> +         return false;
> +     else
> +     {
> +         /* use volatile pointer to prevent code rearrangement */
> +         volatile XLogCtlData *xlogctl = XLogCtl;
> +
> +         /* Intentionally query xlogctl without spinlocking! */
> +         LocalRecoveryInProgress = xlogctl->SharedRecoveryInProgress;
> +
> +         return LocalRecoveryInProgress;
> +     }
> + }

I don't think it's acceptable to *set* LocalRecoveryInProgress
here. That should only be done in the normal routine.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Cpu usage 100% on slave. s_lock problem.

From

Merlin Moncure

Date:

27 August 2013, 17:18:01

On Tue, Aug 27, 2013 at 10:55 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2013-08-27 09:57:38 -0500, Merlin Moncure wrote:
>> + bool
>> + RecoveryMightBeInProgress(void)
>> + {
>> +     /*
>> +      * We check shared state each time only until we leave recovery mode. We
>> +      * can't re-enter recovery, so there's no need to keep checking after the
>> +      * shared variable has once been seen false.
>> +      */
>> +     if (!LocalRecoveryInProgress)
>> +             return false;
>> +     else
>> +     {
>> +             /* use volatile pointer to prevent code rearrangement */
>> +             volatile XLogCtlData *xlogctl = XLogCtl;
>> +
>> +             /* Intentionally query xlogctl without spinlocking! */
>> +             LocalRecoveryInProgress = xlogctl->SharedRecoveryInProgress;
>> +
>> +             return LocalRecoveryInProgress;
>> +     }
>> + }
>
> I don't think it's acceptable to *set* LocalRecoveryInProgress
> here. That should only be done in the normal routine.

quite right -- that was a major error -- you could bypass the
initialization call to the xlog with some bad luck.

merlin

Attachment

recovery3.patch

Re: Cpu usage 100% on slave. s_lock problem.

From

Дмитрий Дегтярёв

Date:

17 September 2013, 11:55:12

Hello.

We have not been able to reproduce this problem on a test servers. Use this patch to production servers do not dare.

In the course of studying the problems we have identified that many queries are executed on the slave several times slower. On master function heap_hot_search_buffer execute 100 cycles, on the slave the same query with the same plan function heap_hot_search_buffer execute 2000 cycles.

Also, we were able to reproduce the problem on the master and detect that there s_lock of slow queries.

We have solved this problem. A large number of queries used 4 frequently changing index. In these indexes, 99% of the dead tuples. Autovacuum and even VACUUM FULL these tuples can not be removed because of autovacuum_freeze_max_age.

We've added cron that 2-3 times a day, performs CREATE INDEX CONCURRENTLY idx_name_new; DROP INDEX CONCURRENTLY idx_name; ALTER INDEX idx_name_new RENAME TO idx_name; for this 4 indexes.

As a result s_lock not exists in listed perf top.

2013/8/29 Merlin Moncure <mmoncure@gmail.com>

so -- are you in a position where you might be able to test this patch?

merlin

Re: Cpu usage 100% on slave. s_lock problem.

From

Andres Freund

Date:

17 September 2013, 11:59:32

Hi,

On 2013-09-17 17:55:01 +0600, Дмитрий Дегтярёв wrote:
> We have not been able to reproduce this problem on a test servers. Use this
> patch to production servers do not dare.
>
> In the course of studying the problems we have identified that many queries
> are executed on the slave several times slower. On master function
> heap_hot_search_buffer execute 100 cycles, on the slave the same query with
> the same plan function heap_hot_search_buffer execute 2000 cycles.
> Also, we were able to reproduce the problem on the master and detect that
> there s_lock of slow queries.

What you describe is normally an indication that you have too many
longrunning transactions around preventing hot pruning from working.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Cpu usage 100% on slave. s_lock problem.

From

Merlin Moncure

Date:

17 September 2013, 13:19:04

On Tue, Sep 17, 2013 at 6:59 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> Hi,
>
> On 2013-09-17 17:55:01 +0600, Дмитрий Дегтярёв wrote:
>> We have not been able to reproduce this problem on a test servers. Use this
>> patch to production servers do not dare.
>>
>> In the course of studying the problems we have identified that many queries
>> are executed on the slave several times slower. On master function
>> heap_hot_search_buffer execute 100 cycles, on the slave the same query with
>> the same plan function heap_hot_search_buffer execute 2000 cycles.
>> Also, we were able to reproduce the problem on the master and detect that
>> there s_lock of slow queries.
>
> What you describe is normally an indication that you have too many
> longrunning transactions around preventing hot pruning from working.

Do you think it's worth submitting the lock avoidance patch for formal review?

merlin

Re: Cpu usage 100% on slave. s_lock problem.

From

Andres Freund

Date:

17 September 2013, 13:24:45

On 2013-09-17 08:18:54 -0500, Merlin Moncure wrote:
> On Tue, Sep 17, 2013 at 6:59 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > Hi,
> >
> > On 2013-09-17 17:55:01 +0600, Дмитрий Дегтярёв wrote:
> >> We have not been able to reproduce this problem on a test servers. Use this
> >> patch to production servers do not dare.
> >>
> >> In the course of studying the problems we have identified that many queries
> >> are executed on the slave several times slower. On master function
> >> heap_hot_search_buffer execute 100 cycles, on the slave the same query with
> >> the same plan function heap_hot_search_buffer execute 2000 cycles.
> >> Also, we were able to reproduce the problem on the master and detect that
> >> there s_lock of slow queries.
> >
> > What you describe is normally an indication that you have too many
> > longrunning transactions around preventing hot pruning from working.
>
> Do you think it's worth submitting the lock avoidance patch for formal review?

You mean the bufmgr.c thing? Generally I think that that code needs a
good of scalability work - there's a whole thread about it
somewhere. But TBH the theories you've voiced about the issues you've
seen haven't convinced me so far.

If you can manage to prove it has a benefit in some case that's
reproducable - why not go ahead?

Quick question: Do you happen to have pg_locks output from back then
around? We've recently found servers going into somewhat similar
slowdowns because they exhausted the fastpath locks which made lwlocks
far more expensive and made s_lock go up very high in the profle.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Cpu usage 100% on slave. s_lock problem.

From

Merlin Moncure

Date:

17 September 2013, 13:32:38

On Tue, Sep 17, 2013 at 8:24 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2013-09-17 08:18:54 -0500, Merlin Moncure wrote:
>> Do you think it's worth submitting the lock avoidance patch for formal review?
>
> You mean the bufmgr.c thing? Generally I think that that code needs a
> good of scalability work - there's a whole thread about it
> somewhere. But TBH the theories you've voiced about the issues you've
> seen haven't convinced me so far.

er, no (but I share your skepticism -- my challenge right now is to
demonstrate measurable benefit which so far I've been unable to do).
I was talking about the patch on  *this* thread which bypasses the
s_lock in RecoveryInProgress()  :-).

> Quick question: Do you happen to have pg_locks output from back then
> around? We've recently found servers going into somewhat similar
> slowdowns because they exhausted the fastpath locks which made lwlocks
> far more expensive and made s_lock go up very high in the profle.

I do. Unfortunately I don't have profile info.   Not sure how useful
it is -- I'll send it off-list.

merlin

Re: Cpu usage 100% on slave. s_lock problem.

From

Andres Freund

Date:

17 September 2013, 13:35:58

On 2013-09-17 08:32:30 -0500, Merlin Moncure wrote:
> On Tue, Sep 17, 2013 at 8:24 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > On 2013-09-17 08:18:54 -0500, Merlin Moncure wrote:
> >> Do you think it's worth submitting the lock avoidance patch for formal review?
> >
> > You mean the bufmgr.c thing? Generally I think that that code needs a
> > good of scalability work - there's a whole thread about it
> > somewhere. But TBH the theories you've voiced about the issues you've
> > seen haven't convinced me so far.
>
> er, no (but I share your skepticism -- my challenge right now is to
> demonstrate measurable benefit which so far I've been unable to do).
> I was talking about the patch on  *this* thread which bypasses the
> s_lock in RecoveryInProgress()  :-).

Ah, yes. Sorry confused issues ;). Yes, I think that'd made sense.

> > Quick question: Do you happen to have pg_locks output from back then
> > around? We've recently found servers going into somewhat similar
> > slowdowns because they exhausted the fastpath locks which made lwlocks
> > far more expensive and made s_lock go up very high in the profle.
>
> I do. Unfortunately I don't have profile info.   Not sure how useful
> it is -- I'll send it off-list.

Great.

The primary thing I'd like to know is whether there are lots of
non-fastpath locks...

If you ever get into the situation I mistakenly referred to again, I'd
strongly suggest recompling postgres with -fno-omit-frame-pointer. That
makes hierarchical profiles actually useful which can help tremendously
with diagnosing issues like this...

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Cpu usage 100% on slave. s_lock problem.

From

Merlin Moncure

Date:

17 September 2013, 13:40:28

On Tue, Sep 17, 2013 at 8:35 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2013-09-17 08:32:30 -0500, Merlin Moncure wrote:
>> On Tue, Sep 17, 2013 at 8:24 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> > On 2013-09-17 08:18:54 -0500, Merlin Moncure wrote:
>> >> Do you think it's worth submitting the lock avoidance patch for formal review?
>> >
>> > You mean the bufmgr.c thing? Generally I think that that code needs a
>> > good of scalability work - there's a whole thread about it
>> > somewhere. But TBH the theories you've voiced about the issues you've
>> > seen haven't convinced me so far.
>>
>> er, no (but I share your skepticism -- my challenge right now is to
>> demonstrate measurable benefit which so far I've been unable to do).
>> I was talking about the patch on  *this* thread which bypasses the
>> s_lock in RecoveryInProgress()  :-).
>
> Ah, yes. Sorry confused issues ;). Yes, I think that'd made sense.
>
>> > Quick question: Do you happen to have pg_locks output from back then
>> > around? We've recently found servers going into somewhat similar
>> > slowdowns because they exhausted the fastpath locks which made lwlocks
>> > far more expensive and made s_lock go up very high in the profle.
>>
>> I do. Unfortunately I don't have profile info.   Not sure how useful
>> it is -- I'll send it off-list.
>
> Great.
>
> The primary thing I'd like to know is whether there are lots of
> non-fastpath locks...
>
> If you ever get into the situation I mistakenly referred to again, I'd
> strongly suggest recompling postgres with -fno-omit-frame-pointer. That
> makes hierarchical profiles actually useful which can help tremendously
> with diagnosing issues like this...

We may get an opportunity to do that.  I'm curious enough about the
THP compaction issues that Kevin mentioned to to maybe consider
cranking buffers again.  If I do that, it will be with strict
instructions to the site operators to catch a profile before taking
further action.

merlin

Re: Cpu usage 100% on slave. s_lock problem.

From

Andres Freund

Date:

17 September 2013, 13:43:53

On 2013-09-17 08:40:23 -0500, Merlin Moncure wrote:
> > If you ever get into the situation I mistakenly referred to again, I'd
> > strongly suggest recompling postgres with -fno-omit-frame-pointer. That
> > makes hierarchical profiles actually useful which can help tremendously
> > with diagnosing issues like this...
>
> We may get an opportunity to do that.  I'm curious enough about the
> THP compaction issues that Kevin mentioned to to maybe consider
> cranking buffers again.  If I do that, it will be with strict
> instructions to the site operators to catch a profile before taking
> further action.

The THP issues should be very clearly diagnosable because a good part of
the time will be spent in the kernel. Lots of spinlocking there, but the
function names are easily discernible from pg's code.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Cpu usage 100% on slave. s_lock problem.

From

Andres Freund

Date:

26 September 2013, 23:08:21

On 2013-08-27 12:17:55 -0500, Merlin Moncure wrote:
> On Tue, Aug 27, 2013 at 10:55 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > On 2013-08-27 09:57:38 -0500, Merlin Moncure wrote:
> >> + bool
> >> + RecoveryMightBeInProgress(void)
> >> + {
> >> +     /*
> >> +      * We check shared state each time only until we leave recovery mode. We
> >> +      * can't re-enter recovery, so there's no need to keep checking after the
> >> +      * shared variable has once been seen false.
> >> +      */
> >> +     if (!LocalRecoveryInProgress)
> >> +             return false;
> >> +     else
> >> +     {
> >> +             /* use volatile pointer to prevent code rearrangement */
> >> +             volatile XLogCtlData *xlogctl = XLogCtl;
> >> +
> >> +             /* Intentionally query xlogctl without spinlocking! */
> >> +             LocalRecoveryInProgress = xlogctl->SharedRecoveryInProgress;
> >> +
> >> +             return LocalRecoveryInProgress;
> >> +     }
> >> + }
> >
> > I don't think it's acceptable to *set* LocalRecoveryInProgress
> > here. That should only be done in the normal routine.
>
> quite right -- that was a major error -- you could bypass the
> initialization call to the xlog with some bad luck.

I've seen this in profiles since, so I'd appreciate pushing this
forward.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Cpu usage 100% on slave. s_lock problem.

From

Merlin Moncure

Date:

27 September 2013, 03:14:15

On Thu, Sep 26, 2013 at 6:08 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2013-08-27 12:17:55 -0500, Merlin Moncure wrote:
>> On Tue, Aug 27, 2013 at 10:55 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> > On 2013-08-27 09:57:38 -0500, Merlin Moncure wrote:
>> >> + bool
>> >> + RecoveryMightBeInProgress(void)
>> >> + {
>> >> +     /*
>> >> +      * We check shared state each time only until we leave recovery mode. We
>> >> +      * can't re-enter recovery, so there's no need to keep checking after the
>> >> +      * shared variable has once been seen false.
>> >> +      */
>> >> +     if (!LocalRecoveryInProgress)
>> >> +             return false;
>> >> +     else
>> >> +     {
>> >> +             /* use volatile pointer to prevent code rearrangement */
>> >> +             volatile XLogCtlData *xlogctl = XLogCtl;
>> >> +
>> >> +             /* Intentionally query xlogctl without spinlocking! */
>> >> +             LocalRecoveryInProgress = xlogctl->SharedRecoveryInProgress;
>> >> +
>> >> +             return LocalRecoveryInProgress;
>> >> +     }
>> >> + }
>> >
>> > I don't think it's acceptable to *set* LocalRecoveryInProgress
>> > here. That should only be done in the normal routine.
>>
>> quite right -- that was a major error -- you could bypass the
>> initialization call to the xlog with some bad luck.
>
> I've seen this in profiles since, so I'd appreciate pushing this
> forward.

roger that -- will push ahead when I get into the office...

merlin

Re: Cpu usage 100% on slave. s_lock problem.

From

Merlin Moncure

Date:

27 September 2013, 12:56:50

On Thu, Sep 26, 2013 at 10:14 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Thu, Sep 26, 2013 at 6:08 PM, Andres Freund <andres@2ndquadrant.com> wrote:
>> On 2013-08-27 12:17:55 -0500, Merlin Moncure wrote:
>>> On Tue, Aug 27, 2013 at 10:55 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>>> > On 2013-08-27 09:57:38 -0500, Merlin Moncure wrote:
>>> >> + bool
>>> >> + RecoveryMightBeInProgress(void)
>>> >> + {
>>> >> +     /*
>>> >> +      * We check shared state each time only until we leave recovery mode. We
>>> >> +      * can't re-enter recovery, so there's no need to keep checking after the
>>> >> +      * shared variable has once been seen false.
>>> >> +      */
>>> >> +     if (!LocalRecoveryInProgress)
>>> >> +             return false;
>>> >> +     else
>>> >> +     {
>>> >> +             /* use volatile pointer to prevent code rearrangement */
>>> >> +             volatile XLogCtlData *xlogctl = XLogCtl;
>>> >> +
>>> >> +             /* Intentionally query xlogctl without spinlocking! */
>>> >> +             LocalRecoveryInProgress = xlogctl->SharedRecoveryInProgress;
>>> >> +
>>> >> +             return LocalRecoveryInProgress;
>>> >> +     }
>>> >> + }
>>> >
>>> > I don't think it's acceptable to *set* LocalRecoveryInProgress
>>> > here. That should only be done in the normal routine.
>>>
>>> quite right -- that was a major error -- you could bypass the
>>> initialization call to the xlog with some bad luck.
>>
>> I've seen this in profiles since, so I'd appreciate pushing this
>> forward.
>
> roger that -- will push ahead when I get into the office...

attached is new version fixing some comment typos.

merlin

Attachment

recovery4.patch