Thread: Cpu usage 100% on slave. s_lock problem.
Hello.
Exist 2 identical server DELL PowerEdge™ R720, CPU Dual Intel® Xeon® E5-2620 Hexa-Core inkl, RAM 256Gb, RAID-10 8 x 600 GB SAS 6 Gb/s 15000 rpm.
$ cat /etc/fedora-release
Fedora release 19
$ postgres --version
postgres (PostgreSQL) 9.2.4
Data ~220Gb and Indexes ~140Gb
iowait ~0.2-0.5. Disk usage only write ~0-2 Mb/sec
On each installed pg_bouncer. Pool size 24.
On Master in peak load ~1200 request/sec, ~30 ms/request avg, 24 CPU ~95% - this is no problem
$ perf top
21,14% [kernel] [k] isolate_freepages_block
12,27% [unknown] [.] 0x00007fc1bb303be0
5,93% postgres [.] hash_search_with_hash_value
4,85% libbz2.so.1.0.6 [.] 0x000000000000a6e0
2,70% postgres [.] PinBuffer
2,34% postgres [.] slot_deform_tuple
1,92% libbz2.so.1.0.6 [.] BZ2_compressBlock
1,85% postgres [.] LWLockAcquire
1,69% postgres [.] heap_page_prune_opt
1,48% postgres [.] _bt_checkkeys
1,40% [kernel] [k] page_fault
1,36% postgres [.] _bt_compare
1,23% postgres [.] heap_hot_search_buffer
1,19% [kernel] [k] get_pageblock_flags_group
1,18% postgres [.] AllocSetAlloc
On Slave max ~400-500 request/sec, ~200 and up 24 ms/request avg, 24 CPU ~95% - this is problem
$ perf top
30,10% postgres [.] s_lock
22,90% [unknown] [.] 0x0000000000729cfe
4,98% postgres [.] RecoveryInProgress.part.9
4,89% postgres [.] LWLockAcquire
4,57% postgres [.] hash_search_with_hash_value
3,50% postgres [.] PinBuffer
2,31% postgres [.] heap_page_prune_opt
2,27% postgres [.] LWLockRelease
1,18% postgres [.] heap_hot_search_buffer
1,03% postgres [.] AllocSetAlloc
...
Slave at a much lower load than the master hangs on the function s_lock. What can be done about it?
On Slave ~300 request/sec, ~5-8 ms/request avg, cpu usage ~20% - normal work by small load
$ perf top
10,74% postgres [.] hash_search_with_hash_value
4,94% postgres [.] PinBuffer
4,61% postgres [.] AllocSetAlloc
3,57% postgres [.] heap_page_prune_opt
3,24% postgres [.] LWLockAcquire
2,47% postgres [.] heap_hot_search_buffer
2,11% postgres [.] SearchCatCache
1,90% postgres [.] LWLockRelease
1,87% postgres [.] _bt_compare
1,68% postgres [.] FunctionCall2Coll
1,46% postgres [.] _bt_checkkeys
1,24% postgres [.] copyObject
1,15% postgres [.] RecoveryInProgress.part.9
1,05% postgres [.] slot_deform_tuple
...
Configuration Master postgres.conf
listen_addresses = '*'
max_connections = 100
shared_buffers = 200GB
work_mem = 20MB
maintenance_work_mem = 2GB
effective_io_concurrency = 4
wal_level = hot_standby
fsync = on
synchronous_commit = off
full_page_writes = on
wal_writer_delay = 200ms
checkpoint_segments = 100
checkpoint_timeout = 15min
checkpoint_completion_target = 0.9
archive_mode = on
archive_command = 'pbzip2 -f -c %p > /opt/pg/wals/wals/%f.bz2'
max_wal_senders = 3
random_page_cost = 0.5
cpu_tuple_cost = 0.02
cpu_index_tuple_cost = 0.01
cpu_operator_cost = 0.005
effective_cache_size = 40GB
default_statistics_target = 300
logging_collector = on
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_truncate_on_rotation = on
log_rotation_age = 1d
log_rotation_size = 0
log_min_duration_statement = 1000
log_checkpoints = on
log_line_prefix = '%t %p %c-%l %x %q(%u, %d, %r, %a) '
log_lock_waits = on
track_io_timing = on
track_activity_query_size = 4096
autovacuum = on
log_autovacuum_min_duration = 0
autovacuum_freeze_max_age = 1500000000
datestyle = 'iso, dmy'
timezone = 'Europe/Moscow'
lc_messages = 'en_US.UTF-8'
lc_monetary = 'ru_RU.UTF-8'
lc_numeric = 'ru_RU.UTF-8'
lc_time = 'ru_RU.UTF-8'
default_text_search_config = 'pg_catalog.russian'
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.max = 1000
pg_stat_statements.track = all
max_locks_per_transaction = 264
Configuration Slave postgres.conf
listen_addresses = '*'
max_connections = 100
shared_buffers = 200GB
work_mem = 20MB
maintenance_work_mem = 2GB
effective_io_concurrency = 4
wal_level = hot_standby
fsync = on
synchronous_commit = off
full_page_writes = on
wal_writer_delay = 200ms
commit_delay = 1000
commit_siblings = 2
checkpoint_segments = 100
checkpoint_timeout = 15min
checkpoint_completion_target = 0.9
archive_mode = on
archive_command = 'pbzip2 -f -c %p > /opt/pg/wals/wals/%f.bz2'
max_wal_senders = 4
hot_standby = on
max_standby_archive_delay = 30s
max_standby_streaming_delay = 30s
hot_standby_feedback = on
random_page_cost = 0.5
cpu_tuple_cost = 0.02
cpu_index_tuple_cost = 0.01
cpu_operator_cost = 0.005
effective_cache_size = 40GB
default_statistics_target = 300
logging_collector = on
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_truncate_on_rotation = on
log_rotation_age = 1d
log_rotation_size = 0
log_min_duration_statement = 1000
log_checkpoints = on
log_line_prefix = '%t %p %c-%l %x %q(%u, %d, %r, %a) '
log_lock_waits = on
track_functions = none
track_io_timing = on
track_activity_query_size = 4096
autovacuum = on
datestyle = 'iso, dmy'
timezone = 'Europe/Moscow'
lc_messages = 'en_US.UTF-8'
lc_monetary = 'ru_RU.UTF-8'
lc_numeric = 'ru_RU.UTF-8'
lc_time = 'ru_RU.UTF-8'
default_text_search_config = 'pg_catalog.russian'
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.max = 1000
pg_stat_statements.track = all
max_locks_per_transaction = 264
Thanks.
On Tue, Aug 27, 2013 at 2:57 AM, Дмитрий Дегтярёв <degtyaryov@gmail.com> wrote: > Hello. > > Exist 2 identical server DELL PowerEdge™ R720, CPU Dual Intel® Xeon® E5-2620 > Hexa-Core inkl, RAM 256Gb, RAID-10 8 x 600 GB SAS 6 Gb/s 15000 rpm. > > $ cat /etc/fedora-release > Fedora release 19 > > $ postgres --version > postgres (PostgreSQL) 9.2.4 > > Data ~220Gb and Indexes ~140Gb > > iowait ~0.2-0.5. Disk usage only write ~0-2 Mb/sec > > On each installed pg_bouncer. Pool size 24. > > On Master in peak load ~1200 request/sec, ~30 ms/request avg, 24 CPU ~95% - > this is no problem > $ perf top > 21,14% [kernel] [k] isolate_freepages_block > 12,27% [unknown] [.] 0x00007fc1bb303be0 > 5,93% postgres [.] hash_search_with_hash_value > 4,85% libbz2.so.1.0.6 [.] 0x000000000000a6e0 > 2,70% postgres [.] PinBuffer > 2,34% postgres [.] slot_deform_tuple > 1,92% libbz2.so.1.0.6 [.] BZ2_compressBlock > 1,85% postgres [.] LWLockAcquire > 1,69% postgres [.] heap_page_prune_opt > 1,48% postgres [.] _bt_checkkeys > 1,40% [kernel] [k] page_fault > 1,36% postgres [.] _bt_compare > 1,23% postgres [.] heap_hot_search_buffer > 1,19% [kernel] [k] get_pageblock_flags_group > 1,18% postgres [.] AllocSetAlloc > > On Slave max ~400-500 request/sec, ~200 and up 24 ms/request avg, 24 CPU > ~95% - this is problem > $ perf top > 30,10% postgres [.] s_lock > 22,90% [unknown] [.] 0x0000000000729cfe > 4,98% postgres [.] RecoveryInProgress.part.9 > 4,89% postgres [.] LWLockAcquire > 4,57% postgres [.] hash_search_with_hash_value > 3,50% postgres [.] PinBuffer > 2,31% postgres [.] heap_page_prune_opt It looks like you're hitting spinlock connection inside heap_page_prune_opt(). Which is commented: * Note: this is called quite often. It's important that it fall out quickly * if there's not any use in pruning. This in turn calls RecoveryInProgress() which spinlocks in order to get a guaranteed result. At that call site, we are told: /* * We can't write WAL in recovery mode, so there's no point trying to * clean the page. The master will likely issue a cleaning WAL record soon * anyway, so this is no particular loss. */ So ISTM it's necessary to pedantically check RecoveryInProgress on each and every call of this routine (or at least, we should be able to reduce the number of spinlocks). Hm, what if we exposed LocalRecoveryInProgress() through a function which would approximately satisfy the condition "MightRecoveryInProgress()" in the basis the condition only moves in one direction? That could lead to optimization around the spinlock in hot path cases like this where getting 'TRUE' incorrectly is mostly harmless... merlin
On Tue, Aug 27, 2013 at 8:23 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > It looks like you're hitting spinlock connection inside > heap_page_prune_opt(). Which is commented: > * Note: this is called quite often. It's important that it fall out quickly > * if there's not any use in pruning. > > This in turn calls RecoveryInProgress() which spinlocks in order to > get a guaranteed result. At that call site, we are told: > /* > * We can't write WAL in recovery mode, so there's no point trying to > * clean the page. The master will likely issue a cleaning WAL record soon > * anyway, so this is no particular loss. > */ > > So ISTM it's necessary to pedantically check RecoveryInProgress on > each and every call of this routine (or at least, we should be able to > reduce the number of spinlocks). > > Hm, what if we exposed LocalRecoveryInProgress() through a function > which would approximately satisfy the condition > "MightRecoveryInProgress()" in the basis the condition only moves in > one direction? That could lead to optimization around the spinlock in > hot path cases like this where getting 'TRUE' incorrectly is mostly > harmless... More specifically, this hypothetical routine would query xlogctl->SharedRecoveryInProgress without taking a lock and would not issue InitXLOGAccess(). RecoveryInProgress() seems to be called everywhere (In particular: StartTransaction()) so I don't think there's a lot of risk in terms of losing access to the xlog. merlin
On Tue, Aug 27, 2013 at 8:38 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Tue, Aug 27, 2013 at 8:23 AM, Merlin Moncure <mmoncure@gmail.com> wrote: >> It looks like you're hitting spinlock connection inside >> heap_page_prune_opt(). Which is commented: >> * Note: this is called quite often. It's important that it fall out quickly >> * if there's not any use in pruning. >> >> This in turn calls RecoveryInProgress() which spinlocks in order to >> get a guaranteed result. At that call site, we are told: >> /* >> * We can't write WAL in recovery mode, so there's no point trying to >> * clean the page. The master will likely issue a cleaning WAL record soon >> * anyway, so this is no particular loss. >> */ >> >> So ISTM it's necessary to pedantically check RecoveryInProgress on >> each and every call of this routine (or at least, we should be able to >> reduce the number of spinlocks). >> >> Hm, what if we exposed LocalRecoveryInProgress() through a function >> which would approximately satisfy the condition >> "MightRecoveryInProgress()" in the basis the condition only moves in >> one direction? That could lead to optimization around the spinlock in >> hot path cases like this where getting 'TRUE' incorrectly is mostly >> harmless... > > More specifically, this hypothetical routine would query > xlogctl->SharedRecoveryInProgress without taking a lock and would not > issue InitXLOGAccess(). RecoveryInProgress() seems to be called > everywhere (In particular: StartTransaction()) so I don't think > there's a lot of risk in terms of losing access to the xlog. Something like the attached. Note, this patch is for research purposes only and should *not* be applied to your production environment. merlin
Attachment
On Tue, Aug 27, 2013 at 9:12 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > Something like the attached. Note, this patch is for research > purposes only and should *not* be applied to your production > environment. Here is a revised version that is missing the spurious whitespace edit. merlin
Attachment
On 2013-08-27 09:57:38 -0500, Merlin Moncure wrote: > + bool > + RecoveryMightBeInProgress(void) > + { > + /* > + * We check shared state each time only until we leave recovery mode. We > + * can't re-enter recovery, so there's no need to keep checking after the > + * shared variable has once been seen false. > + */ > + if (!LocalRecoveryInProgress) > + return false; > + else > + { > + /* use volatile pointer to prevent code rearrangement */ > + volatile XLogCtlData *xlogctl = XLogCtl; > + > + /* Intentionally query xlogctl without spinlocking! */ > + LocalRecoveryInProgress = xlogctl->SharedRecoveryInProgress; > + > + return LocalRecoveryInProgress; > + } > + } I don't think it's acceptable to *set* LocalRecoveryInProgress here. That should only be done in the normal routine. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Aug 27, 2013 at 10:55 AM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2013-08-27 09:57:38 -0500, Merlin Moncure wrote: >> + bool >> + RecoveryMightBeInProgress(void) >> + { >> + /* >> + * We check shared state each time only until we leave recovery mode. We >> + * can't re-enter recovery, so there's no need to keep checking after the >> + * shared variable has once been seen false. >> + */ >> + if (!LocalRecoveryInProgress) >> + return false; >> + else >> + { >> + /* use volatile pointer to prevent code rearrangement */ >> + volatile XLogCtlData *xlogctl = XLogCtl; >> + >> + /* Intentionally query xlogctl without spinlocking! */ >> + LocalRecoveryInProgress = xlogctl->SharedRecoveryInProgress; >> + >> + return LocalRecoveryInProgress; >> + } >> + } > > I don't think it's acceptable to *set* LocalRecoveryInProgress > here. That should only be done in the normal routine. quite right -- that was a major error -- you could bypass the initialization call to the xlog with some bad luck. merlin
Attachment
Hello.

We have not been able to reproduce this problem on a test servers. Use this patch to production servers do not dare.
In the course of studying the problems we have identified that many queries are executed on the slave several times slower. On master function heap_hot_search_buffer execute 100 cycles, on the slave the same query with the same plan function heap_hot_search_buffer execute 2000 cycles.
Also, we were able to reproduce the problem on the master and detect that there s_lock of slow queries.
We have solved this problem. A large number of queries used 4 frequently changing index. In these indexes, 99% of the dead tuples. Autovacuum and even VACUUM FULL these tuples can not be removed because of autovacuum_freeze_max_age.
We've added cron that 2-3 times a day, performs CREATE INDEX CONCURRENTLY idx_name_new; DROP INDEX CONCURRENTLY idx_name; ALTER INDEX idx_name_new RENAME TO idx_name; for this 4 indexes.
As a result s_lock not exists in listed perf top.

2013/8/29 Merlin Moncure <mmoncure@gmail.com>
so -- are you in a position where you might be able to test this patch?
merlin
Hi, On 2013-09-17 17:55:01 +0600, Дмитрий Дегтярёв wrote: > We have not been able to reproduce this problem on a test servers. Use this > patch to production servers do not dare. > > In the course of studying the problems we have identified that many queries > are executed on the slave several times slower. On master function > heap_hot_search_buffer execute 100 cycles, on the slave the same query with > the same plan function heap_hot_search_buffer execute 2000 cycles. > Also, we were able to reproduce the problem on the master and detect that > there s_lock of slow queries. What you describe is normally an indication that you have too many longrunning transactions around preventing hot pruning from working. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Sep 17, 2013 at 6:59 AM, Andres Freund <andres@2ndquadrant.com> wrote: > Hi, > > On 2013-09-17 17:55:01 +0600, Дмитрий Дегтярёв wrote: >> We have not been able to reproduce this problem on a test servers. Use this >> patch to production servers do not dare. >> >> In the course of studying the problems we have identified that many queries >> are executed on the slave several times slower. On master function >> heap_hot_search_buffer execute 100 cycles, on the slave the same query with >> the same plan function heap_hot_search_buffer execute 2000 cycles. >> Also, we were able to reproduce the problem on the master and detect that >> there s_lock of slow queries. > > What you describe is normally an indication that you have too many > longrunning transactions around preventing hot pruning from working. Do you think it's worth submitting the lock avoidance patch for formal review? merlin
On 2013-09-17 08:18:54 -0500, Merlin Moncure wrote: > On Tue, Sep 17, 2013 at 6:59 AM, Andres Freund <andres@2ndquadrant.com> wrote: > > Hi, > > > > On 2013-09-17 17:55:01 +0600, Дмитрий Дегтярёв wrote: > >> We have not been able to reproduce this problem on a test servers. Use this > >> patch to production servers do not dare. > >> > >> In the course of studying the problems we have identified that many queries > >> are executed on the slave several times slower. On master function > >> heap_hot_search_buffer execute 100 cycles, on the slave the same query with > >> the same plan function heap_hot_search_buffer execute 2000 cycles. > >> Also, we were able to reproduce the problem on the master and detect that > >> there s_lock of slow queries. > > > > What you describe is normally an indication that you have too many > > longrunning transactions around preventing hot pruning from working. > > Do you think it's worth submitting the lock avoidance patch for formal review? You mean the bufmgr.c thing? Generally I think that that code needs a good of scalability work - there's a whole thread about it somewhere. But TBH the theories you've voiced about the issues you've seen haven't convinced me so far. If you can manage to prove it has a benefit in some case that's reproducable - why not go ahead? Quick question: Do you happen to have pg_locks output from back then around? We've recently found servers going into somewhat similar slowdowns because they exhausted the fastpath locks which made lwlocks far more expensive and made s_lock go up very high in the profle. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Sep 17, 2013 at 8:24 AM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2013-09-17 08:18:54 -0500, Merlin Moncure wrote: >> Do you think it's worth submitting the lock avoidance patch for formal review? > > You mean the bufmgr.c thing? Generally I think that that code needs a > good of scalability work - there's a whole thread about it > somewhere. But TBH the theories you've voiced about the issues you've > seen haven't convinced me so far. er, no (but I share your skepticism -- my challenge right now is to demonstrate measurable benefit which so far I've been unable to do). I was talking about the patch on *this* thread which bypasses the s_lock in RecoveryInProgress() :-). > Quick question: Do you happen to have pg_locks output from back then > around? We've recently found servers going into somewhat similar > slowdowns because they exhausted the fastpath locks which made lwlocks > far more expensive and made s_lock go up very high in the profle. I do. Unfortunately I don't have profile info. Not sure how useful it is -- I'll send it off-list. merlin
On 2013-09-17 08:32:30 -0500, Merlin Moncure wrote: > On Tue, Sep 17, 2013 at 8:24 AM, Andres Freund <andres@2ndquadrant.com> wrote: > > On 2013-09-17 08:18:54 -0500, Merlin Moncure wrote: > >> Do you think it's worth submitting the lock avoidance patch for formal review? > > > > You mean the bufmgr.c thing? Generally I think that that code needs a > > good of scalability work - there's a whole thread about it > > somewhere. But TBH the theories you've voiced about the issues you've > > seen haven't convinced me so far. > > er, no (but I share your skepticism -- my challenge right now is to > demonstrate measurable benefit which so far I've been unable to do). > I was talking about the patch on *this* thread which bypasses the > s_lock in RecoveryInProgress() :-). Ah, yes. Sorry confused issues ;). Yes, I think that'd made sense. > > Quick question: Do you happen to have pg_locks output from back then > > around? We've recently found servers going into somewhat similar > > slowdowns because they exhausted the fastpath locks which made lwlocks > > far more expensive and made s_lock go up very high in the profle. > > I do. Unfortunately I don't have profile info. Not sure how useful > it is -- I'll send it off-list. Great. The primary thing I'd like to know is whether there are lots of non-fastpath locks... If you ever get into the situation I mistakenly referred to again, I'd strongly suggest recompling postgres with -fno-omit-frame-pointer. That makes hierarchical profiles actually useful which can help tremendously with diagnosing issues like this... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Sep 17, 2013 at 8:35 AM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2013-09-17 08:32:30 -0500, Merlin Moncure wrote: >> On Tue, Sep 17, 2013 at 8:24 AM, Andres Freund <andres@2ndquadrant.com> wrote: >> > On 2013-09-17 08:18:54 -0500, Merlin Moncure wrote: >> >> Do you think it's worth submitting the lock avoidance patch for formal review? >> > >> > You mean the bufmgr.c thing? Generally I think that that code needs a >> > good of scalability work - there's a whole thread about it >> > somewhere. But TBH the theories you've voiced about the issues you've >> > seen haven't convinced me so far. >> >> er, no (but I share your skepticism -- my challenge right now is to >> demonstrate measurable benefit which so far I've been unable to do). >> I was talking about the patch on *this* thread which bypasses the >> s_lock in RecoveryInProgress() :-). > > Ah, yes. Sorry confused issues ;). Yes, I think that'd made sense. > >> > Quick question: Do you happen to have pg_locks output from back then >> > around? We've recently found servers going into somewhat similar >> > slowdowns because they exhausted the fastpath locks which made lwlocks >> > far more expensive and made s_lock go up very high in the profle. >> >> I do. Unfortunately I don't have profile info. Not sure how useful >> it is -- I'll send it off-list. > > Great. > > The primary thing I'd like to know is whether there are lots of > non-fastpath locks... > > If you ever get into the situation I mistakenly referred to again, I'd > strongly suggest recompling postgres with -fno-omit-frame-pointer. That > makes hierarchical profiles actually useful which can help tremendously > with diagnosing issues like this... We may get an opportunity to do that. I'm curious enough about the THP compaction issues that Kevin mentioned to to maybe consider cranking buffers again. If I do that, it will be with strict instructions to the site operators to catch a profile before taking further action. merlin
On 2013-09-17 08:40:23 -0500, Merlin Moncure wrote: > > If you ever get into the situation I mistakenly referred to again, I'd > > strongly suggest recompling postgres with -fno-omit-frame-pointer. That > > makes hierarchical profiles actually useful which can help tremendously > > with diagnosing issues like this... > > We may get an opportunity to do that. I'm curious enough about the > THP compaction issues that Kevin mentioned to to maybe consider > cranking buffers again. If I do that, it will be with strict > instructions to the site operators to catch a profile before taking > further action. The THP issues should be very clearly diagnosable because a good part of the time will be spent in the kernel. Lots of spinlocking there, but the function names are easily discernible from pg's code. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2013-08-27 12:17:55 -0500, Merlin Moncure wrote: > On Tue, Aug 27, 2013 at 10:55 AM, Andres Freund <andres@2ndquadrant.com> wrote: > > On 2013-08-27 09:57:38 -0500, Merlin Moncure wrote: > >> + bool > >> + RecoveryMightBeInProgress(void) > >> + { > >> + /* > >> + * We check shared state each time only until we leave recovery mode. We > >> + * can't re-enter recovery, so there's no need to keep checking after the > >> + * shared variable has once been seen false. > >> + */ > >> + if (!LocalRecoveryInProgress) > >> + return false; > >> + else > >> + { > >> + /* use volatile pointer to prevent code rearrangement */ > >> + volatile XLogCtlData *xlogctl = XLogCtl; > >> + > >> + /* Intentionally query xlogctl without spinlocking! */ > >> + LocalRecoveryInProgress = xlogctl->SharedRecoveryInProgress; > >> + > >> + return LocalRecoveryInProgress; > >> + } > >> + } > > > > I don't think it's acceptable to *set* LocalRecoveryInProgress > > here. That should only be done in the normal routine. > > quite right -- that was a major error -- you could bypass the > initialization call to the xlog with some bad luck. I've seen this in profiles since, so I'd appreciate pushing this forward. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Sep 26, 2013 at 6:08 PM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2013-08-27 12:17:55 -0500, Merlin Moncure wrote: >> On Tue, Aug 27, 2013 at 10:55 AM, Andres Freund <andres@2ndquadrant.com> wrote: >> > On 2013-08-27 09:57:38 -0500, Merlin Moncure wrote: >> >> + bool >> >> + RecoveryMightBeInProgress(void) >> >> + { >> >> + /* >> >> + * We check shared state each time only until we leave recovery mode. We >> >> + * can't re-enter recovery, so there's no need to keep checking after the >> >> + * shared variable has once been seen false. >> >> + */ >> >> + if (!LocalRecoveryInProgress) >> >> + return false; >> >> + else >> >> + { >> >> + /* use volatile pointer to prevent code rearrangement */ >> >> + volatile XLogCtlData *xlogctl = XLogCtl; >> >> + >> >> + /* Intentionally query xlogctl without spinlocking! */ >> >> + LocalRecoveryInProgress = xlogctl->SharedRecoveryInProgress; >> >> + >> >> + return LocalRecoveryInProgress; >> >> + } >> >> + } >> > >> > I don't think it's acceptable to *set* LocalRecoveryInProgress >> > here. That should only be done in the normal routine. >> >> quite right -- that was a major error -- you could bypass the >> initialization call to the xlog with some bad luck. > > I've seen this in profiles since, so I'd appreciate pushing this > forward. roger that -- will push ahead when I get into the office... merlin
On Thu, Sep 26, 2013 at 10:14 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Thu, Sep 26, 2013 at 6:08 PM, Andres Freund <andres@2ndquadrant.com> wrote: >> On 2013-08-27 12:17:55 -0500, Merlin Moncure wrote: >>> On Tue, Aug 27, 2013 at 10:55 AM, Andres Freund <andres@2ndquadrant.com> wrote: >>> > On 2013-08-27 09:57:38 -0500, Merlin Moncure wrote: >>> >> + bool >>> >> + RecoveryMightBeInProgress(void) >>> >> + { >>> >> + /* >>> >> + * We check shared state each time only until we leave recovery mode. We >>> >> + * can't re-enter recovery, so there's no need to keep checking after the >>> >> + * shared variable has once been seen false. >>> >> + */ >>> >> + if (!LocalRecoveryInProgress) >>> >> + return false; >>> >> + else >>> >> + { >>> >> + /* use volatile pointer to prevent code rearrangement */ >>> >> + volatile XLogCtlData *xlogctl = XLogCtl; >>> >> + >>> >> + /* Intentionally query xlogctl without spinlocking! */ >>> >> + LocalRecoveryInProgress = xlogctl->SharedRecoveryInProgress; >>> >> + >>> >> + return LocalRecoveryInProgress; >>> >> + } >>> >> + } >>> > >>> > I don't think it's acceptable to *set* LocalRecoveryInProgress >>> > here. That should only be done in the normal routine. >>> >>> quite right -- that was a major error -- you could bypass the >>> initialization call to the xlog with some bad luck. >> >> I've seen this in profiles since, so I'd appreciate pushing this >> forward. > > roger that -- will push ahead when I get into the office... attached is new version fixing some comment typos. merlin