Home > mailing lists

Re: Write Ahead Logging for Hash Indexes - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: Write Ahead Logging for Hash Indexes
Date	September 7, 2016 09:58:49
Msg-id	CAA4eK1JS+SiRSQBzEFpnsSmxZKingrRH7WNyWULJeEJSj1-=0w@mail.gmail.com Whole thread
In response to	Re: Write Ahead Logging for Hash Indexes (Jeff Janes <jeff.janes@gmail.com>)
Responses	Re: Write Ahead Logging for Hash Indexes Re: Write Ahead Logging for Hash Indexes Re: Write Ahead Logging for Hash Indexes Re: Write Ahead Logging for Hash Indexes
List	pgsql-hackers

Tree view

On Wed, Aug 24, 2016 at 10:32 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Tue, Aug 23, 2016 at 10:05 PM, Amit Kapila <amit.kapila16@gmail.com>
> wrote:
>>
>> On Wed, Aug 24, 2016 at 2:37 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
>>
>> >
>> > After an intentionally created crash, I get an Assert triggering:
>> >
>> > TRAP: FailedAssertion("!(((freep)[(bitmapbit)/32] &
>> > (1<<((bitmapbit)%32))))", File: "hashovfl.c", Line: 553)
>> >
>> > freep[0] is zero and bitmapbit is 16.
>> >
>>
>> Here what is happening is that when it tries to clear the bitmapbit,
>> it expects it to be set.  Now, I think the reason for why it didn't
>> find the bit as set could be that after the new overflow page is added
>> and the bit corresponding to it is set, you might have crashed the
>> system and the replay would not have set the bit.  Then while freeing
>> the overflow page it can hit the Assert as mentioned by you.  I think
>> the problem here could be that I am using REGBUF_STANDARD to log the
>> bitmap page updates which seems to be causing the issue.  As bitmap
>> page doesn't follow the standard page layout, it would have omitted
>> the actual contents while taking full page image and then during
>> replay, it would not have set the bit, because page doesn't need REDO.
>> I think here the fix is to use REGBUF_NO_IMAGE as we use for vm
>> buffers.
>>
>> If you can send me the detailed steps for how you have produced the
>> problem, then I can verify after fixing whether you are seeing the
>> same problem or something else.
>
>
>
> The test is rather awkward, it might be easier to just have me test it.
>

Okay, I have fixed this issue as explained above.  Apart from that, I
have fixed another issue reported by Mark Kirkwood upthread and few
other issues found during internal testing by Ashutosh Sharma.

The locking issue reported by Mark and Ashutosh is that the patch
didn't maintain the locking order while adding overflow page as it
maintains in other write operations (lock the bucket pages first and
then metapage to perform the write operation).  I have added the
comments in _hash_addovflpage() to explain the locking order used in
modified patch.

During stress testing with pgbench using master-standby setup, we
found an issue which indicates that WAL replay machinery doesn't
expect completely zeroed pages (See explanation of RBM_NORMAL mode
atop XLogReadBufferExtended).  Previously before freeing the overflow
page we were zeroing it, now I have changed it to just initialize the
page such that the page will be empty.

Apart from above, I have added support for old snapshot threshold in
the hash index code.

Thanks to Ashutosh Sharma for doing the testing of the patch and
helping me in analyzing some of the above issues.

I forgot to mention in my initial mail that Robert and I had some
off-list discussions about the design of this patch, many thanks to
him for providing inputs.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

wal_hash_index_v2.patch

pgsql-hackers by date:

From: Petr Jelinek
Date: 07 September 2016, 09:36:47
Subject: Re: Logical Replication WIP

From: Amit Kapila
Date: 07 September 2016, 10:08:32
Subject: Re: Write Ahead Logging for Hash Indexes

Re: Write Ahead Logging for Hash Indexes - Mailing list pgsql-hackers

Attachment

Previous

Next