Home > mailing lists

Re: Vacuum: allow usage of more than 1GB of work mem - Mailing list pgsql-hackers

From	Pavan Deolasee
Subject	Re: Vacuum: allow usage of more than 1GB of work mem
Date	September 16, 2016 03:39:41
Msg-id	CABOikdN1O7PEUK+TFAAHxQGG3HTuGmXY5B7mcW=G8MgEHrojpw@mail.gmail.com Whole thread Raw
In response to	Re: Vacuum: allow usage of more than 1GB of work mem (Claudio Freire <klaussfreire@gmail.com>)
Responses	Re: Vacuum: allow usage of more than 1GB of work mem Re: Vacuum: allow usage of more than 1GB of work mem
List	pgsql-hackers

Tree view

On Fri, Sep 16, 2016 at 12:24 AM, Claudio Freire <klaussfreire@gmail.com> wrote:

On Thu, Sep 15, 2016 at 3:48 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> For example, we always allocate the TID array as large as we can fit into
> m_w_m, but maybe we don't need to wait with switching to the bitmap until
> filling the whole array - we could wait as long as the bitmap fits into the
> remaining part of the array, build it there and then copy it to the
> beginning (and use the bitmap from that point).

The bitmap can be created like that, but grow from the end of the
segment backwards.

So the scan can proceed until the bitmap fills the whole segment
(filling backwards), no copy required.

I thought about those approaches when I suggested starting with half m_w_m. So you fill in TIDs from one end and upon reaching half point, convert that into bitmap (assuming stats tell you that there is significant savings with bitmaps) but fill it from the other end. Then reset the TID array and start filling up again. That guarantees that you can always work within available limit.

But I actually wonder if we are over engineering things and overestimating cost of memmove etc. How about this simpler approach:

1. Divide table in some fixed size chunks like Robert suggested. Say 1GB

2. Allocate pointer array to store a pointer to each segment. For 1TB table, thats about 8192 bytes.

3. Allocate a bitmap which can hold MaxHeapTuplesPerPage * chunk size in pages. For 8192 block and 1GB chunk, thats about 4.6MB. Note: I'm suggesting to use a bitmap here because provisioning for worst case, fixed size TID array will cost us 200MB+ where as a bitmap is just 4.6MB.

4. Collect dead tuples in that 1GB chunk. Also collect stats so that we know about the most optimal representation.

5. At the end of 1GB scan, if no dead tuples found, set the chunk pointer to NULL, move to next chunk and restart step 4. If dead tuples found, then check:

a. If bitmap can be further compressed by using less number of bits per page. If so, allocate a new bitmap and compress the bitmap.

b. If TID array will be a more compact representation. If so, allocate a TID array of right size and convert bitmap into an array.

c. Set chunk pointer to whichever representation we choose (of course add headers etc to interpret the representation)

6. Continue until we consume all m_w_m or end-of-table is reached. If we consume all m_w_m then do a round of index cleanup and restart.

I also realised that we can compact the TID array in step (b) above because we only need to store 17 bits for block numbers (we already know which 1GB segment they belong to). Given that usable offsets are also just 13 bits, TID array needs only 4 bytes per TID instead of 6.

Many people are working on implementing different ideas, and I can volunteer to write a patch on these lines unless someone wants to do that.

Thanks,

Pavan

Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

From: Thomas Munro
Date: 16 September 2016, 03:11:48
Subject: Re: kqueue

From: Ashutosh Bapat
Date: 16 September 2016, 04:27:14
Subject: Re: Parallel sec scan in plpgsql

Re: Vacuum: allow usage of more than 1GB of work mem - Mailing list pgsql-hackers

Previous

Next