Making the initial and maximum DSA segment sizes configurable - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Making the initial and maximum DSA segment sizes configurable |
Date | |
Msg-id | CAD21AoAYGGC1ePjVX0H+pp9rH=9vuPK19nNOiu12NprdV5TVJA@mail.gmail.com Whole thread Raw |
Responses |
Re: Making the initial and maximum DSA segment sizes configurable
|
List | pgsql-hackers |
Hi all, I started this new thread from another thread[1] where we're discussing a new storage for TIDs, TidStore, since we found a difficulty about the memory usage limit for TidStores on DSA. TidStore is a new data structure to efficiently store TIDs, backed by a radix tree. In the patch series proposed on that thread, in addition to radix tree and TidStore, there is another patch for lazy (parallel) vacuum to replace the array of dead tuple TIDs with a TidStore. To support parallel vacuum, radix tree (and TidStore) can be created on a local memory as well as on DSA. Also, it has memory usage limit functionality; we can specify the memory limit (e.g., maintenance_work_mem) to TidStoreCreate() function. Once the total DSA segment size (area->control->total_segment_size) exceeds the limit, TidStoreIsFull() returns true. The lazy vacuum can continue scanning heap blocks to collect dead tuple TIDs until TidStoreIsFull() returns true. Currently lazy vacuum is the sole user of TidStore but maybe it can be used by other codes such as tidbitmap.c where will be limited by work_mem. During the development, we found out that DSA memory growth is unpredictable, leading to inefficient memory limitation. DSA is built on top of DSM segments and it manages a set of DSM segments, adding new segments as required and detaching them when they are no longer needed. The DSA segment starts with 1MB in size and a new segment size is at least big enough to follow a geometric series that approximately doubles the total storage each time we create a new segment. Because of this fact, it's not efficient to simply compare the memory limit to the total segment size. For example, if maintenance_work_mem is 512MB, the total segment size will be like: 2 * (1 + 2 + 4 + 8 + 16 + 32 + 64 + 128) = 510MB -> less than the limit, continue heap scan. 2 * (1 + 2 + 4 + 8 + 16 + 32 + 64 + 128) + 256 = 766MB -> stop (exceed 254MB). One might think we can use dsa_set_size_limit() but it cannot; lazy vacuum ends up with an error. If we set DSA_ALLOC_NO_OOM, we might end up stopping the insertion halfway. Besides excessively allocating memory, since the initial DSM segment size is fixed 1MB, memory usage of a shared TidStore will start from 1MB+. This is higher than the minimum values of both work_mem and maintenance_work_mem, 64kB and 1MB respectively. Increasing the minimum m_w_m to 2MB might be acceptable but not for work_mem. Researching possible solutions, we found that aset.c also has a similar characteristic; allocates an 8K block (by default) upon the first allocation in a context, and doubles that size for each successive block request. But we can specify the initial block size and max blocksize. This made me think of an idea to specify both to DSA and both values are calculated based on m_w_m. I've attached the patch for this idea. The changes to dsa.c are straightforward since dsa.c already uses macros DSA_INITIAL_SEGMENT_SIZE and DSA_MAX_SEGMENT_SIZE. I just made these values configurable. FYI with this patch, we can create a DSA in parallel_vacuum_init() with initial and maximum block sizes as follows: initial block size = min(m_w_m / 4, 1MB) max block size = max(m_w_m / 8, 8MB) In most cases, we can start with a 1MB initial segment, the same as before. For larger memory, the heap scan stops after DSA allocates 1.25 times more memory than m_w_m. For example, if m_w_m = 512MB, the both initial and maximum segment sizes are 1MB and 64MB respectively, and then DSA allocates the segments as follows until heap scanning stops: 2 * (1 + 2 + 4 + 8 + 16 + 32 + 64) + (64 * 4) = 510MB -> less than the limit, continue heap scan. 2 * (1 + 2 + 4 + 8 + 16 + 32 + 64) + (64 * 5) = 574MB -> stop (allocated additional 62MB). It also works with smaller memory; If the limit is 1MB, we start with a 256KB initial segment and heap scanning stops after DSA allocated 1.5MB (= 256kB + 256kB + 512kB + 512kB). There is room for considering better formulas for initial and maximum block sizes but making both values configurable is a promising idea. And the analogous behavior to aset could be a good thing for readability and maintainability. There is another test result where I used this idea on top of a radix tree[2]. We need to consider the total number of allocated DSA segments as the total number of DSM segments available on the system is fixed[3]. But it seems not problematic even with this patch since we allocate only a few additional segments (in above examples 17 segs vs. 19 segs). There was no big difference also in performance[2]. Regards, [1] https://www.postgresql.org/message-id/CAD21AoDBmD5q%3DeO%2BK%3DgyuVt53XvwpJ2dgxPwrtZ-eVOjVmtJjg%40mail.gmail.com [2] https://www.postgresql.org/message-id/CAD21AoDKr%3D4YHphy6cRojE5eyT6E2ao8xb44E309eTrUEOC6xw%40mail.gmail.com [3] from dsm.c, the total number of DSM segments available on the system is calculated by: #define PG_DYNSHMEM_FIXED_SLOTS 64 #define PG_DYNSHMEM_SLOTS_PER_BACKEND 5 maxitems = PG_DYNSHMEM_FIXED_SLOTS + PG_DYNSHMEM_SLOTS_PER_BACKEND * MaxBackends; -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Attachment
pgsql-hackers by date: