Re: 10.1: hash index size exploding on vacuum full analyze - Mailing list pgsql-bugs
From | Amit Kapila |
---|---|
Subject | Re: 10.1: hash index size exploding on vacuum full analyze |
Date | |
Msg-id | CAA4eK1+PYgxOKr-wJiOF37BWFdPjH6O-OyNN1V+58T=gtyY6tg@mail.gmail.com Whole thread Raw |
In response to | Re: 10.1: hash index size exploding on vacuum full analyze (AP <pgsql@inml.weebeastie.net>) |
Responses |
Re: 10.1: hash index size exploding on vacuum full analyze
Re: 10.1: hash index size exploding on vacuum full analyze |
List | pgsql-bugs |
On Mon, Nov 20, 2017 at 5:01 AM, AP <pgsql@inml.weebeastie.net> wrote: > On Thu, Nov 16, 2017 at 09:48:13AM +0530, Amit Kapila wrote: >> On Thu, Nov 16, 2017 at 4:59 AM, AP <pgsql@inml.weebeastie.net> wrote: >> > I've some tables that'll never grow so I decided to replace a big index >> > with one with a fillfactor of 100. That went well. The index shrunk to >> > 280GB. I then did a vacuum full analyze on the table to get rid of any >> > cruft (as the table will be static for a long time and then only deletes >> > will happen) and the index exploded to 701GB. When it was created with >> > fillfactor 90 (organically by filling the table) the index was 309GB. > > FYI: Nuking the above and doing a create index run gave me a 280GB index again. > >> Sounds quite strange. I think during vacuum it leads to more number >> of splits than when the original data was loaded. By any chance do >> you have a copy of both the indexes (before vacuum full and after >> vacuum full)? Can you once check and share the output of >> pgstattuple-->pgstathashindex() and pageinspect->hash_metapage_info()? >> I wanted to confirm if the bloat is due to additional splits. > > Before VACUUM FULL: > > Schema | Name | Type | Owner | Table | Size | Description > ---------------+----------------------------------------------------+-------+-----------+---------------------------------------+--------+------------- > bundle_link | be5be0ac8_3ba5_407d_8183_2d05b9387e81_datum_id_idx | index | mdkingpin | be5be0ac8_3ba5_407d_8183_2d05b9387e81| 273 GB | > > mdstash=# select * from hash_metapage_info(get_raw_page('bundle_link.be5be0ac8_3ba5_407d_8183_2d05b9387e81_datum_id_idx',0))\gx > -[ RECORD 1 ]---...--- > magic | 105121344 > version | 4 > ntuples | 9123458028 > ffactor | 409 > bsize | 8152 > bmsize | 4096 > bmshift | 15 > maxbucket | 25165823 > highmask | 33554431 > lowmask | 16777215 > ovflpoint | 71 > firstfree | 10623106 > nmaps | 325 > procid | 456 > ... > Time: 0.613 ms > > mdstash=# select * from pgstathashindex('bundle_link.be5be0ac8_3ba5_407d_8183_2d05b9387e81_datum_id_idx'); > version | bucket_pages | overflow_pages | bitmap_pages | unused_pages | live_items | dead_items | free_percent > ---------+--------------+----------------+--------------+--------------+------------+------------+------------------ > 4 | 25165824 | 10622781 | 325 | 0 | 9123458028 | 0 | 37.4567373970968 > (1 row) > > Time: 2002419.406 ms (33:22.419) > > After VACUUM FULL: > > Schema | Name | Type | Owner | Table | Size | Description > ---------------+----------------------------------------------------+-------+-----------+---------------------------------------+--------+------------- > bundle_link | be5be0ac8_3ba5_407d_8183_2d05b9387e81_datum_id_idx | index | mdkingpin | be5be0ac8_3ba5_407d_8183_2d05b9387e81| 701 GB | > > -[ RECORD 1 ]---...--- > magic | 105121344 > version | 4 > ntuples | 9123458028 > ffactor | 409 > bsize | 8152 > bmsize | 4096 > bmshift | 15 > maxbucket | 83886079 > highmask | 134217727 > lowmask | 67108863 > ovflpoint | 78 > firstfree | 7996259 > nmaps | 245 > procid | 456 > > Time: 69.237 ms > > mdstash=# select * from pgstathashindex('bundle_link.be5be0ac8_3ba5_407d_8183_2d05b9387e81_datum_id_idx'); > version | bucket_pages | overflow_pages | bitmap_pages | unused_pages | live_items | dead_items | free_percent > ---------+--------------+----------------+--------------+--------------+------------+------------+------------------ > 4 | 83886080 | 7996014 | 245 | 0 | 9123458028 | 0 | 75.6390388675015 > (1 row) > > Time: 2474290.172 ms (41:14.290) > Based on above data, we can easily see that after vacuum full, there is a huge increase in free_percent which is mostly due to the additional bucket_pages after vacuum full. See the below calculation: Index size difference = 701 - 273 = 428GB Bucket page size difference = 83886080 - 25165824 = 58720256 = 448GB Overflow page size difference = 7996014 - 10622781 = -2626767 = -20GB So, if we just add the difference of bucket pages and overflow pages size, it will give us the difference of size you are seeing after vacuum full. So, this clearly indicates the theory I was speculating above that somehow the estimated number of tuples (based on which number of buckets are computed) is different when we do a vacuum full. On further looking into it, I found that the relcacheentry for a relation doesn't have the correct value for relpages and reltuples during vacuum full due to which estimate_rel_size can give some size which might be quite different and then hashbuild can create buckets which it might not even need to populate the tuples. I am not sure if it is expected to have uninitialized (0) values for these attributes during vacuum full, but I see that in swap_relation_files when we swap the statistics, we are assuming that new rel has freshly-updated stats which I think is not true. This needs some further investigation. Another angle to look at it is that even if the values of relpages and reltuples is not updated why we get such a wrong estimation by estimate_rel_size. I think to some extent it depends on the schema of the table, so is it possible for you to share schema of the table. > Tell me if you need me to keep the index around. > I don't think so, but till we solve the problem there is no harm in keeping it if possible because one might want some information at a later stage to debug this problem. OTOH, if you have space crunch then feel free to delete it. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
pgsql-bugs by date: