Re: Extreme bloating of intarray GiST indexes - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Extreme bloating of intarray GiST indexes |
Date | |
Msg-id | 15188.1304533765@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Extreme bloating of intarray GiST indexes (Josh Berkus <josh@agliodbs.com>) |
Responses |
Re: Extreme bloating of intarray GiST indexes
|
List | pgsql-hackers |
Josh Berkus <josh@agliodbs.com> writes: >> Can you provide the data in the column that's indexed? > Attached. This is for the index which is 90% free space. I tried loading this data in fresh, and then creating a gist__intbig_ops index on it. I got these pgstattuple numbers (in 8.4.8): table_len | 8806400 tuple_count | 15005 tuple_len | 4081360 tuple_percent | 46.35 dead_tuple_count | 0 dead_tuple_len | 0 dead_tuple_percent | 0 free_space | 4088100 free_percent | 46.42 On the other hand, loading the data with a pre-existing empty index gave table_len | 7798784 tuple_count | 15005 tuple_len | 4081360 tuple_percent | 52.33 dead_tuple_count | 0 dead_tuple_len | 0 dead_tuple_percent | 0 free_space | 3183672 free_percent | 40.82 Neither of those numbers are great, and it's a bit surprising that CREATE INDEX produces a result notably worse than incremental loading; but still a darn sight better than 90% free space. So I think probably the update pattern has a lot to do with this. > * If you didn't notice earlier, it's a partial index. The two columns > which determine the partial index change more often than the intarray > column. Yeah, with only about half of the table actually indexed, since you showed only 7786 index entries in your results. But unless there's reason to think the indexed and unindexed entries are substantially different in the intarray column, this is probably not very relevant. > * We've also determined some other unusual patterns from watching the > application: > (a) the "listings" table is a very wide table, with about 60 columns > (b) whenever the table gets updated, the application code updates these > 60 columns in 4 sections. So there's 4 updates to the same row, in a > single transaction. Hmm. That is going to lead to four dead index entries for every live one (unless some of the updates are HOT, which won't happen if you're changing any indexed columns). VACUUM will get back the space eventually, but not before you've caused some index bloat. I tried doing something similar to my test table: contrib_regression=# alter table listings add column junk int; ALTER TABLE contrib_regression=# create index li on listings(junk); CREATE INDEX contrib_regression=# begin; BEGIN contrib_regression=# update listings set junk=1; UPDATE 15005 contrib_regression=# update listings set junk=2; UPDATE 15005 contrib_regression=# update listings set junk=3; UPDATE 15005 contrib_regression=# update listings set junk=4; UPDATE 15005 contrib_regression=# commit; COMMIT contrib_regression=# vacuum listings; VACUUM and then got these pgstattuple numbers: table_len | 39460864 tuple_count | 15005 tuple_len | 4081360 tuple_percent | 10.34 dead_tuple_count | 0 dead_tuple_len | 0 dead_tuple_percent | 0 free_space | 32923872 free_percent | 83.43 which is up in the same ballpark with your problem. Now probably your client's app is not updating all rows at once, but still this is a pretty wasteful update pattern. Is there a reason not to update all the columns in a single update? If you can't change the app, I'd suggest more aggressive autovacuuming as the least painful fix. regards, tom lane
pgsql-hackers by date: