Re: Deleting older versions in unique indexes to avoid page splits - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Deleting older versions in unique indexes to avoid page splits |
Date | |
Msg-id | CAH2-WzmQcpHEATvgR4bb0aO8bWDggPJVabw488arYe9L+72V5A@mail.gmail.com Whole thread Raw |
In response to | Re: Deleting older versions in unique indexes to avoid page splits (Victor Yegorov <vyegorov@gmail.com>) |
Responses |
Re: Deleting older versions in unique indexes to avoid page splits
|
List | pgsql-hackers |
On Tue, Nov 17, 2020 at 7:24 AM Victor Yegorov <vyegorov@gmail.com> wrote: > I've looked through the code and it looks very good from my end: > - plenty comments, good description of what's going on > - I found no loose ends in terms of AM integration > - magic constants replaced with defines > Code looks good. Still, it'd be good if somebody with more experience could look into this patch. Great, thank you. > Question: why in the comments you're using double spaces after dots? > Is this a convention of the project? Not really. It's based on my habit of trying to be as consistent as possible with existing code. There seems to be a weak consensus among English speakers on this question, which is: the two space convention is antiquated, and only ever made sense in the era of mechanical typewriters. I don't really care either way, and I doubt that any other committer pays much attention to these things. You may have noticed that I use only one space in my e-mails. Actually, I probably shouldn't care about it myself. It's just what I decided to do at some point. I find it useful to decide that this or that practice is now a best practice, and then stick to it without thinking about it very much (this frees up space in my head to think about more important things). But this particular habit of mine around spaces is definitely not something I'd insist on from other contributors. It's just that: a habit. > I am thinking of two more scenarios that require testing: > - queue in the table, with a high rate of INSERTs+DELETEs and a long transaction. I see your point. This is going to be hard to make work outside of unique indexes, though. Unique indexes are already not dependent on the executor hint -- they can just use the "uniquedup" hint. The code for unique indexes is prepared to notice duplicates in _bt_check_unique() in passing, and apply the optimization for that reason. Maybe there is some argument to forgetting about the hint entirely, and always assuming that we should try to find tuples to delete at the point that a page is about to be split. I think that that argument is a lot harder to make, though. And it can be revisited in the future. It would be nice to do better with INSERTs+DELETEs, but that's surely not the big problem for us right now. I realize that this unique indexes/_bt_check_unique() thing is not even really a partial fix to the problem you describe. The indexes that have real problems with such an INSERTs+DELETEs workload will naturally not be unique indexes -- _bt_check_unique() already does a fairly good job of controlling bloat without bottom-up deletion. > - upgraded cluster with !heapkeyspace indexes. I do have a patch that makes that easy to test, that I used for the Postgres 13 deduplication work -- I can rebase it and post it if you like. You will be able to apply the patch, and run the regression tests with a !heapkeyspace index. This works with only one or two tweaks to the tests (IIRC the amcheck tests need to be tweaked in one place for this to work). I don't anticipate that !heapkeyspace indexes will be a problem, because they won't use any of the new stuff anyway, and because nothing about the on-disk format is changed by bottom-up index deletion. -- Peter Geoghegan
pgsql-hackers by date: