Re: WIP: Covering + unique indexes. - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: WIP: Covering + unique indexes. |
Date | |
Msg-id | CAH2-Wzk=xU1dDP7B_gq0S052OHeezEtibZ-BMF=YT-W1L+LWBw@mail.gmail.com Whole thread Raw |
In response to | Re: WIP: Covering + unique indexes. (Alexander Korotkov <a.korotkov@postgrespro.ru>) |
Responses |
Re: WIP: Covering + unique indexes.
|
List | pgsql-hackers |
On Fri, Apr 6, 2018 at 11:08 AM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > OK, incorporated into v15. I've also added sentence about pg_upgrade > to the commit message. I will summarize my feelings on this patch. I endorse committing the patch, because I think that the benefits of committing it now noticeably outweigh the costs. I have various caveats about pushing the patch, but these are manageable. Costs ===== First, there is the question of risks, or costs. I think that this patch has a negligible chance of being problematic in a way that will become memorable. That seems improbable because the patch only really changes the representation of what we're calling "pivot keys" (high keys and internal page downlinks), which is something that VACUUM doesn't care about. I see this patch as a special case of suffix truncation, a technique that has been around since the 1970s. Although you have to look carefully to see it, the amount of extra complexity is pretty small, and the only place where a critical change is made is during leaf page splits. As long as we get that right, everything else should fall into place. There are no risks that I can see that are related to concurrency, or that crop up when doing an anti-wraparound VACUUM. There may be problems, but at least they won't be *pernicious* problems that unravel over a long period of time. The latest amcheck enhancement, and Alexander's recent changes to the patch to make the on-disk representation explicit (not implicit) should change things. We now have the tools to detect any corruption problem that I can think of. For example, if there was some subtle reason why assessing HOT safety broke, then we'd have a way of mechanically detecting that without having to devise a custom test (like the test Pavan happened to be using when the bug fixed by 2aaec654 was originally discovered). The lessons that I applied to designing amcheck were in a few cases from actual experience with real world bugs, including that 2aaec654 bug. I hope that it goes without saying that I've also taken reasonable steps to address all of these risks directly, by auditing code. And, that this remains the first line of defense. Here are the other specific issues that I see with the patch: * It's possible that something was missed in the optimizer. I'm not sure. I share the intuition that very little code is actually needed there, but I'm far from the best person to judge whether or not some subtle detail was missed. * This seems out of date: > + * NOTE: It is not crucial for reliability in present, but maybe > + * it will be that in the future. Now the purpose is just to save > + * more space on inner pages of btree. * CheckIndexCompatible() didn't seem to get the memo about this patch. Maybe just a comment? * It's possible that there are some more bugs in places like relcache.c, or deparsing, or pg_dump, or indexcmds.c; perhaps simple omissions, like the one I just mentioned. If there are, I don't expect them to be particularly serious, or to make me reassess my basic position. But there could be. * I was wrong to suggest _bt_isequal() has an assertion against truncation. It is called for the highkey. Suggest you weaken the assertion, so it only applies when the offset isn't P_HIKEY on non-rightmost page. * Suggest adding a comment above BTStackData, about bts_btentry + offset. * Suggest breaking BTEntrySame() into 3 lines, not 2. * This comment needs to be updated: /* get high key from left page == lowest key on new right page */ Suggest "get high key from left page == lower bound for new right page". * This comment needs to be updated: 13th bit: unused Suggest "13th bit: AM-defined meaning" * Suggest adding a note that the use of P_HIKEY here is historical, since it isn't used to match downlinks: /* * Find the parent buffer and get the parent page. * * Oops - if we were moved right then we need to change stack item! We * want to find parent pointing to where we are, right ? - vadim * 05/27/97 */ ItemPointerSet(&(stack->bts_btentry.t_tid), bknum, P_HIKEY); pbuf = _bt_getstackbuf(rel, stack, BT_WRITE); * I'm slightly concerned that this patch subtly breaks an optimization within _bt_preprocess_keys(), or _bt_checkkeys(). I cannot find any evidence of that, though, and I consider it unlikely, based on the intuition that the simple Pathkey changes in the optimizer don't provide the executor with a truly new set of constraints for index scans. Also, if there was a problem here, it would be in the less serious category of problems -- those that can't really affect anyone not using the user-visible feature. * The docs need some more polishing. Didn't spend very much time on this at all. Benefits ======== There is also the matter of the benefits of this patch, that I think are considerable, and far greater than they appear. This feature is a great way to begin to add a broad variety of enhancements to nbtree that we really need. * The patch makes index-only scans a lot more compelling. There are a couple of reasons why it's better to create indexes that index perhaps as many as 4 or 7 columns to target index-only scans in other database systems. I think that fan-out may be the main one. The disadvantage that we have around HOT safety compared to other systems seem less likely to be the problem when that many columns are involved, and yet this is something that Oracle/SQL Server people do frequently, and Postgres people don't really do at all. This one thing that suffix truncation improves automatically, but INCLUDE indexes can make that general situation a lot better than truncation alone ever could. If you have an index where most columns are INCLUDE columns, and compare that to an index with the same attributes that are indexed in the conventional way, then I believe that you will have far fewer problems with index bloat in some important cases. Apart from everything else, this provides us with the opportunity to learn how to mitigate index bloat problems in real world conditions, even without INCLUDE indexes. We need to get smarter about problems with index bloat. * Suffix truncation works on the same principle, and is enabled by this work. It's prerequisite to making nbtree use the classic L&Y approach, that assumes that all items in the index are unique. We could just add heap TID to pivot tuples today, as an "extra" column, while sorting on TID at the leaf level. This would make TID a first class part of the key space -- a "unique-ifier", as L&Y intended. But doing so naively would add enormous overhead, which would simply be unacceptable. However, once we have suffix truncation, the overhead is eliminated in virtually all cases. We get to move to the classic L&Y invariant, simplifying the code, and we have a solid basis for adding "retail index tuple deletion", which I believe is almost essentially for zheap. There is a good chance that Postgres B-Trees are the only implementation in the world that doesn't have truly unique keys. The design of nbtree would become a whole lot more elegant if we could restore the classic "Ki < v <= Ki+1" invariant, as Vadim intended over 20 years ago. Somebody has to bite the bullet and start changing the representation of pivot tuples to get these benefits (and many more). This seems like an ideal place to start that process. I think that what we have here addresses concerns from Tom [1], in particular. The patch has been marked "Ready for Committer". While this patch is primarily the responsibility of the committer, presumably Teodor in this case, I will take some of the responsibility for the patch after commit. Certainly, because I see the patch as strategically important, I am willing to spend quite a lot of time after feature freeze, to make sure that it is in good shape. I have a general interest in making sure that amcheck gains acceptance as a way of validating a complicated patch like this one after commit. [1] https://www.postgresql.org/message-id/15195.1490988897%40sss.pgh.pa.us -- Peter Geoghegan
pgsql-hackers by date: