Re: WIP: Covering + unique indexes. - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: WIP: Covering + unique indexes. |
Date | |
Msg-id | CAM3SWZRheiMRFWmZ39X=G=w0WgZq5qqiyFAzCMg4jBpMPMsTJA@mail.gmail.com Whole thread Raw |
In response to | Re: WIP: Covering + unique indexes. (Anastasia Lubennikova <a.lubennikova@postgrespro.ru>) |
Responses |
Re: WIP: Covering + unique indexes.
|
List | pgsql-hackers |
On Tue, Apr 5, 2016 at 7:56 AM, Anastasia Lubennikova <a.lubennikova@postgrespro.ru> wrote: > I would say, this is changed, but it doesn't matter. Actually, I would now say that it hasn't really changed (see below), based on my new understanding. The *choice* to go on one page or the other still exists. > Performing any search in btree (including choosing suitable page for > insertion), we use only key attributes. > We assume that included columns are stored in index unordered. The patch assumes no ordering for the non-indexed columns in the index? While I knew that the patch was primarily motivated by enabling index-only scans, I didn't realize that at all. The patch is much much less like a general suffix truncation patch than I thought. I may have been confused in part by the high key issue that you only recently fixed, but you should have corrected me about suffix truncation earlier. Obviously, this was a significant misunderstanding; we have been "talking at cross purposes" this whole time. There seems to have been significant misunderstanding about this before now: http://www.postgresql.org/message-id/CAKJS1f9W0aB-g7H6yYgNBq7hJsOKF3UwHU7-Q5jobbaTyK9f4g@mail.gmail.com My new understanding: The extra "included" columns are stored in the index, but do not affect its sort order at all. They are no more part of the key than, say, the heap TID that the key points to. They are just "payload". > "just matching on constrained attributes" is the core idea of the whole > patch. Included columns just provide us possibility to use index-only scan. > Nothing more. We assume use case where index-only-scan is faster than > index-scan + heap fetch. For example, in queries like "select data from tbl > where id = 1;" we have no scan condition on data. Maybe you afraid of long > linear scan when we have enormous index bloat even on unique index. It will > happen anyway, whether we have index-only scan on covering index or > index-scan on unique index + heap fetch. The only difference is that the > covering index is faster. My concern about performance when that happens is very much secondary. I really only mentioned it to help explain my primary concern. > At the very beginning of the proposal discussion, I suggested to implement > third kind of columns, which are not constrained, but used in scankey. > They must have op class to do it, and they are not truncated. But it was > decided to abandon this feature. I must have missed that. Obviously, I wasn't paying enough attention to earlier discussion. Earlier versions of the patch did fail to recognize that the sort order was not the entire indexed order, but that isn't the case with V8. That that was ever possible was only a bug, it turns out. >> The more I think about it, the more I doubt that it's okay to not >> ensure downlinks are always distinct with their siblings, by sometimes >> including non-constrained (truncatable) attributes within internal >> pages, as needed to *distinguish* downlinks (also, we must >> occasionally have *all* attributes including truncatable attributes in >> internal pages -- we must truncate nothing to keep the key space sane >> in the parent). > Frankly, I still do not understand what you're worried about. > If high key is greater than the scan key, we definitely cannot find any more > tuples, because key attributes are ordered. > If high key is equal to the scan key, we will continue searching and read > next page. I thought, because of the emphasis on unique indexes, that this patch was mostly to offer a way of getting an index with uniqueness only enforced on certain columns, but otherwise just the same as having a non-unique index on those same columns. Plus, some suffix truncation, because point-lookups involving later attributes are unlikely to be useful when this is scoped to just unique indexes (which were emphasized by you), because truncating key columns is not helpful unless bloat is terrible. I now understand that it was quite wrong to link this to suffix truncation at all. The two are really not the same. That does make the patch seem significantly simpler, at least as far as nbtree goes; a tool like amcheck is not likely to detect problems in this patch that a human tester could not catch. That was the kind of problem that I feared. > The code is not changed here, it is the same as processing of duplicates > spreading over several pages. If you do not trust postgresql btree changes > to the L&Y to make duplicates work, I don't know what to say, but it's > definitely not related to my patch. My point about the postgres btree changes to L&Y to make duplicates work is that I think it makes the patch work, but perhaps not absolutely reliably. I don't have any specific misgivings about it on its own. Again, my earlier remarks were based on a misguided understanding of the patch, so it doesn't matter now. Communication is hard. There may be a lesson here for both of us about that. > Of course I do not mind if someone will do more testing. > I did some tests and didn't find anything special. Besides, don't we have > special alpha and beta release stages to find tricky bugs? Our history of committing performance improvements to the B-Tree code is limited, particularly in the last 5 years. That's definitely a problem, and one that I have tried to make smaller, but it is the reality. BTW, I can see why you used index_reform_tuple(), rather than trying to modify an existing tuple in place. NULL bitmaps have a storage overhead in IndexTuples (presumably an alternative approach would make truncated IndexTuples have NULL attributes to represent truncation), whereas the cost of index_reform_tuple() only has to be paid when there is a leaf page split. It's important that truncation is 100% guaranteed to produce a tuple smaller than the inserted tuple, otherwise the user could get a non-recoverable "1/3 of page size exceeded" when they were not the one to insert the big IndexTuple. I should try to see if this could be possible due to some index_reform_tuple() edge-case. -- Peter Geoghegan
pgsql-hackers by date: