Re: Index Skip Scan - Mailing list pgsql-hackers
From | Jesper Pedersen |
---|---|
Subject | Re: Index Skip Scan |
Date | |
Msg-id | 802e46fa-dc2c-4c32-e55b-884e04aa7f12@redhat.com Whole thread Raw |
In response to | RE: Index Skip Scan (Floris Van Nee <florisvannee@Optiver.com>) |
Responses |
Re: Index Skip Scan
|
List | pgsql-hackers |
Hi Floris, On 1/15/20 8:33 AM, Floris Van Nee wrote: > I reviewed the latest version of the patch. Overall some good improvements I think. Please find my feedback below. > Thanks for your review ! > - I think I mentioned this before - it's not that big of a deal, but it just looks weird and inconsistent to me: > create table t2 as (select a, b, c, 10 d from generate_series(1,5) a, generate_series(1,100) b, generate_series(1,10000)c); create index on t2 (a,b,c desc); > > postgres=# explain select distinct on (a,b) a,b,c from t2 where a=2 and b>=5 and b<=5 order by a,b,c desc; > QUERY PLAN > --------------------------------------------------------------------------------- > Index Only Scan using t2_a_b_c_idx on t2 (cost=0.43..216.25 rows=500 width=12) > Skip scan: true > Index Cond: ((a = 2) AND (b >= 5) AND (b <= 5)) > (3 rows) > > postgres=# explain select distinct on (a,b) a,b,c from t2 where a=2 and b=5 order by a,b,c desc; > QUERY PLAN > ----------------------------------------------------------------------------------------- > Unique (cost=0.43..8361.56 rows=500 width=12) > -> Index Only Scan using t2_a_b_c_idx on t2 (cost=0.43..8361.56 rows=9807 width=12) > Index Cond: ((a = 2) AND (b = 5)) > (3 rows) > > When doing a distinct on (params) and having equality conditions for all params, it falls back to the regular index scaneven though there's no reason not to use the skip scan here. It's much faster to write b between 5 and 5 now rather thanwriting b=5. I understand this was a limitation of the unique-keys patch at the moment which could be addressed in thefuture. I think for the sake of consistency it would make sense if this eventually gets addressed. > Agreed, that it is an improvement that should be made. I would like David's view on this since it relates to the UniqueKey patch. > - nodeIndexScan.c, line 126 > This sets xs_want_itup to true in all cases (even for non skip-scans). I don't think this is acceptable given the side-effectsthis has (page will never be unpinned in between returned tuples in _bt_drop_lock_and_maybe_pin) > Correct - fixed. > - nbsearch.c, _bt_skip, line 1440 > _bt_update_skip_scankeys(scan, indexRel); This function is called twice now - once in the else {} and immediately afterthat outside of the else. The second call can be removed I think. > Yes, removed the "else" call site. > - nbtsearch.c _bt_skip line 1597 > LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK); > scan->xs_itup = (IndexTuple) PageGetItem(page, itemid); > > This is an UNLOCK followed by a read of the unlocked page. That looks incorrect? > Yes, that needed to be changed. > - nbtsearch.c _bt_skip line 1440 > if (BTScanPosIsValid(so->currPos) && > _bt_scankey_within_page(scan, so->skipScanKey, so->currPos.buf, dir)) > > Is it allowed to look at the high key / low key of the page without have a read lock on it? > In case of a split the page will still contain a high key and a low key, so this should be ok. > - nbtsearch.c line 1634 > if (_bt_readpage(scan, indexdir, offnum)) ... > else > error() > > Is it really guaranteed that a match can be found on the page itself? Isn't it possible that an extra index condition,not part of the scan key, makes none of the keys on the page match? > The logic for this has been changed. > - nbtsearch.c in general > Most of the code seems to rely quite heavily on the fact that xs_want_itup forces _bt_drop_lock_and_maybe_pin to neverrelease the buffer pin. Have you considered that compacting of a page may still happen even if you hold the pin? [1]I've been trying to come up with cases in which this may break the patch, but I haven't able to produce such a scenario- so it may be fine. But it would be good to consider again. One thing I was thinking of was a scenario where pagesplits and/or compacting would happen in between returning tuples. Could this break the _bt_scankey_within_page checksuch that it thinks the scan key is within the current page, while it actually isn't? Mainly for backward and/or cursorscans. Forward scans shouldn't be a problem I think. Apologies for being a bit vague as I don't have a clear exampleready when it would go wrong. It may well be fine, but it was one of the things on my mind. > There is a BT_READ lock in place when finding the correct leaf page, or searching within the leaf page itself. _bt_vacuum_one_page deletes only LP_DEAD tuples, but those are already ignored in _bt_readpage. Peter, do you have some feedback for this ? Please, find the updated patches attached that Dmitry and I made. Thanks again ! Best regards, Jesper
Attachment
pgsql-hackers by date: