Re: Tid scan improvements - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: Tid scan improvements |
Date | |
Msg-id | CAKJS1f8Ap6cFKeBbTnHXpOcgq5iXV1UhwA7ZiP4fxxx1LArbwQ@mail.gmail.com Whole thread Raw |
In response to | Re: Tid scan improvements (Edmund Horner <ejrh00@gmail.com>) |
Responses |
Re: Tid scan improvements
|
List | pgsql-hackers |
On 28 September 2018 at 18:13, Edmund Horner <ejrh00@gmail.com> wrote: > On Fri, 28 Sep 2018 at 17:02, Edmund Horner <ejrh00@gmail.com> wrote: >> I did run pgindent over it though. :) > > But I didn't check if it still applied to master. Sigh. Here's one that does. I know commit fest is over, but I made a pass of this to hopefully provide a bit of guidance so that it's closer for the November 'fest. I've only done light testing on the patch and it does seem to work, but there are a few things that I think should be changed. Most importantly #11 below I think needs to be done. That might overwrite some of the items that come before it in the list as you likely will have to pull some of code which I mention out out due to changing #11. I've kept them around anyway just in case some of it remains. 1. Could wrap for tables > 16TB. Please use double. See index_pages_fetched() int nrandompages; int nseqpages; 2. Should multiply by nseqpages, not add. run_cost += spc_random_page_cost * nrandompages + spc_seq_page_cost + nseqpages; 3. Should be double: BlockNumber pages = selectivity * baserel->pages; 4. Comment needs updated to mention what the new code does in heapgettup() and heapgettup_pagemode() + /* start from last page of the scan */ - if (scan->rs_startblock > 0) - page = scan->rs_startblock - 1; + if (scan->rs_numblocks == InvalidBlockNumber) + { + if (scan->rs_startblock > 0) + page = scan->rs_startblock - 1; + else + page = scan->rs_nblocks - 1; + } else - page = scan->rs_nblocks - 1; + page = scan->rs_startblock + scan->rs_numblocks - 1; + 5. Variables should be called "inclusive". We use "strict" to indicate an operator comparison cannot match NULL values. + bool strict; /* Indicates < rather than <=, or > rather */ + bool strict2; /* than >= */ Don't break the comment like that. If you need more space don't end the comment and use a new line and tab the next line out to match the * of the first line. 6. Why not pass the TidExpr into MakeTidOpExprState() and have it set the type instead of repeating code 7. It's not very obvious why the following Assert() can't fail. + bool invert; + bool invert2; + + Assert(list_length(((BoolExpr *) expr)->args) == 2); I had to hunt around quite a bit to see that TidQualFromBaseRestrictinfo could only ever make the list have 2 elements, and we'd not form a BoolExpr with just 1. (but see #11) 8. Many instances of the word "strict" are used to mean "inclusive". Can you please change all of them. 9. Confusing comment: + * If the expression was non-strict (<=) and the offset is 0, then just + * pretend it was strict, because offset 0 doesn't exist and we may as + * well exclude that block. Shouldn't this be, "If the operator is non-inclusive, then since TID offsets are 1-based, for simplicity, we can just class the expression as inclusive.", or something along those lines. 10. Comment talks about LHS, but the first OpExpr in a list of two OpExprs has nothing to do with left hand side. You could use LHS if you were talking about the first arg in an OpExpr, but this is not the case here. /* If the LHS is not the lower bound, swap them. */ You could probably just ensure that the >=, > ops is the first in the list inside TidQualFromBaseRestrictinfo(), but you'd need to clearly comment that this is the case in both locations. Perhaps use lcons() for the lower end and lappend() for the upper end, but see #11. 11. I think the qual matching code needs an overhaul. Really you should attempt to find the smallest and largest ctid for your implicitly ANDed ranges. This would require you getting rid of the BETWEEN type claused you're trying to build in TidQualFromBaseRestrictinfo and instead just include all quals, don't ignore other quals when you've already found your complete range bounds. The problem with doing it the way that you're doing it now is in cases like: create table t1(a int); insert into t1 select generate_Series(1,10000000); create index on t1 (a); select ctid,a from t1 order by a desc limit 1; -- find the max ctid. ctid | a -------------+---------- (44247,178) | 10000000 (1 row) set max_parallel_workers_per_gather=0; explain analyze select ctid,* from t1 where ctid > '(0,0)' and ctid <= '(44247,178)' and ctid <= '(0,1)'; QUERY PLAN ----------------------------------------------------------------------------------------------------- Tid Scan on t1 (cost=0.01..169248.78 rows=1 width=10) (actual time=0.042..2123.432 rows=1 loops=1) TID Cond: ((ctid > '(0,0)'::tid) AND (ctid <= '(44247,178)'::tid)) Filter: (ctid <= '(0,1)'::tid) Rows Removed by Filter: 9999999 Planning Time: 4.049 ms Execution Time: 2123.464 ms (6 rows) Due to how you've coded TidQualFromBaseRestrictinfo(), the ctid <= '(0,1)' qual does not make it into the range. It's left as a filter in the Tid Scan. I think I'm going to stop here as changing this going to cause quite a bit of churn. but one more... 12. I think the changes to selfuncs.c to get the selectivity estimate is a fairly credible idea, but I think it also needs to account for offsets. You should be able to work out the average number of items per page with rel->tuples / rel->pages and factor that in to get a better estimate for cases like: postgres=# explain analyze select ctid,* from t1 where ctid <= '(0,200)'; QUERY PLAN ----------------------------------------------------------------------------------------------- Tid Scan on t1 (cost=0.00..5.00 rows=1 width=10) (actual time=0.025..0.065 rows=200 loops=1) TID Cond: (ctid <= '(0,200)'::tid) Planning Time: 0.081 ms Execution Time: 0.088 ms (4 rows) You can likely add on "(offset / avg_tuples_per_page) / rel->pages" to the selectivity and get a fairly accurate estimate... at least when there are no dead tuples in the heap -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: