Re: bitmaps and correlation - Mailing list pgsql-hackers
From | Justin Pryzby |
---|---|
Subject | Re: bitmaps and correlation |
Date | |
Msg-id | 20200107052606.GP12066@telsasoft.com Whole thread Raw |
In response to | Re: bitmaps and correlation (Dilip Kumar <dilipbalaut@gmail.com>) |
Responses |
Re: bitmaps and correlation
|
List | pgsql-hackers |
On Tue, Jan 07, 2020 at 09:21:03AM +0530, Dilip Kumar wrote: > On Tue, Jan 7, 2020 at 1:29 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > Find attached cleaned up patch. > > For now, I updated the regress/expected/, but I think the test maybe has to be > > updated to do what it was written to do. > > I have noticed that in "cost_index" we have used the indexCorrelation > for computing the run_cost, not the number of pages whereas in your > patch you have used it for computing the number of pages. Any reason > for the same? As Jeff has pointed out, high correlation has two effects in cost_index(): 1) the number of pages read will be less; 2) the pages will be read more sequentially; cost_index reuses the pages_fetched variable, so (1) isn't particularly clear, but does essentially: /* max_IO_cost is for the perfectly uncorrelated case (csquared=0) */ pages_fetched(MIN) = index_pages_fetched(tuples_fetched, baserel->pages, (double) index->pages, root); max_IO_cost = pages_fetchedMIN * spc_random_page_cost; /* min_IO_cost is for the perfectly correlated case (csquared=1) */ pages_fetched(MAX) = ceil(indexSelectivity * (double) baserel->pages); min_IO_cost = (pages_fetchedMAX - 1) * spc_seq_page_cost; My patch 1) changes compute_bitmap_pages() to interpolate pages_fetched using the correlation; pages_fetchedMIN is new: > Patch > - pages_fetched = (2.0 * T * tuples_fetched) / (2.0 * T + tuples_fetched); > + pages_fetchedMAX = (2.0 * T * tuples_fetched) / (2.0 * T + tuples_fetched); > + > + /* pages_fetchedMIN is for the perfectly correlated case (csquared=1) */ > + pages_fetchedMIN = ceil(indexSelectivity * (double) baserel->pages); > + > + pages_fetched = pages_fetchedMAX + indexCorrelation*indexCorrelation*(pages_fetchedMIN - pages_fetchedMAX); And, 2) also computes cost_per_page by interpolation between seq_page and random_page cost: + cost_per_page_corr = spc_random_page_cost - + (spc_random_page_cost - spc_seq_page_cost) + * (1-correlation*correlation); Thanks for looking. I'll update the name of pages_fetchedMIN/MAX in my patch for consistency with cost_index. Justin
pgsql-hackers by date: