Re: [HACKERS] index-only count(*) for indexes supporting bitmap scans - Mailing list pgsql-hackers
From | Alexander Kuzmenkov |
---|---|
Subject | Re: [HACKERS] index-only count(*) for indexes supporting bitmap scans |
Date | |
Msg-id | ca192322-29fd-9e60-3766-fe5e69d1f9af@postgrespro.ru Whole thread Raw |
In response to | Re: [HACKERS] index-only count(*) for indexes supporting bitmap scans (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: [HACKERS] index-only count(*) for indexes supporting bitmap scans
Re: [HACKERS] index-only count(*) for indexes supporting bitmap scans |
List | pgsql-hackers |
On 12.04.2017 15:04, Tom Lane wrote: > Andrew Gierth <andrew@tao11.riddles.org.uk> writes: >> "Alexander" == Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> writes: >> Alexander> Structurally, the patch consists of two major parts: a >> Alexander> specialized executor node >> Why? >> It strikes me that the significant fact here is not that we're doing >> count(*), but that we don't need any columns from the bitmap heap scan >> result. Rather than creating a whole new node, can't the existing >> bitmap heapscan be taught to skip fetching the actual table page in >> cases where it's all-visible, not lossy, and no columns are needed? > +1 ... while I hadn't actually looked at the code, it seemed to me that > anything like the optimization-as-described would be impossibly klugy > from the planner's standpoint. Your formulation sounds lots nicer. > > Detecting that no columns are needed in the executor might be a bit tricky > because of the planner's habit of inserting a "physical tlist" to avoid a > projection step. (See also nearby discussion about custom scan planning.) > But we could fix that. I think one rule that would make sense is to > just disable the physical-tlist substitution if the relation's targetlist > is empty --- it wouldn't be buying much in such a case anyhow. Then the > runtime tlist for the scan node would also be empty, and away you go. When making an early prototype of this optimization, I did what you are describing with the bitmap heap scan executor node. In an internal review, it was said that the bitmap heap scan node is already complicated enough and shouldn't have more logic added to it, so I rewrote it a separate node. To me, your approach looks good too, so if the community is generally in favor of this, I could rewrite the executor as such. With planner, the changes are more complex. Two things must be done there. First, when the tlist is empty, we must use a different cost function for bitmap heap scan, because the heap access pattern is different. Second, choose_bitmap_and() must use a different algorithm for choosing the right combination of paths. A standard algorithm chooses the combination based on cost. For count(*) purposes, the decisive factor is that the path has to check all the restrictions, or else we'll need heap access to recheck some of them, which defeats the purpose of having this optimization. The planner code that builds and costs the index path is fairly complex, and integrating this additional behavior into it didn't look good to me. Instead, I created a specialized path node and isolated the logic that handles it. The resulting implementation adds several functions, but it is mostly self-contained and has a single entry point in grouping_planner(). It handles the specific case of bitmap count plans and doesn't complicate the existing code any further. The planner part is to some extent independent of whether we use a separate executor node or not. If we choose not to, the same count(*) optimization code I proposed could create plain bitmap heap scan paths. -- Alexander Kuzmenkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
pgsql-hackers by date: