Re: POC, WIP: OR-clause support for indexes - Mailing list pgsql-hackers
From | Finnerty, Jim |
---|---|
Subject | Re: POC, WIP: OR-clause support for indexes |
Date | |
Msg-id | 4002251A-F36D-4148-A258-548C982397E2@amazon.com Whole thread Raw |
In response to | Re: POC, WIP: OR-clause support for indexes (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: POC, WIP: OR-clause support for indexes
|
List | pgsql-hackers |
Peter, I'm very glad to hear that you're researching this! Will this include skip-scan optimizations for OR or IN predicates, or when the number of distinct values in a leading non-constantindex column(s) is sufficiently small? e.g. suppose there is an ORDER BY b, and WHERE clause predicates (a =1 AND b = 5) OR (c > 12 AND b BETWEEN 100 AND 200). Then a single index scan on an index with leading column b could visitb = 5, and then the range b from 100:200, and deliver the rows in the order requested. Or if the predicate is (a =1 AND b = 5) OR (c LIKE 'XYZ' AND b < 12), then you can scan just b < 12. Or if the index is defined on (a, b) and youknow that b = 100, and that there are only 4 distinct values of column a, then you could skip each distinct value of awhere b = 100, and so on. If you have an ORDER BY clause and a lower and upper bound on the first column of the ORDER BY list, you have a potentialto reduce search effort versus a full index scan, even when that upper and lower bound needs to be derived froma complex predicate. Of course, if you have an IN list you can either skip to the distinct values listed or scan the entire index, depending onestimated cost. /Jim F On 8/1/23, 3:43 PM, "Peter Geoghegan" <pg@bowt.ie <mailto:pg@bowt.ie>> wrote: CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirmthe sender and know the content is safe. On Mon, Jul 31, 2023 at 9:38 AM Alena Rybakina <lena.ribackina@yandex.ru <mailto:lena.ribackina@yandex.ru>> wrote: > I noticed only one thing there: when we have unsorted array values in > SOAP, the query takes longer than > when it has a sorted array. I'll double-check it just in case and write > about the results later. I would expect the B-Tree preprocessing by _bt_preprocess_array_keys() to be very slightly faster when the query is written with presorted, duplicate-free constants. Sorting is faster when you don't really have to sort. However, I would not expect the effect to be significant enough to matter, except perhaps in very extreme cases. Although...some of the cases you care about are very extreme cases. > I am also testing some experience with multi-column indexes using SAOPs. Have you thought about a similar transformation for when the row constructor syntax happens to have been used? Consider a query like the following, against a table with a composite index on (a, b): select * from multi_test where ( a, b ) in (( 1, 1 ), ( 2, 1 )); This query will get a BitmapOr based plan that's similar to the plans that OR-based queries affected by your transformation patch get today, on HEAD. However, this equivalent spelling has the potential to be significantly faster: select * from multi_test where a = any('{1,2}') and b = 1; (Of course, this is more likely to be true with my nbtree SAOP patch in place.) Note that we currently won't use RowCompareExpr in many simple cases where the row constructor syntax has been used. For example, a query like this: select * from multi_test where ( a, b ) = (( 2, 1 )); This case already involves a transformation that is roughly comparable to the one you're working on now. We'll remove the RowCompareExpr during parsing. It'll be as if my example row constructor equality query was written this way instead: select * from multi_test where a = 2 and b = 1; This can be surprisingly important, when combined with other things, in more realistic examples. The nbtree code has special knowledge of RowCompareExpr that makes the rules for comparing index tuples different to those from other kinds of index scans. However, due to the RowCompareExpr transformation process I just described, we don't need to rely on that specialized nbtree code when the row constructor syntax is used with a simple equality clause -- which is what makes the normalization process have real value. If the nbtree code cannot see RowCompareExpr index quals then it cannot have this problem in the first place. In general it is useful to "normalize to conjunctive normal form" when it might allow scan key preprocessing in the nbtree code to come up with a much faster approach to executing the scan. It's easier to understand what I mean by showing a simple example. The nbtree preprocessing code is smart enough to recognize that the following query doesn't really need to do any work, due to having quals that it recognizes as contradictory (it can set so->qual_okay to false for unsatisfiable quals): select * from multi_test where ( a, b ) = (( 2, 1 )) and a = -1; However, it is not smart enough to perform the same trick if we change one small detail with the query: select * from multi_test where ( a, b ) >= (( 2, 1 )) and a = -1; Ideally, the optimizer would canonicalize/normalize everything in a way that made all of the nbtree preprocessing optimizations work just as well, without introducing any new special cases. Obviously, there is no reason why we can't perform the same trick with the second variant. (Note also that the nbtree preprocessing code can be smart about redundant quals, not just contradictory quals, so it matters more than it may appear from this simple, unrealistic example of mine.) While these similar RowCompareExpr transformations are at least somewhat important, that's not really why I bring them up now. I am pointing them out now because I think that it might help you to develop a more complete mental model of these transformations. Ideally, your initial approach will generalize to other situations later on. So it's worth considering the relationship between this existing RowCompareExpr transformation, and the one that you're working on currently. Plus other, future transformations. This example might also give you some appreciation of why my SAOP patch is confused about when we need to do normalization/safety checks. Some things seem necessary when generating index paths in the optimizer. Other things seem necessary during preprocessing, in the nbtree code, at the start of the index scan. Unfortunately, it's not obvious to me where the right place is to deal with each aspect of setting up multi-column SAOP index quals. My mental model is very incomplete. -- Peter Geoghegan
pgsql-hackers by date: