extended operator classes vs. type interfaces - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | extended operator classes vs. type interfaces |
Date | |
Msg-id | r2t603c8f071004081929v81198fferbff67720c815af6f@mail.gmail.com Whole thread Raw |
Responses |
Re: extended operator classes vs. type interfaces
Re: extended operator classes vs. type interfaces Re: extended operator classes vs. type interfaces Re: extended operator classes vs. type interfaces |
List | pgsql-hackers |
There are a couple of features that were considered but not committed for 9.0 which require additional information about the properties of various data types. At PGeast I had a chance to talk with Jeff Davis about this briefly, and wanted to write up some of what we talked about plus some further thoughts of my own before they went out of my head. The features I'm thinking about are: 1. knngist wants to use index scans to speed up queries of the form SELECT ... ORDER BY <column> <op> <constant> (as opposed to the existing machinery which only knows how to use an index for SELECT ... ORDER BY <column>). 2. Window functions want to define windows over a range of values defined by the underlying data type. To do this, we need to define what addition and subtraction mean for a particular data type. 3. Jeff Davis is interested in implementing range types. When the underlying base type is discrete, e.g. integers, you can say that [1,3] = [1,4), but only if you know that 3 and 4 are consecutive (in that order). All of these problems seem loosely related: we're teaching the database more about how certain types behave. But there are some important differences. In case #1, we're trying to teach the planner that if it sees a certain operator, it can map that operator onto an AM strategy - so the knowledge is fundamentally AM-specific. In the remaining cases, the information needed is a property of the underlying data type with no natural or logical relationship to an access method. For that reason, I don't think there's going to be any clean way to create a single mechanism that will encompass all of these needs, though maybe someone has a clever idea I'm not thinking of. What we discussed doing before for #1 (and it make sense to me) is to add a column pg_amop.amopcategory. The existing operator class functions will constitute one category - map a boolean operator onto an index strategy number that can be used to treat a filter condition as an index qual. The functions knngist cares about will be a second category - map an operator that returns some arbitrary type that is legal in the context of an ORDER BY clause onto an index strategy number that can regurgitate the tuples in the order defined by the return type. While it might be possible to shoehorn the remaining cases into the operator class machinery, it seems likely that it will be nothing but ugly. The whole charter of the operator class machinery at least AIUI is to map operators onto AM-specific index strategy numbers, and there is neither an applicable AM nor a strategy number for it in any of these cases. So I think it's time to create a separate concept of type interfaces (Jeff Davis proposed this name, and I like it). What might this look like? Given a type T, I think we'd like to be able to define a type U as "the natural type to be added to or subtracted from T". As Jeff pointed out to me, this is not necessarily the same as the underlying type. For example, if T is a timestamp, U is an interval; if T is a numeric, U is also a numeric; if T is a cidr, U is an integer. Then we'd like to define a canonical addition operator and a canonical subtraction operator. I think that would be sufficient for the needs of RANGE BETWEEN ... PRECEDING AND ... FOLLOWING. It would also be nearly sufficient for range types, but in that case you also need to specify the unit increment within U - i.e. a "1" value for the datatype. It may or may not be worth building the concept of a unit increment into the type interface machinery, though: one could imagine two different range types built over the same base type with different unit increments - e.g. one timestamp range with unit increment = 1s, and one with unit increment = 1m. Under the first type [4pm,5pm) = [4pm,4:59:59pm], while under the second [4pm,5pm) = [4pm,4:59pm]. In a previous thread on this topic, in response to a question about other possible needs for operator knowledge, Hitoshi Harada mentioned range partitioning as another possible case. If we have a partition where x ranges from 1 to 100, that's basically saying that x >= 1 and x <= 100, for suitable values of >= and <=. I think we're already habituated to doing this kind of thing by looking at the default btree opclass for the data type and looking for the operator that implements, e.g. BTLessEqualsStrategyNumber. It might be cleaner to get all this information from type interfaces, but I'm not sure whether it's reasonable (either for reasons of complexity or of performance) to think about untangling all the places where we've already made this assumption and redoing them. Thoughts? ...Robert
pgsql-hackers by date: