Thread: Support for RANGE ... PRECEDING windows in OVER
Hi all Since 8.4, PostgreSQL has had extremely useful window function support - but support for "RANGE PRECEDING / FOLLOWING" windows was dropped late in 8.4's development in order to get the rest of the feature in, per http://archives.postgresql.org/pgsql-hackers/2010-02/msg00540.php. It looks like there was discussion of requiring a new opclass to be declared for types or otherwise extending opclasses to provide the information required for RANGE ... PRECEDING / FOLLOWING ( http://www.postgresql.org/message-id/20100211201444.GA28270@svana.org ) . I can't find any sign that it went anywhere beyond some broad discussion: http://www.postgresql.org/message-id/13993.1265920013@sss.pgh.pa.us at the time. I've missed this feature more than once, and am curious about whether any more recent changes may have made it cleaner to tackle this, or whether consensus can be formed on adding the new entries to btree's opclass to avoid the undesirable explicit lookups of the '+' and '-' oprators. Some question seems to remain open about how ranges over timestamps/intervals should work, but this wasn't elaborated on. There's been interest in this, eg: http://pgsql.hackers.free-usenet.eu/[HACKERS]-range-intervals-in-window-function-frames_T66085695_S1 http://grokbase.com/t/postgresql/pgsql-general/105a89gm2n/postgresql-9-0-support-for-range-value-preceding-window-functions -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Ian Link
Hi all
Since 8.4, PostgreSQL has had extremely useful window function support -
but support for "RANGE PRECEDING / FOLLOWING" windows was dropped late
in 8.4's development in order to get the rest of the feature in, per
http://archives.postgresql.org/pgsql-hackers/2010-02/msg00540.php.
It looks like there was discussion of requiring a new opclass to be
declared for types or otherwise extending opclasses to provide the
information required for RANGE ... PRECEDING / FOLLOWING (
http://www.postgresql.org/message-id/20100211201444.GA28270@svana.org )
. I can't find any sign that it went anywhere beyond some broad
discussion:
http://www.postgresql.org/message-id/13993.1265920013@sss.pgh.pa.us at
the time.
I've missed this feature more than once, and am curious about whether
any more recent changes may have made it cleaner to tackle this, or
whether consensus can be formed on adding the new entries to btree's
opclass to avoid the undesirable explicit lookups of the '+' and '-'
oprators.
Some question seems to remain open about how ranges over
timestamps/intervals should work, but this wasn't elaborated on.
There's been interest in this, eg:
http://pgsql.hackers.free-usenet.eu/[HACKERS]-range-intervals-in-window-function-frames_T66085695_S1
http://grokbase.com/t/postgresql/pgsql-general/105a89gm2n/postgresql-9-0-support-for-range-value-preceding-window-functions
Attachment
On 06/21/2013 10:31 AM, Ian Link wrote: > I am currently looking into this feature. However, as I am quite new to > Postgres, I think it might take me a while to get up to speed. Anyways, > I would also appreciate another round of discussion on the future of the > windowing functions. Good to know, and welcome. I hope the links to the archived discussions on the matter were useful to you. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
I think a good starting point will be to refactor/imrpove the WinGetFuncArgInPartition and WinGetFuncArgInFrame functions.
Tom Lane wrote this about them before comitting the patch:
I'm not terribly happy with the changes you made in WinGetFuncArgInPartition
and WinGetFuncArgInFrame to force the window function mark to not go
past frame start in some modes. Not only is that pretty ugly, but I
think it can mask bugs in window functions: it's an error for a window
function to fetch a row before what it has set its mark to be, but in
some cases that wouldn't be detected because of this change. I think
it would be better to revert those changes and find another method of
protecting fetches needed to determine the frame head. One idea is
to create a separate read pointer that tracks the frame head whenever
actual fetches of the frame head might be needed by update_frameheadpos.
I committed it without changing that, but I think this should be
revisited before trying to add the RANGE value PRECEDING/FOLLOWING
options, because those will substantially expand the number of caseswhere that hack affects the behavior.
I am honestly not 100% certain why these functions have issues, but this seems a good place to start investigating.
Ian Link
Good to know, and welcome.
I hope the links to the archived discussions on the matter were useful
to you.Hi all
Since 8.4, PostgreSQL has had extremely useful window function support -
but support for "RANGE PRECEDING / FOLLOWING" windows was dropped late
in 8.4's development in order to get the rest of the feature in, per
http://archives.postgresql.org/pgsql-hackers/2010-02/msg00540.php.
It looks like there was discussion of requiring a new opclass to be
declared for types or otherwise extending opclasses to provide the
information required for RANGE ... PRECEDING / FOLLOWING (
http://www.postgresql.org/message-id/20100211201444.GA28270@svana.org )
. I can't find any sign that it went anywhere beyond some broad
discussion:
http://www.postgresql.org/message-id/13993.1265920013@sss.pgh.pa.us at
the time.
I've missed this feature more than once, and am curious about whether
any more recent changes may have made it cleaner to tackle this, or
whether consensus can be formed on adding the new entries to btree's
opclass to avoid the undesirable explicit lookups of the '+' and '-'
oprators.
Some question seems to remain open about how ranges over
timestamps/intervals should work, but this wasn't elaborated on.
There's been interest in this, eg:
http://pgsql.hackers.free-usenet.eu/[HACKERS]-range-intervals-in-window-function-frames_T66085695_S1
http://grokbase.com/t/postgresql/pgsql-general/105a89gm2n/postgresql-9-0-support-for-range-value-preceding-window-functions
Attachment
I've missed this feature more than once, and am curious about whether
any more recent changes may have made it cleaner to tackle this, or
whether consensus can be formed on adding the new entries to btree's
opclass to avoid the undesirable explicit lookups of the '+' and '-'
oprators.
Hitoshi Harada
On 06/21/2013 05:32 PM, Hitoshi Harada wrote: > I also later found that we are missing not only notion of '+' or '-', > but also notion of 'zero value' in our catalog. Per spec, RANGE BETWEEN > needs to detect ERROR if the offset value is negative, but it is not > always easy if you think about interval, numeric types as opposed to > int64 used in ROWS BETWEEN. Zero can be tested for with `val = (@ val)` ie `val = abs(val)`. That should make sense for any type in which the concept of zero makes sense. Thanks for the warning on that issue. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 06/21/2013 05:32 PM, Hitoshi Harada wrote:Zero can be tested for with `val = (@ val)` ie `val = abs(val)`. That
> I also later found that we are missing not only notion of '+' or '-',
> but also notion of 'zero value' in our catalog. Per spec, RANGE BETWEEN
> needs to detect ERROR if the offset value is negative, but it is not
> always easy if you think about interval, numeric types as opposed to
> int64 used in ROWS BETWEEN.
should make sense for any type in which the concept of zero makes sense.
Thanks,
Hitoshi Harada
--On Fri, Jun 21, 2013 at 3:20 AM, Craig Ringer <craig@2ndquadrant.com> wrote:On 06/21/2013 05:32 PM, Hitoshi Harada wrote:Zero can be tested for with `val = (@ val)` ie `val = abs(val)`. That
> I also later found that we are missing not only notion of '+' or '-',
> but also notion of 'zero value' in our catalog. Per spec, RANGE BETWEEN
> needs to detect ERROR if the offset value is negative, but it is not
> always easy if you think about interval, numeric types as opposed to
> int64 used in ROWS BETWEEN.
should make sense for any type in which the concept of zero makes sense.Yeah, I mean, it needs to know if offset is negative or not by testing with zero. So we need "zero value" or "is_negative function" for each type.
Thanks,
Hitoshi Harada
On 06/22/2013 03:30 AM, ian link wrote: > Forgive my ignorance, but I don't entirely understand the problem. What > does '+' and '-' refer to exactly? Consider "RANGE 4.5 PRECEDING'. You need to be able to test whether, for the current row 'b', any given row 'a' is within the range (b - 4.5) < a <= b . Not 100% sure about the < vs <= boundaries, but that's irrelevant for the example. To test that, you have to be able to do two things: you have to be able to test whether one value is greater than another, and you have to be able to add or subtract a constant from one of the values. Right now, the b-tree access method provides information on the ordering operators < <= = > >= <> , which provides half the answer. But these don't give any concept of *distance* - you can test ordinality but not cardinality. To implement the "different by 4.5" part, you have to be able to add 4.5 to one value or subtract it from the other. The obvious way to do that is to look up the function that implements the '+' or '-' operator, and do: ((OPERATOR(+))(a, 4.5)) > b AND (a <= b) or ((OPERATOR(-))(b, 4.5)) < a AND (a <= b); The problem outlined by Tom in prior discussion about this is that PostgreSQL tries really hard not to assume that particular operator names mean particular things. Rather than "knowing" that "+" is always "an operator that adds two values together; is transitive, symmetric and reflexive", PostgreSQL requires that you define an *operator class* that names the operator that has those properties. Or at least, it does for less-than, less-than-or-equals, equals, greater-than-or-equals, greater-than, and not-equals as part of the b-tree operator class, which *usually* defines these operators as < <= = >= > <>, but you could use any operator names you wanted if you really liked. Right now (as far as I know) there's no operator class that lets you identify operators for addition and subtraction in a similar way. So it's necessary to either add such an operator class (in which case support has to be added for it for every type), extend the existing b-tree operator class to provide the info, or blindly assume that "+" and "-" are always addition and subtraction. For an example of why such assumptions are a bad idea, consider matrix multiplication. Normally, "a * b" = "b * a", but this isn't true for multiplication of matrices. Similarly, if someone defined a "+" operator as an alias for string concatenation (||), we'd be totally wrong to assume we could use that for doing range-offset windowing. So. Yeah. Operator classes required, unless we're going to change the rules and make certain operator names "special" in PostgreSQL, so that if you implement them they *must* have certain properties. This seems like a pretty poor reason to add such a big change. I hope this explanation (a) is actually correct and (b) is helpful. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Thanks Craig! That definitely does help. I probably still have some questions but I think I will read through the rest of the code before asking. Thanks again!
Ian
> Craig Ringer
> Friday, June 21, 2013 8:41 PM
>
> On 06/22/2013 03:30 AM, ian link wrote:
>>
>> Forgive my ignorance, but I don't entirely understand the problem. What
>> does '+' and '-' refer to exactly?
>
> Consider "RANGE 4.5 PRECEDING'.
>
> You need to be able to test whether, for the current row 'b', any given
> row 'a' is within the range (b - 4.5) < a <= b . Not 100% sure about the
> < vs <= boundaries, but that's irrelevant for the example.
>
> To test that, you have to be able to do two things: you have to be able
> to test whether one value is greater than another, and you have to be
> able to add or subtract a constant from one of the values.
>
> Right now, the b-tree access method provides information on the ordering
> operators < <= = > >= <> , which provides half the answer. But these
> don't give any concept of *distance* - you can test ordinality but not
> cardinality.
>
> To implement the "different by 4.5" part, you have to be able to add 4.5
> to one value or subtract it from the other.
>
> The obvious way to do that is to look up the function that implements
> the '+' or '-' operator, and do:
>
> ((OPERATOR(+))(a, 4.5)) > b AND (a <= b)
>
> or
>
> ((OPERATOR(-))(b, 4.5)) < a AND (a <= b);
>
> The problem outlined by Tom in prior discussion about this is that
> PostgreSQL tries really hard not to assume that particular operator
> names mean particular things. Rather than "knowing" that "+" is always
> "an operator that adds two values together; is transitive, symmetric and
> reflexive", PostgreSQL requires that you define an *operator class* that
> names the operator that has those properties.
>
> Or at least, it does for less-than, less-than-or-equals, equals,
> greater-than-or-equals, greater-than, and not-equals as part of the
> b-tree operator class, which *usually* defines these operators as < <= =
>>
>> = > <>, but you could use any operator names you wanted if you really
>
> liked.
>
> Right now (as far as I know) there's no operator class that lets you
> identify operators for addition and subtraction in a similar way. So
> it's necessary to either add such an operator class (in which case
> support has to be added for it for every type), extend the existing
> b-tree operator class to provide the info, or blindly assume that "+"
> and "-" are always addition and subtraction.
>
> For an example of why such assumptions are a bad idea, consider matrix
> multiplication. Normally, "a * b" = "b * a", but this isn't true for
> multiplication of matrices. Similarly, if someone defined a "+" operator
> as an alias for string concatenation (||), we'd be totally wrong to
> assume we could use that for doing range-offset windowing.
>
> So. Yeah. Operator classes required, unless we're going to change the
> rules and make certain operator names "special" in PostgreSQL, so that
> if you implement them they *must* have certain properties. This seems
> like a pretty poor reason to add such a big change.
>
> I hope this explanation (a) is actually correct and (b) is helpful.
>
> ian link
> Friday, June 21, 2013 12:30 PM
> Forgive my ignorance, but I don't entirely understand the problem. What does '+' and '-' refer to exactly?
> Thanks!
>
>
>
> Hitoshi Harada
> Friday, June 21, 2013 4:35 AM
>
>
>
> Forgive my ignorance, but I don't entirely understand the problem. What
> does '+' and '-' refer to exactly?
Consider "RANGE 4.5 PRECEDING'.
You need to be able to test whether, for the current row 'b', any given
row 'a' is within the range (b - 4.5) < a <= b . Not 100% sure about the
< vs <= boundaries, but that's irrelevant for the example.
To test that, you have to be able to do two things: you have to be able
to test whether one value is greater than another, and you have to be
able to add or subtract a constant from one of the values.
Right now, the b-tree access method provides information on the ordering
operators < <= = > >= <> , which provides half the answer. But these
don't give any concept of *distance* - you can test ordinality but not
cardinality.
To implement the "different by 4.5" part, you have to be able to add 4.5
to one value or subtract it from the other.
The obvious way to do that is to look up the function that implements
the '+' or '-' operator, and do:
((OPERATOR(+))(a, 4.5)) > b AND (a <= b)
or
((OPERATOR(-))(b, 4.5)) < a AND (a <= b);
The problem outlined by Tom in prior discussion about this is that
PostgreSQL tries really hard not to assume that particular operator
names mean particular things. Rather than "knowing" that "+" is always
"an operator that adds two values together; is transitive, symmetric and
reflexive", PostgreSQL requires that you define an *operator class* that
names the operator that has those properties.
Or at least, it does for less-than, less-than-or-equals, equals,
greater-than-or-equals, greater-than, and not-equals as part of the
b-tree operator class, which *usually* defines these operators as < <= =
>= > <>, but you could use any operator names you wanted if you really
liked.
Right now (as far as I know) there's no operator class that lets you
identify operators for addition and subtraction in a similar way. So
it's necessary to either add such an operator class (in which case
support has to be added for it for every type), extend the existing
b-tree operator class to provide the info, or blindly assume that "+"
and "-" are always addition and subtraction.
For an example of why such assumptions are a bad idea, consider matrix
multiplication. Normally, "a * b" = "b * a", but this isn't true for
multiplication of matrices. Similarly, if someone defined a "+" operator
as an alias for string concatenation (||), we'd be totally wrong to
assume we could use that for doing range-offset windowing.
So. Yeah. Operator classes required, unless we're going to change the
rules and make certain operator names "special" in PostgreSQL, so that
if you implement them they *must* have certain properties. This seems
like a pretty poor reason to add such a big change.
I hope this explanation (a) is actually correct and (b) is helpful.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachment
Thanks Craig! That definitely does help. I probably still have some questions but I think I will read through the rest of the code before asking. Thanks again!
Ian
> Craig Ringer
> Friday, June 21, 2013 8:41 PM> ian link
>
> On 06/22/2013 03:30 AM, ian link wrote:
>>
>> Forgive my ignorance, but I don't entirely understand the problem. What
>> does '+' and '-' refer to exactly?
>
> Consider "RANGE 4.5 PRECEDING'.
>
> You need to be able to test whether, for the current row 'b', any given
> row 'a' is within the range (b - 4.5) < a <= b . Not 100% sure about the
> < vs <= boundaries, but that's irrelevant for the example.
>
> To test that, you have to be able to do two things: you have to be able
> to test whether one value is greater than another, and you have to be
> able to add or subtract a constant from one of the values.
>
> Right now, the b-tree access method provides information on the ordering
> operators < <= = > >= <> , which provides half the answer. But these
> don't give any concept of *distance* - you can test ordinality but not
> cardinality.
>
> To implement the "different by 4.5" part, you have to be able to add 4.5
> to one value or subtract it from the other.
>
> The obvious way to do that is to look up the function that implements
> the '+' or '-' operator, and do:
>
> ((OPERATOR(+))(a, 4.5)) > b AND (a <= b)
>
> or
>
> ((OPERATOR(-))(b, 4.5)) < a AND (a <= b);
>
> The problem outlined by Tom in prior discussion about this is that
> PostgreSQL tries really hard not to assume that particular operator
> names mean particular things. Rather than "knowing" that "+" is always
> "an operator that adds two values together; is transitive, symmetric and
> reflexive", PostgreSQL requires that you define an *operator class* that
> names the operator that has those properties.
>
> Or at least, it does for less-than, less-than-or-equals, equals,
> greater-than-or-equals, greater-than, and not-equals as part of the
> b-tree operator class, which *usually* defines these operators as < <= =
>>
>> = > <>, but you could use any operator names you wanted if you really
>
> liked.
>
> Right now (as far as I know) there's no operator class that lets you
> identify operators for addition and subtraction in a similar way. So
> it's necessary to either add such an operator class (in which case
> support has to be added for it for every type), extend the existing
> b-tree operator class to provide the info, or blindly assume that "+"
> and "-" are always addition and subtraction.
>
> For an example of why such assumptions are a bad idea, consider matrix
> multiplication. Normally, "a * b" = "b * a", but this isn't true for
> multiplication of matrices. Similarly, if someone defined a "+" operator
> as an alias for string concatenation (||), we'd be totally wrong to
> assume we could use that for doing range-offset windowing.
>
> So. Yeah. Operator classes required, unless we're going to change the
> rules and make certain operator names "special" in PostgreSQL, so that
> if you implement them they *must* have certain properties. This seems
> like a pretty poor reason to add such a big change.
>
> I hope this explanation (a) is actually correct and (b) is helpful.
>
> Friday, June 21, 2013 12:30 PM> Thanks!
> Forgive my ignorance, but I don't entirely understand the problem. What does '+' and '-' refer to exactly?
>
>
>
> Hitoshi Harada
> Friday, June 21, 2013 4:35 AM
>
>
>On 06/22/2013 03:30 AM, ian link wrote:
> Forgive my ignorance, but I don't entirely understand the problem. What
> does '+' and '-' refer to exactly?
Consider "RANGE 4.5 PRECEDING'.
You need to be able to test whether, for the current row 'b', any given
row 'a' is within the range (b - 4.5) < a <= b . Not 100% sure about the
< vs <= boundaries, but that's irrelevant for the example.
To test that, you have to be able to do two things: you have to be able
to test whether one value is greater than another, and you have to be
able to add or subtract a constant from one of the values.
Right now, the b-tree access method provides information on the ordering
operators < <= = > >= <> , which provides half the answer. But these
don't give any concept of *distance* - you can test ordinality but not
cardinality.
To implement the "different by 4.5" part, you have to be able to add 4.5
to one value or subtract it from the other.
The obvious way to do that is to look up the function that implements
the '+' or '-' operator, and do:
((OPERATOR(+))(a, 4.5)) > b AND (a <= b)
or
((OPERATOR(-))(b, 4.5)) < a AND (a <= b);
The problem outlined by Tom in prior discussion about this is that
PostgreSQL tries really hard not to assume that particular operator
names mean particular things. Rather than "knowing" that "+" is always
"an operator that adds two values together; is transitive, symmetric and
reflexive", PostgreSQL requires that you define an *operator class* that
names the operator that has those properties.
Or at least, it does for less-than, less-than-or-equals, equals,
greater-than-or-equals, greater-than, and not-equals as part of the
b-tree operator class, which *usually* defines these operators as < <= =
>= > <>, but you could use any operator names you wanted if you really
liked.
Right now (as far as I know) there's no operator class that lets you
identify operators for addition and subtraction in a similar way. So
it's necessary to either add such an operator class (in which case
support has to be added for it for every type), extend the existing
b-tree operator class to provide the info, or blindly assume that "+"
and "-" are always addition and subtraction.
For an example of why such assumptions are a bad idea, consider matrix
multiplication. Normally, "a * b" = "b * a", but this isn't true for
multiplication of matrices. Similarly, if someone defined a "+" operator
as an alias for string concatenation (||), we'd be totally wrong to
assume we could use that for doing range-offset windowing.
So. Yeah. Operator classes required, unless we're going to change the
rules and make certain operator names "special" in PostgreSQL, so that
if you implement them they *must* have certain properties. This seems
like a pretty poor reason to add such a big change.
I hope this explanation (a) is actually correct and (b) is helpful.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 06/30/2013 08:54 PM, ian link wrote: > I found some time and I think I am up to speed now. I finally figured out > how to add new operator strategies and made a little test operator for > myself. > > It seems pretty clear that assuming '+' and '-' are addition and > subtraction is a bad idea. I don't think it would be too tricky to add > support for new operator strategies. Andrew Gierth suggested calling these > new strategies "offset -" and "offset +", which I think describes it pretty > well. I assigned the operator itself to be "@+" and "@-" but that can > obviously be changed. If this sounds like a good path to you guys, I will > go ahead and implement the operators for the appropriate types. Please let > me know if I am misunderstanding something - I am still figuring stuff out > :) > > Aside from the opclass stuff, there were some other important issues > mentioned with the original RANGE support. I think I will address those > after the opclass stuff is done. Are these things you plan to get done this week, or for next CommitFest? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
<div dir="ltr">Definitely not this week. Hopefully for next commit fest.</div><div class="gmail_extra"><br /><br /><div class="gmail_quote">OnSun, Jun 30, 2013 at 9:56 PM, Josh Berkus <span dir="ltr"><<a href="mailto:josh@agliodbs.com" target="_blank">josh@agliodbs.com</a>></span>wrote:<br /><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px#ccc solid;padding-left:1ex"><div class="im">On 06/30/2013 08:54 PM, ian link wrote:<br /> > I foundsome time and I think I am up to speed now. I finally figured out<br /> > how to add new operator strategies andmade a little test operator for<br /> > myself.<br /> ><br /> > It seems pretty clear that assuming '+' and '-'are addition and<br /> > subtraction is a bad idea. I don't think it would be too tricky to add<br /> > supportfor new operator strategies. Andrew Gierth suggested calling these<br /> > new strategies "offset -" and "offset+", which I think describes it pretty<br /> > well. I assigned the operator itself to be "@+" and "@-" but thatcan<br /> > obviously be changed. If this sounds like a good path to you guys, I will<br /> > go ahead and implementthe operators for the appropriate types. Please let<br /> > me know if I am misunderstanding something - I amstill figuring stuff out<br /> > :)<br /> ><br /> > Aside from the opclass stuff, there were some other importantissues<br /> > mentioned with the original RANGE support. I think I will address those<br /> > after the opclassstuff is done.<br /><br /></div>Are these things you plan to get done this week, or for next CommitFest?<br /><spanclass="HOEnZb"><font color="#888888"><br /> --<br /> Josh Berkus<br /> PostgreSQL Experts Inc.<br /><a href="http://pgexperts.com"target="_blank">http://pgexperts.com</a><br /></font></span></blockquote></div><br /></div>
On Sun, Jun 30, 2013 at 11:54 PM, ian link <ian@ilink.io> wrote: > I found some time and I think I am up to speed now. I finally figured out > how to add new operator strategies and made a little test operator for > myself. > > It seems pretty clear that assuming '+' and '-' are addition and subtraction > is a bad idea. I don't think it would be too tricky to add support for new > operator strategies. Andrew Gierth suggested calling these new strategies > "offset -" and "offset +", which I think describes it pretty well. I > assigned the operator itself to be "@+" and "@-" but that can obviously be > changed. If this sounds like a good path to you guys, I will go ahead and > implement the operators for the appropriate types. Please let me know if I > am misunderstanding something - I am still figuring stuff out :) I don't think I understand the design you have in mind. I'm actually not clear that it would be all that bad to assume fixed operator names, as we apparently do in a few places despite the existence of operator classes. But if that is bad, then I don't know how using @+ and @- instead helps anything. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas escribió: > On Sun, Jun 30, 2013 at 11:54 PM, ian link <ian@ilink.io> wrote: > > It seems pretty clear that assuming '+' and '-' are addition and subtraction > > is a bad idea. I don't think it would be too tricky to add support for new > > operator strategies. Andrew Gierth suggested calling these new strategies > > "offset -" and "offset +", which I think describes it pretty well. I > > assigned the operator itself to be "@+" and "@-" but that can obviously be > > changed. If this sounds like a good path to you guys, I will go ahead and > > implement the operators for the appropriate types. Please let me know if I > > am misunderstanding something - I am still figuring stuff out :) > > I don't think I understand the design you have in mind. I'm actually > not clear that it would be all that bad to assume fixed operator > names, as we apparently do in a few places despite the existence of > operator classes. But if that is bad, then I don't know how using @+ > and @- instead helps anything. Yeah. Currently, all operator classes are tied to access methods. Since nobody seems to have any great idea about creating an access method that requires addition and subtraction, would it make sense to have operator classes that exist solely to support keeping track of such operators for the various datatypes? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Jul 1, 2013 at 3:28 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Robert Haas escribió: >> On Sun, Jun 30, 2013 at 11:54 PM, ian link <ian@ilink.io> wrote: > >> > It seems pretty clear that assuming '+' and '-' are addition and subtraction >> > is a bad idea. I don't think it would be too tricky to add support for new >> > operator strategies. Andrew Gierth suggested calling these new strategies >> > "offset -" and "offset +", which I think describes it pretty well. I >> > assigned the operator itself to be "@+" and "@-" but that can obviously be >> > changed. If this sounds like a good path to you guys, I will go ahead and >> > implement the operators for the appropriate types. Please let me know if I >> > am misunderstanding something - I am still figuring stuff out :) >> >> I don't think I understand the design you have in mind. I'm actually >> not clear that it would be all that bad to assume fixed operator >> names, as we apparently do in a few places despite the existence of >> operator classes. But if that is bad, then I don't know how using @+ >> and @- instead helps anything. > > Yeah. > > Currently, all operator classes are tied to access methods. Since > nobody seems to have any great idea about creating an access method that > requires addition and subtraction, would it make sense to have operator > classes that exist solely to support keeping track of such operators for > the various datatypes? I suppose if we really wanted to do this, it would make more sense to have a new kind of object, maybe CREATE TYPE INTERFACE, rather than shoehorning it into the operator class machinery. It seems like a fairly heavyweight solution, however. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 07/01/2013 12:05 AM, ian link wrote: > Definitely not this week. Hopefully for next commit fest. > OK, marked "Returned with Feedback". It'll be up to you to add it to the next commitfest if you think it's ready by then. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Jul 1, 2013 at 3:28 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: >> Currently, all operator classes are tied to access methods. Since >> nobody seems to have any great idea about creating an access method that >> requires addition and subtraction, would it make sense to have operator >> classes that exist solely to support keeping track of such operators for >> the various datatypes? We certainly could envision adding "+" and "-" items to btree opfamilies, with the proviso that they'd have to be optional so as not to break existing extensions that create btree opfamilies. If we went that way, I'd suggest that what we actually add to the definition is *not* operators, but functions --- that is, specify that function slots 2 and 3 can contain addition (resp subtraction) functions that are compatible with the family's ordering behavior. Otherwise you'd have the planner trying to match WHERE clauses to the extra operators, which is somewhere between wrong and dangerous. But a function that isn't actually called by the index AM is not going to pose a hazard of being misapplied. Besides, it'd likely be easier not harder for the window-function machinery to work with a function than an operator. > I suppose if we really wanted to do this, it would make more sense to > have a new kind of object, maybe CREATE TYPE INTERFACE, rather than > shoehorning it into the operator class machinery. It seems like a > fairly heavyweight solution, however. Yeah, there's something to be said for not wedging this into the index opclass infrastructure. I'd be happier about building such new infrastructure if we could unify this requirement with Peter's "transforms" feature, which also seems to need more datatype-related knowledge than we have in the catalogs now. regards, tom lane
On 07/02/2013 02:39 AM, Robert Haas wrote: > I'm actually > not clear that it would be all that bad to assume fixed operator > names, as we apparently do in a few places despite the existence of > operator classes. But if that is bad, then I don't know how using @+ > and @- instead helps anything. Personally I'm not clear why it's bad to reserve certain fundamental operators like '+' and '-', requiring that they have particular semantics. Want to use "+" as an alias for || because your Java programmers are used to writing + for string concatenation? Um, don't do that. Existing code would be unaffected since RANGE couldn't ever be used in existing code. At worst, weird user-defined implementations of "+" and "-" would result in bizarre window function behaviour if the operators were unsuitable. Exceeding available memory could certainly be an issue in cases like "+" as concatenation. The main advantage I see of adding opclass entries for this is that it makes it explicit when the operators have semantics suitable for use in range windows. I don't have a strong opinion on whether we should just use "+" and "-" or whether we really need an opclass. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Craig Ringer <craig@2ndquadrant.com> writes: > On 07/02/2013 02:39 AM, Robert Haas wrote: >> I'm actually >> not clear that it would be all that bad to assume fixed operator >> names, as we apparently do in a few places despite the existence of >> operator classes. But if that is bad, then I don't know how using @+ >> and @- instead helps anything. > Personally I'm not clear why it's bad to reserve certain fundamental > operators like '+' and '-', requiring that they have particular semantics. It is bad. It's against project policy, not least because we have assorted *existing* datatypes for which "obvious" operator names like "=" do not have all the properties you might expect. If you need a more concrete example of why that sort of thinking is bad, you might consider the difference between < and ~<~ for type text. If we hard-wired knowledge about operator behavior to operator names, it would be impossible for the system to understand that both of those operators represent sorting-related behaviors. Or to be even more concrete: if we allow RANGE to suppose that there's only one possible definition of "+" for a datatype, we're effectively supposing that there's only one possible sort ordering for that type. Which is already a wrong assumption, and has been since Postgres was still at Berkeley. If you go this way, you won't be able to support both WINDOW ... ORDER BY foo USING < RANGE ... and WINDOW ... ORDER BY foo USING ~<~ RANGE ... because you won't know which addition operator to apply. (And yeah, I'm aware that the SQL standard only expects RANGE to support sort keys that are of numeric, datetime, or interval type. I would hope that we have higher expectations than that. Even if we don't, it's not exactly hard to credit that people might have multiple ideas about how to sort interval values.) There are indeed still some places where we rely on operator names to mean something, but we need to get away from that idea not add more. Ideally, any property the system understands about an operator or function should be explicitly declared through opclass membership or some similar representation. We've made substantial progress in that direction in the last fifteen years. I don't want to reverse that progress in the name of minor expediencies, especially not ones that fail to support flexibility that has been in the system for a couple or three decades already. regards, tom lane
Craig Ringer <craig@2ndquadrant.com> writes:It is bad. It's against project policy, not least because we have
> On 07/02/2013 02:39 AM, Robert Haas wrote:
>> I'm actually
>> not clear that it would be all that bad to assume fixed operator
>> names, as we apparently do in a few places despite the existence of
>> operator classes. But if that is bad, then I don't know how using @+
>> and @- instead helps anything.
> Personally I'm not clear why it's bad to reserve certain fundamental
> operators like '+' and '-', requiring that they have particular semantics.
assorted *existing* datatypes for which "obvious" operator names like
"=" do not have all the properties you might expect.
If you need a more concrete example of why that sort of thinking is
bad, you might consider the difference between < and ~<~ for type text.
If we hard-wired knowledge about operator behavior to operator names,
it would be impossible for the system to understand that both of those
operators represent sorting-related behaviors.
Or to be even more concrete: if we allow RANGE to suppose that there's
only one possible definition of "+" for a datatype, we're effectively
supposing that there's only one possible sort ordering for that type.
Which is already a wrong assumption, and has been since Postgres was
still at Berkeley. If you go this way, you won't be able to support
both WINDOW ... ORDER BY foo USING < RANGE ... and WINDOW ... ORDER BY
foo USING ~<~ RANGE ... because you won't know which addition operator
to apply.
(And yeah, I'm aware that the SQL standard only expects RANGE to support
sort keys that are of numeric, datetime, or interval type. I would hope
that we have higher expectations than that. Even if we don't, it's not
exactly hard to credit that people might have multiple ideas about how
to sort interval values.)
There are indeed still some places where we rely on operator names to
mean something, but we need to get away from that idea not add more.
Ideally, any property the system understands about an operator or
function should be explicitly declared through opclass membership or
some similar representation. We've made substantial progress in that
direction in the last fifteen years. I don't want to reverse that
progress in the name of minor expediencies, especially not ones that
fail to support flexibility that has been in the system for a couple
or three decades already.
regards, tom lane