Re: Parallelized polymorphic aggs, and aggtype vs aggoutputtype - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Parallelized polymorphic aggs, and aggtype vs aggoutputtype |
Date | |
Msg-id | 27247.1466185504@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Parallelized polymorphic aggs, and aggtype vs aggoutputtype (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Parallelized polymorphic aggs, and aggtype vs aggoutputtype
Re: Parallelized polymorphic aggs, and aggtype vs aggoutputtype |
List | pgsql-hackers |
I wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I don't mind if you feel the need to refactor this. > I'm not sure yet. What I think we need to work out first is exactly > how polymorphic parallel aggregates ought to identify which concrete > data types they are using. There were well-defined rules before, > but how do we adapt those to two levels of aggregate evaluation? After reviewing things a bit, it seems like the first design-level issue there is what about polymorphic combine functions. So far as the type system is concerned, there's no issue: the declared signature is always going to be "combine(foo, foo) returns foo" and it doesn't matter whether foo is an ordinary type, a polymorphic one, or INTERNAL. The other question is whether the function itself knows what it's operating on in the current invocation. For regular polymorphic types this seems no different from any other usage. If the transtype is INTERNAL, then the type system provides no help; but we could reasonably assume that the internal representation itself contains as much information as the combine func needs. We could add extra arguments like the "finalfunc_extra" option does for the finalfunc, but I don't really see a need --- that hack is mainly to satisfy the type system that the finalfunc's signature is sane, not to tell the finalfunc something it has no other way to find out. So I think we're probably OK as far as the combine function is concerned. The situation is much direr as far as serialize/deserialize are concerned. These basically don't work at all for polymorphic transtypes: if you try to declare "deserialize(bytea) returns anyarray", the type system won't let you. Perhaps that's not an issue because you shouldn't really need serialize/deserialize for anything except INTERNAL transtype. However, there's a far bigger problem which is that "deserialize(bytea) returns internal" is a blatant security hole, which I see you ignored the defense against in opr_sanity.sql. The claim in the comment there that it's okay if we check for aggregate context is a joke --- or haven't you heard that we allow users to create aggregates? That's not nearly good enough to prevent unsafe usage of such functions. Not to mention that CREATE FUNCTION won't allow creation of such functions, so extensions are locked out of using this feature. This has to be redesigned or else reverted entirely. I'm not going to take no for an answer. A possible solution is to give deserialize an extra dummy argument, along the lines of "deserialize(bytea, internal) returns internal", thereby ensuring it can't be called in any non-system-originated contexts. This is still rather dangerous if the other argument is variable, as somebody might be able to abuse an internal-taking function by naming it as the deserialize function for a maliciously-designed aggregate. What I'm inclined to do to lock it down further is to drop the "serialtype" argument to CREATE AGGREGATE, which seems rather pointless (what else would you ever use besides bytea?). Instead, insist that serialize/deserialize apply *only* when the transtype is INTERNAL, and their signatures are exactly "serialize(internal) returns bytea" and "deserialize(bytea, internal) returns internal", never anything else. A different way of locking things down, which might be cleaner in the long run, is to invent a new pseudo-type for the sole purpose of being the serialization type, that is we'd insist on the signatures being "serialize(internal) returns serialized_internal" and "deserialize(serialized_internal) returns internal", where serialized_internal has the representation properties of bytea but is usable for no other purpose than this. Not sure it's worth the trouble though. Anyway, setting aside the security angle, it doesn't seem like there is anything wrong with the high-level design for polymorphic cases. I'm now thinking the problem I saw is just a garden variety implementation bug (or bugs). I'm still not very happy with the confusion around aggtype though, not least because I suspect it contributed directly to this bug. I also feel that both the code comments and the user-facing documentation for this feature are well below project standards, eg where's the discussion in section 35.10? regards, tom lane
pgsql-hackers by date: