Thread: Inline non-SQL SRFs using SupportRequestSimplify
Hi Hackers, Here is a proof-of-concept patch to inline set-returning functions (SRFs) besides those written in SQL. We already try to inline SQL-language functions,[1] but that means you must have a static SQL query. There is no way to get an inline-able query by dynamically building the sql in, say, plpgsql. We also have a SupportRequestSimplify request type for functions that use SUPPORT to declare a support function, and it can replace the FuncExpr with an arbitrary nodetree.[2] I think this was intended for constant-substitution, but we can also use it to let functions generate dynamic SQL and then inline it. In this patch, if a SRF replaces itself with a Query node, then inline_set_returning_function will use that. So far there are no tests or docs; I'm hoping to hear feedback on the idea before going further. Here is my concrete use-case: I wrote a function to do a temporal semijoin,[3] and I want it to be inlined. There is a support function that builds the same SQL and lets Postgres parse it into a Query.[4] (In practice I would rewrite the main function in C too, so it could share the SQL-building code there, but this is just a POC.) If you build and install that extension on its `inlined` branch,[5] then you can do this: ``` \i bench.sql explain select * from temporal_semijoin('employees', 'id', 'valid_at', 'positions', 'employee_id', 'valid_at') j(id bigint, valid_at daterange); explain select * from temporal_semijoin('employees', 'id', 'valid_at', 'positions', 'employee_id', 'valid_at') j(id bigint, valid_at daterange) where j.id = 10::bigint; ``` Without this patch, you get `ERROR: unrecognized node type: 58`. But with this patch you get these plans: ``` postgres=# explain select * from temporal_semijoin('employees', 'id', 'valid_at', 'positions', 'employee_id', 'valid_at') j(id bigint, valid_at daterange); QUERY PLAN ---------------------------------------------------------------------------------------------- ProjectSet (cost=4918.47..6177.06 rows=22300 width=40) -> Hash Join (cost=4918.47..6062.77 rows=223 width=53) Hash Cond: (employees.id = j.employee_id) Join Filter: (employees.valid_at && j.valid_at) -> Seq Scan on employees (cost=0.00..1027.39 rows=44539 width=21) -> Hash (cost=4799.15..4799.15 rows=9545 width=40) -> Subquery Scan on j (cost=4067.61..4799.15 rows=9545 width=40) -> HashAggregate (cost=4067.61..4703.70 rows=9545 width=40) Group Key: positions.employee_id Planned Partitions: 16 -> Seq Scan on positions (cost=0.00..897.99 rows=44099 width=21) (11 rows) postgres=# explain select * from temporal_semijoin('employees', 'id', 'valid_at', 'positions', 'employee_id', 'valid_at') j(id bigint, valid_at daterange) where j.id = 10::bigint; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------- ProjectSet (cost=0.56..9.22 rows=100 width=40) -> Nested Loop (cost=0.56..8.71 rows=1 width=53) -> GroupAggregate (cost=0.28..4.39 rows=1 width=40) -> Index Only Scan using idx_positions_on_employee_id on positions (cost=0.28..4.36 rows=5 width=21) Index Cond: (employee_id = '10'::bigint) -> Index Only Scan using employees_pkey on employees (cost=0.28..4.30 rows=1 width=21) Index Cond: ((id = '10'::bigint) AND (valid_at && (range_agg(positions.valid_at)))) (7 rows) ``` In particular I'm excited to see in the second plan that the predicate gets pushed into the subquery. If it seems good to let people use SupportRequestSimplify to make their SRFs be inlineable, I'm happy to add tests and docs. We should really document the idea of inlined functions in general, so I'll do that too. Another approach I considered is using a separate support request, e.g. SupportRequestInlineSRF, and just calling it from inline_set_returning_function. I didn't like having two support requests that did almost exactly the same thing. OTOH my current approach means you'll get an error if you do this: ``` postgres=# select temporal_semijoin('employees', 'id', 'valid_at', 'positions', 'employee_id', 'valid_at'); ERROR: unrecognized node type: 66 ``` I'll look into ways to fix that. I think SupportRequestSimplify is a really cool feature. It is nearly like having macros. I'm dreaming about other ways I can (ab)use it. Just making inline-able SRFs has many applications. From my own client work, I could use this for a big permissions query or a query with complicated pricing logic. The sad part though is that SUPPORT functions must be written in C. That means few people will use them, especially these days when so many are in the cloud. Since they take a Node and return a Node, maybe there is no other way. But I would love to have a different mechanism that receives the function's arguments (evaluated) and returns a string, which we parse as a SQL query and then inline. The arguments would have to be const-reducible to strings, of course. You could specify that function with a new INLINE keyword when you create your target function. That feature would be less powerful, but with broader reach. I'd be glad to hear your thoughts! [1] https://wiki.postgresql.org/wiki/Inlining_of_SQL_functions (I couldn't find any mention in our docs though, so we should add that.) [2] https://www.postgresql.org/docs/current/xfunc-optimization.html [3] https://github.com/pjungwir/temporal_ops/blob/master/temporal_ops--1.0.0.sql [4] https://github.com/pjungwir/temporal_ops/blob/inlined/temporal_ops.c [5] https://github.com/pjungwir/temporal_ops/tree/inlined Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
Attachment
On 28/06/2024 01:01, Paul Jungwirth wrote: > If it seems good to let people use SupportRequestSimplify to make their SRFs be inlineable, I'm > happy to add tests and docs. We should really document the idea of inlined functions in general, so > I'll do that too. > > Another approach I considered is using a separate support request, e.g. SupportRequestInlineSRF, and > just calling it from inline_set_returning_function. I didn't like having two support requests that > did almost exactly the same thing. OTOH my current approach means you'll get an error if you do this: > > ``` > postgres=# select temporal_semijoin('employees', 'id', 'valid_at', 'positions', 'employee_id', > 'valid_at'); > ERROR: unrecognized node type: 66 > ``` > > I'll look into ways to fix that. If the support function returns a Query, we end up having a FuncExpr with a Query in the tree. A Query isnt an Expr, which is why you get that error, and it seems like a recipe for confusion in general. Perhaps returning a SubLink would be better. I think we should actually add an assertion after the call to the SupportRequestSimplify support function, to check that it returned an Expr node. +1 to the general feature of letting SRFs be simplified by the support function. > I think SupportRequestSimplify is a really cool feature. It is nearly like having macros. > I'm dreaming about other ways I can (ab)use it. :-D -- Heikki Linnakangas Neon (https://neon.tech)
Heikki Linnakangas <hlinnaka@iki.fi> writes: > On 28/06/2024 01:01, Paul Jungwirth wrote: >> Another approach I considered is using a separate support request, e.g. SupportRequestInlineSRF, and >> just calling it from inline_set_returning_function. I didn't like having two support requests that >> did almost exactly the same thing. OTOH my current approach means you'll get an error if you do this: >> >> ``` >> postgres=# select temporal_semijoin('employees', 'id', 'valid_at', 'positions', 'employee_id', >> 'valid_at'); >> ERROR: unrecognized node type: 66 >> ``` >> >> I'll look into ways to fix that. I like this idea, but I like exactly nothing about this implementation. The right thing is to have a separate SupportRequestInlineSRF request that is called directly by inline_set_returning_function. It might be "almost the same thing" as SupportRequestSimplify, but "almost" only counts in horseshoes and hand grenades. In particular, returning a Query node is simply broken for SupportRequestSimplify (as your example demonstrates), whereas it's the only correct result for SupportRequestInlineSRF. You could imagine keeping it to one support request by adding a boolean field to the request struct to show which behavior is wanted, but I think the principal result of that would be to break extensions that weren't expecting such calls. The defined mechanism for extending the SupportRequest protocol is to add new support request codes, not to whack around the APIs of existing ones. > I think we should actually add an assertion after the call to the > SupportRequestSimplify support function, to check that it returned an > Expr node. Um ... IsA(node, Expr) isn't going to work, and I'm not sure that it'd be useful to try to enumerate the set of Expr subtypes that should be allowed there. But possibly it'd be worth asserting that it's not a Query, just in case anyone gets confused about the difference between SupportRequestSimplify and SupportRequestInlineSRF. It would be good to have an in-core test case for this request type, but I don't really see any built-in SRFs for which expansion as a sub-SELECT would be an improvement. regards, tom lane
Paul Jungwirth <pj@illuminatedcomputing.com> writes: > Here are new patches using a new SupportRequestInlineSRF request type. They include patches and > documentation. I took a look through this. I feel like we're still some way away from having something committable. I've got two main complaint areas: 1. It doesn't seem like integrating this into inline_set_returning_function was the right thing after all, or maybe just the way you did it isn't right. That function is pretty opinionated about what it is doing, and a lot of what it is doing doesn't seem appropriate for a support-function-driven substitution. As an example, it rejects WITH ORDINALITY, but who's to say that a support function couldn't handle that? More generally, I'm not sure if it's appropriate to make any tests on the function's properties, rather than assuming the support function knows what it's doing. I see you already hacked up the test on prolang, but the others in the same if-clause seem equally dubious from here. I'm also unsure whether it's our business to reject volatile functions or subplans in the function arguments. (Maybe it is, but not sure.) There is also stuff towards the bottom of the function, particularly check_sql_fn_retval and parameter substitution, that I do not think makes sense to apply to a non-SQL-language function; but if I'm reading this right you run all that code on the support function's result. It does make sense to require there to be just one RangeTblFunction in the RTE, since it's not at all clear how we could combine the results if there's more than one. But I wonder if we should just pass the RTE node to the support function, and let it make its own decision about rte->funcordinality. Or if that seems like a bad idea, pass the RangeTblFunction node. I think it's essential to do one of those things rather than fake up a FuncExpr, because a support function for a function returning RECORD would likely need access to the column definition list to figure out what to do. I notice that in the case of non-SRF function inlining, we handle support-function calling in a totally separate function (simplify_function) rather than try to integrate it into the code that does SQL function inlining (inline_function). Maybe a similar approach should be adopted here. We could have a wrapper function that implements the parts worth sharing, such as looking up the target function's pg_proc entry and doing the permissions check. Or perhaps put that stuff into the sole caller, preprocess_function_rtes. If we do keep this in inline_set_returning_function, we need to pay more than zero attention to updating that function's header comment. 2. The documentation needs to be a great deal more explicit about what the function is supposed to return. It needs to be a SELECT Query node that has been through parse analysis and rewriting. I don't think pointing to a regression test function is adequate, or even appropriate. The test function is a pretty bad example as-is, too. It aggressively disregards the API recommendation in supportnodes.h: * Support functions must return a NULL pointer, not fail, if they do not * recognize the request node type or cannot handle the given case; this * allows for future extensions of the set of request cases. As a more minor nit, I think SupportRequestInlineSRF should include "struct PlannerInfo *root", for the same reasons that SupportRequestSimplify does. > I split things up into three patch files because I couldn't get git to gracefully handle shifting a > large block of code into an if statement. The first two patches have no changes except that > indentation (and initializing one variable to NULL). They aren't meant to be committed separately. A hack I've used in the past is to have the main patch just add + if (...) + { ... + } around the to-be-reindented code, and then apply pgindent as a separate patch step. (We used to just leave it to the committer to run pgindent, but I think nowadays the cfbot will whine at you if you submit not-pgindented code.) I think that's easier to review since the reviewer can mechanically verify the pgindent patch. This problem may be moot for this patch once we detangle the support function call from SQL-function inlining, though. regards, tom lane