Re: About Custom Aggregates, C Extensions and Memory - Mailing list pgsql-hackers
From | Marthin Laubscher |
---|---|
Subject | Re: About Custom Aggregates, C Extensions and Memory |
Date | |
Msg-id | F51BFCE0-A3B3-4469-A464-D6C8E11CE168@lobeshare.co.za Whole thread Raw |
In response to | Re: About Custom Aggregates, C Extensions and Memory (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
Tom Lane tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us> wrote: > Hm. We do not have in-memory tables, although in some cases a temporary table is close enough. Yay, I didn't somehow overlook them. > But there is one other pre-existing mechanism that might help you: "expanded objects". The idea there is that you havesome "flat" representation of your data type that can go into a table at need, but you also have an in-memory representationthat is better suited to computation, and most of your complicated operations prefer to work on the expandedrepresentation. The name of the game then becomes how to minimize the number of times a value gets flattened andexpanded as you push it around in a computation. > > As of v18, we have a pretty decent story for that when it comes to values you are manipulating within pl/pgsql functions,although less so if you need to use other languages. (The expanded-object functionality exists before v18, butit's hard for user-defined types to avoid excess flattening in earlier versions.) > > If that sounds promising, see... I don't yet command the internal variety to engage with the complexities of Expanded Objects, which I thought formed partof the TOAST domain I was trying not to get too caught up in just yet. I suspect that putting emphasis on the opacity of the aggregate value made it unclear that the I'd like for values of myUDT to look a lot like ordinary variable length binary data values for which the <, > and = operations apply with no decodingrequired. I only need the sticky decoding for more involved operations like aggregation, various set type operationsincluding the simplistic adding or removing a scalar value rather than a set of size one, and testing set membershipof scalars index values with whichever is more appropriate between IN or ANY. I implemented the first trivial/ naïve attempts in plpgsql but that environment really wasn't conducive to get the real functions written, but Ionce spoke C better than English (not a joke) so I'm pretty confident I can and should do it there. Another, who shall not be named, suggested I take a look at the different memory contexts so I had a look. If I understoodcorrectly, the User Defined Type itself does store something analogous to the aggcontext in an aggregate, but inthe input and output functions as well as other operators and functions on the UDT I could define an internal memory structure,switch to a memory context with a suitable lifespan, allocate and manage memory using the appropriate functionsalong the lines StringInfo does and remembering to switch back to the original memory context the each functiongets called with. There's even a suggestion that UDT specific custom memory contexts may be created in the TopMemoryContextusing AllocSetContextCreate with some name or identifier. I'd have to figure out what to use for such identifier.Maybe there UDT instance already has a database or tuple ID that can be adopted for that purpose. It looks likeI'd have to find a way to broker between "normal function memory contexts" and per tuple aggregate memory contexts butthere are language in the comments around that about what is and isn't kosher that I don't know how to interpret.. I don't particularly trust the source. I've paraphrased the above but not all the terms mentioned could be found in the GitHubcode base. So it could be utter bollocks, outdated information, never meant for public consumption, or entirely accurateand useful. Does hearing me mention such things cause you nightmares, either for the perils I'd be facing or the mess it is likely tocause in your beloved database, or could it be the start of a plan that just might work? Essentially the aggregate functions would still be front and centre as defined for the user defined type, and though theuser defined type itself would be largely unaware of it, all the individual functions that manipulate values of the UDTwould go through the same process of getting access to the value in decoded for if it already exist before calling thedecoding routines if it doesn't. If I choose the right memory context, would that simply age-out when the session, transaction,query or aggregate is done, or how what else would know we're done with the memory so we can let go of it? Asfor transitioning between per tuple aggregate context and normal function context a plan can be devised if the transitionpoints can be detected, to copy stuff across in memory. Should be possible e.g. to do that in a final function,even without disturbing the value on the aggregate side of things as advised before. You're forgiven for thinking I am crazy to consider manually manipulating memory contexts simpler than the high-level supportfunctions created to toast and detoast values at the plpgsql level of abstraction. I'm a big fan of failing earlyand hard when things (besides user input) isn't as expected, and also a big fan of explicit programming, i.e. as littlemagical side-effects as possible. The SuiteSparse:GraphBLASTS stuff that sparked the conversation you referred to is very likely to play a role in my projectsomewhere down the line. They don't know the first thing about what I'm doing and why, I might not always see eyeto eye with all the players from the domains where their work has found fertile grounds, and I'm far from ready for suchcomplications, but it seems inevitable that our paths will cross at some point. Thanks for the reference, however unintentional. Regards, Marthin Laubscher
pgsql-hackers by date: