Home > mailing lists

Re: About Custom Aggregates, C Extensions and Memory - Mailing list pgsql-hackers

From	Marthin Laubscher
Subject	Re: About Custom Aggregates, C Extensions and Memory
Date	August 18 00:11:06
Msg-id	F51BFCE0-A3B3-4469-A464-D6C8E11CE168@lobeshare.co.za Whole thread Raw
In response to	Re: About Custom Aggregates, C Extensions and Memory (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

Tree view

Tom Lane tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us> wrote:

> Hm. We do not have in-memory tables, although in some cases a temporary table is close enough.

Yay, I didn't somehow overlook them.

> But there is one other pre-existing mechanism that might help you: "expanded objects". The idea there is that you
havesome "flat" representation of your data type that can go into a table at need, but you also have an in-memory
representationthat is better suited to computation, and most of your complicated operations prefer to work on the
expandedrepresentation. The name of the game then becomes how to minimize the number of times a value gets flattened
andexpanded as you push it around in a computation.
>
> As of v18, we have a pretty decent story for that when it comes to values you are manipulating within pl/pgsql
functions,although less so if you need to use other languages. (The expanded-object functionality exists before v18,
butit's hard for user-defined types to avoid excess flattening in earlier versions.)
>
> If that sounds promising, see...

I don't yet command the internal variety to engage with the complexities of Expanded Objects, which I thought formed
partof the TOAST domain I was trying not to get too caught up in just yet.

I suspect that putting emphasis on the opacity of the aggregate value made it unclear that the I'd like for values of
myUDT to look a lot like ordinary variable length binary data values for which the <, > and = operations apply with no
decodingrequired. I only need the sticky decoding for more involved operations like aggregation, various set type
operationsincluding the simplistic adding or removing a scalar value rather than a set of size one, and testing set
membershipof scalars index values with whichever is more appropriate between IN or ANY. I implemented the first
trivial/ naïve attempts in plpgsql but that environment really wasn't conducive to get the real functions written, but
Ionce spoke C better than English (not a joke) so I'm pretty confident I can and should do it there.

Another, who shall not be named, suggested I take a look at the different memory contexts so I had a look. If I
understoodcorrectly, the User Defined Type itself does store something analogous to the aggcontext in an aggregate, but
inthe input and output functions as well as other operators and functions on the UDT I could define an internal memory
structure,switch to a memory context with a suitable lifespan, allocate and manage memory using the appropriate
functionsalong the lines StringInfo does and remembering to switch back to the original memory context the each
functiongets called with. There's even a suggestion that UDT specific custom memory contexts may be created in the
TopMemoryContextusing AllocSetContextCreate with some name or identifier. I'd have to figure out what to use for such
identifier.Maybe there UDT instance already has a database or tuple ID that can be adopted for that purpose. It looks
likeI'd have to find a way to broker between "normal function memory contexts" and per tuple aggregate memory contexts
butthere are language in the comments around that about what is and isn't kosher that I don't know how to interpret..

I don't particularly trust the source. I've paraphrased the above but not all the terms mentioned could be found in the
GitHubcode base. So it could be utter bollocks, outdated information, never meant for public consumption, or entirely
accurateand useful.

Does hearing me mention such things cause you nightmares, either for the perils I'd be facing or the mess it is likely
tocause in your beloved database, or could it be the start of a plan that just might work?

Essentially the aggregate functions would still be front and centre as defined for the user defined type, and though
theuser defined type itself would be largely unaware of it, all the individual functions that manipulate values of the
UDTwould go through the same process of getting access to the value in decoded for if it already exist before calling
thedecoding routines if it doesn't. If I choose the right memory context, would that simply age-out when the session,
transaction,query or aggregate is done, or how what else would know we're done with the memory so we can let go of it?
Asfor transitioning between per tuple aggregate context and normal function context a plan can be devised if the
transitionpoints can be detected, to copy stuff across in memory. Should be possible e.g. to do that in a final
function,even without disturbing the value on the aggregate side of things as advised before.

You're forgiven for thinking I am crazy to consider manually manipulating memory contexts simpler than the high-level
supportfunctions created to toast and detoast values at the plpgsql level of abstraction. I'm a big fan of failing
earlyand hard when things (besides user input) isn't as expected, and also a big fan of explicit programming, i.e. as
littlemagical side-effects as possible.

The SuiteSparse:GraphBLASTS stuff that sparked the conversation you referred to is very likely to play a role in my
projectsomewhere down the line. They don't know the first thing about what I'm doing and why, I might not always see
eyeto eye with all the players from the domains where their work has found fertile grounds, and I'm far from ready for
suchcomplications, but it seems inevitable that our paths will cross at some point. Thanks for the reference, however
unintentional.

Regards,
Marthin Laubscher

pgsql-hackers by date:

From: Peter Geoghegan
Date: 17 August, 20:30:14
Subject: Re: index prefetching

From: Thomas Munro
Date: 18 August, 02:23:04
Subject: Re: [PING] fallocate() causes btrfs to never compress postgresql files

Re: About Custom Aggregates, C Extensions and Memory - Mailing list pgsql-hackers

Previous

Next