Thread: Re: [NOVICE] Functions in C with Ornate Data Structures

Re: [NOVICE] Functions in C with Ornate Data Structures

From
Tom Lane
Date:
"Stephen P. Berry" <spb@meshuggeneh.net> writes:
>> For example, assuming that you are willing to cheat to the extent of
>> assuming sizeof(pointer) = sizeof(integer), try something like this:

> I'd actually thought of doing something like this, but couldn't find
> an actual explicit argument type for pointers[0], and I can't make
> the assumption you describe for portability reasons (my three main
> test platforms are alpha, sparc64, and x86).

Fair enough.  I had actually thought better of that shortly after writing,
so here's how I'd really do it:

Still make the declaration of the state datatype be "integer" at the SQL
level, and say initcond = 0.  (If you don't do this, you have to fight
nodeAgg.c's ideas about what to do with a pass-by-reference datatype,
and it ain't worth the trouble.)  But in the C code, write acquisition
and return of the state value as

    datstruct *ptr = (datstruct *) PG_GETARG_POINTER(0);

    ...

    PG_RETURN_POINTER(ptr);

This relies on the fact that what you are *really* passing and returning
is not an int but a Datum, and Datum is by definition large enough for
pointers.  The only part of the above that's even slightly dubious is
the assumption that a Datum created from an int32 zero will read as a
pointer NULL --- but I am not aware of any platform where a zero bit
pattern doesn't read as a pointer NULL (and lots of pieces of Postgres
would break on such a platform).  You could get around that too by
making the initial state condition be a SQL NULL instead of a zero, but
I don't see the point.  Unless you need to treat NULL input values as
something other than "ignores", you really want to declare the sfunc as
strict, and that gets in the way of using a NULL initcond.

> Is there any way to keep `intermediate' data used by user-defined
> functions around indefinitely?  I.e., have some sort of crunch_init()
> function that creates a bunch of in-memory data structures, which
> can then be used by subsequent (and independent) queries?

You can if you can figure out how to find them again.  However, the
only obvious answer to that is to use static variables, which falls
down miserably if someone tries to run two independent instances of
your aggregate in one query.  I'd suggest hewing closely to the external
behavior of standard aggregates --- ie, each one is an independent
calculation.  You can use the above techniques to build an efficient
implementation.  If you instead build something that has an API
involving state that persists across queries, I'm pretty sure you'll
regret it in the long run.

> It seems like the general class of thing I'm trying to accomplish
> isn't that esoteric.  Imagine trying to write a function to compute
> the standard deviation of arbitrary precision numbers using the GMP
> library or some such.  Note that I'm not saying that that's what I'm
> trying to do...I'm just offering it as a simple sample problem in
> which one can't pass everything as an argument in an aggregate.  How
> does one set about doing such a thing in Postgres?

I blink not an eye to say that I'd do it exactly as described above.
Stick all the intermediate state into a data structure that's referenced
by a single master pointer, and pass the pointer as the "state value"
of the aggregate.

BTW, mlw posted some contrib code on pghackers just a day or two back
that does something similar to this.  He did some details differently
than I would've, notably this INT32-vs-POINTER business; but it's a
working example.

            regards, tom lane

Re: [NOVICE] Functions in C with Ornate Data Structures

From
mlw
Date:
Tom Lane wrote:
> 
> "Stephen P. Berry" <spb@meshuggeneh.net> writes:
> > Is there any way to keep `intermediate' data used by user-defined
> > functions around indefinitely?  I.e., have some sort of crunch_init()
> > function that creates a bunch of in-memory data structures, which
> > can then be used by subsequent (and independent) queries?

I have had to deal with this problem. I implemented a small version of Oracle's
"contains()" API call. The API is something like this:

select score(1), score(2) from table where contains(cola, 'bla bla bal', 1) >0
and contains(colb, 'fubar', 2) > 1;

On the first call I parse the search string and store it in a hash table based
on the number passed to both contains() and score(). The number passed is an
arbitrary bookmark which separates the various result sets for a single query.

The hash table is static data allocated with malloc(), although I have been
thinking I should use MemoryContextAlloc with the right context, but malloc
seems to work.

On subsequent queries, if the bookmark numer is is found, but the string for
the contains function differes, then I delete the old entry and reparse and
store the new one.

> 
> > It seems like the general class of thing I'm trying to accomplish
> > isn't that esoteric.  Imagine trying to write a function to compute
> > the standard deviation of arbitrary precision numbers using the GMP
> > library or some such.  Note that I'm not saying that that's what I'm
> > trying to do...I'm just offering it as a simple sample problem in
> > which one can't pass everything as an argument in an aggregate.  How
> > does one set about doing such a thing in Postgres?
> 
> I blink not an eye to say that I'd do it exactly as described above.
> Stick all the intermediate state into a data structure that's referenced
> by a single master pointer, and pass the pointer as the "state value"
> of the aggregate.
> 
> BTW, mlw posted some contrib code on pghackers just a day or two back
> that does something similar to this.  He did some details differently
> than I would've, notably this INT32-vs-POINTER business; but it's a
> working example.

The sizeof(int32) == sizeof(void *) is a problem, and I am not happy with it,
although I will look into your (Tom) recommendations.