Thread: Postgres delays function returning large set of data

Postgres delays function returning large set of data

From

flippo00110001

Date:

03 June 2009, 12:52:40

Hello,

First some background:
For my application I need to collect and retrieve data at a very fast rate,
faster than postgres can produce, but I also have the need to store and
access a large range of metadata about the large sets of data I am storing. 
I am therefore attempting to use postgres for the metadata in combination
with the HDF5 filesystem for the large sets of data to store data fast and
have all of the power of a relational database over the metadata.

Now onto the problem:
To combine the two sets of data I have created a C function which can be
dynamically loaded into postgres and takes a primary key index into the data
and returns a large table (around 30000 elements) of the data (value and
timestamp).  When I access the data directly it takes me around 30ms to just
read the data out of my test case.  As the postgres C function is called
many times and I must convert the raw data to tuples it introduces more
overhead bringing the data to take around 2000ms to run the testcase, this
is still acceptable.  My problem is when I perform an actual query over my
test case postgres takes around 22000 ms to retrieve the sets of data.  My
concern is: what is postgres doing in these extra 20 seconds, and how can I
make this run faster?

In summary:
I have a large testcase.
I am running a C function from postgres which takes a primary key and
returns a setof (int4, int4) with around 30000 tuples returned per call
my c function, when timed over the duration of the test case, takes 2
seconds to run.
the postgres query "select * from function(args);" takes around 11 times
this amount of time to execute

my function is defined as
create or replace function(IN args, IN..., OUT name1 int4, OUT name2 int4)
returns setof record as 'libname.so', 'fcnname' LANGUAGE C IMMUTABLE STRICT;

I'm sorry if this is the wrong forum, I was torn between hackers and
performance.
Thank you for any assistance.

-- 
View this message in context:
http://www.nabble.com/Postgres-delays-function-returning-large-set-of-data-tp23853886p23853886.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: Postgres delays function returning large set of data

From

Pavel Stehule

Date:

03 June 2009, 18:05:43

Hello

can you send source code? There are two types of C SRF functions. One
returns row ro by row, second store returned rows and returns block.
What did you use?

ignore comments (in czech please) and look on code -
http://www.postgres.cz/index.php/Iter%C3%A1tor_pole

regards
Pavel Stehule

2009/6/3 flippo00110001 <ddevec@umich.edu>:
>
> Hello,
>
> First some background:
> For my application I need to collect and retrieve data at a very fast rate,
> faster than postgres can produce, but I also have the need to store and
> access a large range of metadata about the large sets of data I am storing.
> I am therefore attempting to use postgres for the metadata in combination
> with the HDF5 filesystem for the large sets of data to store data fast and
> have all of the power of a relational database over the metadata.
>
> Now onto the problem:
> To combine the two sets of data I have created a C function which can be
> dynamically loaded into postgres and takes a primary key index into the data
> and returns a large table (around 30000 elements) of the data (value and
> timestamp).  When I access the data directly it takes me around 30ms to just
> read the data out of my test case.  As the postgres C function is called
> many times and I must convert the raw data to tuples it introduces more
> overhead bringing the data to take around 2000ms to run the testcase, this
> is still acceptable.  My problem is when I perform an actual query over my
> test case postgres takes around 22000 ms to retrieve the sets of data.  My
> concern is: what is postgres doing in these extra 20 seconds, and how can I
> make this run faster?
>
> In summary:
> I have a large testcase.
> I am running a C function from postgres which takes a primary key and
> returns a setof (int4, int4) with around 30000 tuples returned per call
> my c function, when timed over the duration of the test case, takes 2
> seconds to run.
> the postgres query "select * from function(args);" takes around 11 times
> this amount of time to execute
>
> my function is defined as
> create or replace function(IN args, IN..., OUT name1 int4, OUT name2 int4)
> returns setof record as 'libname.so', 'fcnname' LANGUAGE C IMMUTABLE STRICT;
>
> I'm sorry if this is the wrong forum, I was torn between hackers and
> performance.
> Thank you for any assistance.
>
> --
> View this message in context:
http://www.nabble.com/Postgres-delays-function-returning-large-set-of-data-tp23853886p23853886.html
> Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: Postgres delays function returning large set of data

From

flippo00110001

Date:

05 June 2009, 14:14:50


Pavel Stehule wrote:
> 
> Hello
> 
> can you send source code? There are two types of C SRF functions. One
> returns row ro by row, second store returned rows and returns block.
> What did you use?
> 

I had originally used the style that returns row by row, but after reading
the page i created a new function which returns a block.  This version runs
slightly faster (12.5 seconds to run my test case) but it is still far
slower than expected.

A few notes on the code:
getdata function returns an array with length 2*size, the first size
elements are one colum the other size elements are the next column.
I have timed the call getdata and determined it consumes on average around
30ms of my test case's run time.

<code>
PG_FUNCTION_INFO_V1(getTableFastHDF5);

Datum
getTableFastHDF5(PG_FUNCTION_ARGS)
{ /*{{{*//* Locals */ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;FuncCallContext *fcc;TupleDesc
tupdesc;Tuplestorestate*tupstore;MemoryContext per_query_ctx;MemoryContext oldcontext;AttInMetadata *attinmeta;int
*data;intsize;int i;
 
if (!connections)    ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("No
connections open, use openHDF5 to open a file first")));
data = getdata(textToStr(PG_GETARG_TEXT_P(0)), PG_GETARG_INT32(1),
PG_GETARG_INT32(2), &size, TEMPORARY_CONTEXT);
per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;oldcontext = MemoryContextSwitchTo(per_query_ctx);
tupdesc = rsinfo->expectedDesc;tupstore = tuplestore_begin_heap(true, false, SortMem);

if (data == NULL)    ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("Could not
find specified data in file")));

for (i = 0; i < size; i++){   /*{{{*/    Datum val[2];    bool nulls[2];    HeapTuple tup;
    MemoryContextSwitchTo(oldcontext);

    MemSet(nulls, false, 2 * sizeof(bool));

    /* fill strings to be turned into tuple */    val[0] = Int32GetDatum(data[i]);    val[1] = Int32GetDatum(data[i +
size]);
    /* make tuple */    tup = heap_form_tuple(tupdesc, val, nulls);
    /* make tuple to datum so it can be returned */    MemoryContextSwitchTo(per_query_ctx);
tuplestore_puttuple(tupstore,tup);
 
    /* return, but there is more to send */   /*}}}*/}/* return and finish sending */
tuplestore_donestoring(tupstore);
MemoryContextSwitchTo(oldcontext);
rsinfo->returnMode = SFRM_Materialize;rsinfo->setResult = tupstore;rsinfo->setDesc = tupdesc;
return (Datum) 0;

/*}}}*/
}
</code>


-- 
View this message in context:
http://www.nabble.com/Postgres-delays-function-returning-large-set-of-data-tp23853886p23891972.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: Postgres delays function returning large set of data

From

Tom Lane

Date:

05 June 2009, 15:06:45

flippo00110001 <ddevec@umich.edu> writes:
>     tupstore = tuplestore_begin_heap(true, false, SortMem);

SortMem?  What are you testing, 7.4?

But anyway, the question I was going to ask is whether you made sure the
data set you're trying to return fits in memory.  If the tuplestore
spills to disk then it's certainly going to be a lot slower than the
function itself is.

I note that you seem to be leaking copies of all the data in the
inner loop --- it'd probably be advisable to do heap_freetuple(tup) to
avoid bloating executor memory with still a third copy of the dataset.

It might also be advisable to rethink your coding boundaries: if
getdata were to return a prefilled tuplestore instead of an array
that has to be converted into a tuplestore, you would at least save
memory space and perhaps some time too.
        regards, tom lane