Thread: Postgres delays function returning large set of data
Hello, First some background: For my application I need to collect and retrieve data at a very fast rate, faster than postgres can produce, but I also have the need to store and access a large range of metadata about the large sets of data I am storing. I am therefore attempting to use postgres for the metadata in combination with the HDF5 filesystem for the large sets of data to store data fast and have all of the power of a relational database over the metadata. Now onto the problem: To combine the two sets of data I have created a C function which can be dynamically loaded into postgres and takes a primary key index into the data and returns a large table (around 30000 elements) of the data (value and timestamp). When I access the data directly it takes me around 30ms to just read the data out of my test case. As the postgres C function is called many times and I must convert the raw data to tuples it introduces more overhead bringing the data to take around 2000ms to run the testcase, this is still acceptable. My problem is when I perform an actual query over my test case postgres takes around 22000 ms to retrieve the sets of data. My concern is: what is postgres doing in these extra 20 seconds, and how can I make this run faster? In summary: I have a large testcase. I am running a C function from postgres which takes a primary key and returns a setof (int4, int4) with around 30000 tuples returned per call my c function, when timed over the duration of the test case, takes 2 seconds to run. the postgres query "select * from function(args);" takes around 11 times this amount of time to execute my function is defined as create or replace function(IN args, IN..., OUT name1 int4, OUT name2 int4) returns setof record as 'libname.so', 'fcnname' LANGUAGE C IMMUTABLE STRICT; I'm sorry if this is the wrong forum, I was torn between hackers and performance. Thank you for any assistance. -- View this message in context: http://www.nabble.com/Postgres-delays-function-returning-large-set-of-data-tp23853886p23853886.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
Hello can you send source code? There are two types of C SRF functions. One returns row ro by row, second store returned rows and returns block. What did you use? ignore comments (in czech please) and look on code - http://www.postgres.cz/index.php/Iter%C3%A1tor_pole regards Pavel Stehule 2009/6/3 flippo00110001 <ddevec@umich.edu>: > > Hello, > > First some background: > For my application I need to collect and retrieve data at a very fast rate, > faster than postgres can produce, but I also have the need to store and > access a large range of metadata about the large sets of data I am storing. > I am therefore attempting to use postgres for the metadata in combination > with the HDF5 filesystem for the large sets of data to store data fast and > have all of the power of a relational database over the metadata. > > Now onto the problem: > To combine the two sets of data I have created a C function which can be > dynamically loaded into postgres and takes a primary key index into the data > and returns a large table (around 30000 elements) of the data (value and > timestamp). When I access the data directly it takes me around 30ms to just > read the data out of my test case. As the postgres C function is called > many times and I must convert the raw data to tuples it introduces more > overhead bringing the data to take around 2000ms to run the testcase, this > is still acceptable. My problem is when I perform an actual query over my > test case postgres takes around 22000 ms to retrieve the sets of data. My > concern is: what is postgres doing in these extra 20 seconds, and how can I > make this run faster? > > In summary: > I have a large testcase. > I am running a C function from postgres which takes a primary key and > returns a setof (int4, int4) with around 30000 tuples returned per call > my c function, when timed over the duration of the test case, takes 2 > seconds to run. > the postgres query "select * from function(args);" takes around 11 times > this amount of time to execute > > my function is defined as > create or replace function(IN args, IN..., OUT name1 int4, OUT name2 int4) > returns setof record as 'libname.so', 'fcnname' LANGUAGE C IMMUTABLE STRICT; > > I'm sorry if this is the wrong forum, I was torn between hackers and > performance. > Thank you for any assistance. > > -- > View this message in context: http://www.nabble.com/Postgres-delays-function-returning-large-set-of-data-tp23853886p23853886.html > Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
Pavel Stehule wrote: > > Hello > > can you send source code? There are two types of C SRF functions. One > returns row ro by row, second store returned rows and returns block. > What did you use? > I had originally used the style that returns row by row, but after reading the page i created a new function which returns a block. This version runs slightly faster (12.5 seconds to run my test case) but it is still far slower than expected. A few notes on the code: getdata function returns an array with length 2*size, the first size elements are one colum the other size elements are the next column. I have timed the call getdata and determined it consumes on average around 30ms of my test case's run time. <code> PG_FUNCTION_INFO_V1(getTableFastHDF5); Datum getTableFastHDF5(PG_FUNCTION_ARGS) { /*{{{*//* Locals */ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;FuncCallContext *fcc;TupleDesc tupdesc;Tuplestorestate*tupstore;MemoryContext per_query_ctx;MemoryContext oldcontext;AttInMetadata *attinmeta;int *data;intsize;int i; if (!connections) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("No connections open, use openHDF5 to open a file first"))); data = getdata(textToStr(PG_GETARG_TEXT_P(0)), PG_GETARG_INT32(1), PG_GETARG_INT32(2), &size, TEMPORARY_CONTEXT); per_query_ctx = rsinfo->econtext->ecxt_per_query_memory;oldcontext = MemoryContextSwitchTo(per_query_ctx); tupdesc = rsinfo->expectedDesc;tupstore = tuplestore_begin_heap(true, false, SortMem); if (data == NULL) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("Could not find specified data in file"))); for (i = 0; i < size; i++){ /*{{{*/ Datum val[2]; bool nulls[2]; HeapTuple tup; MemoryContextSwitchTo(oldcontext); MemSet(nulls, false, 2 * sizeof(bool)); /* fill strings to be turned into tuple */ val[0] = Int32GetDatum(data[i]); val[1] = Int32GetDatum(data[i + size]); /* make tuple */ tup = heap_form_tuple(tupdesc, val, nulls); /* make tuple to datum so it can be returned */ MemoryContextSwitchTo(per_query_ctx); tuplestore_puttuple(tupstore,tup); /* return, but there is more to send */ /*}}}*/}/* return and finish sending */ tuplestore_donestoring(tupstore); MemoryContextSwitchTo(oldcontext); rsinfo->returnMode = SFRM_Materialize;rsinfo->setResult = tupstore;rsinfo->setDesc = tupdesc; return (Datum) 0; /*}}}*/ } </code> -- View this message in context: http://www.nabble.com/Postgres-delays-function-returning-large-set-of-data-tp23853886p23891972.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
flippo00110001 <ddevec@umich.edu> writes: > tupstore = tuplestore_begin_heap(true, false, SortMem); SortMem? What are you testing, 7.4? But anyway, the question I was going to ask is whether you made sure the data set you're trying to return fits in memory. If the tuplestore spills to disk then it's certainly going to be a lot slower than the function itself is. I note that you seem to be leaking copies of all the data in the inner loop --- it'd probably be advisable to do heap_freetuple(tup) to avoid bloating executor memory with still a third copy of the dataset. It might also be advisable to rethink your coding boundaries: if getdata were to return a prefilled tuplestore instead of an array that has to be converted into a tuplestore, you would at least save memory space and perhaps some time too. regards, tom lane