Thread: explicit cursor vs. for loop in pl/pgsql

explicit cursor vs. for loop in pl/pgsql

From

"David Parker"

Date:

17 May 2005, 15:48:35

I need to process a large table a few "chunks" at a time, commiting in between chunks so that another process can pick up and start processing the data.

I am using a pl/pgsql procedure with a "FOR rec in Select * from tab order by...." statement. The chunksize is passed in to the procedure, and in the FOR loop I iterate until I reach chunksize. The procedure then returns and the calling code issues the commit, etc.

I know from the documentation that the FOR implicitly opens a cursor, but I'm wondering if there would be any performance advantages to explicitly declaring a cursor and moving through it with FETCH commands?

I have to use the ORDER BY, so I imagine I'm taking the hit of processing all the records in the table anyway, regardless of how many I ultimately fetch. The nature of the data is that chunksize doesn't necessarily match up one-for-one with rows, so I can't use it as a LIMIT value.

The table in question gets inserted pretty heavily, and my procedure processes rows then deletes those it has processed. My main concern is to keep the processing fairly smooth, i.e., not have it choke on a select when the table gets huge.

Any suggestions appreciated!

- DAP
----------------------------------------------------------------------------------
David Parker Tazz Networks (401) 709-5130

Re: explicit cursor vs. for loop in pl/pgsql

From

Tom Lane

Date:

17 May 2005, 16:38:47

"David Parker" <dparker@tazznetworks.com> writes:
> I know from the documentation that the FOR implicitly opens a cursor,
> but I'm wondering if there would be any performance advantages to
> explicitly declaring a cursor and moving through it with FETCH commands?

AFAICS it'd be exactly the same.  Might as well stick with the simpler
notation.

> I have to use the ORDER BY, so I imagine I'm taking the hit of
> processing all the records in the table anyway, regardless of how many I
> ultimately fetch.

Not if the ORDER BY can be implemented using an index.  Perhaps what you
need is to make sure that an indexscan gets used.

> The nature of the data is that chunksize doesn't necessarily match up
> one-for-one with rows, so I can't use it as a LIMIT value.

Can you set an upper bound on how many rows you need?  If you can put a
LIMIT into the select, it'll encourage the planner to use an indexscan,
even if you break out of the loop before the limit is reached.

            regards, tom lane

Re: explicit cursor vs. for loop in pl/pgsql

From

"David Parker"

Date:

17 May 2005, 16:51:45

Thanks for the info. I've got an index, so I guess it's as good as it
gets!

The data is actually copied over from the slony transaction log table,
and there's no way to know how many statements (=rows) there might be
for any given transaction, so assigning an arbitrary limit seems too
risky, and I don't want to take the hit of a select count.

Thanks again.

- DAP

>> The nature of the data is that chunksize doesn't necessarily
>match up
>> one-for-one with rows, so I can't use it as a LIMIT value.
>
>Can you set an upper bound on how many rows you need?  If you
>can put a LIMIT into the select, it'll encourage the planner
>to use an indexscan, even if you break out of the loop before
>the limit is reached.
>
>            regards, tom lane
>