Thread: explicit cursor vs. for loop in pl/pgsql
I need to process a large table a few "chunks" at a time, commiting in between chunks so that another process can pick up and start processing the data.
I am using a pl/pgsql procedure with a "FOR rec in Select * from tab order by...." statement. The chunksize is passed in to the procedure, and in the FOR loop I iterate until I reach chunksize. The procedure then returns and the calling code issues the commit, etc.
I know from the documentation that the FOR implicitly opens a cursor, but I'm wondering if there would be any performance advantages to explicitly declaring a cursor and moving through it with FETCH commands?
I have to use the ORDER BY, so I imagine I'm taking the hit of processing all the records in the table anyway, regardless of how many I ultimately fetch. The nature of the data is that chunksize doesn't necessarily match up one-for-one with rows, so I can't use it as a LIMIT value.
The table in question gets inserted pretty heavily, and my procedure processes rows then deletes those it has processed. My main concern is to keep the processing fairly smooth, i.e., not have it choke on a select when the table gets huge.
Any suggestions appreciated!
- DAP
----------------------------------------------------------------------------------
David Parker Tazz Networks (401) 709-5130
"David Parker" <dparker@tazznetworks.com> writes: > I know from the documentation that the FOR implicitly opens a cursor, > but I'm wondering if there would be any performance advantages to > explicitly declaring a cursor and moving through it with FETCH commands? AFAICS it'd be exactly the same. Might as well stick with the simpler notation. > I have to use the ORDER BY, so I imagine I'm taking the hit of > processing all the records in the table anyway, regardless of how many I > ultimately fetch. Not if the ORDER BY can be implemented using an index. Perhaps what you need is to make sure that an indexscan gets used. > The nature of the data is that chunksize doesn't necessarily match up > one-for-one with rows, so I can't use it as a LIMIT value. Can you set an upper bound on how many rows you need? If you can put a LIMIT into the select, it'll encourage the planner to use an indexscan, even if you break out of the loop before the limit is reached. regards, tom lane
Thanks for the info. I've got an index, so I guess it's as good as it gets! The data is actually copied over from the slony transaction log table, and there's no way to know how many statements (=rows) there might be for any given transaction, so assigning an arbitrary limit seems too risky, and I don't want to take the hit of a select count. Thanks again. - DAP >> The nature of the data is that chunksize doesn't necessarily >match up >> one-for-one with rows, so I can't use it as a LIMIT value. > >Can you set an upper bound on how many rows you need? If you >can put a LIMIT into the select, it'll encourage the planner >to use an indexscan, even if you break out of the loop before >the limit is reached. > > regards, tom lane >