Thread: Does writing new records while massive update will generate lock ?
Hi everybody,
I added a datetime column to a table with 51.10^6 entries.
I'm updating this column (for more than 48 hours now) on a RAID5 server.
This transaction is still running and will end in several days. It only uses 1 core.

My question is : Can I add new records in the table or will it generate locks ?
I am using postgresql 8.4
Thanks for your help !
Regards,
Victor
I added a datetime column to a table with 51.10^6 entries.
ALTER TABLE MYBIGTABLE ADD COLUMN date timestamp without time zone;
I'm updating this column (for more than 48 hours now) on a RAID5 server.
UPDATE MYBIGTABLE SET date = (SELECT date FROM INDEXEDTABLE WHERE INDEXEDTABLE.email_id=MYBIGTABLE.email_id) WHERE date is null;
This transaction is still running and will end in several days. It only uses 1 core.

My question is : Can I add new records in the table or will it generate locks ?
I am using postgresql 8.4
Thanks for your help !
Regards,
Victor
Ce message et les éventuels documents joints peuvent contenir des informations confidentielles. Au cas où il ne vous serait pas destiné, nous vous remercions de bien vouloir le supprimer et en aviser immédiatement l'expéditeur. Toute utilisation de ce message non conforme à sa destination, toute diffusion ou publication, totale ou partielle et quel qu'en soit le moyen est formellement interdite.Les communications sur internet n'étant pas sécurisées, l'intégrité de ce message n'est pas assurée et la société émettrice ne peut être tenue pour responsable de son contenu. |
Attachment
On 08/21/2014 08:41 AM, Victor d'Agostino wrote: > UPDATE MYBIGTABLE SET date = (SELECT date FROM INDEXEDTABLE WHERE > INDEXEDTABLE.email_id=MYBIGTABLE.email_id) WHERE date is null; I may be wrong here, but wouldn't this style of query force a nested loop? Over several million rows, that would take an extremely long time. You might want to try this syntax instead: UPDATE MYBIGTABLE big SET date = idx.date FROM INDEXEDTABLE idx WHERE idx.email_id = big.email_id AND big.date IS NULL; > This transaction is still running and will end in several days. It only > uses 1 core. That's not your problem. I suspect if you checked your RAID IO, you'd see 100% IO utilization because instead of a sequence scan, it's performing a random seek for every update. > My question is : Can I add new records in the table or will it generate > locks ? Your update statement will only lock the rows being updated. You should be able to add new rows, but with the IO consuming your RAID, you'll probably see significant write delays that resemble lock waits. -- Shaun Thomas OptionsHouse, LLC | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
On 21/08/2014 14:41, Victor d'Agostino wrote: > Hi everybody, > > I added a datetime column to a table with 51.10^6 entries. > > ALTER TABLE MYBIGTABLE ADD COLUMN date timestamp without time zone; > > I'm updating this column (for more than 48 hours now) on a RAID5 > server. > > UPDATE MYBIGTABLE SET date = (SELECT date FROM INDEXEDTABLE WHERE > INDEXEDTABLE.email_id=MYBIGTABLE.email_id) WHERE date is null; > > This transaction is still running and will end in several days. It > only uses 1 core. > > My question is : Can I add new records in the table or will it > generate locks ? > > I am using postgresql *8.4* Not that this helps your issue, but you may not be aware that 8.4 is now end-of-life and so is no longer supported: http://www.postgresql.org/support/versioning Ray. -- Raymond O'Donnell :: Galway :: Ireland rod@iol.ie
On 21 August 2014 15:41, Victor d'Agostino <victor.d.agostino@fiducial.net> wrote:
I'm updating this column (for more than 48 hours now) on a RAID5 server.
RAID5? That's probably the worst performing RAID configuration you can have and is usually advised against on this list.
You would be better off with RAID 10, RAID 1+0 or even RAID 1, but you would be using more disk space.
You would be better off with RAID 10, RAID 1+0 or even RAID 1, but you would be using more disk space.
That said, if your IO is not being saturated by that query, you could split the update across multiple CPU's by dividing up the email_id's over multiple queries that you run from a session each.
--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.