Thread: Very slow update statement on 40mio rows
Hi!
I'm new to this mailinglist and I'm new to postgres as well. It is about our own backup software (java); we switched the DB from MySQL to postgres and we need some help.
The backup database holds all files from the server in the database. On my testing platform there are about 40 mio rows and the DB size grew to about 70 GB (the same DB was about 11 GB with MySQL, but this is another issue).
Before each backup job, there is a reset to get a consistent state of all files:
UPDATE BackupFiles SET nTapeNr=0, nAFIOCounter=0, nBlockCounter=0, cStatus='NEW'::StatusT, bOnSetBlue=false, bOnSetYellow=false, nLastBackupTS= '0001-01-01 00:00:00' WHERE cStatus='NEW' OR cStatus='WRITING'OR cStatus='ONTAPE';
Explain analyze: http://explain.depesz.com/s/8y5
The statement takes 60-90 minutes. In MySQL the same statement takes about 2 minutes on the same HW.
I tried to optimize the settings but until now without success.
Can we optimize this update statement somehow? Do you have any other ideas?
Any help is appreciated! Thank you!
My current config:
shared_buffers = 2GB
work_mem = 16MB
wal_buffers = 16MB
checkpoint_segments = 45
random_page_cost = 2.0
effective_cache_size = 6GB
HW:
2x Intel E5405 @ 2.00GHz
8 GB RAM
3ware 9650SE-16ML RAID Controller, all caches enabled
DB is on a RAID6 with 14x 1TB (since the DB partition on the RAID1 was too small)
SW:
Debian Squeeze 6.0 with Kernel 3.5.4
Postgres 8.4.13 (standard Debian package)
Table:
+---------------+-----------------------------+---------------------------------------------------------------------+
| Column | Type | Modifiers |
+---------------+-----------------------------+---------------------------------------------------------------------+
| _rowid | bigint | not null default nextval('backupfiles__rowid_seq'::regclass) |
| cfilename | bytea | not null |
| nfilesize | bigint | not null default 0::bigint |
| nfilectimets | timestamp without time zone | not null default '1970-01-01 00:00:00'::timestamp without time zone |
| ntapenr | integer | not null default 0 |
| nafiocounter | bigint | not null default 0::bigint |
| nblockcounter | bigint | not null default 0::bigint |
| cstatus | statust | not null default 'NEW'::statust |
| bonsetblue | boolean | default false |
| bonsetyellow | boolean | default false |
| nlastbackupts | timestamp without time zone | not null default '1970-01-01 00:00:00'::timestamp without time zone |
+---------------+-----------------------------+---------------------------------------------------------------------+
Indexes:
"backupfiles_pkey" PRIMARY KEY, btree (_rowid)
"cfilename_index" btree (cfilename)
"cstatus_index" btree (cstatus)
"nfilectimets_index" btree (nfilectimets)
"ntapenr_index" btree (ntapenr)
Example row:
+--------+-----------------------------+-----------+---------------------+---------+--------------+---------------+---------+------------+--------------+---------------------+
| _rowid | cfilename | nfilesize | nfilectimets | ntapenr | nafiocounter | nblockcounter | cstatus | bonsetblue | bonsetyellow | nlastbackupts |
+--------+-----------------------------+-----------+---------------------+---------+--------------+---------------+---------+------------+--------------+---------------------+
| 1 | /dicom/log/datadir_init.log | 1790 | 2013-01-30 14:02:48 | 0 | 0 | 0 | NEW | f | f | 0001-01-01 00:00:00 |
+--------+-----------------------------+-----------+---------------------+---------+--------------+---------------+---------+------------+--------------+---------------------+
I'm new to this mailinglist and I'm new to postgres as well. It is about our own backup software (java); we switched the DB from MySQL to postgres and we need some help.
The backup database holds all files from the server in the database. On my testing platform there are about 40 mio rows and the DB size grew to about 70 GB (the same DB was about 11 GB with MySQL, but this is another issue).
Before each backup job, there is a reset to get a consistent state of all files:
UPDATE BackupFiles SET nTapeNr=0, nAFIOCounter=0, nBlockCounter=0, cStatus='NEW'::StatusT, bOnSetBlue=false, bOnSetYellow=false, nLastBackupTS= '0001-01-01 00:00:00' WHERE cStatus='NEW' OR cStatus='WRITING'OR cStatus='ONTAPE';
Explain analyze: http://explain.depesz.com/s/8y5
The statement takes 60-90 minutes. In MySQL the same statement takes about 2 minutes on the same HW.
I tried to optimize the settings but until now without success.
Can we optimize this update statement somehow? Do you have any other ideas?
Any help is appreciated! Thank you!
My current config:
shared_buffers = 2GB
work_mem = 16MB
wal_buffers = 16MB
checkpoint_segments = 45
random_page_cost = 2.0
effective_cache_size = 6GB
HW:
2x Intel E5405 @ 2.00GHz
8 GB RAM
3ware 9650SE-16ML RAID Controller, all caches enabled
DB is on a RAID6 with 14x 1TB (since the DB partition on the RAID1 was too small)
SW:
Debian Squeeze 6.0 with Kernel 3.5.4
Postgres 8.4.13 (standard Debian package)
Table:
+---------------+-----------------------------+---------------------------------------------------------------------+
| Column | Type | Modifiers |
+---------------+-----------------------------+---------------------------------------------------------------------+
| _rowid | bigint | not null default nextval('backupfiles__rowid_seq'::regclass) |
| cfilename | bytea | not null |
| nfilesize | bigint | not null default 0::bigint |
| nfilectimets | timestamp without time zone | not null default '1970-01-01 00:00:00'::timestamp without time zone |
| ntapenr | integer | not null default 0 |
| nafiocounter | bigint | not null default 0::bigint |
| nblockcounter | bigint | not null default 0::bigint |
| cstatus | statust | not null default 'NEW'::statust |
| bonsetblue | boolean | default false |
| bonsetyellow | boolean | default false |
| nlastbackupts | timestamp without time zone | not null default '1970-01-01 00:00:00'::timestamp without time zone |
+---------------+-----------------------------+---------------------------------------------------------------------+
Indexes:
"backupfiles_pkey" PRIMARY KEY, btree (_rowid)
"cfilename_index" btree (cfilename)
"cstatus_index" btree (cstatus)
"nfilectimets_index" btree (nfilectimets)
"ntapenr_index" btree (ntapenr)
Example row:
+--------+-----------------------------+-----------+---------------------+---------+--------------+---------------+---------+------------+--------------+---------------------+
| _rowid | cfilename | nfilesize | nfilectimets | ntapenr | nafiocounter | nblockcounter | cstatus | bonsetblue | bonsetyellow | nlastbackupts |
+--------+-----------------------------+-----------+---------------------+---------+--------------+---------------+---------+------------+--------------+---------------------+
| 1 | /dicom/log/datadir_init.log | 1790 | 2013-01-30 14:02:48 | 0 | 0 | 0 | NEW | f | f | 0001-01-01 00:00:00 |
+--------+-----------------------------+-----------+---------------------+---------+--------------+---------------+---------+------------+--------------+---------------------+
--
Mit freundlichen Grüßen Best regards Florian Schröck IT Services aycan Digitalsysteme GmbH Innere Aumühlstr. 5 97076 Würzburg . Germany Tel. +49 (0)9 31. 270 40 88 Fax +49 (0)9 31. 270 40 89 mailto:fschroeck@aycan.de mailto:store@aycan.de http://www.aycan.de Geschäftsführer: Dipl.-Ing. Stephan Popp Sitz der Gesellschaft: Würzburg Eingetragen beim Amtsgericht Würzburg unter HRB 6043 Ust-Id Nr. DE 190658226Ihre Vorteile: www.aycan.de/pacs-wechsel
Was ist ein VNA?: www.aycan.de/vna
Attachment
Florian Schröck <fschroeck@aycan.de> wrote: > UPDATE BackupFiles > SET nTapeNr=0, nAFIOCounter=0, nBlockCounter=0, > cStatus='NEW'::StatusT, bOnSetBlue=false, > bOnSetYellow=false, nLastBackupTS= '0001-01-01 00:00:00' > WHERE cStatus='NEW' OR cStatus='WRITING' OR cStatus='ONTAPE'; > > Explain analyze: http://explain.depesz.com/s/8y5 > The statement takes 60-90 minutes. The EXPLAIN ANALYZE at that URL shows a runtime of 3 minutes and 41 seconds. > I tried to optimize the settings but until now without success. > > Can we optimize this update statement somehow? Do you have any > other ideas? Are there any rows which would already have the values that you are setting? If so, it would be faster to skip those by using this query: UPDATE BackupFiles SET nTapeNr=0, nAFIOCounter=0, nBlockCounter=0, cStatus='NEW'::StatusT, bOnSetBlue=false, bOnSetYellow=false, nLastBackupTS= '0001-01-01 00:00:00' WHERE (cStatus='NEW' OR cStatus='WRITING' OR cStatus='ONTAPE') AND (nTapeNr <> 0 OR nAFIOCounter <> 0 OR nBlockCounter <> 0 OR cStatus <> 'NEW'::StatusT OR bOnSetBlue IS DISTINCT FROM false OR bOnSetYellow IS DISTINCT FROM false OR nLastBackupTS <> '0001-01-01 00:00:00'); Another way to accomplish this is with the suppress_redundant_updates_trigger() trigger function: http://www.postgresql.org/docs/9.2/interactive/functions-trigger.html -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hello Kevin, not updating every row which doesn't need the update solved the problem! Your query took only 1 minute. :) Thank you so much for the fast response, have a great weekend! PS: When you switch to "TEXT" on the explain URL you can see the final runtime which was 66 minutes with the original statement. Best regards, Florian On 02/15/2013 03:59 PM, Kevin Grittner wrote: > Florian Schröck <fschroeck@aycan.de> wrote: > >> UPDATE BackupFiles >> SET nTapeNr=0, nAFIOCounter=0, nBlockCounter=0, >> cStatus='NEW'::StatusT, bOnSetBlue=false, >> bOnSetYellow=false, nLastBackupTS= '0001-01-01 00:00:00' >> WHERE cStatus='NEW' OR cStatus='WRITING' OR cStatus='ONTAPE'; >> >> Explain analyze: http://explain.depesz.com/s/8y5 >> The statement takes 60-90 minutes. > The EXPLAIN ANALYZE at that URL shows a runtime of 3 minutes and 41 > seconds. > >> I tried to optimize the settings but until now without success. >> >> Can we optimize this update statement somehow? Do you have any >> other ideas? > Are there any rows which would already have the values that you are > setting? If so, it would be faster to skip those by using this > query: > > UPDATE BackupFiles > SET nTapeNr=0, nAFIOCounter=0, nBlockCounter=0, > cStatus='NEW'::StatusT, bOnSetBlue=false, > bOnSetYellow=false, nLastBackupTS= '0001-01-01 00:00:00' > WHERE (cStatus='NEW' OR cStatus='WRITING' OR cStatus='ONTAPE') > AND (nTapeNr <> 0 OR nAFIOCounter <> 0 OR nBlockCounter <> 0 > OR cStatus <> 'NEW'::StatusT > OR bOnSetBlue IS DISTINCT FROM false > OR bOnSetYellow IS DISTINCT FROM false > OR nLastBackupTS <> '0001-01-01 00:00:00'); > > Another way to accomplish this is with the > suppress_redundant_updates_trigger() trigger function: > > http://www.postgresql.org/docs/9.2/interactive/functions-trigger.html >
On Fri, Feb 15, 2013 at 9:32 AM, Florian Schröck <fschroeck@aycan.de> wrote: > Hello Kevin, > not updating every row which doesn't need the update solved the problem! > Your query took only 1 minute. :) > > Thank you so much for the fast response, have a great weekend! > > PS: When you switch to "TEXT" on the explain URL you can see the final > runtime which was 66 minutes with the original statement. > > Best regards, > Florian > > On 02/15/2013 03:59 PM, Kevin Grittner wrote: >> Florian Schröck <fschroeck@aycan.de> wrote: >> >>> UPDATE BackupFiles >>> SET nTapeNr=0, nAFIOCounter=0, nBlockCounter=0, >>> cStatus='NEW'::StatusT, bOnSetBlue=false, >>> bOnSetYellow=false, nLastBackupTS= '0001-01-01 00:00:00' >>> WHERE cStatus='NEW' OR cStatus='WRITING' OR cStatus='ONTAPE'; >>> >>> Explain analyze: http://explain.depesz.com/s/8y5 >>> The statement takes 60-90 minutes. >> The EXPLAIN ANALYZE at that URL shows a runtime of 3 minutes and 41 >> seconds. >> >>> I tried to optimize the settings but until now without success. >>> >>> Can we optimize this update statement somehow? Do you have any >>> other ideas? >> Are there any rows which would already have the values that you are >> setting? If so, it would be faster to skip those by using this >> query: >> >> UPDATE BackupFiles >> SET nTapeNr=0, nAFIOCounter=0, nBlockCounter=0, >> cStatus='NEW'::StatusT, bOnSetBlue=false, >> bOnSetYellow=false, nLastBackupTS= '0001-01-01 00:00:00' >> WHERE (cStatus='NEW' OR cStatus='WRITING' OR cStatus='ONTAPE') >> AND (nTapeNr <> 0 OR nAFIOCounter <> 0 OR nBlockCounter <> 0 >> OR cStatus <> 'NEW'::StatusT >> OR bOnSetBlue IS DISTINCT FROM false >> OR bOnSetYellow IS DISTINCT FROM false >> OR nLastBackupTS <> '0001-01-01 00:00:00'); >> >> Another way to accomplish this is with the >> suppress_redundant_updates_trigger() trigger function: >> >> http://www.postgresql.org/docs/9.2/interactive/functions-trigger.html if the number of rows you actually update is not very large relative to size of the table, just for fun, try this: CREATE OR REPLACE FUNCTION BakupFilesCandidateReset(BackupFiles) RETURNS BOOL AS $$ SELECT ($1).cStatus IN('NEW', 'WRITING', 'ONTAPE') AND (($1).nTapeNr, ($1).nAFIOCounter, ($1).nBlockCounter, ($1).cStatus, ($1).bOnSetBlue, ($1).bOnSetYellow, ($1).nLastBackupTS) IS DISTINCT FROM /* simple != will suffice if values are never null */ (0, 0, 0, 'NEW'::StatusT, false, false, '0001-01-01 00:00:00'); $$ LANGUAGE SQL IMMUTABLE; CREATE INDEX ON BackupFiles(BakupFilesCandidateReset(BackupFiles)) WHERE BakupFilesCandidateReset(BackupFiles); SELECT * FROM BackupFiles WHERE BakupFilesCandidateReset(BackupFiles); UPDATE BackupFiles SET ... WHERE BakupFilesCandidateReset(BackupFiles); etc idea here is to maintain partial boolean index representing candidate records to update. plus it's nifty. this is basic mechanism that can be used as foundation for very fast push pull queues. merlin
Hello Merlin,
thanks for the feedback, I forwarded this to my developer, this is an interesting approach.
On 02/19/2013 09:04 PM, Merlin Moncure wrote:
thanks for the feedback, I forwarded this to my developer, this is an interesting approach.
--
Best regards Florian Schröck
On Fri, Feb 15, 2013 at 9:32 AM, Florian Schröck <fschroeck@aycan.de> wrote:Hello Kevin, not updating every row which doesn't need the update solved the problem! Your query took only 1 minute. :) Thank you so much for the fast response, have a great weekend! PS: When you switch to "TEXT" on the explain URL you can see the final runtime which was 66 minutes with the original statement. Best regards, Florian On 02/15/2013 03:59 PM, Kevin Grittner wrote:Florian Schröck <fschroeck@aycan.de> wrote:UPDATE BackupFiles SET nTapeNr=0, nAFIOCounter=0, nBlockCounter=0, cStatus='NEW'::StatusT, bOnSetBlue=false, bOnSetYellow=false, nLastBackupTS= '0001-01-01 00:00:00' WHERE cStatus='NEW' OR cStatus='WRITING' OR cStatus='ONTAPE'; Explain analyze: http://explain.depesz.com/s/8y5 The statement takes 60-90 minutes.The EXPLAIN ANALYZE at that URL shows a runtime of 3 minutes and 41 seconds.I tried to optimize the settings but until now without success. Can we optimize this update statement somehow? Do you have any other ideas?Are there any rows which would already have the values that you are setting? If so, it would be faster to skip those by using this query: UPDATE BackupFiles SET nTapeNr=0, nAFIOCounter=0, nBlockCounter=0, cStatus='NEW'::StatusT, bOnSetBlue=false, bOnSetYellow=false, nLastBackupTS= '0001-01-01 00:00:00' WHERE (cStatus='NEW' OR cStatus='WRITING' OR cStatus='ONTAPE') AND (nTapeNr <> 0 OR nAFIOCounter <> 0 OR nBlockCounter <> 0 OR cStatus <> 'NEW'::StatusT OR bOnSetBlue IS DISTINCT FROM false OR bOnSetYellow IS DISTINCT FROM false OR nLastBackupTS <> '0001-01-01 00:00:00'); Another way to accomplish this is with the suppress_redundant_updates_trigger() trigger function: http://www.postgresql.org/docs/9.2/interactive/functions-trigger.htmlif the number of rows you actually update is not very large relative to size of the table, just for fun, try this: CREATE OR REPLACE FUNCTION BakupFilesCandidateReset(BackupFiles) RETURNS BOOL AS $$ SELECT ($1).cStatus IN('NEW', 'WRITING', 'ONTAPE') AND (($1).nTapeNr, ($1).nAFIOCounter, ($1).nBlockCounter, ($1).cStatus, ($1).bOnSetBlue, ($1).bOnSetYellow, ($1).nLastBackupTS) IS DISTINCT FROM /* simple != will suffice if values are never null */ (0, 0, 0, 'NEW'::StatusT, false, false, '0001-01-01 00:00:00'); $$ LANGUAGE SQL IMMUTABLE; CREATE INDEX ON BackupFiles(BakupFilesCandidateReset(BackupFiles)) WHERE BakupFilesCandidateReset(BackupFiles); SELECT * FROM BackupFiles WHERE BakupFilesCandidateReset(BackupFiles); UPDATE BackupFiles SET ... WHERE BakupFilesCandidateReset(BackupFiles); etc idea here is to maintain partial boolean index representing candidate records to update. plus it's nifty. this is basic mechanism that can be used as foundation for very fast push pull queues. merlin