Thread: pg_wal fills up on big update query
Hi all,
I have a migration where I
· Add a new nullable column to a table
· update almost every row in this big table (8 million rows) from another table where I set this new column
I have also a replication setup running.
The database has a size of around 20GB.
While the migration is running, it more than doubles is size and fills up all space.
Then the migration fails and is rolled back.
What is the best way of keeping this from happening?
My current idea is to lock both tables completely from access (the queried and the updated one) so that postgresql does not have to ensure isolation for concurrent queries by keeping a copy of each row.
Is my thinking here correct?
Thanks in advance and Best Regards,
Daniel
This message may contain confidential and privileged information. If it has been sent to you in error, please reply to advise the sender of the error and then immediately permanently delete it and all attachments to it from your systems. If you are not the intended recipient, do not read, copy, disclose or otherwise use this message or any attachments to it. The sender disclaims any liability for such unauthorized use. PLEASE NOTE that all incoming e-mails sent to PDF e-mail accounts will be archived and may be scanned by us and/or by external service providers to detect and prevent threats to our systems, investigate illegal or inappropriate behavior, and/or eliminate unsolicited promotional e-mails (“spam”). If you have any concerns about this process, please contact us at legal.department@pdf.com.
Do the update in small chunksHi all,
I have a migration where I
· Add a new nullable column to a table
· update almost every row in this big table (8 million rows) from another table where I set this new column
I have also a replication setup running.
The database has a size of around 20GB.
While the migration is running, it more than doubles is size and fills up all space.
Then the migration fails and is rolled back.
What is the best way of keeping this from happening?
My current idea is to lock both tables completely from access (the queried and the updated one) so that postgresql does not have to ensure isolation for concurrent queries by keeping a copy of each row.
Is my thinking here correct?
Thanks in advance and Best Regards,
On Wed, Aug 7, 2019 at 3:34 PM Daniel Fink (PDF) <daniel.fink@pdf.com> wrote: > My current idea is to lock both tables completely from access (the queried and the updated one) so that postgresql doesnot have to ensure isolation for concurrent queries by keeping a copy of each row. I'm not sure that locking will prevent the snapshotting and the WAL machinery, but someone more expert on the are could clarify this. Since the column is nullable, I would apply it outside of the transaction, and then do the update. If that still fails, I would try to split the update on small chunks (after all, it's an update, so it is smething you can line up data). Luca
Hi Rob,
Thanks, I will try.
It’s a bit of a bummer though, because I just started to use flywaydb to manage migrations, and it wraps all migrations into a single transaction.
So I have to do this outside of the tool.
Best Regards,
Daniel
From: Rob Sargent [mailto:robjsargent@gmail.com]
Sent: Wednesday, August 7, 2019 4:22 PM
To: Daniel Fink (PDF) <daniel.fink@pdf.com>
Cc: pgsql-general@lists.postgresql.org
Subject: Re: pg_wal fills up on big update query
On Aug 7, 2019, at 7:34 AM, Daniel Fink (PDF) <daniel.fink@pdf.com> wrote:
Hi all,
I have a migration where I
· Add a new nullable column to a table
· update almost every row in this big table (8 million rows) from another table where I set this new column
I have also a replication setup running.
The database has a size of around 20GB.
While the migration is running, it more than doubles is size and fills up all space.
Then the migration fails and is rolled back.
What is the best way of keeping this from happening?
My current idea is to lock both tables completely from access (the queried and the updated one) so that postgresql does not have to ensure isolation for concurrent queries by keeping a copy of each row.
Is my thinking here correct?
Thanks in advance and Best Regards,
Do the update in small chunks
This message may contain confidential and privileged information. If it has been sent to you in error, please reply to advise the sender of the error and then immediately permanently delete it and all attachments to it from your systems. If you are not the intended recipient, do not read, copy, disclose or otherwise use this message or any attachments to it. The sender disclaims any liability for such unauthorized use. PLEASE NOTE that all incoming e-mails sent to PDF e-mail accounts will be archived and may be scanned by us and/or by external service providers to detect and prevent threats to our systems, investigate illegal or inappropriate behavior, and/or eliminate unsolicited promotional e-mails (“spam”). If you have any concerns about this process, please contact us at legal.department@pdf.com.
Hi Rob, Thanks, I will try. It’s a bit of a bummer though, because I just started to use flywaydb to manage migrations, and it wraps all migrations into a single transaction. So I have to do this outside of the tool. I will still try to evaluate if locking has any effect, to have a better understanding of how postgres works under the hood. Best Regards, Daniel -----Original Message----- From: Luca Ferrari [mailto:fluca1978@gmail.com] Sent: Friday, August 9, 2019 3:58 PM To: Daniel Fink (PDF) <daniel.fink@pdf.com> Cc: pgsql-general <pgsql-general@lists.postgresql.org> Subject: Re: pg_wal fills up on big update query On Wed, Aug 7, 2019 at 3:34 PM Daniel Fink (PDF) <daniel.fink@pdf.com> wrote: > My current idea is to lock both tables completely from access (the queried > and the updated one) so that postgresql does not have to ensure isolation > for concurrent queries by keeping a copy of each row. I'm not sure that locking will prevent the snapshotting and the WAL machinery, but someone more expert on the are could clarify this. Since the column is nullable, I would apply it outside of the transaction, and then do the update. If that still fails, I would try to split the update on small chunks (after all, it's an update, so it is smething you can line up data). Luca -- This message may contain confidential and privileged information. If it has been sent to you in error, please reply to advise the sender of the error and then immediately permanently delete it and all attachments to it from your systems. If you are not the intended recipient, do not read, copy, disclose or otherwise use this message or any attachments to it. The sender disclaims any liability for such unauthorized use. PLEASE NOTE that all incoming e-mails sent to PDF e-mail accounts will be archived and may be scanned by us and/or by external service providers to detect and prevent threats to our systems, investigate illegal or inappropriate behavior, and/or eliminate unsolicited promotional e-mails (“spam”). If you have any concerns about this process, please contact us at legal.department@pdf.com <mailto:legal.department@pdf.com>.
flyway is an excellent tool, I use it too. Just make a series of migrations.
Hi Rob,
Thanks, I will try.
It’s a bit of a bummer though, because I just started to use flywaydb to manage migrations, and it wraps all migrations into a single transaction.
So I have to do this outside of the tool.
Best Regards,
Daniel
From: Rob Sargent [mailto:robjsargent@gmail.com]
Sent: Wednesday, August 7, 2019 4:22 PM
To: Daniel Fink (PDF) <daniel.fink@pdf.com>
Cc: pgsql-general@lists.postgresql.org
Subject: Re: pg_wal fills up on big update query
On Aug 7, 2019, at 7:34 AM, Daniel Fink (PDF) <daniel.fink@pdf.com> wrote:Hi all,
I have a migration where I
· Add a new nullable column to a table
· update almost every row in this big table (8 million rows) from another table where I set this new column
I have also a replication setup running.
The database has a size of around 20GB.
While the migration is running, it more than doubles is size and fills up all space.
Then the migration fails and is rolled back.
What is the best way of keeping this from happening?
My current idea is to lock both tables completely from access (the queried and the updated one) so that postgresql does not have to ensure isolation for concurrent queries by keeping a copy of each row.
Is my thinking here correct?
Thanks in advance and Best Regards,
Do the update in small chunks
This message may contain confidential and privileged information. If it has been sent to you in error, please reply to advise the sender of the error and then immediately permanently delete it and all attachments to it from your systems. If you are not the intended recipient, do not read, copy, disclose or otherwise use this message or any attachments to it. The sender disclaims any liability for such unauthorized use. PLEASE NOTE that all incoming e-mails sent to PDF e-mail accounts will be archived and may be scanned by us and/or by external service providers to detect and prevent threats to our systems, investigate illegal or inappropriate behavior, and/or eliminate unsolicited promotional e-mails (“spam”). If you have any concerns about this process, please contact us at legal.department@pdf.com.