Thread: Vacuum full: alternatives?
Hello, we have a table with an heavy traffic of pg_bulkload and delete of records. The size pass, in only one day, for example for 1Gb to 4Gb and then 1Gb back. We have important problems on size and the only way to gain free space is issueing a vacuum full <table>. But the operation is very slow, sometimes 2/4 hours, and table is not available for services as it is locked. We do not delete everything at one (in this case the truncate woudl resolve the problem). The autovacuum is not able (same for normal vacuum) to free the spaces. Are there some suggestions or another way to manage this? Thank you! Francesco
From: Job <Job@colliniconsulting.it>
To: "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Sent: Monday, June 20, 2016 5:18 AM
Subject: [GENERAL] Vacuum full: alternatives?
Hello,
we have a table with an heavy traffic of pg_bulkload and delete of records.
The size pass, in only one day, for example for 1Gb to 4Gb and then 1Gb back.
We have important problems on size and the only way to gain free space is issueing a vacuum full <table>.
But the operation is very slow, sometimes 2/4 hours, and table is not available for services as it is locked.
We do not delete everything at one (in this case the truncate woudl resolve the problem).
The autovacuum is not able (same for normal vacuum) to free the spaces.
Are there some suggestions or another way to manage this?
Thank you!
Francesco
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Am 20.06.2016 um 11:18 schrieb Job: > Hello, > > we have a table with an heavy traffic of pg_bulkload and delete of records. > The size pass, in only one day, for example for 1Gb to 4Gb and then 1Gb back. > > We have important problems on size and the only way to gain free space is issueing a vacuum full <table>. > But the operation is very slow, sometimes 2/4 hours, and table is not available for services as it is locked. > > We do not delete everything at one (in this case the truncate woudl resolve the problem). > > The autovacuum is not able (same for normal vacuum) to free the spaces. > autovaccum marks space as free, but don't give the space back to os. I would suggest run only autovacuum, and with time you will see a not more growing table. There is no need for vacuum full. Andreas
Francesco
Da: pgsql-general-owner@postgresql.org [pgsql-general-owner@postgresql.org] per conto di Rakesh Kumar [rakeshkumar464a3@gmail.com]
Inviato: lunedì 20 giugno 2016 11.34
A: pgsql-general@postgresql.org
Oggetto: Re: [GENERAL] Vacuum full: alternatives?
From: Job <Job@colliniconsulting.it>
To: "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Sent: Monday, June 20, 2016 5:18 AM
Subject: [GENERAL] Vacuum full: alternatives?
Hello,
we have a table with an heavy traffic of pg_bulkload and delete of records.
The size pass, in only one day, for example for 1Gb to 4Gb and then 1Gb back.
We have important problems on size and the only way to gain free space is issueing a vacuum full <table>.
But the operation is very slow, sometimes 2/4 hours, and table is not available for services as it is locked.
We do not delete everything at one (in this case the truncate woudl resolve the problem).
The autovacuum is not able (same for normal vacuum) to free the spaces.
Are there some suggestions or another way to manage this?
Thank you!
Francesco
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Hi Andreas, >I would suggest run only autovacuum, and with time you will see a not >more growing table. There is no need for vacuum full. So new record, when will be pg_bulkloaded, will replace "marked-free" location? Thank you! Francesco ________________________________________ Da: pgsql-general-owner@postgresql.org [pgsql-general-owner@postgresql.org] per conto di Andreas Kretschmer [andreas@a-kretschmer.de] Inviato: lunedì 20 giugno 2016 11.37 A: pgsql-general@postgresql.org Oggetto: Re: [GENERAL] Vacuum full: alternatives? Am 20.06.2016 um 11:18 schrieb Job: > Hello, > > we have a table with an heavy traffic of pg_bulkload and delete of records. > The size pass, in only one day, for example for 1Gb to 4Gb and then 1Gb back. > > We have important problems on size and the only way to gain free space is issueing a vacuum full <table>. > But the operation is very slow, sometimes 2/4 hours, and table is not available for services as it is locked. > > We do not delete everything at one (in this case the truncate woudl resolve the problem). > > The autovacuum is not able (same for normal vacuum) to free the spaces. > autovaccum marks space as free, but don't give the space back to os. I would suggest run only autovacuum, and with time you will see a not more growing table. There is no need for vacuum full. Andreas -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
On Mon, 2016-06-20 at 11:43 +0200, Job wrote:
> Hi Andreas,
>
> >I would suggest run only autovacuum, and with time you will see a not
> >more growing table. There is no need for vacuum full.
>
> So new record, when will be pg_bulkloaded, will replace "marked-free" location?
Yes, but you may have to make autovacuum process more aggressive. Even
then it might not cope with the frequency of your bulk I/U/D.
There are many tools to use instead of VACUUM FULL though, have a look
at pg_reorg and pgcompact for example. Do not be afraid to use an
awesome tool called Google as well ;-) (This is like the very basic
problem everyone asks about, so you'll find many more in-depth answers
and articles; the phrase you want to google for is "postgresql bloat")
Cheers,
R.
CONFIDENTIALITY NOTICE
The information in this email is confidential and is intended solely for the addressee. Access to this email by anyone else is unauthorised. If you are not the intended recipient, you must not read, use or disseminate the information. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Net-A-Porter Group Limited.
The Net-A-Porter Group Limited is a company registered in England & Wales Number: 3820604 Registered Office: 1 The Village Offices, Westfield, Ariel Way, London, W12 7GF
Am 20.06.2016 um 11:43 schrieb Job: > Hi Andreas, > >> I would suggest run only autovacuum, and with time you will see a not >> more growing table. There is no need for vacuum full. > So new record, when will be pg_bulkloaded, will replace "marked-free" location? exactly, that's the task for vacuum Andreas
From: Job <Job@colliniconsulting.it>
To: Rakesh Kumar <rakeshkumar464a3@gmail.com>; "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Sent: Monday, June 20, 2016 5:39 AM
Subject: R: [GENERAL] Vacuum full: alternatives?
Francesco
Da: pgsql-general-owner@postgresql.org [pgsql-general-owner@postgresql.org] per conto di Rakesh Kumar [rakeshkumar464a3@gmail.com]
Inviato: lunedì 20 giugno 2016 11.34
A: pgsql-general@postgresql.org
Oggetto: Re: [GENERAL] Vacuum full: alternatives?
From: Job <Job@colliniconsulting.it>
To: "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Sent: Monday, June 20, 2016 5:18 AM
Subject: [GENERAL] Vacuum full: alternatives?
Hello,
we have a table with an heavy traffic of pg_bulkload and delete of records.
The size pass, in only one day, for example for 1Gb to 4Gb and then 1Gb back.
We have important problems on size and the only way to gain free space is issueing a vacuum full <table>.
But the operation is very slow, sometimes 2/4 hours, and table is not available for services as it is locked.
We do not delete everything at one (in this case the truncate woudl resolve the problem).
The autovacuum is not able (same for normal vacuum) to free the spaces.
Are there some suggestions or another way to manage this?
Thank you!
Francesco
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Am 20.06.2016 um 11:43 schrieb Job:Hi Andreas,I would suggest run only autovacuum, and with time you will see a notSo new record, when will be pg_bulkloaded, will replace "marked-free" location?
more growing table. There is no need for vacuum full.
exactly, that's the task for vacuum
Andreas
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
>We do not delete everything at one (in this case the truncate woudl resolve the problem).
(based on a date or key field) will allow you to vacuum full only 1 partition at a time.
--
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

But then autovaccum avoids this. Granted it won't give back free space to OS, but it won't let it grow too (or am I missing something).
From: Job <Job@colliniconsulting.it>
To: Rakesh Kumar <rakeshkumar464a3@gmail.com>; "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Sent: Monday, June 20, 2016 5:39 AM
Subject: R: [GENERAL] Vacuum full: alternatives?Hi Rakesh,if i do not free disk space, after some days disk can become full.Everyday we have a lot of pg_bulkload and delete.Thank you!
Francesco
Da: pgsql-general-owner@postgresql.org [pgsql-general-owner@postgresql.org] per conto di Rakesh Kumar [rakeshkumar464a3@gmail.com]
Inviato: lunedì 20 giugno 2016 11.34
A: pgsql-general@postgresql.org
Oggetto: Re: [GENERAL] Vacuum full: alternatives?Any reason why you need the space back? What is wrong with space remaining constant at 4GB.
From: Job <Job@colliniconsulting.it>
To: "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Sent: Monday, June 20, 2016 5:18 AM
Subject: [GENERAL] Vacuum full: alternatives?
Hello,
we have a table with an heavy traffic of pg_bulkload and delete of records.
The size pass, in only one day, for example for 1Gb to 4Gb and then 1Gb back.
We have important problems on size and the only way to gain free space is issueing a vacuum full <table>.
But the operation is very slow, sometimes 2/4 hours, and table is not available for services as it is locked.
We do not delete everything at one (in this case the truncate woudl resolve the problem).
The autovacuum is not able (same for normal vacuum) to free the spaces.
Are there some suggestions or another way to manage this?
Thank you!
Francesco
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
> but it won't let it grow too (or am I missing something).
Yes, you are missing something. By partioning and {Vacuum Full only the table with data no longer needed}, the rest of the data remains available to the users
AND space is reclaimed by the O/S, so it's the best of both worlds.
--
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

Yes, you are missing something. By partioning and {Vacuum Full only the table with data no longer needed}, the rest of the data remains available to the usersOn Mon, Jun 20, 2016 at 8:29 AM, Rakesh Kumar <rakeshkumar464a3@gmail.com> wrote:But then autovaccum avoids this. Granted it won't give back free space to OS, but it won't let it grow too (or am I missing something).
From: Job <Job@colliniconsulting.it>
To: Rakesh Kumar <rakeshkumar464a3@gmail.com>; "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Sent: Monday, June 20, 2016 5:39 AM
Subject: R: [GENERAL] Vacuum full: alternatives?Hi Rakesh,if i do not free disk space, after some days disk can become full.Everyday we have a lot of pg_bulkload and delete.Thank you!
Francesco
Da: pgsql-general-owner@postgresql.org [pgsql-general-owner@postgresql.org] per conto di Rakesh Kumar [rakeshkumar464a3@gmail.com]
Inviato: lunedì 20 giugno 2016 11.34
A: pgsql-general@postgresql.org
Oggetto: Re: [GENERAL] Vacuum full: alternatives?Any reason why you need the space back? What is wrong with space remaining constant at 4GB.
From: Job <Job@colliniconsulting.it>
To: "pgsql-general@postgresql.org" <pgsql-general@postgresql.org>
Sent: Monday, June 20, 2016 5:18 AM
Subject: [GENERAL] Vacuum full: alternatives?
Hello,
we have a table with an heavy traffic of pg_bulkload and delete of records.
The size pass, in only one day, for example for 1Gb to 4Gb and then 1Gb back.
We have important problems on size and the only way to gain free space is issueing a vacuum full <table>.
But the operation is very slow, sometimes 2/4 hours, and table is not available for services as it is locked.
We do not delete everything at one (in this case the truncate woudl resolve the problem).
The autovacuum is not able (same for normal vacuum) to free the spaces.
Are there some suggestions or another way to manage this?
Thank you!
Francesco
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
> but it won't let it grow too (or am I missing something).
AND space is reclaimed by the O/S, so it's the best of both worlds.
--Melvin Davidson
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.
---
alter table table_name set (autovacuum_enabled=true, autovacuum_vacuum_threshold=5000, autovacuum_analyze_threshold=5000, autovacuum_vacuum_scale_factor=0.1, autovacuum_analyze_scale_factor=0.2);
link : https://www.postgresql.org/docs/8.3/static/sql-createtable.html
El 20/06/16 a las 09:50, Melvin Davidson escribió: > > >>but it won't let it grow too (or am I missing something). > > Yes, you are missing something. By partioning and {Vacuum Full only the > table with data no longer needed}, the rest of the data remains > available to the users > AND space is reclaimed by the O/S, so it's the best of both worlds. That's not entirely true. Think about a SELECT which has to scan all child tables. Your are also adding another layer of complexity to the system. -- Martín Marqués http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 20/06/16 16:23, Martín Marqués wrote: > El 20/06/16 a las 09:50, Melvin Davidson escribió: >> >> >>> but it won't let it grow too (or am I missing something). >> >> Yes, you are missing something. By partioning and {Vacuum Full only the >> table with data no longer needed}, the rest of the data remains >> available to the users >> AND space is reclaimed by the O/S, so it's the best of both worlds. > > That's not entirely true. Think about a SELECT which has to scan all > child tables. Or any SELECT on the parent at all. The planner needs to examine the CHECK constraints on the children and can't do it if the child is locked in ACCESS EXCLUSIVE mode. -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On 20.06.2016 17:30, Vik Fearing wrote: > On 20/06/16 16:23, Martín Marqués wrote: >> El 20/06/16 a las 09:50, Melvin Davidson escribió: >>> >>>> but it won't let it grow too (or am I missing something). >>> Yes, you are missing something. By partioning and {Vacuum Full only the >>> table with data no longer needed}, the rest of the data remains >>> available to the users >>> AND space is reclaimed by the O/S, so it's the best of both worlds. >> That's not entirely true. Think about a SELECT which has to scan all >> child tables. > Or any SELECT on the parent at all. The planner needs to examine the > CHECK constraints on the children and can't do it if the child is locked > in ACCESS EXCLUSIVE mode. +1 Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
2016-06-20 11:30 GMT-03:00 Vik Fearing <vik@2ndquadrant.fr>: > On 20/06/16 16:23, Martín Marqués wrote: >> >> That's not entirely true. Think about a SELECT which has to scan all >> child tables. > > Or any SELECT on the parent at all. The planner needs to examine the > CHECK constraints on the children and can't do it if the child is locked > in ACCESS EXCLUSIVE mode. Yeah, totally skipped my mind that, so partitioning is actually a bad idea, if that's all they are looking to solve. Thanks Vik for showing the oversight -- Martín Marqués http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Jun 20, 2016 at 3:13 AM, Andreas Kretschmer <andreas@a-kretschmer.de> wrote: > > > Am 20.06.2016 um 11:43 schrieb Job: >> >> Hi Andreas, >> >>> I would suggest run only autovacuum, and with time you will see a not >>> more growing table. There is no need for vacuum full. >> >> So new record, when will be pg_bulkloaded, will replace "marked-free" >> location? > > exactly, that's the task for vacuum Are you sure that that is the case with pg_bulkload specifically? It bypasses the shared buffers, so it would not surprise me if it bypasses the free space map as well, and thus always appends its data to the end of the table. Cheers, Jeff
Am 20.06.2016 um 11:43 schrieb Job:Hi Andreas,I would suggest run only autovacuum, and with time you will see a notSo new record, when will be pg_bulkloaded, will replace "marked-free" location?
more growing table. There is no need for vacuum full.
exactly, that's the task for vacuum
Andreas
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Mon, Jun 20, 2016 at 6:13 AM, Andreas Kretschmer <andreas@a-kretschmer.de> wrote:
Am 20.06.2016 um 11:43 schrieb Job:Hi Andreas,I would suggest run only autovacuum, and with time you will see a notSo new record, when will be pg_bulkloaded, will replace "marked-free" location?
more growing table. There is no need for vacuum full.
exactly, that's the task for vacuumI believe that free space is only available to UPDATE, not INSERT.
Andreas
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general--
Martin and Vik,
>...Think about a SELECT which has to scan all child tables.
You are really digging for a corner case.
If a scan has to scan all child tables, then
A. it negates the ability to make partitions which are not used
and
B. The SELECT query is poorly crafted.
--
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

On Mon, Jun 20, 2016 at 6:13 AM, Andreas Kretschmer <andreas@a-kretschmer.de> wrote:
Am 20.06.2016 um 11:43 schrieb Job:Hi Andreas,I would suggest run only autovacuum, and with time you will see a notSo new record, when will be pg_bulkloaded, will replace "marked-free" location?
more growing table. There is no need for vacuum full.
exactly, that's the task for vacuumI believe that free space is only available to UPDATE, not INSERT.
--
El 20/06/16 a las 12:06, Melvin Davidson escribió: > > Martin and Vik, > >>...Think about a SELECT which has to scan all child tables. > > You are really digging for a corner case. > If a scan has to scan all child tables, then > A. it negates the ability to make partitions which are not used > and > B. The SELECT query is poorly crafted. And you haven't read Vik's reply. :) -- Martín Marqués http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
El 20/06/16 a las 12:06, Melvin Davidson escribió:
>
> Martin and Vik,
>
>>...Think about a SELECT which has to scan all child tables.
>
> You are really digging for a corner case.
> If a scan has to scan all child tables, then
> A. it negates the ability to make partitions which are not used
> and
> B. The SELECT query is poorly crafted.
And you haven't read Vik's reply. :)
--
Martín Marqués http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
>And you haven't read Vik's reply. :)
--
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

El 20/06/16 a las 11:52, Jeff Janes escribió: > On Mon, Jun 20, 2016 at 3:13 AM, Andreas Kretschmer > <andreas@a-kretschmer.de> wrote: >> >> >> Am 20.06.2016 um 11:43 schrieb Job: >>> >>> Hi Andreas, >>> >>>> I would suggest run only autovacuum, and with time you will see a not >>>> more growing table. There is no need for vacuum full. >>> >>> So new record, when will be pg_bulkloaded, will replace "marked-free" >>> location? >> >> exactly, that's the task for vacuum > > Are you sure that that is the case with pg_bulkload specifically? It > bypasses the shared buffers, so it would not surprise me if it > bypasses the free space map as well, and thus always appends its data > to the end of the table. I didn't do a super intensive check of pg_bulkload, but AFAICS it does batches of COPY with PQputCopyData. If the relation has free space which was reclaimed by vacuum/autovacuum it will try to use that space and not extend the relation (which is more expensive). This happens if used space on those pages is lower than the fillfactor set for that table. IMO, he should start setting autovacuum more aggressively, or running aggressive vacuum, and see how that works. Also, install pgstattuple and check free space on the relation to see how much dead tuples and free space there is. Those are my 2 cents. -- Martín Marqués http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 6/20/2016 8:03 AM, Scott Mead wrote: > > I believe that free space is only available to UPDATE, not INSERT. incorrect. in fact, an update is performed identically to an INSERT + DELETE(old) -- john r pierce, recycling bits in santa cruz
On 06/20/2016 03:18 AM, Job wrote: > Hello, > > we have a table with an heavy traffic of pg_bulkload and delete of records. > The size pass, in only one day, for example for 1Gb to 4Gb and then 1Gb back. > > We have important problems on size and the only way to gain free space is issueing a vacuum full <table>. > But the operation is very slow, sometimes 2/4 hours, and table is not available for services as it is locked. > > We do not delete everything at one (in this case the truncate woudl resolve the problem). > > The autovacuum is not able (same for normal vacuum) to free the spaces. > > Are there some suggestions or another way to manage this? Hi Francesco, We use pg_repack (http://reorg.github.io/pg_repack/) for a similar workload. It allows what amounts to an online vacuum full. The only caveat is that you need to have the available disk space to fully rebuild the table in parallel. Hope that helps. Cheers! - Chris
On Mon, Jun 20, 2016 at 7:23 AM, Martín Marqués <martin@2ndquadrant.com> wrote: > El 20/06/16 a las 09:50, Melvin Davidson escribió: >> >> >>>but it won't let it grow too (or am I missing something). >> >> Yes, you are missing something. By partioning and {Vacuum Full only the >> table with data no longer needed}, the rest of the data remains >> available to the users >> AND space is reclaimed by the O/S, so it's the best of both worlds. > > That's not entirely true. Think about a SELECT which has to scan all > child tables. > Yes, for the partitioning to be a good option, you would probably have to arrange it such that you can prove that all tuples in a given partition are eligible for deletion (or have already been deleted), and then either truncate or dis-inherit the partition. That still requires a stringent lock, but it is only held for a very short time. > Your are also adding another layer of complexity to the system. I think that using pg_bulkload adds more complexity to the system than partitioning would. I wonder if they really need to use that, or if they just picked it over COPY because it sounded like a free lunch. I've just tested pg_bulkload with the default settings, and it definitely isn't using the fsm to re-use freed space in the table. If they use WRITER = BUFFERED it would, though. Cheers, Jeff
On 6/20/2016 8:03 AM, Scott Mead wrote:
I believe that free space is only available to UPDATE, not INSERT.
incorrect. in fact, an update is performed identically to an INSERT + DELETE(old)
incorrect. in fact, an update is performed identically to an INSERT + DELETE(old)Except for heap-only-tuple optimization, right? We cannot build a HOT chain if the user requests a delete separately since their is no longer an association to trace from the old record.I suspect this affects free space usage to some degree as well but I agree and believe that the reclaimed space is not forbidden to be used (I wouldn't rely on my word though and haven't tried to find relevant documentation).
yeah, HOT only works on updates that don't modify any indexed fields, and only if there's adequate free space in the same block. If you have a update intensive table thats a candidate for HOT, I've been recommending setting that table's fill factor to 50-70% prior to populating it to leave freespace in every block.
-- john r pierce, recycling bits in santa cruz
On 20/06/16 17:25, Melvin Davidson wrote: >>And you haven't read Vik's reply. :) > > Yes I have. Vacuum wll not lock all tables at once, only the ones it is > currently working on, so the planner may have a slight delay, > but it will not be gigantic. I think you should try it. > I have proposed a reasonable solution to solve the problem in it's > entirety. Do you have a better one? You mean by partitioning? That doesn't really solve any problem, except that vacfull-ing a partition should be faster than doing the whole enchilada. -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On 20/06/16 17:25, Melvin Davidson wrote:
>>And you haven't read Vik's reply. :)
>
> Yes I have. Vacuum wll not lock all tables at once, only the ones it is
> currently working on, so the planner may have a slight delay,
> but it will not be gigantic.
I think you should try it.
> I have proposed a reasonable solution to solve the problem in it's
> entirety. Do you have a better one?
You mean by partitioning? That doesn't really solve any problem, except
that vacfull-ing a partition should be faster than doing the whole
enchilada.
Vik,
Your comments make no sense to me.
>Or any SELECT on the parent at all. The planner needs to examine the
>CHECK constraints on the children and can't do it if the child is locked
>in ACCESS EXCLUSIVE mode.
Nowhere in the documentation does it say that the planner needs to take locks
or is even concerned with them. Locks are transient, so they do not figure into
the query plan. If I am wrong, kindly point me to the documentation or url that
shows contrary.
>I think you should try it.
Why would I even attempt that? We do not know the PostgreSQL version or O/S as yet.
I do not have any info regarding table structure or any data. I have given a suggestion
that will probably help solve the problem. I am not here to do any actual work.
>That doesn't really solve any problem, except
>that vacfull-ing a partition should be faster than doing the whole
>enchilada.
That is exactly the point, because based on the original problem description, the
data is transient.
--
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

On Mon, Jun 20, 2016 at 3:18 AM, Job <Job@colliniconsulting.it> wrote: > > Hello, > > we have a table with an heavy traffic of pg_bulkload and delete of records. > The size pass, in only one day, for example for 1Gb to 4Gb and then 1Gb back. > > We have important problems on size and the only way to gain free space is issueing a vacuum full <table>. > But the operation is very slow, sometimes 2/4 hours, and table is not available for services as it is locked. > > We do not delete everything at one (in this case the truncate woudl resolve the problem). > > The autovacuum is not able (same for normal vacuum) to free the spaces. > > Are there some suggestions or another way to manage this? First off, from your comments I'm not sure you really get postgresql's way of freeing space and reusing it via autovacuum. Basically postgresql, autovacuum process marks space as free, and the backend writes new data (inserts or updates) into the free space. You eventually reach equilibrium of a sort when the vacuum is freeing up space as quickly as it's being consumed, or faster. The problem occurs when vacuum can't keep up with your delete / write and update rate combined. If this is happening you need to: A: Make sure your IO Subsystem is fast enough to handle BOTH your update rate AND your vacuuming needed to keep up, You're better off with a machine that can do 15,000 transactions per second running a load of 1,000 than trying to handle it with a machine that can do 1,500 tps etc. Sizing the hardware is a whole other conversation. AND B: Make your autovacuum aggressive enough to NOT fall behind. It's important to remember that autovacuum was built and designed in a time when most databases lived on spinning media. It's designed to not overload spinning discs with too much random IO. A super fast RAID-10 array from that time period could do 200 to 1,000 transactions per second and that only with a top notch RAID controller etc. Regular spinning discs have a maximum random write ops per second that measure in the 100 per second range. My laptop with no pgsql optimizations, can do 850 tps on it's SSD. A server with 10 SSDs in RAID-5 can do 15,000 tps. If you have a fast IO subsystem and wish to utilize it with pgsql you're going to have to examine whether or not autovacuum with default settings is fast enough to keep up. Remmeber, Auto-vacuum, by design, is slow and tries not to get in the way. It's fast enough for most mundane uses, but can't keep up with a fast machine running hard. The default settings for autovacuum to look at here are first these two.: autovacuum_vacuum_cost_delay = 20ms autovacuum_vacuum_cost_limit = 200 They govern how hard autovac works. By default autovac doesn't work hard. Making it work too hard for a given machine can cause system performance issues. I.e. it "gets in the way". Lowering cost_delay is usually enough. As you approach 1ms autovac starts to use a lot more bandwidth. I find that even on pretty fast machines that are routinely doing 1,000 writes per second or more, 3ms is fast enough to keep up with a cost limit of 200. 5ms is a good compromise without getting too aggressive. In contrast to autovacuum, REGULAR vacuum, by default, runs at full throttle. It hits your db server hard, performance-wise. It has zero cost delay, so it works very hard. If you run it midday on a hard working server you will almost certainly see the performance drop. The difference between regular vacuum with a delay time of 0 and autovac with a delay of 20ms is huge. These settings become important if you have a LOT of tables or dbs. Otherwise they're probably fine. autovacuum_max_workers =3 # Adjust this last, unless you have thousands of tables or dbs. autovacuum_naptime = 1 min # How long to wait before checking the next db. Default is usually fine unless you have a lot of dbs. These settings tell autovacuum when to kick in. Keeping these low enough to keep autovac busy is a good idea too: autovacuum_vacuum_scale_factor autovacuum_vacuum_threshold autovacuum_analyze_scale_factor autovacuum_analyze_threshold I tend to go for threshold, which is an absolute number of rows changed before autovac kicks off. Scale factor can be dangerous because what seems small at the beginning, gets big fast. If it's 0.1 then that's 10%. 10% of 1,000 is 100 rows, but 10% of 1,000,000 is 100,000, which is a LOT of rows to ignore until you have more than that that need vacuuming. Setting it to something like 100 or 1,000 will keep your db from growing hundreds of thousands ofr dead tuples in a big table. Either way you need to make sure your autovacuum is aggressive enough to keep up with your db's throughput. Checking for bloat. You can see what parts of your db are getting too big. First, go here: https://www.keithf4.com/checking-for-postgresql-bloat/ The script there will let you check all your tables AND indexes for bloat. This will let you know if you've got a simple space problem or a vacuuming problem. Assuming you DO have bloating, one of the first things you can do is rebuild all the indexes except for PK ones with new indexes then drop the old ones. On some tables this saves a LOT of space. Since you can "create index concurrently ... " this is a non-blocking operation. You can get a list of indexes for a table: select indexdef from pg_indexes where tablename='pgbench_accounts'; Then build "create index concurrently ..." statements for each one. Assuming the table is bloated and you HAVE to recover space for normal operation (let's say you've got 80MB of data in a 200GB table etc)... I'm gonna head in a different direction here. Slony. Slony can subscribe your table, or your whole db, up to you, to a new one, either on a different machine or on the same machine. It's actually pretty easy to set it up and subscribe one table, get it caught up, drop access to db, swap tables, and bring access to the db back up. The downtime is measured in seconds. If one table one time is all you need that's fine but you can also use it to setup a replica of the bloated machine, sans bloat, and switchover the whole db operation to another machine / db.
Hello, very interesting comments and contributions, thank you. >I've just tested pg_bulkload with the default settings, and it >definitely isn't using the fsm to re-use freed space in the table. If >they use WRITER = BUFFERED it would, though. So with WRITER = BUFFERED it should be slower but free-marked space should be reused again? Thank you! Francesco ________________________________________ Da: Jeff Janes [jeff.janes@gmail.com] Inviato: lunedì 20 giugno 2016 17.51 A: Martín Marqués Cc: Melvin Davidson; Rakesh Kumar; Job; pgsql-general@postgresql.org Oggetto: Re: R: [GENERAL] Vacuum full: alternatives? On Mon, Jun 20, 2016 at 7:23 AM, Martín Marqués <martin@2ndquadrant.com> wrote: > El 20/06/16 a las 09:50, Melvin Davidson escribió: >> >> >>>but it won't let it grow too (or am I missing something). >> >> Yes, you are missing something. By partioning and {Vacuum Full only the >> table with data no longer needed}, the rest of the data remains >> available to the users >> AND space is reclaimed by the O/S, so it's the best of both worlds. > > That's not entirely true. Think about a SELECT which has to scan all > child tables. > Yes, for the partitioning to be a good option, you would probably have to arrange it such that you can prove that all tuples in a given partition are eligible for deletion (or have already been deleted), and then either truncate or dis-inherit the partition. That still requires a stringent lock, but it is only held for a very short time. > Your are also adding another layer of complexity to the system. I think that using pg_bulkload adds more complexity to the system than partitioning would. I wonder if they really need to use that, or if they just picked it over COPY because it sounded like a free lunch. Cheers, Jeff
Excellent Scott! Thank you! Francesco ________________________________________ Da: Scott Marlowe [scott.marlowe@gmail.com] Inviato: martedì 21 giugno 2016 2.06 A: Job Cc: pgsql-general@postgresql.org Oggetto: Re: [GENERAL] Vacuum full: alternatives? On Mon, Jun 20, 2016 at 3:18 AM, Job <Job@colliniconsulting.it> wrote: > > Hello, > > we have a table with an heavy traffic of pg_bulkload and delete of records. > The size pass, in only one day, for example for 1Gb to 4Gb and then 1Gb back. > > We have important problems on size and the only way to gain free space is issueing a vacuum full <table>. > But the operation is very slow, sometimes 2/4 hours, and table is not available for services as it is locked. > > We do not delete everything at one (in this case the truncate woudl resolve the problem). > > The autovacuum is not able (same for normal vacuum) to free the spaces. > > Are there some suggestions or another way to manage this? First off, from your comments I'm not sure you really get postgresql's way of freeing space and reusing it via autovacuum. Basically postgresql, autovacuum process marks space as free, and the backend writes new data (inserts or updates) into the free space. You eventually reach equilibrium of a sort when the vacuum is freeing up space as quickly as it's being consumed, or faster. The problem occurs when vacuum can't keep up with your delete / write and update rate combined. If this is happening you need to: A: Make sure your IO Subsystem is fast enough to handle BOTH your update rate AND your vacuuming needed to keep up, You're better off with a machine that can do 15,000 transactions per second running a load of 1,000 than trying to handle it with a machine that can do 1,500 tps etc. Sizing the hardware is a whole other conversation. AND B: Make your autovacuum aggressive enough to NOT fall behind. It's important to remember that autovacuum was built and designed in a time when most databases lived on spinning media. It's designed to not overload spinning discs with too much random IO. A super fast RAID-10 array from that time period could do 200 to 1,000 transactions per second and that only with a top notch RAID controller etc. Regular spinning discs have a maximum random write ops per second that measure in the 100 per second range. My laptop with no pgsql optimizations, can do 850 tps on it's SSD. A server with 10 SSDs in RAID-5 can do 15,000 tps. If you have a fast IO subsystem and wish to utilize it with pgsql you're going to have to examine whether or not autovacuum with default settings is fast enough to keep up. Remmeber, Auto-vacuum, by design, is slow and tries not to get in the way. It's fast enough for most mundane uses, but can't keep up with a fast machine running hard. The default settings for autovacuum to look at here are first these two.: autovacuum_vacuum_cost_delay = 20ms autovacuum_vacuum_cost_limit = 200 They govern how hard autovac works. By default autovac doesn't work hard. Making it work too hard for a given machine can cause system performance issues. I.e. it "gets in the way". Lowering cost_delay is usually enough. As you approach 1ms autovac starts to use a lot more bandwidth. I find that even on pretty fast machines that are routinely doing 1,000 writes per second or more, 3ms is fast enough to keep up with a cost limit of 200. 5ms is a good compromise without getting too aggressive. In contrast to autovacuum, REGULAR vacuum, by default, runs at full throttle. It hits your db server hard, performance-wise. It has zero cost delay, so it works very hard. If you run it midday on a hard working server you will almost certainly see the performance drop. The difference between regular vacuum with a delay time of 0 and autovac with a delay of 20ms is huge. These settings become important if you have a LOT of tables or dbs. Otherwise they're probably fine. autovacuum_max_workers =3 # Adjust this last, unless you have thousands of tables or dbs. autovacuum_naptime = 1 min # How long to wait before checking the next db. Default is usually fine unless you have a lot of dbs. These settings tell autovacuum when to kick in. Keeping these low enough to keep autovac busy is a good idea too: autovacuum_vacuum_scale_factor autovacuum_vacuum_threshold autovacuum_analyze_scale_factor autovacuum_analyze_threshold I tend to go for threshold, which is an absolute number of rows changed before autovac kicks off. Scale factor can be dangerous because what seems small at the beginning, gets big fast. If it's 0.1 then that's 10%. 10% of 1,000 is 100 rows, but 10% of 1,000,000 is 100,000, which is a LOT of rows to ignore until you have more than that that need vacuuming. Setting it to something like 100 or 1,000 will keep your db from growing hundreds of thousands ofr dead tuples in a big table. Either way you need to make sure your autovacuum is aggressive enough to keep up with your db's throughput. Checking for bloat. You can see what parts of your db are getting too big. First, go here: https://www.keithf4.com/checking-for-postgresql-bloat/ The script there will let you check all your tables AND indexes for bloat. This will let you know if you've got a simple space problem or a vacuuming problem. Assuming you DO have bloating, one of the first things you can do is rebuild all the indexes except for PK ones with new indexes then drop the old ones. On some tables this saves a LOT of space. Since you can "create index concurrently ... " this is a non-blocking operation. You can get a list of indexes for a table: select indexdef from pg_indexes where tablename='pgbench_accounts'; Then build "create index concurrently ..." statements for each one. Assuming the table is bloated and you HAVE to recover space for normal operation (let's say you've got 80MB of data in a 200GB table etc)... I'm gonna head in a different direction here. Slony. Slony can subscribe your table, or your whole db, up to you, to a new one, either on a different machine or on the same machine. It's actually pretty easy to set it up and subscribe one table, get it caught up, drop access to db, swap tables, and bring access to the db back up. The downtime is measured in seconds. If one table one time is all you need that's fine but you can also use it to setup a replica of the bloated machine, sans bloat, and switchover the whole db operation to another machine / db.