Home > mailing lists

Re: maintenance_work_mem used by Vacuum - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: maintenance_work_mem used by Vacuum
Date	October 7, 2019 19:27:43
Msg-id	CA+TgmoZD1xBi8ra6k72qs=hvnYFgp_QQLagz6dYa6NPqUDwAKw@mail.gmail.com Whole thread Raw
In response to	maintenance_work_mem used by Vacuum (Amit Kapila <amit.kapila16@gmail.com>)
Responses	Re: maintenance_work_mem used by Vacuum Re: maintenance_work_mem used by Vacuum
List	pgsql-hackers

Tree view

On Sun, Oct 6, 2019 at 6:55 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> As per docs [1] (see maintenance_work_mem), the maximum amount of memory used by the Vacuum command must be no more
thanmaintenance_work_mem. However, during the review/discussion of the "parallel vacuum" patch [2], we observed that
itis not true. Basically, if there is a gin index defined on a table, then the vacuum on that table can consume up to
2* maintenance_work_mem memory space. The vacuum can use maintenance_work_mem memory space to keep track of dead
tuplesand another maintenance_work_mem memory space to move tuples from pending pages into regular GIN structure (see
ginInsertCleanup). The behavior related to Gin index consuming extra maintenance_work_mem memory is introduced by
commit e2c79e14d998cd31f860854bc9210b37b457bb01. It is not clear to me if this is acceptable behavior and if so,
shouldn'twe document it?

I would say that sucks, because it makes it harder to set
maintenance_work_mem correctly. Not sure how hard it would be to fix,
though.

> We wanted to decide how a parallel vacuum should use memory? Can each worker consume maintenance_work_mem to clean
upthe gin Index or all workers should use no more than maintenance_work_mem? We were thinking of later but before we
decidewhat is the right behavior for parallel vacuum, I thought it is better to once discuss if the current memory
usagemodel is right.

Well, I had the idea when we were developing parallel query that we
should just ignore the problem of work_mem: every node can use X
amount of work_mem, and if there are multiple copies of the node in
multiple processes, then you probably end up using more memory. I
have been informed by Thomas Munro -- in very polite terminology --
that this was a terrible decision which is causing all kinds of
problems for users. I haven't actually encountered that situation
myself, but I don't doubt that it's an issue.

I think it's a lot easier to do better when we're talking about
maintenance commands rather than queries. Maintenance operations
typically don't have the problem that queries do with an unknown
number of nodes using memory; you typically know all of your memory
needs up front. So it's easier to budget that out across workers or
whatever. It's a little harder in this case, because you could have
any number of GIN indexes (1 to infinity) and the amount of memory you
can use depends on not only on how many of them there are but,
presumably also, the number of those that are going to be vacuumed at
the same time. So you might have 8 indexes, 3 workers, and 2 of the
indexes are GIN. In that case, you know that you can't have more than
2 GIN indexes being processed at the same time, but it's likely to be
only one, and maybe with proper scheduling you could make it sure it's
only one. On the other hand, if you dole out the memory assuming it's
only 1, what happens if you start that one, then process all 6 of the
non-GIN indexes, and that one isn't done yet. I guess you could wait
to start cleanup on the other GIN indexes until the previous index
cleanup finishes, but that kinda sucks too. So I'm not really sure how
to handle this particular case. I think the principle of dividing up
the memory rather than just using more is probably a good one, but
figuring out exactly how that should work seems tricky.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Tom Lane
Date: 07 October 2019, 19:06:35
Subject: Re: [HACKERS] Deadlock in XLogInsert at AIX

From: Tomas Vondra
Date: 07 October 2019, 19:40:22
Subject: Re: Transparent Data Encryption (TDE) and encrypted files

Re: maintenance_work_mem used by Vacuum - Mailing list pgsql-hackers

Previous

Next