Re: Vacuum: allow usage of more than 1GB of work mem - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Vacuum: allow usage of more than 1GB of work mem |
Date | |
Msg-id | CA+Tgmobq1gO2TWygYf9mLVw_5ZKy8OGuxnAxT6efG_tS0xn4Vg@mail.gmail.com Whole thread Raw |
In response to | Re: Vacuum: allow usage of more than 1GB of work mem (Pavan Deolasee <pavan.deolasee@gmail.com>) |
Responses |
Re: Vacuum: allow usage of more than 1GB of work mem
Re: Vacuum: allow usage of more than 1GB of work mem Re: Vacuum: allow usage of more than 1GB of work mem Re: Vacuum: allow usage of more than 1GB of work mem Re: Vacuum: allow usage of more than 1GB of work mem Re: Vacuum: allow usage of more than 1GB of work mem |
List | pgsql-hackers |
On Wed, Sep 14, 2016 at 8:16 AM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote: > Ah, thanks. So MaxHeapTuplesPerPage sets the upper boundary for the per page > bitmap size. Thats about 36 bytes for 8K page. IOW if on an average there > are 6 or more dead tuples per page, bitmap will outperform the current > representation, assuming max allocation for bitmap. If we can use additional > estimates to restrict the size to somewhat more conservative value and then > keep overflow area, then probably the break-even happens even earlier than > that. I hope this gives us a good starting point, but let me know if you > think it's still a wrong approach to pursue. Well, it's certainly a bigger change. I think the big concern is that the amount of memory now becomes fixed based on the table size. So one problem is that you have to figure out what you're going to do if the bitmap doesn't fit in maintenance_work_mem. A related problem is that it might fit but use more memory than before, which could cause problems for some people. Now on the other hand it could also use less memory for some people, and that would be good. I am kind of doubtful about this whole line of investigation because we're basically trying pretty hard to fix something that I'm not sure is broken. I do agree that, all other things being equal, the TID lookups will probably be faster with a bitmap than with a binary search, but maybe not if the table is large and the number of dead TIDs is small, because cache efficiency is pretty important. But even if it's always faster, does TID lookup speed even really matter to overall VACUUM performance? Claudio's early results suggest that it might, but maybe that's just a question of some optimization that hasn't been done yet. I'm fairly sure that our number one priority should be to minimize the number of cases where we need to do multiple scans of the indexes to stay within maintenance_work_mem. If we're satisfied we've met that goal, then within that we should try to make VACUUM as fast as possible with as little memory usage as possible. I'm not 100% sure I know how to get there, or how much work it's worth expending. In theory we could even start with the list of TIDs and switch to the bitmap if the TID list becomes larger than the bitmap would have been, but I don't know if it's worth the effort. /me thinks a bit. Actually, I think that probably *is* worthwhile, specifically because it might let us avoid multiple index scans in cases where we currently require them. Right now, our default maintenance_work_mem value is 64MB, which is enough to hold a little over ten million tuples. It's also large enough to hold a bitmap for a 14GB table. So right now if you deleted, say, 100 tuples per page you would end up with an index vacuum cycles for every ~100,000 pages = 800MB, whereas switching to the bitmap representation for such cases would require only one index vacuum cycle for every 14GB, more than an order of magnitude improvement! On the other hand, if we switch to the bitmap as the ONLY possible representation, we will lose badly when there are scattered updates - e.g. 1 deleted tuple every 10 pages. So it seems like we probably want to have both options. One tricky part is figuring out how we switch between them when memory gets tight; we have to avoid bursting above our memory limit while making the switch. And even if our memory limit is very high, we want to avoid using memory gratuitously; I think we should try to grow memory usage incrementally with either representation. For instance, one idea to grow memory usage incrementally would be to store dead tuple information separately for each 1GB segment of the relation. So we have an array of dead-tuple-representation objects, one for every 1GB of the relation. If there are no dead tuples in a given 1GB segment, then this pointer can just be NULL. Otherwise, it can point to either the bitmap representation (which will take ~4.5MB) or it can point to an array of TIDs (which will take 6 bytes/TID). That could handle an awfully wide variety of usage patterns efficiently; it's basically never worse than what we're doing today, and when the dead tuple density is high for any portion of the relation it's a lot better. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: