Visibility map thoughts - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Visibility map thoughts |
Date | |
Msg-id | 472EE7F6.4070005@enterprisedb.com Whole thread Raw |
Responses |
Re: Visibility map thoughts
Re: Visibility map thoughts Re: Visibility map thoughts Re: Visibility map thoughts |
List | pgsql-hackers |
Though 8.3 isn't out of the oven just yet, I've been thinking about the dead space map a lot, and decided I have to start writing down those thoughts. Reducing VACUUM time is important, but the real big promise is the ability to do index-only-scans. Because that's the main focus of this exercise, I'm calling it the the Visibility Map from now on, because it's not about tracking dead space, but tuple visibility in general. Don't worry, reduced VACUUM times on read-mostly tables with hot spots will still fall out of it. What to store in the Visibility Map? ------------------------------------ Each potential user of the visibility map needs slightly different information: a) VACUUM needs to know if a page has enough dead tuples that it's interesting to visit. b) VACUUM FREEZE needs to know if a page has any non-frozen tuples that can be frozen. c) Index-only-scans needs to know if all tuples on a page are visible to all transactions. HOT updates are OK, though. From this point on, this document concentrates on a map suited for index-only-scans. The basic structure is a bitmap with one bit per heap page. A set bit means "all tuples on heap page are visible to all transactions". A cleared bit means that we don't know if that's true or not. This kind of map is useful for VACUUM as well, though VACUUM could get away with less strict semantics, and we might not want to visit a page in VACUUM if there's only very few dead tuples on a page. What this implementation gives us is a conservative VACUUM that will always find all the same dead tuples as a regular VACUUM (except for HOT updated tuples which can be pruned without scanning indexes). VACUUM using the visibility map can't update stats, so we'd still want regular vacuums every now and then, or rely on ANALYZE to do that. It's not useful for VACUUM FREEZE, unless we're willing to freeze much more aggressively, and change the meaning of a set bit to "all tuples on heap page are frozen". Other kinds of visibility maps could be useful in some narrow scenarios. For example, if we had a map of pages that have no live tuples, we could skip those pages in sequential scans. We could also use more than one bit per page to store more information, but let's keep it simple for now. Where to store the visibility map? ---------------------------------- a) In a fixed size shared memory struct. Not acceptable. b) In the special area of every nth heap page. I played a bit with this back in February 2006, because it's simple and requires few changes. c) As a new relation kind, like toast tables. d) As an auxiliary smgr relation, attached to the heap. I'm leaning towards D at the moment. It requires a little bit of changes to the buffer manager API and elsewhere, but not too much. B is not good because we might want to have other kinds of auxiliary files in the future (FSM in particular). And it is a bit hacky anyway. Let's do something more generic. C doesn't seem like the right approach, because a visibility map doesn't really have much in common with "real" relations. In any case, the bitmap is divided into pages, and the pages live in shared_buffers, and are managed by the buffer manager like any other page. Updating the visibility map --------------------------- A single page in the visibility map covers a wide range of heap pages ((8192 - page header) * 8 = ~65000 heap pages). It can easily become locking bottleneck, if every update on any page in that range needs to lock the visibility map page. Setting a bit is just a hint. It's ok to lose it on crash. However, setting a bit mustn't hit the disk too early. What might otherwise happen is that the change that made all tuples on a page visible, like committing an inserting transaction, isn't replayed after crash, but the set bit is already on disk. In other words, setting a bit must set the LSN of the visibility map page. Clearing a bit is the opposite. Clearing a bit mustn't be lost, but it's ok if it hits the disk before the operation that cleared it. Therefore we don't need to set the LSN, but it must be redone at WAL replay of heap insert/update/delete. Because a concurrent update on the same word might be lost if we didn't lock the page at all, we need to lock the visibility map page to set or clear a bit. Since these are quick operations, especially clearing a bit, perhaps we could use just the buffer header spinlock. To further reduce contention, we can have a copy of the bit in the page header of the heap page itself. That way we'd only need to access the visibility map on the first update on a page that clears a bit. When to update the visibility map? ---------------------------------- - Clear bit on every update/insert/delete. - Set bit in vacuum or prune, if all tuples on page are now visible to all transactions. - If prune sees that all tuples are LIVE or INSERT_IN_PROGRESS, we can't set the bit yet, but as soon as the maximum xmin on the page < OldestXmin, we can. Call PageSetPrunable accordingly, to prune again after that happens. - We don't need to clear the bit on HOT updates, because by definition none of the indexed columns changed. Next we need to figure out how to actually do the index-only-scans, using the visibility map. And how to use it for VACUUMs; Itagaki's dsm patch addressed that, and assuming it does what we want, that part of the patch is still usable. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
pgsql-hackers by date: