Re: Hash Indexes - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Hash Indexes |
Date | |
Msg-id | CA+TgmoZ64uTdvH8fe8LLo14qQUc8byruVALfgFfOc15S3wzE8A@mail.gmail.com Whole thread Raw |
In response to | Re: Hash Indexes (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Hash Indexes
Re: Hash Indexes |
List | pgsql-hackers |
On Tue, Oct 4, 2016 at 12:36 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > I think one way to avoid the risk of deadlock in above scenario is to > take the cleanup lock conditionally, if we get the cleanup lock then > we will delete the items as we are doing in patch now, else it will > just mark the tuples as dead and ensure that it won't try to remove > tuples that are moved-by-split. Now, I think the question is how will > these dead tuples be removed. We anyway need a separate mechanism to > clear dead tuples for hash indexes as during scans we are marking the > tuples as dead if corresponding tuple in heap is dead which are not > removed later. This is already taken care in btree code via > kill_prior_tuple optimization. So I think clearing of dead tuples can > be handled by a separate patch. That seems like it could work. The hash scan code will need to be made smart enough to ignore any tuples marked dead, if it isn't already. More aggressive cleanup can be left for another patch. > I have also thought about using page-scan-at-a-time idea which has > been discussed upthread[1], but I think we can't completely eliminate > the need to out-wait scans (cleanup lock requirement) for scans that > are started when split-in-progress or for non-MVCC scans as described > in that e-mail [1]. We might be able to find some way to solve the > problem with this approach, but I think it will be slightly > complicated and much more work is required as compare to previous > approach. There are several levels of aggressiveness here with different locking requirements: 1. Mark line items dead without reorganizing the page. Needs an exclusive content lock, no more. Even a shared content lock may be OK, as for other opportunistic bit-flipping. 2. Mark line items dead and compact the tuple data. If a pin is sufficient to look at tuple data, as it is for the heap, then a cleanup lock is required here. But if we always hold a shared content lock when looking at the tuple data, it might be possible to do this with just an exclusive content lock. 3. Remove dead line items completely, compacting the tuple data and the item-pointer array. Doing this with only an exclusive content lock certainly needs page-at-a-time mode because otherwise a searcher that resumes a scan later might resume from the wrong place. It also needs the guarantee mentioned for point #2, namely that nobody will be examining the tuple data without a shared content lock. 4. Squeezing the bucket. This is probably always going to require a cleanup lock, because otherwise it's pretty unclear how a concurrent scan could be made safe. I suppose the scan could remember every TID it has seen, somehow detect that a squeeze had happened, and rescan the whole bucket ignoring TIDs already returned, but that seems to require the client to use potentially unbounded amounts of memory to remember already-returned TIDs, plus an as-yet-uninvented mechanism for detecting that a squeeze has happened. So this seems like a dead-end to me. I think that it is very much worthwhile to reduce the required lock strength from cleanup-lock to exclusive-lock in as many cases as possible, but I don't think it will be possible to completely eliminate the need to take the cleanup lock in some cases. However, if we can always take the cleanup lock conditionally and never be in a situation where it's absolutely required, we should be OK - and even level (1) gives you that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: