Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation) - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation) |
Date | |
Msg-id | CA+TgmoY0pB4qxgOH=bD_VfvLktj8f2w58s2tYF_NEaj2QJdNxQ@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation) (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation) |
List | pgsql-hackers |
On Tue, Mar 21, 2017 at 3:50 PM, Peter Geoghegan <pg@bowt.ie> wrote: > On Tue, Mar 21, 2017 at 12:06 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> From my point of view, the main point is that having two completely >> separate mechanisms for managing temporary files that need to be >> shared across cooperating workers is not a good decision. That's a >> need that's going to come up over and over again, and it's not >> reasonable for everybody who needs it to add a separate mechanism for >> doing it. We need to have ONE mechanism for it. > > Obviously I understand that there is value in code reuse in general. > The exact extent to which code reuse is possible here has been unclear > throughout, because it's complicated for all kinds of reasons. That's > why Thomas and I had 2 multi-hour Skype calls all about it. I agree that the extent to which code reuse is possible here is somewhat unclear, but I am 100% confident that the answer is non-zero. You and Thomas both need BufFiles that can be shared across multiple backends associated with the same ParallelContext. I don't understand how you can argue that it's reasonable to have two different ways of sharing the same kind of object across the same set of processes. And if that's not reasonable, then somehow we need to come up with a single mechanism that can meet both your requirements and Thomas's requirements. >> It's just not OK in my book for a worker to create something that it >> initially owns and then later transfer it to the leader. > > Isn't that an essential part of having a refcount, in general? You > were the one that suggested refcounting. No, quite the opposite. My point in suggesting adding a refcount was to avoid needing to have a single owner. Instead, the process that decrements the reference count to zero becomes responsible for doing the cleanup. What you've done with the ref count is use it as some kind of medium for transferring responsibility from backend A to backend B; what I want is to allow backends A, B, C, D, E, and F to attach to the same shared resource, and whichever one of them happens to be the last one out of the room shuts off the lights. >> The cooperating backends should have joint ownership of the objects from >> the beginning, and the last process to exit the set should clean up >> those resources. > > That seems like a facile summary of the situation. There is a sense in > which there is always joint ownership of files with my design. But > there is also a sense is which there isn't, because it's impossible to > do that while not completely reinventing resource management of temp > files. I wanted to preserve resowner.c ownership of fd.c segments. As I've said before, I think that's an anti-goal. This is a different problem, and trying to reuse the solution we chose for the non-parallel case doesn't really work. resowner.c could end up owning a shared reference count which it's responsible for decrementing -- and then decrementing it removes the file if the result is zero. But it can't own performing the actual unlink(), because then we can't support cases where the file may have multiple readers, since whoever owns the unlink() might try to zap the file out from under one of the others. > You maintain that it's better to have the leader unlink() everything > at the end, and suppress the errors when that doesn't work, so that > that path always just plows through. I don't want the leader to be responsible for anything. I want the last process to detach to be responsible for cleanup, regardless of which process that ends up being. I want that for lots of good reasons which I have articulated including (1) it's how all other resource management for parallel query already works, e.g. DSM, DSA, and group locking; (2) it avoids the need for one process to sit and wait until another process assumes ownership, which isn't a feature even if (as you contend, and I'm not convinced) it doesn't hurt much; and (3) it allows for use cases where multiple processes are reading from the same shared BufFile without the risk that some other process will try to unlink() the file while it's still in use. The point for me isn't so much whether unlink() ever ignores errors as whether cleanup (however defined) is an operation guaranteed to happen exactly once. > I disagree with that. It is a > trade-off, I suppose. I have now run out of time to work through it > with you or Thomas, though. Bummer. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: