Re: [HACKERS] on_dsm_detach() callback and parallel tuplesort BufFile resource management - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: [HACKERS] on_dsm_detach() callback and parallel tuplesort BufFile resource management |
Date | |
Msg-id | CAH2-Wz=W8ZCYMTbNKqikLf+YftQvFen45WACu=PLMM41toZAMg@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] on_dsm_detach() callback and parallel tuplesort BufFile resource management (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: [HACKERS] on_dsm_detach() callback and parallel tuplesort BufFile resource management
Re: [HACKERS] on_dsm_detach() callback and parallel tuplesort BufFile resource management |
List | pgsql-hackers |
On Wed, Mar 8, 2017 at 5:23 PM, Robert Haas <robertmhaas@gmail.com> wrote: > I think something like 0007-hj-shared-buf-file-v6.patch from > https://www.postgresql.org/message-id/CAEepm=3g=dMG+84083fkFzLvgMJ7HdhbGB=AeZABNukbZm3hpA@mail.gmail.com > is probably a good approach to this problem. In essence, it dodges > the problem of trying to transfer ownership by making ownership be > common from the beginning. That's what I've been recommending, or > trying to, anyway. I think that I'm already doing that, to at least the same extent that 0007-hj-shared-buf-file-v6.patch is. That patch seems to be solving the problem by completely taking over management of temp files from fd.c. That is, these temp files are not marked as temp files in the way ordinary temp BufFiles are, with explicit buy-in from resowner.c about their temp-ness. It adds "#include "catalog/pg_tablespace.h" to buffile.c, even though that kind of thing generally lives within fd.c. It does use PG_TEMP_FILE_PREFIX, but never manages to use temp_tablespaces if that's set by user. It also doesn't do anything about temp_file_limit enforcement. Quite a lot of thought seems to have gone into making the fd.c/resowner mechanism leak-proof in the face of errors. So, quite apart from what that approaches misses out on, I really don't want to change resource management too much. As I went into in my "recap", it seems useful to change as little as possible about temp files. Besides, because parallel hash join cannot know the size of the BufFiles from workers in advance, and thus cannot replicate fd.c fully by having state for each segment, it ends up actually *relying on* ENOENT-on-unlink() as a condition that terminates execution: > +bool > +BufFileDeleteShared(Oid tablespace, pid_t pid, int set, int partition, > + int participant) > +{ > + char tempdirpath[MAXPGPATH]; > + char tempfilepath[MAXPGPATH]; > + int segment = 0; > + bool found = false; > + > + /* > + * We don't know if the BufFile really exists, because SharedBufFile > + * tracks only the range of file numbers. If it does exists, we don't > + * khow many 1GB segments it has, so we'll delete until we hit ENOENT or > + * an IO error. > + */ > + for (;;) > + { > + make_shareable_path(tempdirpath, tempfilepath, > + tablespace, pid, set, partition, > + participant, segment); > + if (!PathNameDelete(tempfilepath)) > + break; > + found = true; > + ++segment; > + } > + > + return found; > +} However, the whole point of my fortifying the refcount mechanism for V9 of parallel tuplesort is to not have to ignore a ENOENT like this, on the grounds that that's ugly/weird (I pointed this out when I posted my V8, actually). Obviously I could very easily teach fd.c not to LOG a complaint about an ENOENT on unlink() for the relevant parallel case, on the assumption that it's only down to an error during the brief period of co-ownership of a Buffile at the start of leader unification. Is that acceptable? Anyway, the only advantage I immediately see with the approach 0007-hj-shared-buf-file-v6.patch takes (over what I've provisionally written as my V9) is that by putting everything in shared memory all along, there is no weirdness with tying local memory clean-up to a shared memory on_dsm_detach() callback. As I said, stashing a local pointer for the leader in shared memory is weird, even if it accomplishes the stated goals for V9. -- Peter Geoghegan
pgsql-hackers by date: