Re: hyrax vs. RelationBuildPartitionDesc - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: hyrax vs. RelationBuildPartitionDesc |
Date | |
Msg-id | 20190517195929.uey6hadbgazplm3h@alap3.anarazel.de Whole thread Raw |
In response to | Re: hyrax vs. RelationBuildPartitionDesc (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: hyrax vs. RelationBuildPartitionDesc
|
List | pgsql-hackers |
Hi, On 2019-05-01 13:09:07 -0400, Robert Haas wrote: > The original issue on this thread was that hyrax started running out > of memory when it hadn't been doing so previously. That happened > because, for complicated reasons, commit > 898e5e3290a72d288923260143930fb32036c00c resulted in the leak being > hit lots of times in CLOBBER_CACHE_ALWAYS builds instead of just once. > Commits 2455ab48844c90419714e27eafd235a85de23232 and > d3f48dfae42f9655425d1f58f396e495c7fb7812 fixed that problem. > > In the email at issue, Tom complains about two things. First, he > complains about the fact that I try to arrange things so that relcache > data remains valid for as long as required instead of just copying it. > Second, he complains about the fact repeated ATTACH and DETACH > PARTITION operations can produce a slow session-lifespan memory leak. > > I think it's reasonable to fix the second issue for v12. I am not > sure how important it is, because (1) the leak is small, (2) it seems > unlikely that anyone would execute enough ATTACH/DETACH PARTITION > operations in one backend for the leak to amount to anything > significant, and (3) if a relcache flush ever happens when you don't > have the relation open, all of the "leaked" memory will be un-leaked. > However, I believe that we could fix it as follows. First, invent > rd_pdoldcxt and put the first old context there; if that pointer is > already in use, then parent the new context under the old one. > Second, in RelationDecrementReferenceCount, if the refcount hits 0, > nuke rd_pdoldcxt and set the pointer back to NULL. With that change, > you would only keep the old PartitionDesc around until the ref count > hits 0, whereas at present, you keep the old PartitionDesc around > until an invalidation happens while the ref count is 0. That sounds roughly reasonably. Tom, any objections? I think it'd be good if we fixed this soon. > I think the first issue is not v12 material. Tom proposed to fix it > by copying all the relevant data out of the relcache, but his own > performance results show a pretty significant hit, Yea, the numbers didn't look very convincing. > and AFAICS he > hasn't pointed to anything that's actually broken by the current > approach. What I think should be done is actually generalize the > approach I took in this patch, so that instead of the partition > directory holding a reference count, the planner or executor holds > one. Then not only could people who want information about the > PartitionDesc avoid copying stuff from the relcache, but maybe other > things as well. I think that would be significantly better than > continuing to double-down on the current copy-everything approach, > which really only works well in a world where a table can't have all > that much metadata, which is clearly not true for PostgreSQL any more. > I'm not sure that Tom is actually opposed to this approach -- although > I may have misunderstood his position -- but where we disagree, I > think, is that I see what I did in this commit as a stepping-stone > toward a better world, and he sees it as something that should be > killed with fire until that better world has fully arrived. Tom, are you ok with deferring further work here for v13? Greetings, Andres Freund
pgsql-hackers by date: