Re: [BUG] orphaned function - Mailing list pgsql-hackers
From | Drouvot, Bertrand |
---|---|
Subject | Re: [BUG] orphaned function |
Date | |
Msg-id | 03a3a7f2-4c67-7bb2-cad2-8ec7b9800942@amazon.com Whole thread Raw |
In response to | Re: [BUG] orphaned function (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: [BUG] orphaned function
|
List | pgsql-hackers |
Hi, + Jim and Nathan. On 12/18/20 12:26 AM, Tom Lane wrote: > Gilles Darold <gilles@darold.net> writes: >> The same problem applies if the returned type or the procedural language >> is dropped. I have tried to fix that in ProcedureCreate() by using an >> AccessSharedLock for the data types and languages involved in the >> function declaration. With this patch the race condition with parameters >> types, return types and PL languages are fixed. Only data types and >> procedural languages with Oid greater than FirstBootstrapObjectId will >> be locked locked. But this is probably more complex than this fix so it >> is proposed as a separate patch >> (v1-003-orphaned_function_types_language.patch) to not interfere with >> the applying of Bertran's patch. > Indeed. This points up one of the things that is standing in the way > of any serious consideration of applying this patch. To my mind there > are two fundamental, somewhat interrelated considerations that haven't > been meaningfully addressed: > > 1. What set of objects is it worth trying to remove this race condition > for, and with respect to what dependencies? Bertrand gave no > justification for worrying about function-to-namespace dependencies > specifically, I focused on this one because this is the most (if not the unique) one we have observed so far. Those orphaned functions are breaking stuff like pg_dump, pg_upgrade and that's how we discovered them. That being said, I agree that there is no reason to focus only on this ones and we should better try to "fix" them all. > and you've likewise given none for expanding the patch > to consider function-to-datatype dependencies. There are dozens more > cases that could be considered; but I sure don't want to be processing > another ad-hoc fix every time someone thinks they're worried about > another specific case. > > Taking a practical viewpoint, I'm far more worried about dependencies > of tables than those of any other kind of object. A messed-up function > definition doesn't cost you much: you can just drop and recreate the > function. A table full of data is way more trouble to recreate, and > indeed the data might be irreplaceable. So it seems pretty silly to > propose that we try to remove race conditions for functions' dependencies > on datatypes without removing the same race condition for tables' > dependencies on their column datatypes. > > But any of these options lead to the same question: why stop there? > An approach that would actually be defensible, perhaps, is to incorporate > this functionality into the dependency mechanism: any time we're about to > create a pg_depend or pg_shdepend entry, take out an AccessShareLock on > the referenced object, and then check to make sure it still exists. > This would improve the dependency mechanism so it prevents creation-time > race conditions, not just attempts to drop dependencies of > already-committed objects. It would mean that the one patch would be > the end of the problem, rather than looking forward to a steady drip of > new fixes. Agree that working with pg_depend and pg_shdepend is the way to go. Instead of using the locking machinery (and then make the one that would currently produce the orphaned object waiting), Jim proposed another approach: make use of special snapshot (like a dirty one depending of the case). That way instead of locking we could instead report an error, something like this: postgres=# drop schema tobeorph; ERROR: cannot drop schema tobeorph because other objects that are currently under creation depend on it DETAIL: function under creation tobeorph.bdttime() depends on schema tobeorph > > 2. How much are we willing to pay for this added protection? The fact > that we've gotten along fine without it for years suggests that the > answer needs to be "not much". But I'm not sure that that actually > is the answer, especially if we don't have a policy that says "we'll > protect against these race conditions but no other ones". I think > there are possibly-serious costs in three different areas: > > * Speed. How much do all these additional lock acquisitions slow > down a typical DDL command? > > * Number of locks taken per transaction. This'd be particularly an > issue for pg_restore runs using single-transaction mode: they'd take > not only locks on the objects they create, but also a bunch of > reference-protection locks. It's not very hard to believe that this'd > make a difference in whether restoring a large database is possible > without increasing max_locks_per_transaction. > > * Risk of deadlock. The reference locks themselves should be sharable, > so maybe there isn't much of a problem, but I want to see this question > seriously analyzed not just ignored. All of this should be mitigated with the "special snapshot" approach. > > Obviously, the magnitude of these costs depends a lot on how many > dependencies we want to remove the race condition for. But that's > exactly the reason why I don't want a piecemeal approach of fixing > some problems now and some other problems later. That's way too > much like the old recipe for boiling a frog: we could gradually get > into serious performance problems without anyone ever having stopped > to consider the issue. > > In short, I think we should either go for a 100% solution if careful > analysis shows we can afford it, or else have a reasoned policy > why we are going to close these specific race conditions and no others > (implying that we'll reject future patches in the same area). Agree. What about taking the 100% solution way with the "special snapshot" approach? If you think that could make sense then we (Jim, Nathan and I) can work on a patch with this approach and come back with a proposal. Bertrand
pgsql-hackers by date: