Re: group locking: incomplete patch, just for discussion - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: group locking: incomplete patch, just for discussion |
Date | |
Msg-id | CA+TgmoYhSq+xw_XeyEBZa_3zHxOUt3G6yg9PsPFMaKMLxZ8Yqg@mail.gmail.com Whole thread Raw |
In response to | Re: group locking: incomplete patch, just for discussion (Jeff Davis <pgsql@j-davis.com>) |
Responses |
Re: group locking: incomplete patch, just for discussion
|
List | pgsql-hackers |
On Wed, Nov 12, 2014 at 2:57 AM, Jeff Davis <pgsql@j-davis.com> wrote: > Trying to catch up on this thread, please excuse me if these questions > are already covered. Welcome to the party. The more, the merrier. :-) > You mention the possibility of undetected deadlocks, which is surely > unacceptable. But why not improve the deadlock detector? How bad are > _detected_ deadlocks? A lot of the concern was around catalog accesses, > but those lock requests would rarely wait anyway. Detected deadlocks are fine. Improving the deadlock detector is the heart of what needs to be done here. As you say, the lock requests we're talking about will rarely wait, so deadlocks won't be frequent. The issue is making sure that, if they do happen, we get a better behavior than "your parallel query hangs forever; good luck figuring out why". > I also wonder if group locking is generally the wrong approach to > parallelism. Parallel scan/sort should work by assigning workers to > chunks of data, and then merging the results. In principle the workers > don't need to touch the same data at all, so why are we trying so hard > to get them to all take the same locks? > > The reason, I assume, is that a full table is the lockable object, but > we are imagining the segment files as the chunks. But maybe there's a > way to address this more directly with an extra field in the lock tag, > and perhaps some catalog knowledge? In the common case, we're only taking AccessShareLock on some relation to prevent it from being concurrently dropped or rewritten. Locking only part of the relation wouldn't lead to any improvement in concurrency, because the full-relation lock isn't conflicting with anything that the partial-relation lock wouldn't also need to conflict with. More generally, I think there's some misunderstanding about the overall goal of the parallelism infrastructure that I'm trying to create. Many people have proposed interesting strategies for how we could do various things (locking, planning, execution) better. Some of those ideas are, doubtless, good ones. But my goal is in some ways the opposite: I'm trying to make it possible to run as much existing PostgreSQL backend code as possible inside a parallel worker without any modification. To do that, I'm trying to modify the subsystems that are widely used throughout the backend - such as locking - in such a way that the people using that infrastructure need not put conditional logic into their code to make it do one thing when in parallel mode and another thing when not in parallel mode. And I'm also trying to avoid fundamentally changing the way major subsystems work today as a precondition of parallelism. So I'm taking a very surgical approach to the patches I propose. I'm not proposing patches that do anything fundamentally new or different; instead, I'm proposing patches that let the stuff we already do continue to work. In a single backend executing a query, we have lots of code that assumes you can allocate memory, look data up in a syscache, throw an error, lock a buffer, examine the value of a GUC, test a tuple against the active snapshot, and so on. For parallel mode to be useful, those same things need to work in a parallel worker. We of course do not need every crazy thing somebody may want to do in a parallel worker, nor will it. But anything that's done in thousands of places throughout the code had better work, or parallelism will be so restricted as to be useless. So commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, for example, is about letting "ereport" work in a parallel worker, and this patch is about making things like "SearchSysCache1" and "PG_GETARG_TEXT_PP" work *reliably* in a parallel worker. Now, the time to do new and different things is coming, and is maybe even now not so very far away. I'm not an expert on parallel execution, and I'm sure there are a number of people far more skilled than I at figuring out how to do a parallel scan, join, sort, or whatever quickly. What I'm trying to do is get the infrastructure to a point where those people don't have to worry about whether the basic facilities that we rely on throughout the backend are gonna work. By way of comparison, think about the periodic discussions about making the backend multi-threaded. Since the backend relies on global variables in an enormous number of places, it isn't thread-safe. If you spawn a thread inside a PostgreSQL backend, it can't safely ereport(), palloc(), LWLockAcquire(), or just about anything else, which means that, although anybody can write code that calls pthread_create(), it's not useful to do so because there's practically no existing backend code that you could safely call from the new thread. Using background worker processes eliminates a lot of those problems - palloc and lwlocks just work, for example - but not all of them. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: