Re: [HACKERS] An isolation test for SERIALIZABLE READ ONLY DEFERRABLE - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: [HACKERS] An isolation test for SERIALIZABLE READ ONLY DEFERRABLE |
Date | |
Msg-id | CAEepm=1MR4Ug9YsLtOS4Q9KAU9aku0pZS4RhBN=0LY3pJ49Ksg@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] An isolation test for SERIALIZABLE READ ONLY DEFERRABLE (Michael Paquier <michael.paquier@gmail.com>) |
Responses |
Re: [HACKERS] An isolation test for SERIALIZABLE READ ONLY DEFERRABLE
|
List | pgsql-hackers |
On Thu, Jan 12, 2017 at 2:21 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Thu, Jan 12, 2017 at 4:09 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Tue, Jan 10, 2017 at 11:41 PM, Michael Paquier >> <michael.paquier@gmail.com> wrote: >>> Do you think that expanding the wait query by default could be >>> intrusive for the other tests? I am wondering about such a white list >>> to generate false positives for the existing tests, including >>> out-of-core extensions, as all the tests now rely only on >>> pg_blocking_pids(). >> >> It won't affect anything unless running at transaction isolation level >> serializable with the "read only deferrable" option. > > Indeed as monitoring.sgml says. And that's now very likely close to > zero. It would be nice to add a comment in the patch to just mention > that. In short, I withdraw my concerns about this patch, the addition > of a test for repeatable read outweights the tweaks done in the > isolation tester. I am marking this as ready for committer, I have not > spotted issues with it. Thanks! Aside from exercising the code, which is surely always a good thing, I think these three tests are quite educational on their own for those of us trying to learn about transaction isolation. I also have longer term plans to show the first and third of them running with the read-only transaction moved to a standby server. Kevin Grittner gave me the idea of multi-server isolation tests when I mentioned my WIP SERIALIZABLE DEFERRABLE-on-standbys patch off-list, which prompted me to realise that our existing SSI tests aren't very interesting for that because they all rely on the behaviour of SERIALIZABLE, not SERIALIZABLE DEFERRABLE. So I figured we'd better have some tests that show show that working on a single node system first. Concerning the question of beauty, I thought of several ways for the isolation tester to detect this type of waiting: 1. Create a new function pg_waiting_for_safe_snapshot(pid) that would acquire SerializableXactHashLock, do a linear search of active SERIALIZABLEXACT structs (using FirstPredXact and NextPredXact) looking for sact->pid == pid and then test sact->flags & SXACT_FLAG_DEFERRABLE_WAITING. Then change the isolation tester's "waiting" query to add " OR pg_waiting_for_safe_snapshot($1)". 2. Modify the existing pg_blocking_pids() function to detect this type of waiting and figure out which other pids are responsible, adding them to its current results. That could work by first doing a linear search to find the SERIALIZABLEXACT struct as above, and if its SXACT_FLAG_DEFERRABLE_WAITING flag is set, then walking the list of possibleUnsafeConflicts to find the pids responsible. Then the isolation tester wouldn't need to change at all. 3. Create a new function pg_waiting_for_safe_snapshot_pids(), separate from pg_blocking_pids(), to return the list of pids as above, and modify the isolation tester to call this new function too. 4. The wait_event based approach as proposed, looking at pg_stat_activity. I couldn't think of any practical uses for the new facilities 1-3 outside this isolation test, which is why I didn't look into those more intrusive approaches. I admit that 4 is a bit clunky, but it's a simple non-intrusive change and I can't think of any reason it would give incorrect results. Even though wait_event is updated and read without memory synchronisation, as far as I know we can assume that select(2) and WaitLatch contain full memory barriers so the isolation tester will certainly see the SafeSnapshot value when it repeatedly polls that query, and the value can't change again until the isolation tester sees it, recognises it as a kind of waiting it knows about, and moves on to running the step in the test. Upon reflection, either 2 or 3 might be considered more beautiful than 4, depending on how others feel about extending the remit of the existing pg_blocking_pid function. I'd be happy to post a new patch using one of those approaches if others feel it'd be better that way. -- Thomas Munro http://www.enterprisedb.com
pgsql-hackers by date: