Thread: REVIEW: Track TRUNCATE via pgstat
https://commitfest.postgresql.org/action/patch_view?id=1661 (apologies for not replying to the thread; I can't find it inmy inbox) Patch applies and passes make check. Code formatting looks good. The regression test partially tests this. It does not cover 2PC, nor does it test rolling back a subtransaction that containsa truncate. The latter actually doesn't work correctly. The test also adds 2.5 seconds of forced pg_sleep. I think that's both bad and unnecessary. When I removed the sleeps I stillsaw times of less than 0.1 seconds. Also, wait_for_trunc_test_stats() should display something if it times out; otherwiseyou'll have a test failure and won't have any indication why. I've attached a patch that adds logging on timeout and contains a test case that demonstrates the rollback to savepoint bug. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
Attachment
Jim Nasby <Jim.Nasby@BlueTreble.com> writes: > https://commitfest.postgresql.org/action/patch_view?id=1661 (apologies for not replying to the thread; I can't find itin my inbox) > > Patch applies and passes make check. Code formatting looks good. Jim, > The regression test partially tests this. It does not cover 2PC, nor > does it test rolling back a subtransaction that contains a > truncate. The latter actually doesn't work correctly. Thanks for pointing out the missing 2PC test, I've added one. The test you've added for rolling back a subxact actually works correctly, if you consider the fact that aborted (sub)xacts still account for insert/update/delete in pgstat. I've added this test with the corrected expected results. > The test also adds 2.5 seconds of forced pg_sleep. I think that's both > bad and unnecessary. When I removed the sleeps I still saw times of > less than 0.1 seconds. Well, I never liked that part, but the stats don't get updated if we don't put the session to sleep for at least PGSTAT_STAT_INTERVAL (which is 500ms). Removing these extra sleep calls would theoretically not make a difference as wait_for_trunc_test_stats() seems to have enough sleep calls itself, but due to the pgstat_report_stat() being called from the main loop only, there's no way short of placing the explicit pg_sleep at top level, if we want to be able to check the effects reproducibly. Another idea would be exposing pgstat_report_stat(true) at SQL level. That would eleminate the need for explicit pg_sleep(>=0.5), but we'll still need the wait_for_... call to make sure the collector has picked it up. > Also, wait_for_trunc_test_stats() should display something if it times > out; otherwise you'll have a test failure and won't have any > indication why. Oh, good catch. Since I've copied this function from stats.sql, we might want to update that one too in a separate commit. > I've attached a patch that adds logging on timeout and contains a test > case that demonstrates the rollback to savepoint bug. I'm attaching the updated patch version. Thank you for the review! -- Alex PS: re: your CF comment: I'm producing the patches using git format-patch --ext-diff where diff.external is set to '/bin/bash src/tools/git-external-diff'. Now that I try to apply it using git, looks like git doesn't like the copied context diff very much...
Attachment
Alex Shulgin wrote: > Jim Nasby <Jim.Nasby@BlueTreble.com> writes: > > The test also adds 2.5 seconds of forced pg_sleep. I think that's both > > bad and unnecessary. When I removed the sleeps I still saw times of > > less than 0.1 seconds. > > Well, I never liked that part, but the stats don't get updated if we > don't put the session to sleep for at least PGSTAT_STAT_INTERVAL (which > is 500ms). > > Removing these extra sleep calls would theoretically not make a > difference as wait_for_trunc_test_stats() seems to have enough sleep > calls itself, but due to the pgstat_report_stat() being called from the > main loop only, there's no way short of placing the explicit pg_sleep at > top level, if we want to be able to check the effects reproducibly. > > Another idea would be exposing pgstat_report_stat(true) at SQL level. > That would eleminate the need for explicit pg_sleep(>=0.5), but we'll > still need the wait_for_... call to make sure the collector has picked > it up. We already have a stats test that sleeps. Why not add this stuff there, to avoid making another test slow? I agree that tests that sleep are annoying. (Try running the "timeout" isolation test a few times and you'll see what I mean.) -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: >> >> Another idea would be exposing pgstat_report_stat(true) at SQL level. >> That would eleminate the need for explicit pg_sleep(>=0.5), but we'll >> still need the wait_for_... call to make sure the collector has picked >> it up. > > We already have a stats test that sleeps. Why not add this stuff there, > to avoid making another test slow? Well, if we could come up with a set of statements to test that would produce the end result unambigously, so that we can be certain the stats we check at the end cannot be a result of neat interaction of buggy behavior... I'm not sure this is at all possible, but I know for sure it will make debugging the possible fails harder. Even with the current approach of checking the stats after every isolated case it's sometimes takes quite a little more than a glance to verify correctness due to side-effects of rollback (ins/upd/del counters are still updated), and the way stats are affecting the dead tuples counter. I'll try to see if the checks can be converged though. -- Alex
Alex Shulgin wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: > >> > >> Another idea would be exposing pgstat_report_stat(true) at SQL level. > >> That would eleminate the need for explicit pg_sleep(>=0.5), but we'll > >> still need the wait_for_... call to make sure the collector has picked > >> it up. > > > > We already have a stats test that sleeps. Why not add this stuff there, > > to avoid making another test slow? > > Well, if we could come up with a set of statements to test that would > produce the end result unambigously, so that we can be certain the stats > we check at the end cannot be a result of neat interaction of buggy > behavior... It is always possible that things work just right because two bugs cancel each other. > I'm not sure this is at all possible, but I know for sure it will make > debugging the possible fails harder. Surely if some other patch introduces a failure, nobody will try to debug it by looking only at the failures of this test. > Even with the current approach of checking the stats after every > isolated case it's sometimes takes quite a little more than a glance > to verify correctness due to side-effects of rollback (ins/upd/del > counters are still updated), and the way stats are affecting the dead > tuples counter. Honestly I think pg_regress is not the right tool to test stat counter updates. It kind-of works today, but only because we don't stress it too much. If you want to create a new test framework for pgstats, and include some tests for truncate, be my guest. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Alex Shulgin wrote: > >> Alvaro Herrera <alvherre@2ndquadrant.com> writes: >> >> >> >> Another idea would be exposing pgstat_report_stat(true) at SQL level. >> >> That would eleminate the need for explicit pg_sleep(>=0.5), but we'll >> >> still need the wait_for_... call to make sure the collector has picked >> >> it up. >> > >> > We already have a stats test that sleeps. Why not add this stuff there, >> > to avoid making another test slow? >> >> Well, if we could come up with a set of statements to test that would >> produce the end result unambigously, so that we can be certain the stats >> we check at the end cannot be a result of neat interaction of buggy >> behavior... > > It is always possible that things work just right because two bugs > cancel each other. > >> I'm not sure this is at all possible, but I know for sure it will make >> debugging the possible fails harder. > > Surely if some other patch introduces a failure, nobody will try to > debug it by looking only at the failures of this test. > >> Even with the current approach of checking the stats after every >> isolated case it's sometimes takes quite a little more than a glance >> to verify correctness due to side-effects of rollback (ins/upd/del >> counters are still updated), and the way stats are affecting the dead >> tuples counter. > > Honestly I think pg_regress is not the right tool to test stat counter > updates. It kind-of works today, but only because we don't stress it > too much. If you want to create a new test framework for pgstats, and > include some tests for truncate, be my guest. Yes, these are all good points. Actually, moving the test to stats.sql helped me realize the current patch behavior is not strictly correct: there's a corner case when you insert/update before truncate in transaction, which is later aborted. I need to take a closer look. -- Alex
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > >> Even with the current approach of checking the stats after every >> isolated case it's sometimes takes quite a little more than a glance >> to verify correctness due to side-effects of rollback (ins/upd/del >> counters are still updated), and the way stats are affecting the dead >> tuples counter. > > Honestly I think pg_regress is not the right tool to test stat counter > updates. It kind-of works today, but only because we don't stress it > too much. If you want to create a new test framework for pgstats, and > include some tests for truncate, be my guest. OK, I think I have now all bases covered, though the updated patch is not that "pretty". The problem is that we don't know in advance if the (sub)transaction is going to succeed or abort, and in case of aborted truncate we need to use the stats gathered prior to truncate. Thus the need to track insert/update/deletes that happened before first truncate separately. To the point of making a dedicated pgstat testing tool: let's have another TODO item? -- Alex
Attachment
Alex Shulgin wrote: > OK, I think I have now all bases covered, though the updated patch is > not that "pretty". > > The problem is that we don't know in advance if the (sub)transaction is > going to succeed or abort, and in case of aborted truncate we need to > use the stats gathered prior to truncate. Thus the need to track > insert/update/deletes that happened before first truncate separately. Ugh, this is messy indeed. I grant that TRUNCATE is a tricky case to handle correctly, so some complexity is expected. Can you please explain in detail how this works? > To the point of making a dedicated pgstat testing tool: let's have > another TODO item? Sure. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Alex Shulgin wrote: > >> OK, I think I have now all bases covered, though the updated patch is >> not that "pretty". >> >> The problem is that we don't know in advance if the (sub)transaction is >> going to succeed or abort, and in case of aborted truncate we need to >> use the stats gathered prior to truncate. Thus the need to track >> insert/update/deletes that happened before first truncate separately. > > Ugh, this is messy indeed. I grant that TRUNCATE is a tricky case to > handle correctly, so some complexity is expected. Can you please > explain in detail how this works? The main idea is that aborted transaction can leave dead tuples behind (that is every insert and update), but when TRUNCATE is issued we need to reset insert/update/delete counters to 0: otherwise we won't get accurate live and dead counts at commit time. If the transaction that issued TRUNCATE is instead aborted, the insert/update counters that we were incrementing *after* truncate are not relevant to accurate calculation of dead tuples in the original relfilenode we are now back to due to abort. We need the insert/updates counts that happened *before* the first TRUNCATE, hence the need for separate counters. >> To the point of making a dedicated pgstat testing tool: let's have >> another TODO item? > > Sure. Added one. -- Alex
Here's v0.5. (Why did you use that weird decimal versioning scheme? You could just say "v4" and save a couple of keystrokes). This patch makes perfect sense to me now. I was ready to commit, but I checked the regression test you added and noticed that you're only reading results for the last set of operations because they all use the same table and so each new set clobbers the values for the previous one. So I modified them to use one table for each set, and report the counters for all tables. In doing this I noticed that the one for trunc_stats_test3 is at odds with what your comment in the .sql file says; would you review it please? Thanks. (I didn't update the expected file.) BTW you forgot to update expected/prepared_xact_1.out, for the case when prep xacts are disabled. If some other committer decides to give this a go, please remember to bump catversion before pushing. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
<div dir="ltr">On Sat, Jan 24, 2015 at 5:58 AM, Alvaro Herrera <span dir="ltr"><<a href="mailto:alvherre@2ndquadrant.com"target="_blank">alvherre@2ndquadrant.com</a>></span> wrote:<br /><div class="gmail_extra"><divclass="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #cccsolid;padding-left:1ex">Here's v0.5. (Why did you use that weird decimal versioning scheme? You<br /> could just say"v4" and save a couple of keystrokes). This patch makes<br /> perfect sense to me now. I was ready to commit, but Ichecked the<br /> regression test you added and noticed that you're only reading results<br /> for the last set of operationsbecause they all use the same table and<br /> so each new set clobbers the values for the previous one. So I modified<br/> them to use one table for each set, and report the counters for all<br /> tables. In doing this I noticedthat the one for trunc_stats_test3 is<br /> at odds with what your comment in the .sql file says; would you review<br/> it please? Thanks.<br /><br /> (I didn't update the expected file.)<br /><br /> BTW you forgot to update expected/prepared_xact_1.out,for the case when<br /> prep xacts are disabled.<br /><br /> If some other committer decidesto give this a go, please remember to<br /> bump catversion before pushing.<br /></blockquote></div><br /></div><divclass="gmail_extra">Alex, this patch seems nicely backed. Could you review the changes of Alvaro? This threadis waiting for your input for 3 weeks.<br />-- <br /><div class="gmail_signature">Michael<br /></div></div></div>
Pushed, thanks. I reviewed the test results and concluded that the comments were wrong and the code was right, so I updated the comments to match reality. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services