Re: Query progress indication - an implementation - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Query progress indication - an implementation |
Date | |
Msg-id | 603c8f070906281038r2dccf5c3u3562d930cbd91f24@mail.gmail.com Whole thread Raw |
In response to | Query progress indication - an implementation (Scara Maccai <m_lists@yahoo.it>) |
Responses |
Re: Query progress indication - an implementation
|
List | pgsql-hackers |
On Fri, Jun 26, 2009 at 3:37 AM, Scara Maccai<m_lists@yahoo.it> wrote: > Hi all, > > following the link in > > http://wiki.postgresql.org/wiki/Query_progress_indication > > but mostly: > > http://www.postech.ac.kr/~swhwang/progress2.pdf [1] > > I'm trying to write an implementation of the "dne" method in postgresql. > > I added another column to the pg_stat_get_activity function to report the percentage of work done for the query (of course,any other method could be used... the way the percentage is reported to the user can be easily changed). > > I attached a first patch (just to see if anyone is interested, the work is by no means finished). > > I guess I did a lot of mistakes, since I don't know anything about postgresql code... > > 1) the progress indicator can be eliminated at runtime; this could be done with another runtime flag (at the moment isalways on) > > 2) I added a new structure (Progress) to PlanState to keep all the info about execution progress > > 3) I needed a pointer to the root of the PlanStates, to be able to calculate the total progress of the query tree (I betthis pointer was already available somewhere, but I couldn't find where...) > > 4) sub-plans are not included yet (well, just to be honest, I don't really know what postgresql means with those... :)) > > 5) the percentage is updated at most every second (can be easily changed) > > 6) the methods to adjust upper/lower bounds in [1] are not implemented yet (but that shouldn't be a problem) > > 7) the "spilled tuples" handling in [1] is not supported yet > > 8) only hash join, loop join, aggregate, sequence scans are implemented at the moment > > 9) I added another flag (EXEC_FLAG_DRIVER_BRANCH) in executor.h to signal to the sub-nodes if they are part of a branchthat will contain a driver node (for example, inner subtree of a Nested Loops join is not a driver branch). I guessthis could be done better at Plan level (instead of PlanState), but this way less code has to be changed > > 10) at the moment all driver nodes have the same "work_per_tuple=1", but this could be changed (for example, CPU-intensivedriver nodes could have a smaller work_per_tuple value) > > Well, some (very early) tests on a tpcd db showed it works as expected (well, I only did very few tests...) You might want to take a look at this: http://wiki.postgresql.org/wiki/Submitting_a_Patch The project style is not to use C++-style comments, and you should eliminate all of the unnecessary diff hunks from your patch (like files that have only comment or whitespace changes). Also, it is requested that patches be submitted in context diff format and added to the CommitFest wiki here: http://wiki.postgresql.org/wiki/CommitFest_2009-First As to the content of the patch, I think that what you are doing is comparing the actual number of "operations" with the expected number of operations. If that's correct, I'm not sure it's really all that useful, because it will only give you accurate percentage-of-completion information when the estimates are correct. But when the estimates are correct, you probably have a pretty good idea how long the query will take to run anyway. When the estimates are off, you'll find that the actual number of operations is more than the expected number of operations, but that won't really tell you how far you have to go. The only other use case I can think of for functionality of this type is some kind of dashboard view on a system with very long-running queries, where you want to see how far you have yet to go on each one (maybe to approximate when you can submit the next one) without having detailed knowledge of how expensive each individual query was project to be. But that's a pretty narrow use case, and I'm not sure it really justifies the overhead of instrumenting every query in this way. For a fraction of the run-time cost, you could include the estimated total cost of the query in the pg_stat_activity output, which would let the user do much the same thing presuming that they have some knowledge of the usual ratio between costs and execution times. Greg Stark was (is?) working on a way to get EXPLAIN-ANALYZE type statistics on running queries; you might want to take a look at some of that work and see what you think. http://wiki.postgresql.org/wiki/Greg_Stark%27s_Development_Projects#EXPLAIN_queries_in_progress ...Robert
pgsql-hackers by date: