Re: machine-readable explain output - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: machine-readable explain output |
Date | |
Msg-id | 603c8f070906160622i384839d8t1ebe8c7011f86c35@mail.gmail.com Whole thread Raw |
In response to | Re: machine-readable explain output (Andres Freund <andres@anarazel.de>) |
Responses |
Re: machine-readable explain output
Re: machine-readable explain output Re: machine-readable explain output |
List | pgsql-hackers |
On Tue, Jun 16, 2009 at 8:53 AM, Andres Freund<andres@anarazel.de> wrote: > On 06/16/2009 02:14 PM, Greg Stark wrote: >> >> On Tue, Jun 16, 2009 at 12:19 PM, Andres Freund<andres@anarazel.de> >> wrote: >>> >>> <Startup-Cost>1710.98</Startup-Cost> >>> <Total-Cost>1710.98</Total-Cost> >>> <Plan-Rows>72398</Plan-Rows> >>> <Plan-Width>4</Plan-Width> >>> <Actual-Startup-Time>136.595</Actual-Startup-Time> >>> <Actual-Total-Time>136.595</Actual-Total-Time> >>> <Actual-Rows>72398</Actual-Rows> >>> <Actual-Loops>1</Actual-Loops> >> >> XML's not really my thing currently but it sure seems strange to me to >> have *everything* be a separate tag like this. Doesn't XML do >> attributes too? I would have thought to use child tags like this only >> for things that have some further structure. > >> I would have expected something like: >> >> <join >> <scan type=sequential source="foo.bar"> >> <estimates cost-startup=nnn cost-total=nnn rows=nnn width=nnn></> >> <actual time-startup=nnn time-total=nnnn rows=nnn loops=nnn></> >> </scan> >> <scan type=function source="foo.bar($1)"> >> <parameters> >> <parameter name="$1" expression="...."></> >> </parameters> >> </scan> >> </join> >> >> >> This would allow something like a graphical explain plan to still make >> sense of a plan even if it finds a node it doesn't recognize. It would >> still know generally what to do with a "scan" node or a "join" node >> even if it is a new type of scan or join. As long as you understand how the current code uses <Plan> and <Plans>, you can do this just as well with the current implementation.Each plan node gets a <Plan>. If there are any plans"under" it, it gets a <Plans> child which contains those. Whether you put the additional details into attributes or other tags is irrelevant. As to why I chose to do it this way, I had a couple of reasons: 1. It didn't seem very wise to go with the approach of trying to do EVERYTHING with attributes. If I did that, then I'd either get really long lines that were not easily readable, or I'd have to write some kind of complicated line wrapping code (which didn't seem to make a lot of sense for a machine-readable format). The current format isn't the most beautiful thing I've ever seen, but you don't need a parser to make sense of it, just a bit of patience. 2. I wanted the JSON output and the XML output to be similar, and that seemed much easier with this design. 3. We have existing precedent for this design pattern in, e.g. table_to_xml http://www.postgresql.org/docs/current/interactive/functions-xml.html > While that also looks sensible the more structured variant makes it easier > to integrate additional stats which may not easily be pressed in the > 'attribute' format. As a fastly contrived example you could have io > statistics over time like: > <iostat> > <stat time="10" name=pagefault>...</stat> > <stat time="20" name=pagefault>...</stat> > <stat time="30" name=pagefault>...</stat> > </iostat> > > Something like that would be harder with your variant. > > Structuring it in tags like suggested above: > <Plan-Estimates> > <Startup-Cost>...</Startup-Cost> > ... > </Plan-Estimates> > <Execution-Cost> > <Startup-Cost>...</Startup-Cost> > ... > </Execution-Cost> > > Enables displaying unknown 'scalar' values just like your variant and also > allows more structured values. > > It would be interesting to get somebody having used the old explain in an > automated fashion into this discussion... Well, one problem with this is that the actual values are not costs, but times, and the estimated values are not times, but costs. The planner estimates the cost of operations on an arbitrary scale where the cost of a sequential page fetch is 1.0. When we measure actual times, they are in milliseconds. There is no point that I can see in making it appear that those are the same thing. Observe the current output: explain analyze select 1; QUERY PLAN ------------------------------------------------------------------------------------Result (cost=0.00..0.01 rows=1 width=0)(actual time=0.005..0.007 rows=1 loops=1)Total runtime: 0.243 ms (2 rows) ...Robert
pgsql-hackers by date: