Thread: explain vs auto_explain

explain vs auto_explain

From
Vijaykumar Jain
Date:
Hi,

Why does auto_explain have extended flags to support nested statements and triggers whereas explain does not.
the objects of concern have the same ownership and access, so it does not even sound like a security thing.

or maybe it is just a format/display issue ? 
i tried to check the doc, there is no explicit mention of differences 



/*
postgres@ubuntu:/tmp$ cat db1/postgresql.auto.conf | grep -v '^#'
port=5432
session_preload_libraries = auto_explain
auto_explain.log_min_duration = 0
auto_explain.log_analyze = true
auto_explain.log_buffers = true
auto_explain.log_timing = true
auto_explain.log_verbose = on
auto_explain.log_triggers = on
auto_explain.log_nested_statements = on
postgres@ubuntu:/tmp$ psql -p 5432 demo
psql (16.4 (Ubuntu 16.4-0ubuntu0.24.04.1))
Type "help" for help.

demo=# create table t(col1 int primary key, col2 int);                                                                      CREATE TABLE
demo=# create table r(col3 int primary key, col1 int references t(col1) on update cascade on delete cascade); -- foreign table with col1 having no index
CREATE TABLE
demo=# insert into t select x, x from generate_series(1, 1000000) x;                                                        INSERT 0 1000000
demo=# insert into r select x, (x % 3) + 1 from generate_series(1, 100000) x; -- only 1,2,3 for col1
INSERT 0 100000
demo=# begin;
BEGIN
demo=*# explain (analyze,verbose,buffers) delete from t where col1 < 5; -- when we delete from t, it will show it touches r but will not show what plan was used to delete data from r. we need to check the actual plan to figure out why the delete is slow and how to fix it                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------- Delete on public.t  (cost=0.42..8.50 rows=0 width=0) (actual time=0.025..0.026 rows=0 loops=1)   Buffers: shared hit=13   ->  Index Scan using t_pkey on public.t  (cost=0.42..8.50 rows=4 width=6) (actual time=0.006..0.008 rows=4 loops=1)         Output: ctid         Index Cond: (t.col1 < 5)         Buffers: shared hit=4 Planning:   Buffers: shared hit=7 Planning Time: 0.083 ms Trigger RI_ConstraintTrigger_a_16475 for constraint r_col1_fkey: time=46.274 calls=4 Execution Time: 46.460 ms
(11 rows)

demo=*# -- note Trigger RI_ConstraintTrigger_a_16475 for constraint r_col1_fkey: time=46.274 calls=4 , it made 4 calls to r
demo=*# rollback;
ROLLBACK
demo=#
\q

-------------- why do we not see this via console EXPLAIN, it is still the same PID
postgres@ubuntu:/tmp$ less db1.log 
postgres@ubuntu:/tmp$ tail -20 db1.log
2024-10-05 18:40:26.429 UTC [2260] LOG:  duration: 1.087 ms  plan:        Query Text: DELETE FROM ONLY "public"."r" WHERE $1 OPERATOR(pg_catalog.=) "col1"        Query Parameters: $1 = '4'        Delete on public.r  (cost=0.00..1693.00 rows=0 width=0) (actual time=1.086..1.087 rows=0 loops=1)          Buffers: shared hit=443          ->  Seq Scan on public.r  (cost=0.00..1693.00 rows=1 width=6) (actual time=1.086..1.086 rows=0 loops=1)                Output: ctid                Filter: (4 = r.col1)                Buffers: shared hit=443
2024-10-05 18:40:26.429 UTC [2260] CONTEXT:  SQL statement "DELETE FROM ONLY "public"."r" WHERE $1 OPERATOR(pg_catalog.=) "col1""
2024-10-05 18:40:26.429 UTC [2260] LOG:  duration: 46.303 ms  plan:        Query Text: explain (analyze,verbose,buffers) delete from t where col1 < 5;        Delete on public.t  (cost=0.42..8.50 rows=0 width=0) (actual time=0.025..0.026 rows=0 loops=1)          Buffers: shared hit=13          ->  Index Scan using t_pkey on public.t  (cost=0.42..8.50 rows=4 width=6) (actual time=0.006..0.008 rows=4 loops=1)                Output: ctid                Index Cond: (t.col1 < 5)                Buffers: shared hit=4        Trigger RI_ConstraintTrigger_a_16475 for constraint r_col1_fkey: time=46.274 calls=4

*/

--
Thanks,
Vijay

Open to work
Resume - Vijaykumar Jain

Re: explain vs auto_explain

From
"David G. Johnston"
Date:
On Sat, Oct 19, 2024 at 10:05 AM Vijaykumar Jain <vijaykumarjain.github@gmail.com> wrote:
Why does auto_explain have extended flags to support nested statements and triggers whereas explain does not.
the objects of concern have the same ownership and access, so it does not even sound like a security thing.

or maybe it is just a format/display issue ? 
i tried to check the doc, there is no explicit mention of differences 


With auto_explain you end up producing multiple individual explains, one for each top-level query being executed.  Each one produced by auto_explain which hooks into the point where a top-level query begins execution.  Running explain directly on a top-level query doesn't establish any of the needed hooks to produce these additional explains.  While those hooks could be installed the output for manual explain is the single query result sent to the client.  How multiple explains could be combined into that single output channel would need to be figured out as well.  While auto_explain uses the log file which is much simpler to use in this manner since each auto_explain can just be told to write its output to the log and not worry about any other considerations.

In short, there is no fundamental reason manual explain couldn't be improved along these lines.

David J.

Re: explain vs auto_explain

From
Vijaykumar Jain
Date:
On Sat, 19 Oct 2024 at 22:55, David G. Johnston <david.g.johnston@gmail.com> wrote:

With auto_explain you end up producing multiple individual explains, one for each top-level query being executed.  Each one produced by auto_explain which hooks into the point where a top-level query begins execution.  Running explain directly on a top-level query doesn't establish any of the needed hooks to produce these additional explains.  While those hooks could be installed the output for manual explain is the single query result sent to the client.  How multiple explains could be combined into that single output channel would need to be figured out as well.  While auto_explain uses the log file which is much simpler to use in this manner since each auto_explain can just be told to write its output to the log and not worry about any other considerations.

Thanks David.
The reason is, a normal db user who does not have access to logs or the db server, will rely on what the console throws back.
 as a result, they are not able to get the complete view of what is happening ... 

not sure how relevant is this context, but when i was playing with citus, they have a flag
set citus.explain_all_tasks = 1; 
which is session based and dumps the plan on all remote workers / shard nodes right to the user, which otherwise only dumps for one node .. yes its verbose, but it helps the user to see what was run on the shard.

Similarly, I was expecting to explain to give a full plan with nested plans if requested back to the user in their session directly. if they dont want the extended output, the default flags should be fine ? 
Or is it the case, where there can be million rows from remote tables, and auto_explain will log all scans for all matching rows, thereby killing the console session or something ?
i tried to check the code for auto_explain , there is nothing that helps understand why it was provided as a separate .


 
--
Thanks,
Vijay

Open to work
Resume - Vijaykumar Jain

Re: explain vs auto_explain

From
"David G. Johnston"
Date:
On Sat, Oct 19, 2024 at 10:43 AM Vijaykumar Jain <vijaykumarjain.github@gmail.com> wrote:
i tried to check the code for auto_explain , there is nothing that helps understand why it was provided as a separate .

Probably because output to log was easier than reworking the internals to make output to client happen.

If you are interested in working on a patch to address your use case I say go for it.  Designing the inputs and outputs actually doesn't seem that challenging, if exceptionally vulnerable to bike-shedding.  But I imagine the internals are quite a different matter.

David J.

Re: explain vs auto_explain

From
Tom Lane
Date:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Sat, Oct 19, 2024 at 10:43 AM Vijaykumar Jain <
> vijaykumarjain.github@gmail.com> wrote:
>> i tried to check the code for auto_explain , there is nothing that helps
>> understand why it was provided as a separate .

> Probably because output to log was easier than reworking the internals to
> make output to client happen.

The reason that auto_explain exists is to capture plans for queries
that are being issued by real applications --- which aren't programmed
to issue EXPLAIN for themselves, and likely don't have a good place to
put the data if they did.  Also, auto_explain can capture runtime
details for queries that are really being executed and delivering
results, whereas EXPLAIN ANALYZE doesn't deliver the query results and
thus can't be shoehorned into real applications.  So it's partly a
matter of not having a protocol spec that would allow the EXPLAIN data
to be delivered on a side channel, but mostly a recognition that
rewriting applications to capture such data would be painful.

            regards, tom lane



Re: explain vs auto_explain

From
Vijaykumar Jain
Date:


On Sat, 19 Oct 2024 at 23:31, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Sat, Oct 19, 2024 at 10:43 AM Vijaykumar Jain <
> vijaykumarjain.github@gmail.com> wrote:
>> i tried to check the code for auto_explain , there is nothing that helps
>> understand why it was provided as a separate .

> Probably because output to log was easier than reworking the internals to
> make output to client happen.

The reason that auto_explain exists is to capture plans for queries
that are being issued by real applications --- which aren't programmed
to issue EXPLAIN for themselves, and likely don't have a good place to
put the data if they did.  Also, auto_explain can capture runtime
details for queries that are really being executed and delivering
results, whereas EXPLAIN ANALYZE doesn't deliver the query results and
thus can't be shoehorned into real applications.  So it's partly a
matter of not having a protocol spec that would allow the EXPLAIN data
to be delivered on a side channel, but mostly a recognition that
rewriting applications to capture such data would be painful.

                        regards, tom lane

ok, it makes sense for the reason of having auto_explain. but maybe i did ask correctly, 
why do we not have the extended flags in auto_explain , in , explain wrt nested_statements, and triggers ...
a user who finds the console output complicated, could well use a pager or redirect the output to the file via \o which is client side.

as i mentioned the reason is, there are differences on what auto_explain captures and what explain does... and the dev user is not able to see the difference
without having access to logs.
for example , iirc 
refresh materialised view does not show the plan , although there was once a feature reported, which showed the difference in support for parallelism.
ex in this discussion

i dont expect this to be a feature request or something, it was just that i wanted to be aware why there are differences,
 because the cloud guys have strict control over logs as it has many other things, so they just wont give access at all.


--
Thanks,
Vijay

Open to work
Resume - Vijaykumar Jain

Re: explain vs auto_explain

From
Vijaykumar Jain
Date:


On Sat, 19 Oct 2024 at 23:48, Vijaykumar Jain <vijaykumarjain.github@gmail.com> wrote:


ok, it makes sense for the reason of having auto_explain. but maybe i did ask correctly, 
why do we not have the extended flags in auto_explain , in , explain wrt nested_statements, and triggers ...
a user who finds the console output complicated, could well use a pager or redirect the output to the file via \o which is client side.

 
actually my bad. pls ignore.

i forgot we could leverage client_min_messages to get the output to console 

/*
postgres=# explain analyze create materialized view mv as select * from t;
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Seq Scan on t  (cost=0.00..35.50 rows=2550 width=4) (actual time=0.027..0.028 rows=0 loops=1)
 Planning Time: 0.549 ms
 Execution Time: 7.309 ms
(3 rows)

                                 ^
postgres=# explain analyze refresh materialized view mv;
                QUERY PLAN
-------------------------------------------
 Utility statements have no plan structure
(1 row)

postgres=# alter system set client_min_messages TO log;
ALTER SYSTEM
postgres=# select pg_reload_conf();
 pg_reload_conf
----------------
 t
(1 row)

postgres=# refresh materialized view
postgres-# mv ;
LOG:  duration: 0.016 ms  plan:
Query Text: refresh materialized view
mv ;
Seq Scan on public.t  (cost=0.00..35.50 rows=2550 width=4) (actual time=0.009..0.009 rows=0 loops=1)
  Output: col1
REFRESH MATERIALIZED VIEW
*/


I think i am good. thanks all. 
--
Thanks,
Vijay

Open to work
Resume - Vijaykumar Jain