Thread: [HACKERS] [PATCH] New command to monitor progression of long running queries

[HACKERS] [PATCH] New command to monitor progression of long running queries

From

Remi Colinet

Date:

17 April 2017, 15:09:01

Hello,

I've implemented a new command named PROGRESS to monitor progression of long running SQL queries in a backend process.

Use case
=======

A use case is shown in the below example based on a table named t_10m with 10 millions rows.

The table has been created with :

CREATE TABLE T_10M ( id integer, md5 text);
INSERT INTO T_10M SELECT generate_series(1,10000000) AS id, md5(random()::text) AS md5;

1/ Start a first psql session to run long SQL queries:

[pgadm@rco ~]$ psql -A -d test
psql (10devel)
Type "help" for help.
test=#

The option -A is used to allow rows to be output without formatting work.

Redirect output to a file in order to let the query run without terminal interaction:
test=# \o out

Start a long running query:
test=# select * from t_10M order by md5;

2/ In a second psql session, list the backend pid and their SQL query

[pgadm@rco ~]$ psql -d test
psql (10devel)
Type "help" for help.

test=# select pid, query from pg_stat_activity ;
pid | query
-------+-------------------------------------------
19081 |
19084 |
19339 | select pid, query from pg_stat_activity ;
19341 | select * from t_10m order by md5;
19727 | select * from t_10m order by md5;
19726 | select * from t_10m order by md5;
19079 |
19078 |
19080 |
(9 rows)

test=#

Chose the pid of the backend running the long SQL query to be monitored. Above example is a parallel SQL query. Lowest pid is the main backend of the query.

test=# PROGRESS 19341;
                                       PLAN PROGRESS
-------------------------------------------------------------------------------------------
Gather Merge
   -> Sort=> dumping tuples to tapes
         rows r/w merge 0/0 rows r/w effective 0/2722972 0%
         Sort Key: md5
         -> Parallel Seq Scan on t_10m => rows 2751606/3954135 69% blks 125938/161222 78%
(5 rows)

test=#

The query of the monitored backend is:
test=# select * from t_10M order by md5;

Because the table has 10 millions of rows, the sort is done on tapes.

Design of the command
=================

The design of the patch/command is:

- the user issue the "PROGRESS pid" command from a psql session. The pid is the one of the backend which runs the SQL query for which we want to get a progression report. It can be determined from the view pg_stat_activity.
- the monitoring backend, upon receiving the "PROGRESS pid" command from psql utility used in step above, sends a signal to the backend whose process pid is the one provided in the PROGRESS command.

- the monitored backend receives the signal and notes the request as for any interrupt. Then, it continues its execution of its SQL query until interrupts can be serviced.

- when the monitored process can service the interrupts, it deals with the progress request by collecting its execution tree with the execution progress of each long running node. At this time, the SQL query is no more running. The progression of each node is calculated during the execution of the SQL query which is at this moment stopped. The execution tree is dumped in shared memory pages allocated at the start of the server. Then, the monitored backend set a latch on which the monitoring process is waiting for. It then continues executing its SQL query.

- the monitoring backend collects the share memory data dumped by the monitored backed, and sends it to its psql session, as a list of rows.

The command PROGRESS does not incur any slowness on the running query because the execution progress is only computed upon receiving the progress request which is supposed to be seldom used.

The code heavily reuses the one of the explain command. In order to share as much code as possible with the EXPLAIN command, part of the EXPLAIN code which deals with reporting quals for instance, has been moved to a new report.c file in the src/backend/commands folder. This code in report.c is shared between explain.c source code and PROGRESS command source code which is in progress.c file.

The progression reported by PROGRESS command is given in terms of rows, blocks, bytes and percents. The values displayed depend on the node type in the execution plan.

The current patch implements all the possible nodes which could take a lot of time:

- Sequential scan nodes with rows and block progress (node type T_SeqScan, T_SampleScan, T_BitmapHeaepScan, T_SubqueryScan, T_FunctionScan, T_ValuesScan, T_CteScan, T_WorkTableScan)

- Tuple id scan node with rows and blocks progress (T_TidScan)

- Limit node with rows progress (T_Limit)

- Foreign and custom scan with rows and blocks progress (T_ForeignScan, T_CustomScan)

- Index scan, index only scan and bitmap index scan with rows and blocks progress

Patch
====

The diff stat of the patch is:

[root@rco pg]# git diff --stat master..
contrib/auto_explain/auto_explain.c       |    5 +-
contrib/postgres_fdw/postgres_fdw.c       |   13 +-
src/backend/access/heap/heapam.c          |    2 +
src/backend/commands/Makefile             |    3 +-
src/backend/commands/explain.c            | 2834 ++++++++++++++-----------------------------------------------------------------------------------------------
src/backend/commands/prepare.c            |    5 +-
src/backend/commands/progress.c           | 1314 +++++++++++++++++++++++++++++++++++++++++++++++++++
src/backend/commands/report.c             | 2120 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
src/backend/executor/execProcnode.c       |   31 ++
src/backend/executor/nodeBitmapHeapscan.c |   13 +-
src/backend/executor/nodeIndexonlyscan.c |   13 +-
src/backend/executor/nodeIndexscan.c      |   15 +-
src/backend/executor/nodeSamplescan.c     |   12 +-
src/backend/executor/nodeSeqscan.c        |   16 +-
src/backend/nodes/bitmapset.c             |   19 +
src/backend/nodes/outfuncs.c              | 245 ++++++++++
src/backend/parser/gram.y                 |   99 +++-
src/backend/postmaster/postmaster.c       |    1 +
src/backend/storage/file/buffile.c        |   47 ++
src/backend/storage/ipc/ipci.c            |    3 +
src/backend/storage/ipc/procarray.c       |   57 +++
src/backend/storage/ipc/procsignal.c      |    4 +
src/backend/storage/lmgr/lwlock.c         |    7 +-
src/backend/storage/lmgr/lwlocknames.txt |    1 +
src/backend/tcop/postgres.c               |   10 +
src/backend/tcop/pquery.c                 |   25 +
src/backend/tcop/utility.c                |   10 +
src/backend/utils/init/globals.c          |   12 +
src/backend/utils/sort/tuplesort.c        | 142 +++++-
src/backend/utils/sort/tuplestore.c       |   73 ++-
src/include/commands/explain.h            |   67 +--
src/include/commands/prepare.h            |    2 +-
src/include/commands/report.h             | 136 ++++++
src/include/executor/execdesc.h           |    2 +
src/include/executor/progress.h           |   52 ++
src/include/foreign/fdwapi.h              |   10 +-
src/include/nodes/bitmapset.h             |    1 +
src/include/nodes/execnodes.h             |    3 +
src/include/nodes/extensible.h            |    6 +-
src/include/nodes/nodes.h                 |    8 +
src/include/nodes/parsenodes.h            |   11 +
src/include/nodes/plannodes.h             |   11 +
src/include/parser/kwlist.h               |    4 +
src/include/pgstat.h                      |    3 +-
src/include/storage/buffile.h             |    8 +
src/include/storage/procarray.h           |    3 +
src/include/storage/procsignal.h          |    3 +
src/include/utils/tuplesort.h             |   71 ++-
src/include/utils/tuplestore.h            |   33 ++
49 files changed, 4979 insertions(+), 2606 deletions(-)
[root@rco pg]#

The progress command can be used with the watch command of psql making it more handy to monitor a long running query.

The default format of the PROGRESS command is text. It can be easily expanded to json and xml as for EXPLAIN command.

The patch is based on commit 85a0781334a204c15c9c6ea9d3e6c75334c2beb6 (Date: Fri Apr 14 17:51:25 2017 -0400)

Use cases
========

Some further examples of use are shown below in the test_v1.txt file.

What do you make of this idea/patch?

Does it make sense?

Any suggestion is welcome.

The current patch is still work in progress. It is meanwhile stable. It can be used with regular queries. Utilities commands are not supported for the moment.

Documentation is not yet written.

Regards

Remi

Attachment

Re: [HACKERS] [PATCH] New command to monitor progression of longrunning queries

From

Maksim Milyutin

Date:

18 April 2017, 16:00:16

Hi!

On 17.04.2017 15:09, Remi Colinet wrote:
> Hello,
>
> I've implemented a new command named PROGRESS to monitor progression of
> long running SQL queries in a backend process.
>
>
> Use case
> =======
>
> A use case is shown in the below example based on a table named t_10m
> with 10 millions rows.
>
> The table has been created with :
>
> CREATE TABLE T_10M ( id integer, md5 text);
> INSERT INTO T_10M SELECT generate_series(1,10000000) AS id,
> md5(random()::text) AS md5;
>
> 1/ Start a first psql session to run long SQL queries:
>
> [pgadm@rco ~]$ psql -A -d test
> psql (10devel)
> Type "help" for help.
> test=#
>
> The option -A is used to allow rows to be output without formatting work.
>
> Redirect output to a file in order to let the query run without terminal
> interaction:
> test=# \o out
>
> Start a long running query:
> test=# select * from t_10M order by md5;
>
> 2/ In a second psql session, list the backend pid and their SQL query
>
> [pgadm@rco ~]$ psql -d test
> psql (10devel)
> Type "help" for help.
>
> test=# select pid, query from pg_stat_activity ;
>   pid  |                   query
> -------+-------------------------------------------
>  19081 |
>  19084 |
>  19339 | select pid, query from pg_stat_activity ;
>  19341 | select * from t_10m order by md5;
>  19727 | select * from t_10m order by md5;
>  19726 | select * from t_10m order by md5;
>  19079 |
>  19078 |
>  19080 |
> (9 rows)
>
> test=#
>
> Chose the pid of the backend running the long SQL query to be monitored.
> Above example is a parallel SQL query. Lowest pid is the main backend of
> the query.
>
> test=# PROGRESS 19341;
>                                        PLAN
> PROGRESS
> -------------------------------------------------------------------------------------------
>  Gather Merge
>    ->  Sort=> dumping tuples to tapes
>          rows r/w merge 0/0 rows r/w effective 0/2722972 0%
>          Sort Key: md5
>          ->  Parallel Seq Scan on t_10m => rows 2751606/3954135 69% blks
> 125938/161222 78%
> (5 rows)
>
> test=#
>
> The query of the monitored backend is:
> test=# select * from t_10M order by md5;
>
> Because the table has 10 millions of rows, the sort is done on tapes.
>
>
> Design of the command
> =================
>
> The design of the patch/command is:
> - the user issue the "PROGRESS pid" command from a psql session. The pid
> is the one of the backend which runs the SQL query for which we want to
> get a progression report. It can be determined from the view
> pg_stat_activity.
> - the monitoring backend, upon receiving the "PROGRESS pid" command from
> psql utility used in step above, sends a signal to the backend whose
> process pid is the one provided in the PROGRESS command.
> - the monitored backend receives the signal and notes the request as for
> any interrupt. Then, it continues its execution of its SQL query until
> interrupts can be serviced.
> - when the monitored process can service the interrupts, it deals with
> the progress request by collecting its execution tree with the execution
> progress of each long running node. At this time, the SQL query is no
> more running. The progression of each node is calculated during the
> execution of the SQL query which is at this moment stopped. The
> execution tree is dumped in shared memory pages allocated at the start
> of the server. Then, the monitored backend set a latch on which the
> monitoring process is waiting for. It then continues executing its SQL
> query.
> - the monitoring backend collects the share memory data dumped by the
> monitored backed, and sends it to its psql session, as a list of rows.
>
> The command PROGRESS does not incur any slowness on the running query
> because the execution progress is only computed upon receiving the
> progress request which is supposed to be seldom used.
>
> The code heavily reuses the one of the explain command. In order to
> share as much code as possible with the EXPLAIN command, part of the
> EXPLAIN code which deals with reporting quals for instance, has been
> moved to a new report.c file in the src/backend/commands folder. This
> code in report.c is shared between explain.c source code and PROGRESS
> command source code which is in progress.c file.
>
> The progression reported by PROGRESS command is given in terms of rows,
> blocks, bytes and percents. The values displayed depend on the node type
> in the execution plan.
>
> The current patch implements all the possible nodes which could take a
> lot of time:
> - Sequential scan nodes with rows and block progress (node type
> T_SeqScan, T_SampleScan, T_BitmapHeaepScan, T_SubqueryScan,
> T_FunctionScan, T_ValuesScan, T_CteScan, T_WorkTableScan)
> - Tuple id scan node with rows and blocks progress (T_TidScan)
> - Limit node with rows progress (T_Limit)
> - Foreign and custom scan with rows and blocks progress (T_ForeignScan,
> T_CustomScan)
> - Index scan, index only scan and bitmap index scan with rows and blocks
> progress
>
>
> Use cases
> ========
>
> Some further examples of use are shown below in the test_v1.txt file.
>
>
> What do you make of this idea/patch?
>
> Does it make sense?
>
> Any suggestion is welcome.
>
> The current patch is still work in progress. It is meanwhile stable. It
> can be used with regular queries. Utilities commands are not supported
> for the moment.
> Documentation is not yet written.
>
> Regards
> Remi
>

I had implemented analogous feature as extension *pg_query_state* [1] 
the idea of which I proposed in the thread [2]. Together with this 
extension I provided some patches to postgres core to enable to send 
custom signals to working backend (similar to your PROCSIG_PROGRESS) and 
to print the current query state through patches in 'ExplainNode' function.

I had implemented the same mechanics as you:
1) interrupt the working backend through ProcSignal;
2) handle progress request in the CHECK_FOR_INTERRUPTS entry;
3) transfer query state through shared memory to caller.
But criticism of my approach was that the structure 'QueryDesc' on basis 
of which query state is formed can be inconsistent in the places where 
CHECK_FOR_INTERRUPTS appears [3].

I plan to propose the custom_signal patch to community as soon as 
possible and as consequence release *pg_query_state* from dependency on 
patches to postgres core. In perspective, I want to resolve the problem 
related to binding to the CHECK_FOR_INTERRUPTS entries perhaps through 
patching the executor and implement the robust PROGRESS command.


1. https://github.com/postgrespro/pg_query_state
2. 
https://www.postgresql.org/message-id/dbfb1a42-ee58-88fd-8d77-550498f52bc5%40postgrespro.ru
3. https://www.postgresql.org/message-id/24182.1472745492%40sss.pgh.pa.us


-- 
Maksim Milyutin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Re: [HACKERS] [PATCH] New command to monitor progression of longrunning queries

From

Remi Colinet

Date:

18 April 2017, 17:39:15

Hello Maksim,

The core implementation I suggested for the new PROGRESS command uses different functions from the one used by EXPLAIN for its core implementation.
Some source code is shared with EXPLAIN command. But this shared code is only related to quals, properties, children, subPlans and few other nodes.

All other code for PROGRESS is new code.

I don't believe explain.c code can be fully shared with the one of the new PROGRESS command. These 2 commands have different purposes.
The core code used for the new PROGRESS command is very different from the core code used for EXPLAIN.

I only extracted some common code from explain.c and put it in report.c which is used by progress.c.This code is valid for Plan and PlanState.

The code shared is:

ReportPreScanNode() renamed from ExplainPreScanNode()
ReportBeginOutput() renamed from ExplainBeginOutput()
ReportEndOutput() renamed from ExplainEndOutput()
ReportOpenGroup() ...
ReportProperties() ...
ReportPropertyText() ...
ReportHasChildren() ...
ReportSubPlans() ...
ReportMemberNodes() ...
ReportCustomChildren() ...
ReportCloseGroup() ...

ExplainState has been renamed ReportState.

Regarding the queryDesc state of SQL query upon receiving a request to report its execution progress, it does not bring any issue. The request is noted when the signal is received by the monitored backend. Then, the backend continues its execution code path. When interrupts are checked in the executor code, the request will be dealt.

When the request is being dealt, the monitored backend will stop its execution and report the progress of the SQL query. Whatever is the status of the SQL query, progress.c code checks the status and report either that the SQL query does not have a valid status, or otherwise the current execution state of the SQL query.

SQL query status checking is about:

- idle transaction

- out of transaction status

- null planned statement

- utility command

- self monitoring

Other tests can be added if needed to exclude some SQL query state. Such checking is done in void HandleProgressRequest(void).

I do not see why a SQL query progression would not be possible in this context. Even when the queryDescc is NULL, we can just report a <idle transaction> output. This is currently the case with the patch suggested.

So far, I've found this new command very handy. It allows to evaluate the time needed to complete a SQL query.

A further improvement would be to report the memory, disk and time resources used by the monitored backend. An overuse of memory, disk and time resources can prevent the SQL query to complete.

Best regards

Remi

2017-04-18 15:00 GMT+02:00 Maksim Milyutin <m.milyutin@postgrespro.ru>:

Hi!

On 17.04.2017 15:09, Remi Colinet wrote:
Hello,

I've implemented a new command named PROGRESS to monitor progression of
long running SQL queries in a backend process.

Use case
=======

A use case is shown in the below example based on a table named t_10m
with 10 millions rows.

The table has been created with :

CREATE TABLE T_10M ( id integer, md5 text);
INSERT INTO T_10M SELECT generate_series(1,10000000) AS id,
md5(random()::text) AS md5;

1/ Start a first psql session to run long SQL queries:

[pgadm@rco ~]$ psql -A -d test
psql (10devel)
Type "help" for help.
test=#

The option -A is used to allow rows to be output without formatting work.

Redirect output to a file in order to let the query run without terminal
interaction:
test=# \o out

Start a long running query:
test=# select * from t_10M order by md5;

2/ In a second psql session, list the backend pid and their SQL query

[pgadm@rco ~]$ psql -d test
psql (10devel)
Type "help" for help.

test=# select pid, query from pg_stat_activity ;
pid | query
-------+-------------------------------------------
19081 |
19084 |
19339 | select pid, query from pg_stat_activity ;
19341 | select * from t_10m order by md5;
19727 | select * from t_10m order by md5;
19726 | select * from t_10m order by md5;
19079 |
19078 |
19080 |
(9 rows)

test=#

Chose the pid of the backend running the long SQL query to be monitored.
Above example is a parallel SQL query. Lowest pid is the main backend of
the query.

test=# PROGRESS 19341;
PLAN
PROGRESS
-------------------------------------------------------------------------------------------
Gather Merge
-> Sort=> dumping tuples to tapes
rows r/w merge 0/0 rows r/w effective 0/2722972 0%
Sort Key: md5
-> Parallel Seq Scan on t_10m => rows 2751606/3954135 69% blks
125938/161222 78%
(5 rows)

test=#

The query of the monitored backend is:
test=# select * from t_10M order by md5;

Because the table has 10 millions of rows, the sort is done on tapes.

Design of the command
=================

The design of the patch/command is:
- the user issue the "PROGRESS pid" command from a psql session. The pid
is the one of the backend which runs the SQL query for which we want to
get a progression report. It can be determined from the view
pg_stat_activity.
- the monitoring backend, upon receiving the "PROGRESS pid" command from
psql utility used in step above, sends a signal to the backend whose
process pid is the one provided in the PROGRESS command.
- the monitored backend receives the signal and notes the request as for
any interrupt. Then, it continues its execution of its SQL query until
interrupts can be serviced.
- when the monitored process can service the interrupts, it deals with
the progress request by collecting its execution tree with the execution
progress of each long running node. At this time, the SQL query is no
more running. The progression of each node is calculated during the
execution of the SQL query which is at this moment stopped. The
execution tree is dumped in shared memory pages allocated at the start
of the server. Then, the monitored backend set a latch on which the
monitoring process is waiting for. It then continues executing its SQL
query.
- the monitoring backend collects the share memory data dumped by the
monitored backed, and sends it to its psql session, as a list of rows.

The command PROGRESS does not incur any slowness on the running query
because the execution progress is only computed upon receiving the
progress request which is supposed to be seldom used.

The code heavily reuses the one of the explain command. In order to
share as much code as possible with the EXPLAIN command, part of the
EXPLAIN code which deals with reporting quals for instance, has been
moved to a new report.c file in the src/backend/commands folder. This
code in report.c is shared between explain.c source code and PROGRESS
command source code which is in progress.c file.

The progression reported by PROGRESS command is given in terms of rows,
blocks, bytes and percents. The values displayed depend on the node type
in the execution plan.

The current patch implements all the possible nodes which could take a
lot of time:
- Sequential scan nodes with rows and block progress (node type
T_SeqScan, T_SampleScan, T_BitmapHeaepScan, T_SubqueryScan,
T_FunctionScan, T_ValuesScan, T_CteScan, T_WorkTableScan)
- Tuple id scan node with rows and blocks progress (T_TidScan)
- Limit node with rows progress (T_Limit)
- Foreign and custom scan with rows and blocks progress (T_ForeignScan,
T_CustomScan)
- Index scan, index only scan and bitmap index scan with rows and blocks
progress

Use cases
========

Some further examples of use are shown below in the test_v1.txt file.

What do you make of this idea/patch?

Does it make sense?

Any suggestion is welcome.

The current patch is still work in progress. It is meanwhile stable. It
can be used with regular queries. Utilities commands are not supported
for the moment.
Documentation is not yet written.

Regards
Remi

I had implemented analogous feature as extension *pg_query_state* [1] the idea of which I proposed in the thread [2]. Together with this extension I provided some patches to postgres core to enable to send custom signals to working backend (similar to your PROCSIG_PROGRESS) and to print the current query state through patches in 'ExplainNode' function.

I had implemented the same mechanics as you:
1) interrupt the working backend through ProcSignal;
2) handle progress request in the CHECK_FOR_INTERRUPTS entry;
3) transfer query state through shared memory to caller.
But criticism of my approach was that the structure 'QueryDesc' on basis of which query state is formed can be inconsistent in the places where CHECK_FOR_INTERRUPTS appears [3].

I plan to propose the custom_signal patch to community as soon as possible and as consequence release *pg_query_state* from dependency on patches to postgres core. In perspective, I want to resolve the problem related to binding to the CHECK_FOR_INTERRUPTS entries perhaps through patching the executor and implement the robust PROGRESS command.

1. https://github.com/postgrespro/pg_query_state
2. https://www.postgresql.org/message-id/dbfb1a42-ee58-88fd-8d77-550498f52bc5%40postgrespro.ru
3. https://www.postgresql.org/message-id/24182.1472745492%40sss.pgh.pa.us

--
Maksim Milyutin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Re: [HACKERS] [PATCH] New command to monitor progression of longrunning queries

From

Maksim Milyutin

Date:

18 April 2017, 21:31:03

On 18.04.2017 17:39, Remi Colinet wrote:
> Hello Maksim,
>
> The core implementation I suggested for the new PROGRESS command uses
> different functions from the one used by EXPLAIN for its core
> implementation.
> Some source code is shared with EXPLAIN command. But this shared code is
> only related to quals, properties, children, subPlans and few other nodes.
>
> All other code for PROGRESS is new code.
>
> I don't believe explain.c code can be fully shared with the one of the
> new PROGRESS command. These 2 commands have different purposes.
> The core code used for the new PROGRESS command is very different from
> the core code used for EXPLAIN.
>

Perhaps you will be forced to duplicate significant snippets of 
functionality from explain.c into your progress.c.

>
> Regarding the queryDesc state of SQL query upon receiving a request to
> report its execution progress, it does not bring any issue. The request
> is noted when the signal is received by the monitored backend. Then, the
> backend continues its execution code path. When interrupts are checked
> in the executor code, the request will be dealt.
>

Yes, interrupts are checked in the CHECK_FOR_INTERRUPTS entries.

> When the request is being dealt, the monitored backend will stop its
> execution and report the progress of the SQL query. Whatever is the
> status of the SQL query, progress.c code checks the status and report
> either that the SQL query does not have a valid status, or otherwise the
> current execution state of the SQL query.
>
> SQL query status checking is about:
> - idle transaction
> - out of transaction status
> - null planned statement
> - utility command
> - self monitoring
>
> Other tests can be added if needed to exclude some SQL query state. Such
> checking is done in void HandleProgressRequest(void).
> I do not see why a SQL query progression would not be possible in this
> context. Even when the queryDescc is NULL, we can just report a <idle
> transaction> output. This is currently the case with the patch suggested.
>

It's interesting question - how much the active query is in a usable 
state on the stage of execution. Tom Lane noticed that 
CHECK_FOR_INTERRUPTS doesn't give us 100% guarantee about full 
consistency [1].

> So far, I've found this new command very handy. It allows to evaluate
> the time needed to complete a SQL query.
>

Could you explain how you get the percent of execution for nodes of plan 
tree and overall for query?

> A further improvement would be to report the memory, disk and time
> resources used by the monitored backend. An overuse of memory, disk and
> time resources can prevent the SQL query to complete.
>

This functionality is somehow implemented in explain.c. You can see my 
patch to this file [2]. I only manipulate runtime statistics (data in 
the structure 'Instrumentation') to achieve the printing state of 
running query.


1. https://www.postgresql.org/message-id/24182.1472745492%40sss.pgh.pa.us
2. 
https://github.com/postgrespro/pg_query_state/blob/master/runtime_explain.patch

-- 
Maksim Milyutin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Re: [HACKERS] [PATCH] New command to monitor progression of longrunning queries

From

Remi Colinet

Date:

19 April 2017, 17:13:56

Maksim,

2017-04-18 20:31 GMT+02:00 Maksim Milyutin <m.milyutin@postgrespro.ru>:

On 18.04.2017 17:39, Remi Colinet wrote:
Hello Maksim,

The core implementation I suggested for the new PROGRESS command uses
different functions from the one used by EXPLAIN for its core
implementation.
Some source code is shared with EXPLAIN command. But this shared code is
only related to quals, properties, children, subPlans and few other nodes.

All other code for PROGRESS is new code.

I don't believe explain.c code can be fully shared with the one of the
new PROGRESS command. These 2 commands have different purposes.
The core code used for the new PROGRESS command is very different from
the core code used for EXPLAIN.

Perhaps you will be forced to duplicate significant snippets of functionality from explain.c into your progress.c.

Currently, few code is duplicated between EXPLAIN and PROGRESS commands. The duplicated code could be moved to file src/backend/commands/report.c which is used to gather shared code between the 2 commands. I will try to complete this code sharing as much as possible.

The main point is that PROGRESS uses the same design pattern as EXPLAIN by parsing the query tree. The work horse of the PROGRESS command is ProgressNode() which calls recursively sub nodes until we reach leaf nodes such as SeqScan, IndexScan, TupleStore, Sort, Material, ... . EXPLAIN command uses a similar work horse with function ExplainNode() which eventually calls different leaf nodes.

Some of the leaf nodes which are common to the 2 commands have been put in the file src/backend/commands/report.c. May be some further code sharing is also possible for the work horse by using a template function which would call EXPLAIN specific leaf node functions or PROGRESS specific leaf node functions.

Regarding the queryDesc state of SQL query upon receiving a request to
report its execution progress, it does not bring any issue. The request
is noted when the signal is received by the monitored backend. Then, the
backend continues its execution code path. When interrupts are checked
in the executor code, the request will be dealt.

Yes, interrupts are checked in the CHECK_FOR_INTERRUPTS entries.

When the request is being dealt, the monitored backend will stop its
execution and report the progress of the SQL query. Whatever is the
status of the SQL query, progress.c code checks the status and report
either that the SQL query does not have a valid status, or otherwise the
current execution state of the SQL query.

SQL query status checking is about:
- idle transaction
- out of transaction status
- null planned statement
- utility command
- self monitoring

Other tests can be added if needed to exclude some SQL query state. Such
checking is done in void HandleProgressRequest(void).
I do not see why a SQL query progression would not be possible in this
context. Even when the queryDescc is NULL, we can just report a <idle
transaction> output. This is currently the case with the patch suggested.

It's interesting question - how much the active query is in a usable state on the stage of execution. Tom Lane noticed that CHECK_FOR_INTERRUPTS doesn't give us 100% guarantee about full consistency [1].

I wonder what you mean about usable state.

Currently, the code suggested tests the queryDesc pointer and all the sub nodes pointers in order to detect NULL pointers. When the progress report is collected by the backend, this backend does the collect and consequently does not run the query. So the query tree is not being modified. At this moment, whatever is the query state, we can manage to deal with its static state. It is only a tree which could also be just a NULL pointer in the most extreme case. Such case is dealt in the current code.

So far, I've found this new command very handy. It allows to evaluate
the time needed to complete a SQL query.

Could you explain how you get the percent of execution for nodes of plan tree and overall for query?

The progress of execution of the query is computed as follows at 2 different places for each leaf node type (Scan, IndexScan, Sort, Material, TupleStore, ...):

- one place in the executor code, or in access methods code, or in sort utilities code, used during the execution of the SQL query in which following values are counted for instance: rows R/W, blocks, R/W, tapes R/W used for sort, tuple store R/W, ... . Some of these values are already computed in the current Postgresql official source code. Some other values had to be added and collected.

- one place in the leaf function of each node type (ProgressScan(), ProgressSort(), ...) in which percents are computed and are then dumped together with raw values collected during execution, in the report. The dump details can be selected with the VERBOSE option of the PROGRESS command (For instance # PROGRESS VERBOSE $pid)

For instance:

1/ read/write rows are collected when running the executor in the file src/backend/executor/execProcnode.c
==============================================================================

This is already implemented in the current official source tree. Nothing mode needs to be implemented to collect values (total rows number and current rows number already fetched are collected).
The report is implemented in leaf function ProgressScanRows().

2/ read/write blocks are collected in the file src/backend/access/heap/heapam.c
==========================================================

This is already implemented in the current official source tree. Nothing more needs to be implemented to collect values during execution.
The report is implemented in a leaf function ProgressScanBlks().

3/ the sort progressions are collected in the file src/backend/utils/sort/tuplesort.c
==========================================================

[root@rco pg]# git diff --stat master.. src/backend/utils/sort/tuplesort.c
src/backend/utils/sort/tuplesort.c | 142 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------------
1 file changed, 127 insertions(+), 15 deletions(-)
[root@rco pg]#

New fields have been added to compute the different I/O (read/write/merge/...) per tapes for instance, during a sort on tape.

The report of Sort node is implemented in leaf function ProgressSort()

4/ the tuple store progressions are computed in the file src/backend/utils/sort/tuplestore.c
=================================================================

[root@rco pg]# git diff --stat master.. src/backend/utils/sort/tuplestore.c
src/backend/utils/sort/tuplestore.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----------------
1 file changed, 56 insertions(+), 17 deletions(-)
[root@rco pg]#

New fields have been added to collect the I/O needed for such tuple store.

The report of TupleStore node is implemented in leaf function ProgressTupleStore().

Other node types have been implemented: TidScan, IndexScan, LimitScan, CustomScan, Hash, ModifyTable.

Such node may require some new fields to collect values during the SQL query execution.

Overall, the overhead caused by new values collected during the SQL query execution, is very low.

A few values need to be collected.

A further improvement would be to report the memory, disk and time
resources used by the monitored backend. An overuse of memory, disk and
time resources can prevent the SQL query to complete.

This functionality is somehow implemented in explain.c. You can see my patch to this file [2]. I only manipulate runtime statistics (data in the structure 'Instrumentation') to achieve the printing state of running query.

I will check your patch and try to add such feature to the current patch.
It provides a valuable hint to estimate whether a SQL query has a chance to complete and will not reached the resource limits.
.

1. https://www.postgresql.org/message-id/24182.1472745492%40sss.pgh.pa.us
2. https://github.com/postgrespro/pg_query_state/blob/master/runtime_explain.patch

Thanks for your suggestions and comments

Regards

Remi

--
Maksim Milyutin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Re: [HACKERS] [PATCH] New command to monitor progression of longrunning queries

From

Remi Colinet

Date:

19 April 2017, 17:46:16

Following on previous email....

I have added below some use cases which I find very relevant when we need to know the progress of a SQL query.

The command can be used by any SQL query (select, update, delete, insert).

The tables used have been created with :

CREATE TABLE T_1M (id integer, md5 text);
INSERT INTO T_1M SELECT generate_series(1,1000000) AS id, md5(random()::text) AS md5;

CREATE TABLE T_10M ( id integer, md5 text);
INSERT INTO T_10M SELECT generate_series(1,10000000) AS id, md5(random()::text) AS md5;

All the different leaf node types are implemented.

1/ Parallel select with sort (no index)
===========================

=> Terminal running the long SQL query:
test=# select * from t_10M order by md5;

=> Terminal monitoring SQL query progression:
test=# select pid,query from pg_stat_activity ;
pid |                  query
-------+------------------------------------------
8062 |
8065 |
19605 | select pid,query from pg_stat_activity ;
20830 | select * from t_10M order by md5;
20832 | select * from t_10M order by md5;
20833 | select * from t_10M order by md5;
8060 |
8059 |
8061 |
(9 rows)

test=# PROGRESS 20830
test-# ;
                             PLAN PROGRESS
------------------------------------------------------------------------
Gather Merge
   -> Sort=> dumping tuples to tapes / merging tapes
         rows r/w merge 2167923/2167908 rows r/w effective 0/3514320 0%
         Sort Key: md5
         -> Parallel Seq Scan on t_10m => rows 3514321/4166700 84%
(5 rows)

test=#
test=#
test=# PROGRESS 20830;
                               PLAN PROGRESS
----------------------------------------------------------------------------
Gather Merge
   -> Sort=> final merge sort on tapes
         rows r/w merge 4707198/4691167 rows r/w effective 16016/3514320 0%
         Sort Key: md5
         -> Parallel Seq Scan on t_10m => rows 3514321/4166700 84%
(5 rows)

test=# PROGRESS 20830;
                                PLAN PROGRESS
-----------------------------------------------------------------------------
Gather Merge
   -> Sort=> final merge sort on tapes
         rows r/w merge 4809857/4691167 rows r/w effective 118675/3514320 3%
         Sort Key: md5
         -> Parallel Seq Scan on t_10m => rows 3514321/4166700 84%
(5 rows)

test=# PROGRESS 20830;
                                PLAN PROGRESS
-----------------------------------------------------------------------------
Gather Merge
   -> Sort=> final merge sort on tapes
         rows r/w merge 4883715/4691167 rows r/w effective 192533/3514320 5%
         Sort Key: md5
         -> Parallel Seq Scan on t_10m => rows 3514321/4166700 84%
(5 rows)

test=# PROGRESS 20830;
                                PLAN PROGRESS
-----------------------------------------------------------------------------
Gather Merge
   -> Sort=> final merge sort on tapes
         rows r/w merge 4948381/4691167 rows r/w effective 257199/3514320 7%
         Sort Key: md5
         -> Parallel Seq Scan on t_10m => rows 3514321/4166700 84%
(5 rows)

test=# PROGRESS 20830;
                                PLAN PROGRESS
-----------------------------------------------------------------------------
Gather Merge
   -> Sort=> final merge sort on tapes
         rows r/w merge 5022137/4691167 rows r/w effective 330955/3514320 9%
         Sort Key: md5
         -> Parallel Seq Scan on t_10m => rows 3514321/4166700 84%
(5 rows)

test=# PROGRESS 20830;
                                PLAN PROGRESS
------------------------------------------------------------------------------
Gather Merge
   -> Sort=> final merge sort on tapes
         rows r/w merge 5079083/4691167 rows r/w effective 387901/3514320 11%
         Sort Key: md5
         -> Parallel Seq Scan on t_10m => rows 3514321/4166700 84%
(5 rows)

test=# PROGRESS 20830;
                                PLAN PROGRESS
------------------------------------------------------------------------------
Gather Merge
   -> Sort=> final merge sort on tapes
         rows r/w merge 5144499/4691167 rows r/w effective 453317/3514320 12%
         Sort Key: md5
         -> Parallel Seq Scan on t_10m => rows 3514321/4166700 84%
(5 rows)

test=# PROGRESS 20830;
    PLAN PROGRESS
----------------------
<out of transaction>
(1 row)

test=#

SQL query was interrupted before completion

2/ Insert into table
=============

=> Terminal running the long SQL query:
test=# INSERT INTO T_10M SELECT generate_series(10000001, 12000000) AS id, md5(random()::text) AS md5;

=> Terminal monitoring SQL query progression:
test=# PROGRESS 20830;
    PLAN PROGRESS
----------------------
<out of transaction>
(1 row)

test=#
test=# PROGRESS 20830;
     PLAN PROGRESS
-----------------------
Insert => rows 718161
   -> ProjectSet
         -> Result
(3 rows)

test=# PROGRESS 20830;
     PLAN PROGRESS
------------------------
Insert => rows 1370255
   -> ProjectSet
         -> Result
(3 rows)

test=# PROGRESS 20830;
     PLAN PROGRESS
------------------------
Insert => rows 1916731
   -> ProjectSet
         -> Result
(3 rows)

test=# PROGRESS 20830;
PLAN PROGRESS
----------------
<idle backend>
(1 row)

test=#

3/ Delete with like clause

===================

=> Terminal running the long SQL query:
test=# DELETE FROM T_10M WHERE md5 like '%cb%';

=> Terminal monitoring SQL query progression:
test=# PROGRESS 20830;
PLAN PROGRESS
----------------
<idle backend>
(1 row)

test=# PROGRESS 20830;
                            PLAN PROGRESS
----------------------------------------------------------------------
Delete => rows 91906
   -> Seq Scan on t_10m => rows 91906/848485 10% blks 6668/100000 6%
         Filter: (md5 ~~ '%cb%'::text)
(3 rows)

test=# PROGRESS 20830;
                              PLAN PROGRESS
-------------------------------------------------------------------------
Delete => rows 151900
   -> Seq Scan on t_10m => rows 151900/848485 17% blks 11019/100000 11%
         Filter: (md5 ~~ '%cb%'::text)
(3 rows)

test=# PROGRESS 20830;
                              PLAN PROGRESS
-------------------------------------------------------------------------
Delete => rows 309533
   -> Seq Scan on t_10m => rows 309533/848485 36% blks 22471/100000 22%
         Filter: (md5 ~~ '%cb%'::text)
(3 rows)

test=# PROGRESS 20830;
                              PLAN PROGRESS
-------------------------------------------------------------------------
Delete => rows 705968
   -> Seq Scan on t_10m => rows 705968/848485 83% blks 51274/100000 51%
         Filter: (md5 ~~ '%cb%'::text)
(3 rows)

test=# PROGRESS 20830;
                              PLAN PROGRESS
--------------------------------------------------------------------------
Delete => rows 913843
   -> Seq Scan on t_10m => rows 913843/848485 107% blks 66417/100000 66%
         Filter: (md5 ~~ '%cb%'::text)
(3 rows)

test=# PROGRESS 20830;
                               PLAN PROGRESS
---------------------------------------------------------------------------
Delete => rows 1113104
   -> Seq Scan on t_10m => rows 1113104/848485 131% blks 80881/100000 80%
         Filter: (md5 ~~ '%cb%'::text)
(3 rows)

test=# PROGRESS 20830;
PLAN PROGRESS
----------------
<idle backend>
(1 row)

test=#

Above monitoring report shows:
- The Seq Scan node with the number of rows scanned, the number of blocks scanned/read.

- The Delete node with the number of rows deleted.

4/ Select with offset and limit clause

===========================

=> Terminal running the long SQL query:
test=#select * from t_10M order by md5 offset 80 limit 10;

=> Terminal monitoring SQL query progression:
test=# \watch PROGRESS 20830;
Wed 19 Apr 2017 04:36:16 PM CEST (every 1s)

PLAN PROGRESS
----------------
<idle backend>
(1 row)

Wed 19 Apr 2017 04:36:17 PM CEST (every 1s)

PLAN PROGRESS
----------------
<idle backend>
(1 row)

                 Wed 19 Apr 2017 04:36:18 PM CEST (every 1s)

                                PLAN PROGRESS
------------------------------------------------------------------------------
Limit   => offset 0% limit 0%
   -> Sort=> loading tuples in memory 90
         Sort Key: md5
         -> Seq Scan on t_10m => rows 174392/11586584 1% blks 1640/100000 1%
(4 rows)

...

                                  PLAN PROGRESS
----------------------------------------------------------------------------------
Limit   => offset 0% limit 0%
   -> Sort=> loading tuples in memory 90
         Sort Key: md5
         -> Seq Scan on t_10m => rows 1656828/11586584 14% blks 15600/100000 15%
(4 rows)

...

                                  PLAN PROGRESS
----------------------------------------------------------------------------------
Limit   => offset 0% limit 0%
   -> Sort=> loading tuples in memory 90
         Sort Key: md5
         -> Seq Scan on t_10m => rows 4954207/11586584 42% blks 46640/100000 46%
(4 rows)

                   Wed 19 Apr 2017 04:36:35 PM CEST (every 1s)

...

                                  PLAN PROGRESS
----------------------------------------------------------------------------------
Limit   => offset 0% limit 0%
   -> Sort=> loading tuples in memory 90
         Sort Key: md5
         -> Seq Scan on t_10m => rows 7837687/11586584 67% blks 73772/100000 73%
(4 rows)

                   Wed 19 Apr 2017 04:36:41 PM CEST (every 1s)

...

                                   PLAN PROGRESS
-----------------------------------------------------------------------------------
Limit   => offset 0% limit 0%
   -> Sort=> loading tuples in memory 90
         Sort Key: md5
         -> Seq Scan on t_10m => rows 10378786/11586584 89% blks 97690/100000 97%
(4 rows)

Wed 19 Apr 2017 04:36:49 PM CEST (every 1s)

PLAN PROGRESS
----------------
<idle backend>
(1 row)

5/ Sample scan

=============

=> Terminal running the long SQL query:
# select * from t_10m tablesample system(50);

=> Terminal monitoring SQL query progression:
PLAN PROGRESS
----------------
<idle backend>
(1 row)

              Wed 19 Apr 2017 04:44:12 PM CEST (every 1s)

                             PLAN PROGRESS
-----------------------------------------------------------------------
Sample Scan on t_10m => rows 783274/5793292 13% blks 14616/100000 14%
   Sampling: system ('50'::real)
(2 rows)

              Wed 19 Apr 2017 04:44:13 PM CEST (every 1s)

                             PLAN PROGRESS
------------------------------------------------------------------------
Sample Scan on t_10m => rows 2514675/5793292 43% blks 47076/100000 47%
   Sampling: system ('50'::real)
(2 rows)

              Wed 19 Apr 2017 04:44:14 PM CEST (every 1s)

                             PLAN PROGRESS
------------------------------------------------------------------------
Sample Scan on t_10m => rows 4031400/5793292 69% blks 75625/100000 75%
   Sampling: system ('50'::real)
(2 rows)

Wed 19 Apr 2017 04:44:15 PM CEST (every 1s)

PLAN PROGRESS
----------------
<idle backend>
(1 row)

2017-04-19 16:13 GMT+02:00 Remi Colinet <remi.colinet@gmail.com>:

Maksim,

2017-04-18 20:31 GMT+02:00 Maksim Milyutin <m.milyutin@postgrespro.ru>:
On 18.04.2017 17:39, Remi Colinet wrote:
Hello Maksim,

The core implementation I suggested for the new PROGRESS command uses
different functions from the one used by EXPLAIN for its core
implementation.
Some source code is shared with EXPLAIN command. But this shared code is
only related to quals, properties, children, subPlans and few other nodes.

All other code for PROGRESS is new code.

I don't believe explain.c code can be fully shared with the one of the
new PROGRESS command. These 2 commands have different purposes.
The core code used for the new PROGRESS command is very different from
the core code used for EXPLAIN.

Perhaps you will be forced to duplicate significant snippets of functionality from explain.c into your progress.c.

Currently, few code is duplicated between EXPLAIN and PROGRESS commands. The duplicated code could be moved to file src/backend/commands/report.c which is used to gather shared code between the 2 commands. I will try to complete this code sharing as much as possible.

The main point is that PROGRESS uses the same design pattern as EXPLAIN by parsing the query tree. The work horse of the PROGRESS command is ProgressNode() which calls recursively sub nodes until we reach leaf nodes such as SeqScan, IndexScan, TupleStore, Sort, Material, ... . EXPLAIN command uses a similar work horse with function ExplainNode() which eventually calls different leaf nodes.

Some of the leaf nodes which are common to the 2 commands have been put in the file src/backend/commands/report.c. May be some further code sharing is also possible for the work horse by using a template function which would call EXPLAIN specific leaf node functions or PROGRESS specific leaf node functions.

Regarding the queryDesc state of SQL query upon receiving a request to
report its execution progress, it does not bring any issue. The request
is noted when the signal is received by the monitored backend. Then, the
backend continues its execution code path. When interrupts are checked
in the executor code, the request will be dealt.

Yes, interrupts are checked in the CHECK_FOR_INTERRUPTS entries.

When the request is being dealt, the monitored backend will stop its
execution and report the progress of the SQL query. Whatever is the
status of the SQL query, progress.c code checks the status and report
either that the SQL query does not have a valid status, or otherwise the
current execution state of the SQL query.

SQL query status checking is about:
- idle transaction
- out of transaction status
- null planned statement
- utility command
- self monitoring

Other tests can be added if needed to exclude some SQL query state. Such
checking is done in void HandleProgressRequest(void).
I do not see why a SQL query progression would not be possible in this
context. Even when the queryDescc is NULL, we can just report a <idle
transaction> output. This is currently the case with the patch suggested.

It's interesting question - how much the active query is in a usable state on the stage of execution. Tom Lane noticed that CHECK_FOR_INTERRUPTS doesn't give us 100% guarantee about full consistency [1].

I wonder what you mean about usable state.

Currently, the code suggested tests the queryDesc pointer and all the sub nodes pointers in order to detect NULL pointers. When the progress report is collected by the backend, this backend does the collect and consequently does not run the query. So the query tree is not being modified. At this moment, whatever is the query state, we can manage to deal with its static state. It is only a tree which could also be just a NULL pointer in the most extreme case. Such case is dealt in the current code.

So far, I've found this new command very handy. It allows to evaluate
the time needed to complete a SQL query.

Could you explain how you get the percent of execution for nodes of plan tree and overall for query?

The progress of execution of the query is computed as follows at 2 different places for each leaf node type (Scan, IndexScan, Sort, Material, TupleStore, ...):

- one place in the executor code, or in access methods code, or in sort utilities code, used during the execution of the SQL query in which following values are counted for instance: rows R/W, blocks, R/W, tapes R/W used for sort, tuple store R/W, ... . Some of these values are already computed in the current Postgresql official source code. Some other values had to be added and collected.

- one place in the leaf function of each node type (ProgressScan(), ProgressSort(), ...) in which percents are computed and are then dumped together with raw values collected during execution, in the report. The dump details can be selected with the VERBOSE option of the PROGRESS command (For instance # PROGRESS VERBOSE $pid)

For instance:

1/ read/write rows are collected when running the executor in the file src/backend/executor/execProcnode.c
==============================================================================

This is already implemented in the current official source tree. Nothing mode needs to be implemented to collect values (total rows number and current rows number already fetched are collected).
The report is implemented in leaf function ProgressScanRows().

2/ read/write blocks are collected in the file src/backend/access/heap/heapam.c
==========================================================

This is already implemented in the current official source tree. Nothing more needs to be implemented to collect values during execution.
The report is implemented in a leaf function ProgressScanBlks().

3/ the sort progressions are collected in the file src/backend/utils/sort/tuplesort.c
==========================================================

[root@rco pg]# git diff --stat master.. src/backend/utils/sort/tuplesort.c
src/backend/utils/sort/tuplesort.c | 142 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------------
1 file changed, 127 insertions(+), 15 deletions(-)
[root@rco pg]#

New fields have been added to compute the different I/O (read/write/merge/...) per tapes for instance, during a sort on tape.
The report of Sort node is implemented in leaf function ProgressSort()

4/ the tuple store progressions are computed in the file src/backend/utils/sort/tuplestore.c
=================================================================

[root@rco pg]# git diff --stat master.. src/backend/utils/sort/tuplestore.c
src/backend/utils/sort/tuplestore.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----------------
1 file changed, 56 insertions(+), 17 deletions(-)
[root@rco pg]#

New fields have been added to collect the I/O needed for such tuple store.
The report of TupleStore node is implemented in leaf function ProgressTupleStore().

Other node types have been implemented: TidScan, IndexScan, LimitScan, CustomScan, Hash, ModifyTable.
Such node may require some new fields to collect values during the SQL query execution.

Overall, the overhead caused by new values collected during the SQL query execution, is very low.
A few values need to be collected.

A further improvement would be to report the memory, disk and time
resources used by the monitored backend. An overuse of memory, disk and
time resources can prevent the SQL query to complete.

This functionality is somehow implemented in explain.c. You can see my patch to this file [2]. I only manipulate runtime statistics (data in the structure 'Instrumentation') to achieve the printing state of running query.

I will check your patch and try to add such feature to the current patch.
It provides a valuable hint to estimate whether a SQL query has a chance to complete and will not reached the resource limits.
.

1. https://www.postgresql.org/message-id/24182.1472745492%40sss.pgh.pa.us
2. https://github.com/postgrespro/pg_query_state/blob/master/runtime_explain.patch

Thanks for your suggestions and comments

Regards
Remi

--
Maksim Milyutin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Re: [HACKERS] [PATCH] New command to monitor progression of longrunning queries

From

Maksim Milyutin

Date:

19 April 2017, 19:41:12

On 19.04.2017 17:13, Remi Colinet wrote:
> Maksim,
>
>
> 2017-04-18 20:31 GMT+02:00 Maksim Milyutin <m.milyutin@postgrespro.ru
> <mailto:m.milyutin@postgrespro.ru>>:
>
>     On 18.04.2017 17:39, Remi Colinet wrote:
>
>
>         Regarding the queryDesc state of SQL query upon receiving a
>         request to
>         report its execution progress, it does not bring any issue. The
>         request
>         is noted when the signal is received by the monitored backend.
>         Then, the
>         backend continues its execution code path. When interrupts are
>         checked
>         in the executor code, the request will be dealt.
>
>
>     Yes, interrupts are checked in the CHECK_FOR_INTERRUPTS entries.
>
>         When the request is being dealt, the monitored backend will stop its
>         execution and report the progress of the SQL query. Whatever is the
>         status of the SQL query, progress.c code checks the status and
>         report
>         either that the SQL query does not have a valid status, or
>         otherwise the
>         current execution state of the SQL query.
>
>         SQL query status checking is about:
>         - idle transaction
>         - out of transaction status
>         - null planned statement
>         - utility command
>         - self monitoring
>
>         Other tests can be added if needed to exclude some SQL query
>         state. Such
>         checking is done in void HandleProgressRequest(void).
>         I do not see why a SQL query progression would not be possible
>         in this
>         context. Even when the queryDescc is NULL, we can just report a
>         <idle
>         transaction> output. This is currently the case with the patch
>         suggested.
>
>
>     It's interesting question - how much the active query is in a usable
>     state on the stage of execution. Tom Lane noticed that
>     CHECK_FOR_INTERRUPTS doesn't give us 100% guarantee about full
>     consistency [1].
>
>
> I wonder what you mean about usable state.
>

A usable query state is suitable for analysis, IOW we have consistent 
QueryDesc object. This term was introduced by Tom Lane in [1]. I suppose 
he meant the case when a query fails with error and before transaction 
aborts we bump into *CHECK_FOR_INTERRUPTS* in the place where QueryDesc 
may be inconsistent and further reading from it will give us invalid result.


> Currently, the code suggested tests the queryDesc pointer and all the
> sub nodes pointers in order to detect NULL pointers. When the progress
> report is collected by the backend, this backend does the collect and
> consequently does not run the query. So the query tree is not being
> modified. At this moment, whatever is the query state, we can manage to
> deal with its static state. It is only a tree which could also be just a
> NULL pointer in the most extreme case. Such case is dealt in the current
> code.
>

Perhaps the deep checking of QueryDesc would allow us to consider it as 
consistent.


1. https://www.postgresql.org/message-id/24182.1472745492%40sss.pgh.pa.us

-- 
Maksim Milyutin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Re: [HACKERS] [PATCH] New command to monitor progression of longrunning queries

From

Remi Colinet

Date:

20 April 2017, 19:13:45

2017-04-19 18:41 GMT+02:00 Maksim Milyutin <m.milyutin@postgrespro.ru>:

On 19.04.2017 17:13, Remi Colinet wrote:
Maksim,

2017-04-18 20:31 GMT+02:00 Maksim Milyutin <m.milyutin@postgrespro.ru
<mailto:m.milyutin@postgrespro.ru>>:

On 18.04.2017 17:39, Remi Colinet wrote:

Regarding the queryDesc state of SQL query upon receiving a
request to
report its execution progress, it does not bring any issue. The
request
is noted when the signal is received by the monitored backend.
Then, the
backend continues its execution code path. When interrupts are
checked
in the executor code, the request will be dealt.

Yes, interrupts are checked in the CHECK_FOR_INTERRUPTS entries.

When the request is being dealt, the monitored backend will stop its
execution and report the progress of the SQL query. Whatever is the
status of the SQL query, progress.c code checks the status and
report
either that the SQL query does not have a valid status, or
otherwise the
current execution state of the SQL query.

SQL query status checking is about:
- idle transaction
- out of transaction status
- null planned statement
- utility command
- self monitoring

Other tests can be added if needed to exclude some SQL query
state. Such
checking is done in void HandleProgressRequest(void).
I do not see why a SQL query progression would not be possible
in this
context. Even when the queryDescc is NULL, we can just report a
<idle
transaction> output. This is currently the case with the patch
suggested.

It's interesting question - how much the active query is in a usable
state on the stage of execution. Tom Lane noticed that
CHECK_FOR_INTERRUPTS doesn't give us 100% guarantee about full
consistency [1].

I wonder what you mean about usable state.

A usable query state is suitable for analysis, IOW we have consistent QueryDesc object. This term was introduced by Tom Lane in [1]. I suppose he meant the case when a query fails with error and before transaction aborts we bump into *CHECK_FOR_INTERRUPTS* in the place where QueryDesc may be inconsistent and further reading from it will give us invalid result.

I could indeed trigger a segmentation fault because the nodes of the tree may be under freeing. Some node may be partially filled for instance. But each node can be checked against null pointer once the monitored backend is no more executing its query and is dumping its progress state. So this is not a big deal in fact.

Currently, the code suggested tests the queryDesc pointer and all the
sub nodes pointers in order to detect NULL pointers. When the progress
report is collected by the backend, this backend does the collect and
consequently does not run the query. So the query tree is not being
modified. At this moment, whatever is the query state, we can manage to
deal with its static state. It is only a tree which could also be just a
NULL pointer in the most extreme case. Such case is dealt in the current
code.

Perhaps the deep checking of QueryDesc would allow us to consider it as consistent.

1. https://www.postgresql.org/message-id/24182.1472745492%40sss.pgh.pa.us

--
Maksim Milyutin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Re: [HACKERS] [PATCH] New command to monitor progression of longrunning queries

From

Vinayak Pokale

Date:

06 May 2017, 06:38:38

On Mon, Apr 17, 2017 at 9:09 PM, Remi Colinet <remi.colinet@gmail.com> wrote:

Hello,

I've implemented a new command named PROGRESS to monitor progression of long running SQL queries in a backend process.

Thank you for the patch.

I am testing your patch but after applying your patch other regression test failed.

$ make installcheck

13 of 178 tests failed.

Regards,

Vinayak

Re: [HACKERS] [PATCH] New command to monitor progression of longrunning queries

From

Jaime Casanova

Date:

06 May 2017, 06:57:03

On 5 May 2017 at 22:38, Vinayak Pokale <vinpokale@gmail.com> wrote:
>
> On Mon, Apr 17, 2017 at 9:09 PM, Remi Colinet <remi.colinet@gmail.com>
> wrote:
>>
>> Hello,
>>
>> I've implemented a new command named PROGRESS to monitor progression of
>> long running SQL queries in a backend process.
>>
> Thank you for the patch.
>

sorry if i'm bikeshedding to soon but... why a command instead of a function?
something like pg_progress_backend() will be in sync with
pg_cancel_backend()/pg_terminate_backend() and the result of such a
function could be usable by a tool to examine a slow query status

-- 
Jaime Casanova                      www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] [PATCH] New command to monitor progression of longrunning queries

From

Remi Colinet

Date:

11 May 2017, 16:25:19

Do you have more details about the failed tests?

Regards,

Remi

2017-05-06 5:38 GMT+02:00 Vinayak Pokale <vinpokale@gmail.com>:

On Mon, Apr 17, 2017 at 9:09 PM, Remi Colinet <remi.colinet@gmail.com> wrote:
Hello,

I've implemented a new command named PROGRESS to monitor progression of long running SQL queries in a backend process.

Thank you for the patch.

I am testing your patch but after applying your patch other regression test failed.

$ make installcheck
13 of 178 tests failed.

Regards,
Vinayak

Re: [HACKERS] [PATCH] New command to monitor progression of longrunning queries

From

Remi Colinet

Date:

11 May 2017, 16:39:56

That's a good point.

A command is more straightforward because it targets only one backend.
The user is supposed to know which backend pid is taking a long time to complete based on pg_stat_activity().

This is somehow the same approach as EXPLAIN command.
But the use is limited to psql utility. And this adds one more command.

I see 2 possible choices:

1 - either convert the command into a table.
This is the way it is done on Oracle database with v$session_longops view. Obviously, this requires probing the status of each backend. This inconvenient can be mitigated by using a threeshold of a few seconds before considering a backend progression. v$session_longops only reports long running queries after at least 6 seconds of execution.

This is less efficient that targeting directly a given pid or backend id. But this is far better for SQL.

2 - either convert the command into a function
The advantage of a function is that it can accept parameters. So parameters could be the pid of the backend, the verbosity level, the format (text, json, ....).
This would not reduce the options of the current command. And then a view could be created on top of the function.

May be a mix of both a function with parameters and a view created on the function is the solution.

Regards

Remi

2017-05-06 5:57 GMT+02:00 Jaime Casanova <jaime.casanova@2ndquadrant.com>:

On 5 May 2017 at 22:38, Vinayak Pokale <vinpokale@gmail.com> wrote:
>
> On Mon, Apr 17, 2017 at 9:09 PM, Remi Colinet <remi.colinet@gmail.com>
> wrote:
>>
>> Hello,
>>
>> I've implemented a new command named PROGRESS to monitor progression of
>> long running SQL queries in a backend process.
>>
> Thank you for the patch.
>

sorry if i'm bikeshedding to soon but... why a command instead of a function?
something like pg_progress_backend() will be in sync with
pg_cancel_backend()/pg_terminate_backend() and the result of such a
function could be usable by a tool to examine a slow query status

--
Jaime Casanova www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services