Thread: On-demand running query plans using auto_explain and signals

On-demand running query plans using auto_explain and signals

From

"Shulgin, Oleksandr"

Date:

29 August 2015, 15:33:48

Hello,

The other day I've rediscovered the exciting idea of using signals to dump a backend's currently executed query plan, which, it turns out, was already proposed by Pavel and Simon in these threads:

http://www.postgresql.org/message-id/flat/CAFj8pRA-DuzkmDtu52CiUgb0P7TVri_B8LtjMJfWcnr1LPts6w@mail.gmail.com

http://www.postgresql.org/message-id/flat/CAFj8pRDEo24joEg4UFRDYeFADFTw-jw_=t=kPwOyDW=v=g1Fhg@mail.gmail.com#

Unfortunately, the latest one is missing an actual patch (or the attachment was scrubbed) and I'm really surprised that the idea didn't take off then.

While evaluating it myself I've decided to simply patch auto_explain module which is now in contrib, so presumably should be available to a broad audience. Proof-of-concept patch against master is attached (a patch for an older branch like 9.0 requires trivial manual effort to adapt).

While I can see the value of in-core support for this, realistically this is unlikely to be included in 9.5, but an ad hoc patch could be useful long before that happens.

What this patch does is simply dump the plan of the query being run by the executor to the log when signaled with SIGUSR1. The auto_explain module must be loaded to the backend beforehand of course, session_preload_libraries seems like the place to do that cluster-wide.

Probably using SIGUSR2 would be more appropriate, but I'm not sure if there are other extensions out there that might be already using it for some other reason (well, I do not know that for SIGUSR1 either). Looking at the current state of affairs in procsignal_sigusr1_handler() makes me believe it should be pretty safe to catch the signal like I do. Or is that not the case?

The current_query_desc probably needs to be a stack-like structure in order to keep track of the nested queries correctly, but it works in the simplest cases.

What would be even more useful is including stats from the running query in the explain output, if you're a willing to pay for a (hopefully small) overhead. Unfortunately, that doesn't work out of the box: if you enable the auto_explain.log_analyze and friends in the .conf-file, you either get all zero counts, or if you're really unlucky, an error from InstrEndLoop():

ERROR: InstrEndLoop called on running node

The latest patch in this area I could found is this one: http://www.postgresql.org/message-id/87wsn82lda.fsf@oxford.xeocode.com

From what I can see, there's no way around this problem short of hacking InstrEndLoop... Did anything change in this area since 2008 possibly? I would really love to have a way to make this work with existing un-patched servers.

Cheers!

Alex

Attachment

explain-on-signal.patch

Re: On-demand running query plans using auto_explain and signals

From

Andres Freund

Date:

29 August 2015, 15:44:20

On 2015-08-29 17:33:22 +0200, Shulgin, Oleksandr wrote:
> Probably using SIGUSR2 would be more appropriate, but I'm not sure if there
> are other extensions out there that might be already using it for some
> other reason (well, I do not know that for SIGUSR1 either).  Looking at the
> current state of affairs in procsignal_sigusr1_handler() makes me believe
> it should be pretty safe to catch the signal like I do.  Or is that not the
> case?

You can catch signals, but you're not allowed to do a lot from
them. Anything allocating memory, acquiring locks, etc. is out - these
functions aren't reentrant.  If you can guarantee that you're not
interrupting any relevant code you can bend those rules, but that's
obviously not the case here.

Check out the list of async-signal-safe functions at http://man7.org/linux/man-pages/man7/signal.7.html

Andres

Re: On-demand running query plans using auto_explain and signals

From

"Shulgin, Oleksandr"

Date:

29 August 2015, 16:26:16

On Sat, Aug 29, 2015 at 5:44 PM, Andres Freund <andres@anarazel.de> wrote:

On 2015-08-29 17:33:22 +0200, Shulgin, Oleksandr wrote:
> Probably using SIGUSR2 would be more appropriate, but I'm not sure if there
> are other extensions out there that might be already using it for some
> other reason (well, I do not know that for SIGUSR1 either). Looking at the
> current state of affairs in procsignal_sigusr1_handler() makes me believe
> it should be pretty safe to catch the signal like I do. Or is that not the
> case?

You can catch signals, but you're not allowed to do a lot from
them. Anything allocating memory, acquiring locks, etc. is out - these
functions aren't reentrant. If you can guarantee that you're not
interrupting any relevant code you can bend those rules, but that's
obviously not the case here.

Check out the list of async-signal-safe functions at http://man7.org/linux/man-pages/man7/signal.7.html

Good point. There's still hope to set a flag and process it later on. Will have to check if it's possible to stay in the scope of a loaded module though.

Re: On-demand running query plans using auto_explain and signals

From

Pavel Stehule

Date:

29 August 2015, 16:28:47

2015-08-29 18:25 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:

On Sat, Aug 29, 2015 at 5:44 PM, Andres Freund <andres@anarazel.de> wrote:
On 2015-08-29 17:33:22 +0200, Shulgin, Oleksandr wrote:
> Probably using SIGUSR2 would be more appropriate, but I'm not sure if there
> are other extensions out there that might be already using it for some
> other reason (well, I do not know that for SIGUSR1 either). Looking at the
> current state of affairs in procsignal_sigusr1_handler() makes me believe
> it should be pretty safe to catch the signal like I do. Or is that not the
> case?

You can catch signals, but you're not allowed to do a lot from
them. Anything allocating memory, acquiring locks, etc. is out - these
functions aren't reentrant. If you can guarantee that you're not
interrupting any relevant code you can bend those rules, but that's
obviously not the case here.

Check out the list of async-signal-safe functions at http://man7.org/linux/man-pages/man7/signal.7.html

Good point. There's still hope to set a flag and process it later on. Will have to check if it's possible to stay in the scope of a loaded module though.

I had a workable prototype - and It was implemented very similar as handling CANCEL

Pavel

Re: On-demand running query plans using auto_explain and signals

From

Andres Freund

Date:

29 August 2015, 16:36:11

On 2015-08-29 18:27:59 +0200, Pavel Stehule wrote:
> 2015-08-29 18:25 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>
> > Good point.  There's still hope to set a flag and process it later on.
> > Will have to check if it's possible to stay in the scope of a loaded module
> > though.

> I had a workable prototype - and It was implemented very similar as
> handling CANCEL

Where did you put the handling of that kind of interrupt? Directly into
ProcessInterrupts()?

Andres

Re: On-demand running query plans using auto_explain and signals

From

Pavel Stehule

Date:

29 August 2015, 17:31:48

2015-08-29 18:36 GMT+02:00 Andres Freund <andres@anarazel.de>:

On 2015-08-29 18:27:59 +0200, Pavel Stehule wrote:
> 2015-08-29 18:25 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>
> > Good point. There's still hope to set a flag and process it later on.
> > Will have to check if it's possible to stay in the scope of a loaded module
> > though.

> I had a workable prototype - and It was implemented very similar as
> handling CANCEL

Where did you put the handling of that kind of interrupt? Directly into
ProcessInterrupts()?

Probably. I don't remember it well, but it need hack code - it cannot be used from extension.

Pavel

Andres

Re: On-demand running query plans using auto_explain and signals

From

"Shulgin, Oleksandr"

Date:

30 August 2015, 08:30:56

<p dir="ltr">On Aug 29, 2015 7:31 PM, "Pavel Stehule" <<a
href="mailto:pavel.stehule@gmail.com">pavel.stehule@gmail.com</a>>wrote:<br /> ><br /> ><br /> ><br /> >
2015-08-2918:36 GMT+02:00 Andres Freund <<a href="mailto:andres@anarazel.de">andres@anarazel.de</a>>:<br />
>><br/> >> On 2015-08-29 18:27:59 +0200, Pavel Stehule wrote:<br /> >> > 2015-08-29 18:25
GMT+02:00Shulgin, Oleksandr <<a href="mailto:oleksandr.shulgin@zalando.de">oleksandr.shulgin@zalando.de</a>><br
/>>> > > Good point.  There's still hope to set a flag and process it later on.<br /> >> > >
Willhave to check if it's possible to stay in the scope of a loaded module<br /> >> > > though.<br />
>><br/> >> > I had a workable prototype - and It was implemented very similar as<br /> >> >
handlingCANCEL<br /> >><br /> >> Where did you put the handling of that kind of interrupt? Directly into<br
/>>> ProcessInterrupts()?<br /> ><br /> ><br /> > Probably. I don't remember it well, but it need hack
code- it cannot be used from extension.<p dir="ltr">Do you still have the code somewhere around? Did it see production
use?<pdir="ltr">Thanks!<br /> --<br /> Alex<br />

Re: On-demand running query plans using auto_explain and signals

From

Pavel Stehule

Date:

30 August 2015, 08:35:33

2015-08-30 10:30 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:

On Aug 29, 2015 7:31 PM, "Pavel Stehule" <pavel.stehule@gmail.com> wrote:
>
>
>
> 2015-08-29 18:36 GMT+02:00 Andres Freund <andres@anarazel.de>:
>>
>> On 2015-08-29 18:27:59 +0200, Pavel Stehule wrote:
>> > 2015-08-29 18:25 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>
>> > > Good point. There's still hope to set a flag and process it later on.
>> > > Will have to check if it's possible to stay in the scope of a loaded module
>> > > though.
>>
>> > I had a workable prototype - and It was implemented very similar as
>> > handling CANCEL
>>
>> Where did you put the handling of that kind of interrupt? Directly into
>> ProcessInterrupts()?
>
>
> Probably. I don't remember it well, but it need hack code - it cannot be used from extension.
Do you still have the code somewhere around? Did it see production use?

I am not sure I am able to find it - I'll try. We didn't use it on production.

Thanks!
--
Alex

Re: On-demand running query plans using auto_explain and signals

From

Pavel Stehule

Date:

31 August 2015, 04:16:18

2015-08-30 10:34 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:

2015-08-30 10:30 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:
On Aug 29, 2015 7:31 PM, "Pavel Stehule" <pavel.stehule@gmail.com> wrote:
>
>
>
> 2015-08-29 18:36 GMT+02:00 Andres Freund <andres@anarazel.de>:
>>
>> On 2015-08-29 18:27:59 +0200, Pavel Stehule wrote:
>> > 2015-08-29 18:25 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>
>> > > Good point. There's still hope to set a flag and process it later on.
>> > > Will have to check if it's possible to stay in the scope of a loaded module
>> > > though.
>>
>> > I had a workable prototype - and It was implemented very similar as
>> > handling CANCEL
>>
>> Where did you put the handling of that kind of interrupt? Directly into
>> ProcessInterrupts()?
>
>
> Probably. I don't remember it well, but it need hack code - it cannot be used from extension.
Do you still have the code somewhere around? Did it see production use?

I sent it to mailing list year ago

http://www.postgresql.org/message-id/CAFj8pRAXcS9B8ABgiM-zauVgGqDhPZOaRz5YSp1_Nhv9HP8nKw@mail.gmail.com

Regards

Pavel

I am not sure I am able to find it - I'll try. We didn't use it on production.

Thanks!
--
Alex

Re: On-demand running query plans using auto_explain and signals

From

"Shulgin, Oleksandr"

Date:

31 August 2015, 09:30:44

Do you still have the code somewhere around? Did it see production use?
I sent it to mailing list year ago

http://www.postgresql.org/message-id/CAFj8pRAXcS9B8ABgiM-zauVgGqDhPZOaRz5YSp1_Nhv9HP8nKw@mail.gmail.com

Ah, thanks! Somehow I've missed this mail. You didn't add the patch to a commitfest back then I think?

Alex

Re: On-demand running query plans using auto_explain and signals

From

Pavel Stehule

Date:

31 August 2015, 10:36:31

2015-08-31 11:30 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:

Do you still have the code somewhere around? Did it see production use?
I sent it to mailing list year ago

http://www.postgresql.org/message-id/CAFj8pRAXcS9B8ABgiM-zauVgGqDhPZOaRz5YSp1_Nhv9HP8nKw@mail.gmail.com

Ah, thanks! Somehow I've missed this mail. You didn't add the patch to a commitfest back then I think?

I had no time to finish this patch - there is few issues in signal handling and returning back result - but still I want it :) - and what I know - almost all other SQL db has similar functionality.

--
Alex

Re: On-demand running query plans using auto_explain and signals

From

"Shulgin, Oleksandr"

Date:

31 August 2015, 17:09:44

On Mon, Aug 31, 2015 at 12:35 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

http://www.postgresql.org/message-id/CAFj8pRAXcS9B8ABgiM-zauVgGqDhPZOaRz5YSp1_Nhv9HP8nKw@mail.gmail.com

Ah, thanks! Somehow I've missed this mail. You didn't add the patch to a commitfest back then I think?

I had no time to finish this patch - there is few issues in signal handling and returning back result - but still I want it :) - and what I know - almost all other SQL db has similar functionality.

I've updated the patch for the current master and also added some unexpected parameters handling, so attached is a v2.

I'd say we should hide the so-designed pg_cmdstatus() interface behind more friendly calls like pg_explain_backend() and pg_backend_progress() to give some naming examples, to remove the need for magic numbers in the second arg.

What I've found missing in this approach is the insight into nested executor runs, so that if you're running a "SELECT my_func()", you only see this outer query in the pg_cmdstatus() output. With the auto_explain approach, by hooking into executor I was able to capture the nested queries and their plans as well.

It's conceptually trivial to add some code to use the Executor hooks here, but I don't see any precedent for this except for contrib modules (auto_explain and pg_stat_statements), I'm just not sure if that would be OK-ish.

And when we solve that, there is another problem of having a sane interface to query the nested plans. For a psql user, probably the most interesting would be the topmost (level=1) and the innermost (e.g. level=-1) plans. We might also want to provide a full nesting of plans in a structured format like JSON or... *cough* XML, for programs to consume and display nicely with folding and stuff.

And the most interesting would be making instrumentation work with all of the above.

I'm adding this to the next CF.

Alex

Attachment

explain-pid-v2.patch

Re: On-demand running query plans using auto_explain and signals

From

Pavel Stehule

Date:

01 September 2015, 05:40:46

2015-08-31 19:09 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:

On Mon, Aug 31, 2015 at 12:35 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

http://www.postgresql.org/message-id/CAFj8pRAXcS9B8ABgiM-zauVgGqDhPZOaRz5YSp1_Nhv9HP8nKw@mail.gmail.com

Ah, thanks! Somehow I've missed this mail. You didn't add the patch to a commitfest back then I think?

I had no time to finish this patch - there is few issues in signal handling and returning back result - but still I want it :) - and what I know - almost all other SQL db has similar functionality.

I've updated the patch for the current master and also added some unexpected parameters handling, so attached is a v2.

Thank you very much

I'd say we should hide the so-designed pg_cmdstatus() interface behind more friendly calls like pg_explain_backend() and pg_backend_progress() to give some naming examples, to remove the need for magic numbers in the second arg.

I had similar idea - this is good enough for start, but target interface iis based on integration with EXPLAIN statement

some like EXPLAIN PROCESS or EXPLAIN PID or EXPLAIN VERBOSE PID ..

What I've found missing in this approach is the insight into nested executor runs, so that if you're running a "SELECT my_func()", you only see this outer query in the pg_cmdstatus() output. With the auto_explain approach, by hooking into executor I was able to capture the nested queries and their plans as well.

I understand - originally I didn't think about nested queries, but it is good idea and probably not a problem:

Not for XML and JSON where we can describe nesting simply

It is little bit harder for plain text - but we can use similar format that is used for subplans or some like

top query:

SELECT fx()

nested (1. level) query:

SELECT ....

It's conceptually trivial to add some code to use the Executor hooks here, but I don't see any precedent for this except for contrib modules (auto_explain and pg_stat_statements), I'm just not sure if that would be OK-ish.

And when we solve that, there is another problem of having a sane interface to query the nested plans. For a psql user, probably the most interesting would be the topmost (level=1) and the innermost (e.g. level=-1) plans. We might also want to provide a full nesting of plans in a structured format like JSON or... *cough* XML, for programs to consume and display nicely with folding and stuff.

And the most interesting would be making instrumentation work with all of the above.

the important functionality is drawing complete text of query - it was my original motivation, because I had not way how to get complete query before its finishing

Probably the communication between processes should be more complex :( - the SHM queue should be used there, because some plans can be terrible long.

The using shared write buffer (one for all) is too simply solution probably - good for prototype, but not good for core.

I have a idea about communication:

1. caller prepare buffer, shm queue and signalize target process - parameter is pid od caller

2. target process fills a write buffer and close queue

3. caller show data and close buffer, close queue

Now almost all code for communication is in upstream - the missing part is injection one end of queue to any process dynamicaly.

Regards

Pavel

I'm adding this to the next CF.

--
Alex

Re: On-demand running query plans using auto_explain and signals

From

"Shulgin, Oleksandr"

Date:

01 September 2015, 13:01:37

I'd say we should hide the so-designed pg_cmdstatus() interface behind more friendly calls like pg_explain_backend() and pg_backend_progress() to give some naming examples, to remove the need for magic numbers in the second arg.

I had similar idea - this is good enough for start, but target interface iis based on integration with EXPLAIN statement

some like EXPLAIN PROCESS or EXPLAIN PID or EXPLAIN VERBOSE PID ..

Yes, that's another way to do it.

the important functionality is drawing complete text of query - it was my original motivation, because I had not way how to get complete query before its finishing

Probably the communication between processes should be more complex :( - the SHM queue should be used there, because some plans can be terrible long.

The using shared write buffer (one for all) is too simply solution probably - good for prototype, but not good for core.

I have a idea about communication:

1. caller prepare buffer, shm queue and signalize target process - parameter is pid od caller
2. target process fills a write buffer and close queue
3. caller show data and close buffer, close queue

Now almost all code for communication is in upstream - the missing part is injection one end of queue to any process dynamicaly.

I'm not familiar with the shared memory handling, but could we not allocate just enough shared memory to fit the data we're going to write instead of the fixed 8k? It's not that we cannot know the length of the resulting plan text in advance.

I think we can remove buffer_is_free/buffer_holds_data and just use the buffer != NULL to check if there's some data to be read (and buffer == NULL to check if we can write).

Alex

Re: On-demand running query plans using auto_explain and signals

From

Pavel Stehule

Date:

01 September 2015, 13:10:43

2015-09-01 15:00 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:

I'd say we should hide the so-designed pg_cmdstatus() interface behind more friendly calls like pg_explain_backend() and pg_backend_progress() to give some naming examples, to remove the need for magic numbers in the second arg.

I had similar idea - this is good enough for start, but target interface iis based on integration with EXPLAIN statement

some like EXPLAIN PROCESS or EXPLAIN PID or EXPLAIN VERBOSE PID ..

Yes, that's another way to do it.

the important functionality is drawing complete text of query - it was my original motivation, because I had not way how to get complete query before its finishing

Probably the communication between processes should be more complex :( - the SHM queue should be used there, because some plans can be terrible long.

The using shared write buffer (one for all) is too simply solution probably - good for prototype, but not good for core.

I have a idea about communication:

1. caller prepare buffer, shm queue and signalize target process - parameter is pid od caller
2. target process fills a write buffer and close queue
3. caller show data and close buffer, close queue

Now almost all code for communication is in upstream - the missing part is injection one end of queue to any process dynamicaly.

I'm not familiar with the shared memory handling, but could we not allocate just enough shared memory to fit the data we're going to write instead of the fixed 8k? It's not that we cannot know the length of the resulting plan text in advance.

the shared memory cannot be reused - (released) :(, so allocating enough memory is not effective. More - in this moment it has not sense. Shared memory queue can do almost all work.

I think we can remove buffer_is_free/buffer_holds_data and just use the buffer != NULL to check if there's some data to be read (and buffer == NULL to check if we can write).

--
Alex

Re: On-demand running query plans using auto_explain and signals

From

"Shulgin, Oleksandr"

Date:

01 September 2015, 15:21:11

I'm not familiar with the shared memory handling, but could we not allocate just enough shared memory to fit the data we're going to write instead of the fixed 8k? It's not that we cannot know the length of the resulting plan text in advance.

the shared memory cannot be reused - (released) :(, so allocating enough memory is not effective. More - in this moment it has not sense. Shared memory queue can do almost all work.

A-ha, I've discovered the shared memory message queue facility and I see how we can use it.

But do we really need the slots mechanism? Would it not be OK to just let the LWLock do the sequencing of concurrent requests? Given that we only going to use one message queue per cluster, there's not much concurrency you can gain by introducing slots I believe.

Alex

Re: On-demand running query plans using auto_explain and signals

From

Pavel Stehule

Date:

01 September 2015, 17:02:56

2015-09-01 17:20 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:

I'm not familiar with the shared memory handling, but could we not allocate just enough shared memory to fit the data we're going to write instead of the fixed 8k? It's not that we cannot know the length of the resulting plan text in advance.

the shared memory cannot be reused - (released) :(, so allocating enough memory is not effective. More - in this moment it has not sense. Shared memory queue can do almost all work.

A-ha, I've discovered the shared memory message queue facility and I see how we can use it.

But do we really need the slots mechanism? Would it not be OK to just let the LWLock do the sequencing of concurrent requests? Given that we only going to use one message queue per cluster, there's not much concurrency you can gain by introducing slots I believe.

I afraid of problems on production. When you have a queue related to any process, then all problems should be off after end of processes. One message queue per cluster needs restart cluster when some pathological problems are - and you cannot restart cluster in production week, sometimes weeks. The slots are more robust.

Pavel

--
Alex

Re: On-demand running query plans using auto_explain and signals

From

"Shulgin, Oleksandr"

Date:

02 September 2015, 09:11:19

On Tue, Sep 1, 2015 at 7:02 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

But do we really need the slots mechanism? Would it not be OK to just let the LWLock do the sequencing of concurrent requests? Given that we only going to use one message queue per cluster, there's not much concurrency you can gain by introducing slots I believe.

I afraid of problems on production. When you have a queue related to any process, then all problems should be off after end of processes. One message queue per cluster needs restart cluster when some pathological problems are - and you cannot restart cluster in production week, sometimes weeks. The slots are more robust.

Yes, but in your implementation the slots themselves don't have a queue/buffer. Did you intend to have a message queue per slot?

What sort of pathological problems are you concerned of? The communicating backends should just detach from the message queue properly and have some timeout configured to prevent deadlocks. Other than that, I don't see how having N slots really help the problem: in case of pathological problems you will just deplete them all sooner or later.

Alex

Re: On-demand running query plans using auto_explain and signals

From

Pavel Stehule

Date:

02 September 2015, 09:17:10

2015-09-02 11:01 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:

On Tue, Sep 1, 2015 at 7:02 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

But do we really need the slots mechanism? Would it not be OK to just let the LWLock do the sequencing of concurrent requests? Given that we only going to use one message queue per cluster, there's not much concurrency you can gain by introducing slots I believe.

I afraid of problems on production. When you have a queue related to any process, then all problems should be off after end of processes. One message queue per cluster needs restart cluster when some pathological problems are - and you cannot restart cluster in production week, sometimes weeks. The slots are more robust.

Yes, but in your implementation the slots themselves don't have a queue/buffer. Did you intend to have a message queue per slot?

The message queue cannot be reused, so I expect one slot per caller to be used passing parameters, - message queue will be created/released by demand by caller.

What sort of pathological problems are you concerned of? The communicating backends should just detach from the message queue properly and have some timeout configured to prevent deadlocks. Other than that, I don't see how having N slots really help the problem: in case of pathological problems you will just deplete them all sooner or later.

I afraid of unexpected problems :) - any part of signal handling or multiprocess communication is fragile. Slots are simple and simply attached to any process without necessity to alloc/free some memory.

Pavel

--
Alex

Re: On-demand running query plans using auto_explain and signals

From

"Shulgin, Oleksandr"

Date:

02 September 2015, 10:37:01

On Wed, Sep 2, 2015 at 11:16 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

2015-09-02 11:01 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:
On Tue, Sep 1, 2015 at 7:02 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

But do we really need the slots mechanism? Would it not be OK to just let the LWLock do the sequencing of concurrent requests? Given that we only going to use one message queue per cluster, there's not much concurrency you can gain by introducing slots I believe.

I afraid of problems on production. When you have a queue related to any process, then all problems should be off after end of processes. One message queue per cluster needs restart cluster when some pathological problems are - and you cannot restart cluster in production week, sometimes weeks. The slots are more robust.

Yes, but in your implementation the slots themselves don't have a queue/buffer. Did you intend to have a message queue per slot?

The message queue cannot be reused, so I expect one slot per caller to be used passing parameters, - message queue will be created/released by demand by caller.

I don't believe a message queue cannot really be reused. What would stop us from calling shm_mq_create() on the queue struct again?

To give you an idea, in my current prototype I have only the following struct:

typedef struct {

LWLock *lock;

/*CmdStatusInfoSlot slots[CMDINFO_SLOTS];*/

pid_t target_pid;

pid_t sender_pid;

int request_type;

int result_code;

shm_mq buffer;

} CmdStatusInfo;

An instance of this is allocated on shared memory once, using BUFFER_SIZE of 8k.

In pg_cmdstatus() I lock on the LWLock to check if target_pid is 0, then it means nobody else is using this communication channel at the moment. If that's the case, I set the pids and request_type and initialize the mq buffer. Otherwise I just sleep and retry acquiring the lock (a timeout should be added here probably).

What sort of pathological problems are you concerned of? The communicating backends should just detach from the message queue properly and have some timeout configured to prevent deadlocks. Other than that, I don't see how having N slots really help the problem: in case of pathological problems you will just deplete them all sooner or later.

I afraid of unexpected problems :) - any part of signal handling or multiprocess communication is fragile. Slots are simple and simply attached to any process without necessity to alloc/free some memory.

Yes, but do slots solve the actual problem? If there is only one message queue, you still have the same problem regardless of the number of slots you decide to have.

Alex

Re: On-demand running query plans using auto_explain and signals

From

Pavel Stehule

Date:

02 September 2015, 12:56:49

2015-09-02 12:36 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:

On Wed, Sep 2, 2015 at 11:16 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

2015-09-02 11:01 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:
On Tue, Sep 1, 2015 at 7:02 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

But do we really need the slots mechanism? Would it not be OK to just let the LWLock do the sequencing of concurrent requests? Given that we only going to use one message queue per cluster, there's not much concurrency you can gain by introducing slots I believe.

I afraid of problems on production. When you have a queue related to any process, then all problems should be off after end of processes. One message queue per cluster needs restart cluster when some pathological problems are - and you cannot restart cluster in production week, sometimes weeks. The slots are more robust.

Yes, but in your implementation the slots themselves don't have a queue/buffer. Did you intend to have a message queue per slot?

The message queue cannot be reused, so I expect one slot per caller to be used passing parameters, - message queue will be created/released by demand by caller.

I don't believe a message queue cannot really be reused. What would stop us from calling shm_mq_create() on the queue struct again?

you cannot to change recipient later

To give you an idea, in my current prototype I have only the following struct:

typedef struct {
LWLock *lock;
/*CmdStatusInfoSlot slots[CMDINFO_SLOTS];*/
pid_t target_pid;
pid_t sender_pid;
int request_type;
int result_code;
shm_mq buffer;
} CmdStatusInfo;

An instance of this is allocated on shared memory once, using BUFFER_SIZE of 8k.

In pg_cmdstatus() I lock on the LWLock to check if target_pid is 0, then it means nobody else is using this communication channel at the moment. If that's the case, I set the pids and request_type and initialize the mq buffer. Otherwise I just sleep and retry acquiring the lock (a timeout should be added here probably).

What sort of pathological problems are you concerned of? The communicating backends should just detach from the message queue properly and have some timeout configured to prevent deadlocks. Other than that, I don't see how having N slots really help the problem: in case of pathological problems you will just deplete them all sooner or later.

I afraid of unexpected problems :) - any part of signal handling or multiprocess communication is fragile. Slots are simple and simply attached to any process without necessity to alloc/free some memory.

Yes, but do slots solve the actual problem? If there is only one message queue, you still have the same problem regardless of the number of slots you decide to have.

--
Alex

Re: On-demand running query plans using auto_explain and signals

From

"Shulgin, Oleksandr"

Date:

02 September 2015, 13:01:05

On Wed, Sep 2, 2015 at 2:56 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

2015-09-02 12:36 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:
On Wed, Sep 2, 2015 at 11:16 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

2015-09-02 11:01 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:
On Tue, Sep 1, 2015 at 7:02 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

But do we really need the slots mechanism? Would it not be OK to just let the LWLock do the sequencing of concurrent requests? Given that we only going to use one message queue per cluster, there's not much concurrency you can gain by introducing slots I believe.

I afraid of problems on production. When you have a queue related to any process, then all problems should be off after end of processes. One message queue per cluster needs restart cluster when some pathological problems are - and you cannot restart cluster in production week, sometimes weeks. The slots are more robust.

Yes, but in your implementation the slots themselves don't have a queue/buffer. Did you intend to have a message queue per slot?

The message queue cannot be reused, so I expect one slot per caller to be used passing parameters, - message queue will be created/released by demand by caller.

I don't believe a message queue cannot really be reused. What would stop us from calling shm_mq_create() on the queue struct again?

you cannot to change recipient later

Well, maybe I'm missing something, but sh_mq_create() will just overwrite the contents of the struct, so it doesn't care about sender/receiver: only sh_mq_set_sender/receiver() do.

Re: On-demand running query plans using auto_explain and signals

From

Pavel Stehule

Date:

02 September 2015, 13:05:13

2015-09-02 15:00 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:

On Wed, Sep 2, 2015 at 2:56 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

2015-09-02 12:36 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:
On Wed, Sep 2, 2015 at 11:16 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

2015-09-02 11:01 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:
On Tue, Sep 1, 2015 at 7:02 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

But do we really need the slots mechanism? Would it not be OK to just let the LWLock do the sequencing of concurrent requests? Given that we only going to use one message queue per cluster, there's not much concurrency you can gain by introducing slots I believe.

I afraid of problems on production. When you have a queue related to any process, then all problems should be off after end of processes. One message queue per cluster needs restart cluster when some pathological problems are - and you cannot restart cluster in production week, sometimes weeks. The slots are more robust.

Yes, but in your implementation the slots themselves don't have a queue/buffer. Did you intend to have a message queue per slot?

The message queue cannot be reused, so I expect one slot per caller to be used passing parameters, - message queue will be created/released by demand by caller.

I don't believe a message queue cannot really be reused. What would stop us from calling shm_mq_create() on the queue struct again?

you cannot to change recipient later

Well, maybe I'm missing something, but sh_mq_create() will just overwrite the contents of the struct, so it doesn't care about sender/receiver: only sh_mq_set_sender/receiver() do.

if you create sh_mq from scratch, then you can reuse structure.

Pavel

Re: On-demand running query plans using auto_explain and signals

From

"Shulgin, Oleksandr"

Date:

02 September 2015, 13:08:00

On Wed, Sep 2, 2015 at 3:04 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Well, maybe I'm missing something, but sh_mq_create() will just overwrite the contents of the struct, so it doesn't care about sender/receiver: only sh_mq_set_sender/receiver() do.

if you create sh_mq from scratch, then you can reuse structure.

That was my plan. I'm now close to actually compiling this stuff, we'll see if it works in the simplest case at least. :-)

Re: On-demand running query plans using auto_explain and signals

From

"Shulgin, Oleksandr"

Date:

03 September 2015, 16:31:23

On Wed, Sep 2, 2015 at 3:07 PM, Shulgin, Oleksandr <oleksandr.shulgin@zalando.de> wrote:

On Wed, Sep 2, 2015 at 3:04 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Well, maybe I'm missing something, but sh_mq_create() will just overwrite the contents of the struct, so it doesn't care about sender/receiver: only sh_mq_set_sender/receiver() do.

if you create sh_mq from scratch, then you can reuse structure.

Please find attached a v3.

It uses a shared memory queue and also has the ability to capture plans nested deeply in the call stack. Not sure about using the executor hook, since this is not an extension...

The LWLock is used around initializing/cleaning the shared struct and the message queue, the IO synchronization is handled by the message queue itself. After some testing with concurrent pgbench and intentionally deep recursive plpgsql functions (up to 700 plpgsql stack frames) I think this approach can work. Unless there's some theoretical problem I'm just not aware of. :-)

Comments welcome!

Alex

Attachment

explain-pid-v3.patch

Re: On-demand running query plans using auto_explain and signals

From

Pavel Stehule

Date:

03 September 2015, 20:07:45

2015-09-03 18:30 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:

On Wed, Sep 2, 2015 at 3:07 PM, Shulgin, Oleksandr <oleksandr.shulgin@zalando.de> wrote:
On Wed, Sep 2, 2015 at 3:04 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Well, maybe I'm missing something, but sh_mq_create() will just overwrite the contents of the struct, so it doesn't care about sender/receiver: only sh_mq_set_sender/receiver() do.

if you create sh_mq from scratch, then you can reuse structure.

Please find attached a v3.

It uses a shared memory queue and also has the ability to capture plans nested deeply in the call stack. Not sure about using the executor hook, since this is not an extension...

The LWLock is used around initializing/cleaning the shared struct and the message queue, the IO synchronization is handled by the message queue itself. After some testing with concurrent pgbench and intentionally deep recursive plpgsql functions (up to 700 plpgsql stack frames) I think this approach can work. Unless there's some theoretical problem I'm just not aware of. :-)

Comments welcome!

I am not pretty happy from this design. Only one EXPLAIN PID/GET STATUS in one time can be executed per server - I remember lot of queries that doesn't handle CANCEL well ~ doesn't handle interrupt well, and this can be unfriendly. Cannot to say if it is good enough for first iteration. This is functionality that can be used for diagnostic when you have overloaded server and this risk looks too high (for me). The idea of receive slot can to solve this risk well. The difference from this code should not be too big - although it is not trivial - needs work with PGPROC.

Other smaller issues:

* probably sending line by line is useless - shm_mq_send can pass bigger data when nowait = false
* pg_usleep(1000L); - it is related to single point resource

Some ideas:

* this code share some important parts with auto_explain (query stack) - and because it should be in core (due handling signal if I remember well), it can be first step of integration auto_explain to core.

--
Alex

Re: On-demand running query plans using auto_explain and signals

From

Pavel Stehule

Date:

03 September 2015, 20:14:48

2015-09-03 22:06 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:

Hi

2015-09-03 18:30 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:
On Wed, Sep 2, 2015 at 3:07 PM, Shulgin, Oleksandr <oleksandr.shulgin@zalando.de> wrote:
On Wed, Sep 2, 2015 at 3:04 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Well, maybe I'm missing something, but sh_mq_create() will just overwrite the contents of the struct, so it doesn't care about sender/receiver: only sh_mq_set_sender/receiver() do.

if you create sh_mq from scratch, then you can reuse structure.

Please find attached a v3.

It uses a shared memory queue and also has the ability to capture plans nested deeply in the call stack. Not sure about using the executor hook, since this is not an extension...

The LWLock is used around initializing/cleaning the shared struct and the message queue, the IO synchronization is handled by the message queue itself. After some testing with concurrent pgbench and intentionally deep recursive plpgsql functions (up to 700 plpgsql stack frames) I think this approach can work. Unless there's some theoretical problem I'm just not aware of. :-)

Comments welcome!

I am not pretty happy from this design. Only one EXPLAIN PID/GET STATUS in one time can be executed per server - I remember lot of queries that doesn't handle CANCEL well ~ doesn't handle interrupt well, and this can be unfriendly. Cannot to say if it is good enough for first iteration. This is functionality that can be used for diagnostic when you have overloaded server and this risk looks too high (for me). The idea of receive slot can to solve this risk well (and can be used elsewhere). The difference from this code should not be too big - although it is not trivial - needs work with PGPROC. The opinion of our multiprocess experts can be interesting. Maybe I am too careful.

Other smaller issues:

* probably sending line by line is useless - shm_mq_send can pass bigger data when nowait = false
* pg_usleep(1000L); - it is related to single point resource

Some ideas:

* this code share some important parts with auto_explain (query stack) - and because it should be in core (due handling signal if I remember well), it can be first step of integration auto_explain to core.

--
Alex

Re: On-demand running query plans using auto_explain and signals

From

"Shulgin, Oleksandr"

Date:

04 September 2015, 03:50:41

<p dir="ltr">On Sep 3, 2015 10:14 PM, "Pavel Stehule" <<a
href="mailto:pavel.stehule@gmail.com">pavel.stehule@gmail.com</a>>wrote:<br /> >>><br /> >>>
Pleasefind attached a v3.<br /> >>><br /> >>> It uses a shared memory queue and also has the ability
tocapture plans nested deeply in the call stack.  Not sure about using the executor hook, since this is not an
extension...<br/> >>><br /> >>> The LWLock is used around initializing/cleaning the shared struct and
themessage queue, the IO synchronization is handled by the message queue itself.<br /> >><br /> >> I am not
prettyhappy from this design. Only one EXPLAIN PID/GET STATUS in one time can be executed per server - I remember lot
ofqueries that doesn't handle CANCEL well ~ doesn't handle interrupt well, and this can be unfriendly. Cannot to say if
itis good enough for first iteration. This is functionality that can be used for diagnostic when you have overloaded
serverand this risk looks too high (for me). The idea of receive slot can to solve this risk well (and can be used
elsewhere).The difference from this code should not be too big - although it is not trivial - needs work with PGPROC.
Theopinion of our multiprocess experts can be interesting. Maybe I am too careful.<p dir="ltr">Sorry, but I still don't
seehow the slots help this issue - could you please elaborate?<p dir="ltr">>> Other smaller issues:<br />
>><br/> >> * probably sending line by line is useless - shm_mq_send can pass bigger data when nowait =
false<pdir="ltr">I'm not sending it like that because of the message size - I just find it more convenient. If you
thinkit can be problematic, its easy to do this as before, by splitting lines on the receiving side.<p
dir="ltr">>>* pg_usleep(1000L); - it is related to single point resource<p dir="ltr">But not a highly concurrent
one.<pdir="ltr">-<br /> Alex

Re: On-demand running query plans using auto_explain and signals

From

Pavel Stehule

Date:

04 September 2015, 04:12:22

2015-09-04 5:50 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:

On Sep 3, 2015 10:14 PM, "Pavel Stehule" <pavel.stehule@gmail.com> wrote:
>>>
>>> Please find attached a v3.
>>>
>>> It uses a shared memory queue and also has the ability to capture plans nested deeply in the call stack. Not sure about using the executor hook, since this is not an extension...
>>>
>>> The LWLock is used around initializing/cleaning the shared struct and the message queue, the IO synchronization is handled by the message queue itself.
>>
>> I am not pretty happy from this design. Only one EXPLAIN PID/GET STATUS in one time can be executed per server - I remember lot of queries that doesn't handle CANCEL well ~ doesn't handle interrupt well, and this can be unfriendly. Cannot to say if it is good enough for first iteration. This is functionality that can be used for diagnostic when you have overloaded server and this risk looks too high (for me). The idea of receive slot can to solve this risk well (and can be used elsewhere). The difference from this code should not be too big - although it is not trivial - needs work with PGPROC. The opinion of our multiprocess experts can be interesting. Maybe I am too careful.
Sorry, but I still don't see how the slots help this issue - could you please elaborate?

with slot (or some similiar) there is not global locked resource. If I'll have a time at weekend I'll try to write some prototype.

>> Other smaller issues:
>>
>> * probably sending line by line is useless - shm_mq_send can pass bigger data when nowait = false
I'm not sending it like that because of the message size - I just find it more convenient. If you think it can be problematic, its easy to do this as before, by splitting lines on the receiving side.

Yes, shm queue sending data immediately - so slicing on sender generates more interprocess communication

>> * pg_usleep(1000L); - it is related to single point resource
But not a highly concurrent one.

I believe so it is not becessary - waiting (sleeping) can be deeper in reading from queue - the code will be cleaner

Regard

Pavel

-
Alex

Re: On-demand running query plans using auto_explain and signals

From

"Shulgin, Oleksandr"

Date:

07 September 2015, 09:55:48

On Fri, Sep 4, 2015 at 6:11 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Sorry, but I still don't see how the slots help this issue - could you please elaborate?
with slot (or some similiar) there is not global locked resource. If I'll have a time at weekend I'll try to write some prototype.

But you will still lock on the slots list to find an unused one. How is that substantially different from what I'm doing?

>> Other smaller issues:
>>
>> * probably sending line by line is useless - shm_mq_send can pass bigger data when nowait = false
I'm not sending it like that because of the message size - I just find it more convenient. If you think it can be problematic, its easy to do this as before, by splitting lines on the receiving side.
Yes, shm queue sending data immediately - so slicing on sender generates more interprocess communication

Well, we are talking about hundreds to thousands bytes per plan in total. And if my reading of shm_mq implementation is correct, if the message fits into the shared memory buffer, the receiver gets the direct pointer to the shared memory, no extra allocation/copy to process-local memory. So this can be actually a win.

>> * pg_usleep(1000L); - it is related to single point resource
But not a highly concurrent one.
I believe so it is not becessary - waiting (sleeping) can be deeper in reading from queue - the code will be cleaner

The only way I expect this line to be reached is when a concurrent pg_cmdstatus() call is in progress: the receiving backend has set the target_pid and has created the queue, released the lock and now waits to read something from shm_mq. So the backend that's trying to also use this communication channel can obtain the lwlock, checks if the channel is not used at the time, fails and then it needs to check again, but that's going to put a load on the CPU, so there's a small sleep.

The real problem could be if the process that was signaled to connect to the message queue never handles the interrupt, and we keep waiting forever in shm_mq_receive(). We could add a timeout parameter or just let the user cancel the call: send a cancellation request, use pg_cancel_backend() or set statement_timeout before running this.

Alex

Re: On-demand running query plans using auto_explain and signals

From

Pavel Stehule

Date:

08 September 2015, 04:59:30

2015-09-07 11:55 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:

On Fri, Sep 4, 2015 at 6:11 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Sorry, but I still don't see how the slots help this issue - could you please elaborate?
with slot (or some similiar) there is not global locked resource. If I'll have a time at weekend I'll try to write some prototype.

But you will still lock on the slots list to find an unused one. How is that substantially different from what I'm doing?

It is not necessary - you can use similar technique to what it does PGPROC. I am sending "lock free" demo.

I don't afraid about locks - short locks, when the range and time are limited. But there are lot of bugs, and fixes with the name "do interruptible some", and it is reason, why I prefer typical design for work with shared memory.

>> Other smaller issues:
>>
>> * probably sending line by line is useless - shm_mq_send can pass bigger data when nowait = false
I'm not sending it like that because of the message size - I just find it more convenient. If you think it can be problematic, its easy to do this as before, by splitting lines on the receiving side.
Yes, shm queue sending data immediately - so slicing on sender generates more interprocess communication

Well, we are talking about hundreds to thousands bytes per plan in total. And if my reading of shm_mq implementation is correct, if the message fits into the shared memory buffer, the receiver gets the direct pointer to the shared memory, no extra allocation/copy to process-local memory. So this can be actually a win.

you have to calculate with signals and interprocess communication. the cost of memory allocation is not all.

>> * pg_usleep(1000L); - it is related to single point resource
But not a highly concurrent one.
I believe so it is not becessary - waiting (sleeping) can be deeper in reading from queue - the code will be cleaner

The only way I expect this line to be reached is when a concurrent pg_cmdstatus() call is in progress: the receiving backend has set the target_pid and has created the queue, released the lock and now waits to read something from shm_mq. So the backend that's trying to also use this communication channel can obtain the lwlock, checks if the channel is not used at the time, fails and then it needs to check again, but that's going to put a load on the CPU, so there's a small sleep.

The real problem could be if the process that was signaled to connect to the message queue never handles the interrupt, and we keep waiting forever in shm_mq_receive(). We could add a timeout parameter or just let the user cancel the call: send a cancellation request, use pg_cancel_backend() or set statement_timeout before running this.

This is valid question - for begin we can use a statement_timeout and we don't need to design some special (if you don't hold some important lock).

My example (the code has prototype quality) is little bit longer, but it work without global lock - the requester doesn't block any other

Pavel

--
Alex

Attachment

signal_demo.patch

Re: On-demand running query plans using auto_explain and signals

From

"Shulgin, Oleksandr"

Date:

08 September 2015, 09:50:00

On Tue, Sep 8, 2015 at 6:58 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>>
>> But you will still lock on the slots list to find an unused one. How is that substantially different from what I'm doing?
>
> It is not necessary - you can use similar technique to what it does PGPROC. I am sending "lock free" demo.
>
> I don't afraid about locks - short locks, when the range and time are limited. But there are lot of bugs, and fixes with the name "do interruptible some", and it is reason, why I prefer typical design for work with shared memory.

Thanks, this is really helpful! The key difference is that every backend has a dedicated slot, so there's no need to search for a free one, which would again incur locking.

>> Well, we are talking about hundreds to thousands bytes per plan in total. And if my reading of shm_mq implementation is correct, if the message fits into the shared memory buffer, the receiver gets the direct pointer to the shared memory, no extra allocation/copy to process-local memory. So this can be actually a win.
>
> you have to calculate with signals and interprocess communication. the cost of memory allocation is not all.

Sure. Anyway, we're talking about only kilobytes being sent in this case, so the whole performance discussion is rather moot.

>> The real problem could be if the process that was signaled to connect to the message queue never handles the interrupt, and we keep waiting forever in shm_mq_receive(). We could add a timeout parameter or just let the user cancel the call: send a cancellation request, use pg_cancel_backend() or set statement_timeout before running this.
>
> This is valid question - for begin we can use a statement_timeout and we don't need to design some special (if you don't hold some important lock).
> My example (the code has prototype quality) is little bit longer, but it work without global lock - the requester doesn't block any other

I'll update the commitfest patch to use this technique.

Alex

Re: On-demand running query plans using auto_explain and signals

From

"Shulgin, Oleksandr"

Date:

08 September 2015, 16:54:17

On Tue, Sep 8, 2015 at 11:49 AM, Shulgin, Oleksandr <oleksandr.shulgin@zalando.de> wrote:

>> The real problem could be if the process that was signaled to connect to the message queue never handles the interrupt, and we keep waiting forever in shm_mq_receive(). We could add a timeout parameter or just let the user cancel the call: send a cancellation request, use pg_cancel_backend() or set statement_timeout before running this.
>
> This is valid question - for begin we can use a statement_timeout and we don't need to design some special (if you don't hold some important lock).
> My example (the code has prototype quality) is little bit longer, but it work without global lock - the requester doesn't block any other

I'll update the commitfest patch to use this technique.

Please find attached v4.

Alex

Attachment

explain-pid-v4.patch

Re: On-demand running query plans using auto_explain and signals

From

Pavel Stehule

Date:

09 September 2015, 06:37:27

2015-09-08 18:53 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:

On Tue, Sep 8, 2015 at 11:49 AM, Shulgin, Oleksandr <oleksandr.shulgin@zalando.de> wrote:

>> The real problem could be if the process that was signaled to connect to the message queue never handles the interrupt, and we keep waiting forever in shm_mq_receive(). We could add a timeout parameter or just let the user cancel the call: send a cancellation request, use pg_cancel_backend() or set statement_timeout before running this.
>
> This is valid question - for begin we can use a statement_timeout and we don't need to design some special (if you don't hold some important lock).
> My example (the code has prototype quality) is little bit longer, but it work without global lock - the requester doesn't block any other

I'll update the commitfest patch to use this technique.

Please find attached v4.

It is better

Two notices:

1. The communication mechanism can be used more wide, than only for this purpose. We can introduce a SendInfoHook - and it can be used for any customer probes - memory, cpu, ...

2. With your support for explain of nested queries we have all what we need for integration auto_explain to core.

Regards

Pavel

--
Alex

Re: On-demand running query plans using auto_explain and signals

From

"Shulgin, Oleksandr"

Date:

09 September 2015, 08:22:26

On Wed, Sep 9, 2015 at 8:36 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Please find attached v4.

It is better

One important thing to notice, and probably deserves a code comment, that any modification of the slot fields done by the info sender backend must be done before actually sending the message via the shared memory queue (or detaching one, if there's nothing to be sent). Otherwise it is possible that is_processed flag will be set on the slot that has been already reused by the receiving backend.

I could hit that problem rather easily while doing

select pg_cmdstatus(pid,$1) from pg_stat_activity where pid<>pg_backend_pid() \watch 1

and running pgbench with >= 8 connections in the background. The observed effect is that after a few successful runs, the pg_cmdstatus() never returns because the next signaled backend loops through the slots and doesn't find an unprocessed one (because it was marked as processed by the backend that was responding just before it).

Two notices:

1. The communication mechanism can be used more wide, than only for this purpose. We can introduce a SendInfoHook - and it can be used for any customer probes - memory, cpu, ...

Not sure if for CPU you can get any more insight than an external tool like top(1) will provide. For the memory, it might be indeed useful in some way (memory context inspection?)

So there's certainly a space for generalization. Rename it as pg_backend_info()?

2. With your support for explain of nested queries we have all what we need for integration auto_explain to core.

Well, I don't quite see how that follows. What I'm really after is bringing instrumentation support to this, so that not only plan could be extracted, but also the rows/loops/times accumulated thus far during the query run.

Alex

Re: On-demand running query plans using auto_explain and signals

From

Pavel Stehule

Date:

09 September 2015, 08:47:36

Two notices:

1. The communication mechanism can be used more wide, than only for this purpose. We can introduce a SendInfoHook - and it can be used for any customer probes - memory, cpu, ...

Not sure if for CPU you can get any more insight than an external tool like top(1) will provide. For the memory, it might be indeed useful in some way (memory context inspection?)

CPU is probably nonsense - sorry - but there are more other possibilities, how to use it - simple checking some internal states of extensions without log processing or without gdb, etc

So there's certainly a space for generalization. Rename it as pg_backend_info()?

It is good name, so why not?

2. With your support for explain of nested queries we have all what we need for integration auto_explain to core.

Well, I don't quite see how that follows. What I'm really after is bringing instrumentation support to this, so that not only plan could be extracted, but also the rows/loops/times accumulated thus far during the query run.

It is possible, but not necessary step - but it is premature question in this moment

Regards

Pavel

--
Alex

Re: On-demand running query plans using auto_explain and signals

From

"Shulgin, Oleksandr"

Date:

09 September 2015, 09:55:15

On Wed, Sep 9, 2015 at 10:22 AM, Shulgin, Oleksandr <oleksandr.shulgin@zalando.de> wrote:

On Wed, Sep 9, 2015 at 8:36 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

Please find attached v4.

It is better

One important thing to notice, and probably deserves a code comment, that any modification of the slot fields done by the info sender backend must be done before actually sending the message via the shared memory queue (or detaching one, if there's nothing to be sent). Otherwise it is possible that is_processed flag will be set on the slot that has been already reused by the receiving backend.

Sorry, the latest version did contain some debugging messages I didn't intend to send. I've removed them and also added a comment about the above synchronization problem.

Alex

Attachment

explain-pid-v4.1.patch

Re: On-demand running query plans using auto_explain and signals

From

Tomas Vondra

Date:

12 September 2015, 09:50:48

Hi,

I did a quick initial review of this patch today, so here are my
comments so far:

- ipcs.c should include utils/cmdstatus.h (the compiler complains about implicit declaration of two functions)

- Attempts to get plan for simple insert queries like this
INSERT INTO x SELECT * FROM x;
end with a segfault, because ActivePortal->queryDesc is 0x0 for this query. Needs investigation.

- The lockless approach seems fine to me, although I think the fear of performance issues is a bit moot (I don't think
weexpect large number of processes calling pg_cmdstatus at the same time). But it's not significantly more complex,
sowhy not.

- The patch contains pretty much no documentation, both comments at the code level and user docs. The lack of user
docsis not that a big deal at this point (although the patch seems to be mature enough, although the user-level API
willlikely change).

The lack of code comments is more serious, as it makes the review somewhat more difficult. For example it'd be very
niceto document the contract for the lock-less interface.

- I agree that pg_cmdstatus() is not the best API. Having something like EXPLAIN PID would be nice, but it does not
reallywork for all the request types (just CMD_STATUS_REQUEST_EXPLAIN). Maybe there's not a single API for all cases,
i.e.we should use EXPLAIN PID for one case and invent something different for the other?

- Is there a particular reason why we allocate slots for auxiliary processes and not just for backends (NumBackends)?
Dowe expect those auxiliary processes to ever use this API?

- CleanupCmdStatusSlot seems needlessly complicated. I don't quite see the need for the second argument, or the
internalslot variable. Why not to simply use the MyCmdStatusSlot directly?

- I also don't quite understand why we need to track css_pid for the slot? In what scenario will this actually
matter?

- While being able to get EXPLAIN from the running process is nice, I'm especially interested in getting EXPLAIN
ANALYZEto get insight into the progress of the execution. The are other ways to get the EXPLAIN, e.g. by opening a
differentconnection and actually running it (sure, the plan might have changed since then), but currently there's no
wayto get insight into the progress.

From the thread I get the impression that Oleksandr also finds this useful - correct? What are the plans in this
direction?
ISTM we need at least two things for that to work:
(a) Ability to enable instrumentation on all queries (effectively what auto_explain allows), otherwise we can't
getEXPLAIN ANALYZE on the queries later. But auto_explain is an extension, so that does not seem as a good
matchif this is supposed to be in core. In that case a separate GUC seems appropriate.

(b) Being able to run the InstrEnd* methods repeatedly - the initial message in this thread mentions issues with
InstrEndLoopfor example. So perhaps this is non-trivial.

- And finally, I think we should really support all existing EXPLAIN formats, not just text. We need to support the
otherformats (yaml, json, xml) if we want to use the EXPLAIN PID approach, and it also makes the plans easier to
processby additional tools.

regards

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: On-demand running query plans using auto_explain and signals

From

Pavel Stehule

Date:

12 September 2015, 18:28:38

Thank you for precious check.

2015-09-12 11:50 GMT+02:00 Tomas Vondra <tomas.vondra@2ndquadrant.com>:

Hi,

I did a quick initial review of this patch today, so here are my comments so far:

- ipcs.c should include utils/cmdstatus.h (the compiler complains
about implicit declaration of two functions)

- Attempts to get plan for simple insert queries like this

INSERT INTO x SELECT * FROM x;

end with a segfault, because ActivePortal->queryDesc is 0x0 for this
query. Needs investigation.

- The lockless approach seems fine to me, although I think the fear
of performance issues is a bit moot (I don't think we expect large
number of processes calling pg_cmdstatus at the same time). But
it's not significantly more complex, so why not.

- The patch contains pretty much no documentation, both comments
at the code level and user docs. The lack of user docs is not that
a big deal at this point (although the patch seems to be mature
enough, although the user-level API will likely change).

The lack of code comments is more serious, as it makes the review
somewhat more difficult. For example it'd be very nice to document
the contract for the lock-less interface.

- I agree that pg_cmdstatus() is not the best API. Having something
like EXPLAIN PID would be nice, but it does not really work for
all the request types (just CMD_STATUS_REQUEST_EXPLAIN). Maybe
there's not a single API for all cases, i.e. we should use EXPLAIN
PID for one case and invent something different for the other?

I used this simple interface because it is faster way how to get a prototype. In this patch the most complexity is in interprocess communication and in executor hooking. The UI can be designed later very quickly.

There is consensus about EXPLAIN PID, for other (status, query) I don't want to introduce new keyword, so there will be accessed via functions. But we can introduce psql commands

\qpid - query pid and \spid - status pid

- Is there a particular reason why we allocate slots for auxiliary
processes and not just for backends (NumBackends)? Do we expect those
auxiliary processes to ever use this API?

The introduced interprocess communication (we talked about this possibility week ago) can be used by any diagnostic extension - so aux processes can be interesting too.

- CleanupCmdStatusSlot seems needlessly complicated. I don't quite see
the need for the second argument, or the internal slot variable. Why
not to simply use the MyCmdStatusSlot directly?

- I also don't quite understand why we need to track css_pid for the
slot? In what scenario will this actually matter?

I have to recheck it - but this is safeguard against to change process with same backendId.

- While being able to get EXPLAIN from the running process is nice,
I'm especially interested in getting EXPLAIN ANALYZE to get insight
into the progress of the execution. The are other ways to get the
EXPLAIN, e.g. by opening a different connection and actually running
it (sure, the plan might have changed since then), but currently
there's no way to get insight into the progress.

It can be pretty nice feature. I though about it - as next step of this feature. But I have not idea how it is complex task - maybe not too much.

From the thread I get the impression that Oleksandr also finds this
useful - correct? What are the plans in this direction?

ISTM we need at least two things for that to work:

(a) Ability to enable instrumentation on all queries (effectively
what auto_explain allows), otherwise we can't get EXPLAIN ANALYZE
on the queries later. But auto_explain is an extension, so that
does not seem as a good match if this is supposed to be in core.
In that case a separate GUC seems appropriate.

If described functionality will be implemented, then auto_explain extension will be very thin envelop - I don't see any reason, why it should not be integrated to core.

(b) Being able to run the InstrEnd* methods repeatedly - the initial
message in this thread mentions issues with InstrEndLoop for
example. So perhaps this is non-trivial.

- And finally, I think we should really support all existing EXPLAIN
formats, not just text. We need to support the other formats (yaml,
json, xml) if we want to use the EXPLAIN PID approach, and it also
makes the plans easier to process by additional tools.

surely - there is not any reason why not.

Regards

Pavel

regards

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: On-demand running query plans using auto_explain and signals

From

"Shulgin, Oleksandr"

Date:

14 September 2015, 08:24:14

On Sat, Sep 12, 2015 at 11:50 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

Hi,

I did a quick initial review of this patch today, so here are my comments so far:

Hi Tomas,

First of all, thanks for the review!

- ipcs.c should include utils/cmdstatus.h (the compiler complains
about implicit declaration of two functions)

Correct, though the file name is ipci.c.

- Attempts to get plan for simple insert queries like this

INSERT INTO x SELECT * FROM x;

end with a segfault, because ActivePortal->queryDesc is 0x0 for this
query. Needs investigation.

Yes, I've hit the same problem after submitting the latest version of the patch. For now I've just added a check for queryDesc being not NULL, but I guess the top of the current_query_stack might contains something useful. Something I need to try.

- The lockless approach seems fine to me, although I think the fear
of performance issues is a bit moot (I don't think we expect large
number of processes calling pg_cmdstatus at the same time). But
it's not significantly more complex, so why not.

I believe the main benefit of the less-locking approach is that if something goes wrong when two backends tried to communicate it doesn't prevent the rest of them from operating, because there is no shared (and thus locked) communication channel.

- The patch contains pretty much no documentation, both comments
at the code level and user docs. The lack of user docs is not that
a big deal at this point (although the patch seems to be mature
enough, although the user-level API will likely change).

The lack of code comments is more serious, as it makes the review
somewhat more difficult. For example it'd be very nice to document
the contract for the lock-less interface.

I will add the code comments. The user docs could wait before we decide on the interface, I think.

- I agree that pg_cmdstatus() is not the best API. Having something
like EXPLAIN PID would be nice, but it does not really work for
all the request types (just CMD_STATUS_REQUEST_EXPLAIN). Maybe
there's not a single API for all cases, i.e. we should use EXPLAIN
PID for one case and invent something different for the other?

I can think of something like:

EXPLAIN [ ( option [, ...] ) ] PROCESS <PID>;

where option is extended with:

QUERY
PROGRESS
BACKTRACE

in addition to the usual ANALYZE, VERBOSE, FORMAT, etc.

- Is there a particular reason why we allocate slots for auxiliary
processes and not just for backends (NumBackends)? Do we expect those
auxiliary processes to ever use this API?

If we extend the interface to a more general one, there still might be some space for querying status of checkpointer of bgwriter.

- CleanupCmdStatusSlot seems needlessly complicated. I don't quite see
the need for the second argument, or the internal slot variable. Why
not to simply use the MyCmdStatusSlot directly?

Good point.

- I also don't quite understand why we need to track css_pid for the
slot? In what scenario will this actually matter?

I think it's being only used for error reporting and could help in debugging, but for now that's it.

- While being able to get EXPLAIN from the running process is nice,
I'm especially interested in getting EXPLAIN ANALYZE to get insight
into the progress of the execution. The are other ways to get the
EXPLAIN, e.g. by opening a different connection and actually running
it (sure, the plan might have changed since then), but currently
there's no way to get insight into the progress.

From the thread I get the impression that Oleksandr also finds this
useful - correct? What are the plans in this direction?

ISTM we need at least two things for that to work:

(a) Ability to enable instrumentation on all queries (effectively
what auto_explain allows), otherwise we can't get EXPLAIN ANALYZE
on the queries later. But auto_explain is an extension, so that
does not seem as a good match if this is supposed to be in core.
In that case a separate GUC seems appropriate.

(b) Being able to run the InstrEnd* methods repeatedly - the initial
message in this thread mentions issues with InstrEndLoop for
example. So perhaps this is non-trivial.

I was able to make this work with a simple change to InstrEndLoop and the callers. Basically, adding a bool parameter in_explain and passing an appropriate value. I guess that's not the best approach, but it appears to work.

Adding a GUC to enable instrumentation sounds reasonable.

Do you believe it makes sense to add instrumentation support in this same patch or better focus on making the simplest thing work first?

- And finally, I think we should really support all existing EXPLAIN
formats, not just text. We need to support the other formats (yaml,
json, xml) if we want to use the EXPLAIN PID approach, and it also
makes the plans easier to process by additional tools.

Sure, that was in my plans (and see above for possible syntax). What would be really neat is retrieving the complete backtrace. Not sure what the good interface would look like, but using JSON format for the output sounds promising.

Alex

Re: On-demand running query plans using auto_explain and signals

From

Pavel Stehule

Date:

14 September 2015, 08:45:27

2015-09-14 10:23 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:

I can think of something like:

EXPLAIN [ ( option [, ...] ) ] PROCESS <PID>;

where option is extended with:

QUERY
PROGRESS
BACKTRACE

in addition to the usual ANALYZE, VERBOSE, FORMAT, etc.

It can work

Regards

Pavel

Re: On-demand running query plans using auto_explain and signals

From

Tomas Vondra

Date:

14 September 2015, 09:53:21


On 09/14/2015 10:23 AM, Shulgin, Oleksandr wrote:
> On Sat, Sep 12, 2015 at 11:50 AM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
...
>     - Attempts to get plan for simple insert queries like this
>
>            INSERT INTO x SELECT * FROM x;
>
>        end with a segfault, because ActivePortal->queryDesc is 0x0 for this
>        query. Needs investigation.
>
>
> Yes, I've hit the same problem after submitting the latest version of
> the patch.  For now I've just added a check for queryDesc being not
> NULL, but I guess the top of the current_query_stack might contains
> something useful.  Something I need to try.

Well, the thing is we're able to do EXPLAIN on those queries, and IIRC 
auto_explain can log them too. So perhaps look into the hooks where they 
take the queryDesc in those cases - it has to be available somewhere.

>     - The lockless approach seems fine to me, although I think the fear
>        of performance issues is a bit moot (I don't think we expect large
>        number of processes calling pg_cmdstatus at the same time). But
>        it's not significantly more complex, so why not.
>
>
> I believe the main benefit of the less-locking approach is that if
> something goes wrong when two backends tried to communicate it doesn't
> prevent the rest of them from operating, because there is no shared (and
> thus locked) communication channel.

Sure, but I think it really deserves a bit more detailed explanation of 
the protocol, and discussion of the expected behavior for some basic 
failure types.

For example - what happens when a backend requests info, but dies before 
receiving it, and the backed is reused for another connection? Doesn't 
this introduce a race condition? Perhaps not, I'm just asking.

>
>     - The patch contains pretty much no documentation, both comments
>        at the code level and user docs. The lack of user docs is not that
>        a big deal at this point (although the patch seems to be mature
>        enough, although the user-level API will likely change).
>
>        The lack of code comments is more serious, as it makes the review
>        somewhat more difficult. For example it'd be very nice to document
>        the contract for the lock-less interface.
>
>
> I will add the code comments.  The user docs could wait before we decide
> on the interface, I think.

Agreed, although I think having rudimentary user documentation would be 
useful for the reviewers - a summary of the goals that are a bit 
scattered over the e-mail thread.

>     - I agree that pg_cmdstatus() is not the best API. Having something
>        like EXPLAIN PID would be nice, but it does not really work for
>        all the request types (just CMD_STATUS_REQUEST_EXPLAIN). Maybe
>        there's not a single API for all cases, i.e. we should use EXPLAIN
>        PID for one case and invent something different for the other?
>
>
> I can think of something like:
>
> EXPLAIN [ ( option [, ...] ) ] PROCESS <PID>;
>
> where option is extended with:
>
>    QUERY
>    PROGRESS
>    BACKTRACE
>
> in addition to the usual ANALYZE, VERBOSE, FORMAT, etc.

Seems OK. I'm not quite sure what PROGRESS is - I assume it's the same 
thing as ANALYZE? Why not to use the keyword, then?

>
>     - Is there a particular reason why we allocate slots for auxiliary
>        processes and not just for backends (NumBackends)? Do we expect those
>        auxiliary processes to ever use this API?
>
>
> If we extend the interface to a more general one, there still might be
> some space for querying status of checkpointer of bgwriter.

I don't think we should mix this with monitoring of auxiliary processes. 
This interface is designed at monitoring SQL queries running in other 
backends, effectively "remote" EXPLAIN. But those auxiliary processes 
are not processing SQL queries at all, they're not even using regular 
executor ...

OTOH the ability to request this info (e.g. auxiliary process looking at 
plans running in backends) seems useful, so I'm ok with tuple slots for 
auxiliary processes.

>
>     - CleanupCmdStatusSlot seems needlessly complicated. I don't quite see
>        the need for the second argument, or the internal slot variable. Why
>        not to simply use the MyCmdStatusSlot directly?
>
>
> Good point.
>
>     - I also don't quite understand why we need to track css_pid for the
>        slot? In what scenario will this actually matter?
>
>
> I think it's being only used for error reporting and could help in
> debugging, but for now that's it.

I recommend getting rid of it, unless we have a clear idea of how to use 
it. Otherwise I envision we won't really keep it updated properly 
(because it's "just for debugging"), and then one day someone actually 
starts using it and get bitten by it.

>
>     - While being able to get EXPLAIN from the running process is nice,
>        I'm especially interested in getting EXPLAIN ANALYZE to get insight
>        into the progress of the execution. The are other ways to get the
>        EXPLAIN, e.g. by opening a different connection and actually running
>        it (sure, the plan might have changed since then), but currently
>        there's no way to get insight into the progress.
>
>        From the thread I get the impression that Oleksandr also finds this
>        useful - correct? What are the plans in this direction?
>
>        ISTM we need at least two things for that to work:
>
>        (a) Ability to enable instrumentation on all queries (effectively
>            what auto_explain allows), otherwise we can't get EXPLAIN ANALYZE
>            on the queries later. But auto_explain is an extension, so that
>            does not seem as a good match if this is supposed to be in core.
>            In that case a separate GUC seems appropriate.
>
>        (b) Being able to run the InstrEnd* methods repeatedly - the initial
>            message in this thread mentions issues with InstrEndLoop for
>            example. So perhaps this is non-trivial.
>
>
> I was able to make this work with a simple change to InstrEndLoop and
> the callers.  Basically, adding a bool parameter in_explain and passing
> an appropriate value.  I guess that's not the best approach, but it
> appears to work.
>
> Adding a GUC to enable instrumentation sounds reasonable.

Nice!

>
> Do you believe it makes sense to add instrumentation support in this
> same patch or better focus on making the simplest thing work first?

Let's make it a "patch series" with two patches.

>
>     - And finally, I think we should really support all existing EXPLAIN
>        formats, not just text. We need to support the other formats (yaml,
>        json, xml) if we want to use the EXPLAIN PID approach, and it also
>        makes the plans easier to process by additional tools.
>
>
> Sure, that was in my plans (and see above for possible syntax).  What
> would be really neat is retrieving the complete backtrace.  Not sure
> what the good interface would look like, but using JSON format for the
> output sounds promising.

I don't quite see the reason to encode everything as JSON, because that 
makes it a bit difficult for clients that would prefer YAML, for 
example. Why not to just use the requested format? For example in YAML, 
we can do something like this:
            QUERY PLAN
---------------------------------- - Plan[0]:                      +     Node Type: "Index Only Scan"+     Scan
Direction:"Forward"   +     ...                         +                                 + - Plan[1]:
   +     Node Type: "Index Only Scan"+     Scan Direction: "Forward"   +     ...                         +
                  + - Plan[2]:                      +     Node Type: "Index Only Scan"+     Scan Direction: "Forward"
+    ...                         +
 

and similarly for other formats. We don't really use nesting.

regards

-- 
Tomas Vondra                   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: On-demand running query plans using auto_explain and signals

From

"Shulgin, Oleksandr"

Date:

14 September 2015, 11:16:05

On Mon, Sep 14, 2015 at 11:53 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

On 09/14/2015 10:23 AM, Shulgin, Oleksandr wrote:
On Sat, Sep 12, 2015 at 11:50 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
...
- Attempts to get plan for simple insert queries like this

INSERT INTO x SELECT * FROM x;

end with a segfault, because ActivePortal->queryDesc is 0x0 for this
query. Needs investigation.

Yes, I've hit the same problem after submitting the latest version of
the patch. For now I've just added a check for queryDesc being not
NULL, but I guess the top of the current_query_stack might contains
something useful. Something I need to try.

Well, the thing is we're able to do EXPLAIN on those queries, and IIRC auto_explain can log them too. So perhaps look into the hooks where they take the queryDesc in those cases - it has to be available somewhere.

Yes, this seems to work. I was able to successfully query "create table as" and even "explain analyze" for the plans. In both cases ActivePortal doesn't have a valid queryDesc, but the executor hook captures one.

- The lockless approach seems fine to me, although I think the fear
of performance issues is a bit moot (I don't think we expect large
number of processes calling pg_cmdstatus at the same time). But
it's not significantly more complex, so why not.

I believe the main benefit of the less-locking approach is that if
something goes wrong when two backends tried to communicate it doesn't
prevent the rest of them from operating, because there is no shared (and
thus locked) communication channel.

Sure, but I think it really deserves a bit more detailed explanation of the protocol, and discussion of the expected behavior for some basic failure types.

For example - what happens when a backend requests info, but dies before receiving it, and the backed is reused for another connection? Doesn't this introduce a race condition? Perhaps not, I'm just asking.

Now explained better in code comments.

The worst thing that could happen in this case I believe is that the requesting backend will not receive any response from the second use of its slot due to the slot being marked as processed by the backend that was asked first:

slot->is_processed = false;

slot->is_valid = true;

/* signals backend B */;

shm_mq_read(); /* blocks */

B: /* finds the slot and prepares the response */

A: /* decides to bail out */

InvalidateCmdStatusSlot();

shm_mq_detach();

/* exits pg_cmdstatus() */

/* pg_cmdstatus() is called again */

/* initializes the slot and signals some backend */

shm_mq_read(); /* blocks */

B: /* completes preparing the response */

slot->is_processed = true;

shm_mq_send() => SHM_MQ_DETACHED

slot->is_valid = false;

/* gives up with this slot */

Now the backend that has been signaled on the second call to pg_cmdstatus (it can be either some other backend, or the backend B again) will not find an unprocessed slot, thus it will not try to attach/detach the queue and the backend A will block forever.

This requires a really bad timing and the user should be able to interrupt the querying backend A still.

In any case, the backends that are being asked to send the info will be able to notice the problem (receiver detached early) and handle it gracefully.

- The patch contains pretty much no documentation, both comments
at the code level and user docs. The lack of user docs is not that
a big deal at this point (although the patch seems to be mature
enough, although the user-level API will likely change).

The lack of code comments is more serious, as it makes the review
somewhat more difficult. For example it'd be very nice to document
the contract for the lock-less interface.

I will add the code comments. The user docs could wait before we decide
on the interface, I think.

Agreed, although I think having rudimentary user documentation would be useful for the reviewers - a summary of the goals that are a bit scattered over the e-mail thread.

- I agree that pg_cmdstatus() is not the best API. Having something
like EXPLAIN PID would be nice, but it does not really work for
all the request types (just CMD_STATUS_REQUEST_EXPLAIN). Maybe
there's not a single API for all cases, i.e. we should use EXPLAIN
PID for one case and invent something different for the other?

I can think of something like:

EXPLAIN [ ( option [, ...] ) ] PROCESS <PID>;

where option is extended with:

QUERY
PROGRESS
BACKTRACE

in addition to the usual ANALYZE, VERBOSE, FORMAT, etc.

Seems OK. I'm not quite sure what PROGRESS is - I assume it's the same thing as ANALYZE? Why not to use the keyword, then?

This is what CMD_INFO_PROGRESS_TAG does: return the "partial completion" command tag, like "INSERT 12345", reflecting the number of tuples processed up to this point in time.

- Is there a particular reason why we allocate slots for auxiliary
processes and not just for backends (NumBackends)? Do we expect those
auxiliary processes to ever use this API?

If we extend the interface to a more general one, there still might be
some space for querying status of checkpointer of bgwriter.

I don't think we should mix this with monitoring of auxiliary processes. This interface is designed at monitoring SQL queries running in other backends, effectively "remote" EXPLAIN. But those auxiliary processes are not processing SQL queries at all, they're not even using regular executor ...

OTOH the ability to request this info (e.g. auxiliary process looking at plans running in backends) seems useful, so I'm ok with tuple slots for auxiliary processes.

Now that I think about it, reserving the slots for aux process doesn't let us query their status, it's the other way round. If we don't reserve them, then aux process would not be able to query any other process for the status. Likely this is not a problem at all, so we can remove these extra slots.

- CleanupCmdStatusSlot seems needlessly complicated. I don't quite see
the need for the second argument, or the internal slot variable. Why
not to simply use the MyCmdStatusSlot directly?

Good point.

- I also don't quite understand why we need to track css_pid for the
slot? In what scenario will this actually matter?

I think it's being only used for error reporting and could help in
debugging, but for now that's it.

I recommend getting rid of it, unless we have a clear idea of how to use it. Otherwise I envision we won't really keep it updated properly (because it's "just for debugging"), and then one day someone actually starts using it and get bitten by it.

The design was copied over from procsignal.c, where the pss_pid is used to "claim" the procsignal slot. There it's used for searching the target process slot when backend id is not known. We don't use that in cmd status slots because of the inverted slots logic, so we could really remove this field.

- While being able to get EXPLAIN from the running process is nice,
I'm especially interested in getting EXPLAIN ANALYZE to get insight
into the progress of the execution. The are other ways to get the
EXPLAIN, e.g. by opening a different connection and actually running
it (sure, the plan might have changed since then), but currently
there's no way to get insight into the progress.

From the thread I get the impression that Oleksandr also finds this
useful - correct? What are the plans in this direction?

ISTM we need at least two things for that to work:

(a) Ability to enable instrumentation on all queries (effectively
what auto_explain allows), otherwise we can't get EXPLAIN ANALYZE
on the queries later. But auto_explain is an extension, so that
does not seem as a good match if this is supposed to be in core.
In that case a separate GUC seems appropriate.

(b) Being able to run the InstrEnd* methods repeatedly - the initial
message in this thread mentions issues with InstrEndLoop for
example. So perhaps this is non-trivial.

I was able to make this work with a simple change to InstrEndLoop and
the callers. Basically, adding a bool parameter in_explain and passing
an appropriate value. I guess that's not the best approach, but it
appears to work.

Adding a GUC to enable instrumentation sounds reasonable.

Nice!

Do you believe it makes sense to add instrumentation support in this
same patch or better focus on making the simplest thing work first?

Let's make it a "patch series" with two patches.

OK, I'll submit them later today after addressing all the comments.

- And finally, I think we should really support all existing EXPLAIN
formats, not just text. We need to support the other formats (yaml,
json, xml) if we want to use the EXPLAIN PID approach, and it also
makes the plans easier to process by additional tools.

Sure, that was in my plans (and see above for possible syntax). What
would be really neat is retrieving the complete backtrace. Not sure
what the good interface would look like, but using JSON format for the
output sounds promising.

I don't quite see the reason to encode everything as JSON, because that makes it a bit difficult for clients that would prefer YAML, for example. Why not to just use the requested format? For example in YAML, we can do something like this:

QUERY PLAN
----------------------------------
- Plan[0]: +
Node Type: "Index Only Scan"+
Scan Direction: "Forward" +
... +
+
- Plan[1]: +
Node Type: "Index Only Scan"+
Scan Direction: "Forward" +
... +
+
- Plan[2]: +
Node Type: "Index Only Scan"+
Scan Direction: "Forward" +
... +

and similarly for other formats. We don't really use nesting.

You're right, the nesting structure is too simple to require a certain format. Do you have ideas about representing stack frames in TEXT format? Something like what gdb does with #0, #1, etc. markers for frame depth?

Alex

Re: On-demand running query plans using auto_explain and signals

From

Tomas Vondra

Date:

14 September 2015, 12:11:35


On 09/14/2015 01:15 PM, Shulgin, Oleksandr wrote:
> On Mon, Sep 14, 2015 at 11:53 AM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
>
>
>     On 09/14/2015 10:23 AM, Shulgin, Oleksandr wrote:
>
>         On Sat, Sep 12, 2015 at 11:50 AM, Tomas Vondra
>         <tomas.vondra@2ndquadrant.com
>         <mailto:tomas.vondra@2ndquadrant.com>
>         <mailto:tomas.vondra@2ndquadrant.com
>         <mailto:tomas.vondra@2ndquadrant.com>>> wrote:
>
>     ...
>
>              - Attempts to get plan for simple insert queries like this
>
>                     INSERT INTO x SELECT * FROM x;
>
>                 end with a segfault, because ActivePortal->queryDesc is
>         0x0 for this
>                 query. Needs investigation.
>
>
>         Yes, I've hit the same problem after submitting the latest
>         version of
>         the patch.  For now I've just added a check for queryDesc being not
>         NULL, but I guess the top of the current_query_stack might contains
>         something useful.  Something I need to try.
>
>
>     Well, the thing is we're able to do EXPLAIN on those queries, and
>     IIRC auto_explain can log them too. So perhaps look into the hooks
>     where they take the queryDesc in those cases - it has to be
>     available somewhere.
>
>
> Yes, this seems to work.  I was able to successfully query "create table
> as" and even "explain analyze" for the plans.  In both cases
> ActivePortal doesn't have a valid queryDesc, but the executor hook
> captures one.
>
>              - The lockless approach seems fine to me, although I think
>         the fear
>                 of performance issues is a bit moot (I don't think we
>         expect large
>                 number of processes calling pg_cmdstatus at the same
>         time). But
>                 it's not significantly more complex, so why not.
>
>
>         I believe the main benefit of the less-locking approach is that if
>         something goes wrong when two backends tried to communicate it
>         doesn't
>         prevent the rest of them from operating, because there is no
>         shared (and
>         thus locked) communication channel.
>
>
>     Sure, but I think it really deserves a bit more detailed explanation
>     of the protocol, and discussion of the expected behavior for some
>     basic failure types.
>
>     For example - what happens when a backend requests info, but dies
>     before receiving it, and the backed is reused for another
>     connection? Doesn't this introduce a race condition? Perhaps not,
>     I'm just asking.
>
>
> Now explained better in code comments.
>
> The worst thing that could happen in this case I believe is that the
> requesting backend will not receive any response from the second use of
> its slot due to the slot being marked as processed by the backend that
> was asked first:
>
> A:
> slot->is_processed = false;
> slot->is_valid = true;
> /* signals backend B */;
> shm_mq_read(); /* blocks */
>
> B: /* finds the slot and prepares the response */
>
> A: /* decides to bail out */
> InvalidateCmdStatusSlot();
> shm_mq_detach();
> /* exits pg_cmdstatus() */
>
> /* pg_cmdstatus() is called again */
> /* initializes the slot and signals some backend */
> shm_mq_read(); /* blocks */
>
> B: /* completes preparing the response */
> slot->is_processed = true;
> shm_mq_send() => SHM_MQ_DETACHED
> slot->is_valid = false;
> /* gives up with this slot */
>
> Now the backend that has been signaled on the second call to
> pg_cmdstatus (it can be either some other backend, or the backend B
> again) will not find an unprocessed slot, thus it will not try to
> attach/detach the queue and the backend A will block forever.
>
> This requires a really bad timing and the user should be able to
> interrupt the querying backend A still.

I think we can't rely on the low probability that this won't happen, and 
we should not rely on people interrupting the backend. Being able to 
detect the situation and fail gracefully should be possible.

It may be possible to introduce some lock-less protocol preventing such 
situations, but it's not there at the moment. If you believe it's 
possible, you need to explain and "prove" that it's actually safe.

Otherwise we may need to introduce some basic locking - for example we 
may introduce a LWLock for each slot, and lock it with dontWait=true 
(and skip it if we couldn't lock it). This should prevent most scenarios 
where one corrupted slot blocks many processes.

>
> In any case, the backends that are being asked to send the info will be
> able to notice the problem (receiver detached early) and handle it
> gracefully.

Ummm, how? Maybe I missed something?


>     I don't think we should mix this with monitoring of auxiliary
>     processes. This interface is designed at monitoring SQL queries
>     running in other backends, effectively "remote" EXPLAIN. But those
>     auxiliary processes are not processing SQL queries at all, they're
>     not even using regular executor ...
>
>     OTOH the ability to request this info (e.g. auxiliary process
>     looking at plans running in backends) seems useful, so I'm ok with
>     tuple slots for auxiliary processes.
>
>
> Now that I think about it, reserving the slots for aux process doesn't
> let us query their status, it's the other way round.  If we don't
> reserve them, then aux process would not be able to query any other
> process for the status.  Likely this is not a problem at all, so we can
> remove these extra slots.

I don't know. I can imagine using this from background workers, but I 
think those are counted as regular backends (not sure though).

>
>              - I also don't quite understand why we need to track
>         css_pid for the
>                 slot? In what scenario will this actually matter?
>
>
>         I think it's being only used for error reporting and could help in
>         debugging, but for now that's it.
>
>
>     I recommend getting rid of it, unless we have a clear idea of how to
>     use it. Otherwise I envision we won't really keep it updated
>     properly (because it's "just for debugging"), and then one day
>     someone actually starts using it and get bitten by it.
>
>
> The design was copied over from procsignal.c, where the pss_pid is used
> to "claim" the procsignal slot.  There it's used for searching the
> target process slot when backend id is not known.  We don't use that in
> cmd status slots because of the inverted slots logic, so we could really
> remove this field.

OK.

>     I don't quite see the reason to encode everything as JSON, because
>     that makes it a bit difficult for clients that would prefer YAML,
>     for example. Why not to just use the requested format? For example
>     in YAML, we can do something like this:
>
>                  QUERY PLAN
>     ----------------------------------
>       - Plan[0]:                      +
>           Node Type: "Index Only Scan"+
>           Scan Direction: "Forward"   +
>           ...                         +
>                                       +
>       - Plan[1]:                      +
>           Node Type: "Index Only Scan"+
>           Scan Direction: "Forward"   +
>           ...                         +
>                                       +
>       - Plan[2]:                      +
>           Node Type: "Index Only Scan"+
>           Scan Direction: "Forward"   +
>           ...                         +
>
>     and similarly for other formats. We don't really use nesting.
>
>
> You're right, the nesting structure is too simple to require a certain
> format.  Do you have ideas about representing stack frames in TEXT
> format?  Something like what gdb does with #0, #1, etc. markers for
> frame depth?

Yes. The simpler the format, the better.

regards

-- 
Tomas Vondra                   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: On-demand running query plans using auto_explain and signals

From

Pavel Stehule

Date:

14 September 2015, 12:40:24

2015-09-14 14:11 GMT+02:00 Tomas Vondra <tomas.vondra@2ndquadrant.com>:

On 09/14/2015 01:15 PM, Shulgin, Oleksandr wrote:
On Mon, Sep 14, 2015 at 11:53 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:

On 09/14/2015 10:23 AM, Shulgin, Oleksandr wrote:

On Sat, Sep 12, 2015 at 11:50 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com
<mailto:tomas.vondra@2ndquadrant.com>
<mailto:tomas.vondra@2ndquadrant.com

<mailto:tomas.vondra@2ndquadrant.com>>> wrote:

...

- Attempts to get plan for simple insert queries like this

INSERT INTO x SELECT * FROM x;

end with a segfault, because ActivePortal->queryDesc is
0x0 for this
query. Needs investigation.

Yes, I've hit the same problem after submitting the latest
version of
the patch. For now I've just added a check for queryDesc being not
NULL, but I guess the top of the current_query_stack might contains
something useful. Something I need to try.

Well, the thing is we're able to do EXPLAIN on those queries, and
IIRC auto_explain can log them too. So perhaps look into the hooks
where they take the queryDesc in those cases - it has to be
available somewhere.

Yes, this seems to work. I was able to successfully query "create table
as" and even "explain analyze" for the plans. In both cases
ActivePortal doesn't have a valid queryDesc, but the executor hook
captures one.

- The lockless approach seems fine to me, although I think
the fear
of performance issues is a bit moot (I don't think we
expect large
number of processes calling pg_cmdstatus at the same
time). But
it's not significantly more complex, so why not.

I believe the main benefit of the less-locking approach is that if
something goes wrong when two backends tried to communicate it
doesn't
prevent the rest of them from operating, because there is no
shared (and
thus locked) communication channel.

Sure, but I think it really deserves a bit more detailed explanation
of the protocol, and discussion of the expected behavior for some
basic failure types.

For example - what happens when a backend requests info, but dies
before receiving it, and the backed is reused for another
connection? Doesn't this introduce a race condition? Perhaps not,
I'm just asking.

Now explained better in code comments.

The worst thing that could happen in this case I believe is that the
requesting backend will not receive any response from the second use of
its slot due to the slot being marked as processed by the backend that
was asked first:

A:
slot->is_processed = false;
slot->is_valid = true;
/* signals backend B */;
shm_mq_read(); /* blocks */

B: /* finds the slot and prepares the response */

A: /* decides to bail out */
InvalidateCmdStatusSlot();
shm_mq_detach();
/* exits pg_cmdstatus() */

/* pg_cmdstatus() is called again */
/* initializes the slot and signals some backend */
shm_mq_read(); /* blocks */

B: /* completes preparing the response */
slot->is_processed = true;
shm_mq_send() => SHM_MQ_DETACHED
slot->is_valid = false;
/* gives up with this slot */

hmm .. yes - there have to be check if slot is related still to queue before to change is_valid attribute.

Regards

Pavel

Re: On-demand running query plans using auto_explain and signals

From

"Shulgin, Oleksandr"

Date:

14 September 2015, 13:09:32

On Mon, Sep 14, 2015 at 2:11 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

Now the backend that has been signaled on the second call to
pg_cmdstatus (it can be either some other backend, or the backend B
again) will not find an unprocessed slot, thus it will not try to
attach/detach the queue and the backend A will block forever.

This requires a really bad timing and the user should be able to
interrupt the querying backend A still.

I think we can't rely on the low probability that this won't happen, and we should not rely on people interrupting the backend. Being able to detect the situation and fail gracefully should be possible.

It may be possible to introduce some lock-less protocol preventing such situations, but it's not there at the moment. If you believe it's possible, you need to explain and "prove" that it's actually safe.

Otherwise we may need to introduce some basic locking - for example we may introduce a LWLock for each slot, and lock it with dontWait=true (and skip it if we couldn't lock it). This should prevent most scenarios where one corrupted slot blocks many processes.

OK, I will revisit this part then.

In any case, the backends that are being asked to send the info will be
able to notice the problem (receiver detached early) and handle it
gracefully.

Ummm, how? Maybe I missed something?

Well, I didn't attach the updated patch (doing that now). The basic idea is that when the backend that has requested information bails out prematurely it still detaches from the shared memory queue. This makes it possible for the backend being asked to detect the situation either before attaching to the queue or when trying to send the data, so it won't be blocked forever if the other backend failed to wait.

I don't think we should mix this with monitoring of auxiliary
processes. This interface is designed at monitoring SQL queries
running in other backends, effectively "remote" EXPLAIN. But those
auxiliary processes are not processing SQL queries at all, they're
not even using regular executor ...

OTOH the ability to request this info (e.g. auxiliary process
looking at plans running in backends) seems useful, so I'm ok with
tuple slots for auxiliary processes.

Now that I think about it, reserving the slots for aux process doesn't
let us query their status, it's the other way round. If we don't
reserve them, then aux process would not be able to query any other
process for the status. Likely this is not a problem at all, so we can
remove these extra slots.

I don't know. I can imagine using this from background workers, but I think those are counted as regular backends (not sure though).

MaxBackends includes the background workers, yes.

Alex

Re: On-demand running query plans using auto_explain and signals

From

"Shulgin, Oleksandr"

Date:

14 September 2015, 13:34:35

Well, I didn't attach the updated patch (doing that now).

This time for real. Sorry, it's Monday :-p

Attachment

explain-pid-v4.2.patch

Re: On-demand running query plans using auto_explain and signals

From

"Shulgin, Oleksandr"

Date:

14 September 2015, 16:47:17

On Mon, Sep 14, 2015 at 3:09 PM, Shulgin, Oleksandr <oleksandr.shulgin@zalando.de> wrote:

On Mon, Sep 14, 2015 at 2:11 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

Now the backend that has been signaled on the second call to
pg_cmdstatus (it can be either some other backend, or the backend B
again) will not find an unprocessed slot, thus it will not try to
attach/detach the queue and the backend A will block forever.

This requires a really bad timing and the user should be able to
interrupt the querying backend A still.

I think we can't rely on the low probability that this won't happen, and we should not rely on people interrupting the backend. Being able to detect the situation and fail gracefully should be possible.

It may be possible to introduce some lock-less protocol preventing such situations, but it's not there at the moment. If you believe it's possible, you need to explain and "prove" that it's actually safe.

Otherwise we may need to introduce some basic locking - for example we may introduce a LWLock for each slot, and lock it with dontWait=true (and skip it if we couldn't lock it). This should prevent most scenarios where one corrupted slot blocks many processes.

OK, I will revisit this part then.

I have a radical proposal to remove the need for locking: make the CmdStatusSlot struct consist of a mere dsm_handle and move all the required metadata like sender_pid, request_type, etc. into the shared memory segment itself.

If we allow the only the requesting process to update the slot (that is the handle value itself) this removes the need for locking between sender and receiver.

The sender will walk through the slots looking for a non-zero dsm handle (according to dsm_create() implementation 0 is considered an invalid handle), and if it finds a valid one, it will attach and look inside, to check if it's destined for this process ID. At first that might sound strange, but I would expect 99% of the time that the only valid slot would be for the process that has been just signaled.

The sender process will then calculate the response message, update the result_code in the shared memory segment and finally send the message through the queue. If the receiver has since detached we get a detached result code and bail out.

Clearing the slot after receiving the message should be the requesting process' responsibility. This way the receiver only writes to the slot and the sender only reads from it.

By the way, is it safe to assume atomic read/writes of dsm_handle (uint32)? I would be surprised if not.

Alex

Re: On-demand running query plans using auto_explain and signals

From

Pavel Stehule

Date:

14 September 2015, 17:27:59

2015-09-14 18:46 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:

On Mon, Sep 14, 2015 at 3:09 PM, Shulgin, Oleksandr <oleksandr.shulgin@zalando.de> wrote:
On Mon, Sep 14, 2015 at 2:11 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

Now the backend that has been signaled on the second call to
pg_cmdstatus (it can be either some other backend, or the backend B
again) will not find an unprocessed slot, thus it will not try to
attach/detach the queue and the backend A will block forever.

This requires a really bad timing and the user should be able to
interrupt the querying backend A still.

I think we can't rely on the low probability that this won't happen, and we should not rely on people interrupting the backend. Being able to detect the situation and fail gracefully should be possible.

It may be possible to introduce some lock-less protocol preventing such situations, but it's not there at the moment. If you believe it's possible, you need to explain and "prove" that it's actually safe.

Otherwise we may need to introduce some basic locking - for example we may introduce a LWLock for each slot, and lock it with dontWait=true (and skip it if we couldn't lock it). This should prevent most scenarios where one corrupted slot blocks many processes.

OK, I will revisit this part then.

I have a radical proposal to remove the need for locking: make the CmdStatusSlot struct consist of a mere dsm_handle and move all the required metadata like sender_pid, request_type, etc. into the shared memory segment itself.

If we allow the only the requesting process to update the slot (that is the handle value itself) this removes the need for locking between sender and receiver.

The sender will walk through the slots looking for a non-zero dsm handle (according to dsm_create() implementation 0 is considered an invalid handle), and if it finds a valid one, it will attach and look inside, to check if it's destined for this process ID. At first that might sound strange, but I would expect 99% of the time that the only valid slot would be for the process that has been just signaled.

The sender process will then calculate the response message, update the result_code in the shared memory segment and finally send the message through the queue. If the receiver has since detached we get a detached result code and bail out.

Clearing the slot after receiving the message should be the requesting process' responsibility. This way the receiver only writes to the slot and the sender only reads from it.

By the way, is it safe to assume atomic read/writes of dsm_handle (uint32)? I would be surprised if not.

I don't see any reason why it should not to work - only few processes will wait for data - so lost attach/detach shm operations will not be too much.

Pavel

--
Alex

Re: On-demand running query plans using auto_explain and signals

From

"Shulgin, Oleksandr"

Date:

15 September 2015, 09:01:26

On Mon, Sep 14, 2015 at 7:27 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

2015-09-14 18:46 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:

I have a radical proposal to remove the need for locking: make the CmdStatusSlot struct consist of a mere dsm_handle and move all the required metadata like sender_pid, request_type, etc. into the shared memory segment itself.

If we allow the only the requesting process to update the slot (that is the handle value itself) this removes the need for locking between sender and receiver.

The sender will walk through the slots looking for a non-zero dsm handle (according to dsm_create() implementation 0 is considered an invalid handle), and if it finds a valid one, it will attach and look inside, to check if it's destined for this process ID. At first that might sound strange, but I would expect 99% of the time that the only valid slot would be for the process that has been just signaled.

The sender process will then calculate the response message, update the result_code in the shared memory segment and finally send the message through the queue. If the receiver has since detached we get a detached result code and bail out.

Clearing the slot after receiving the message should be the requesting process' responsibility. This way the receiver only writes to the slot and the sender only reads from it.

By the way, is it safe to assume atomic read/writes of dsm_handle (uint32)? I would be surprised if not.

I don't see any reason why it should not to work - only few processes will wait for data - so lost attach/detach shm operations will not be too much.

Please see attached for implementation of this approach. The most surprising thing is that it actually works :)

One problem still remains with the process requesting information when the target process exits before it can have a chance to handle the signal. The requesting process then waits forever, because nobody attaches/detaches the queue. We've discussed this before and looks like we need to introduce a timeout parameter to pg_cmdstatus()...

Alex

Attachment

explain-pid-v5.patch

Re: On-demand running query plans using auto_explain and signals

From

"Shulgin, Oleksandr"

Date:

16 September 2015, 14:32:14

On Tue, Sep 15, 2015 at 11:00 AM, Shulgin, Oleksandr <oleksandr.shulgin@zalando.de> wrote:

On Mon, Sep 14, 2015 at 7:27 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

2015-09-14 18:46 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:

... This way the receiver only writes to the slot and the sender only reads from it.

By the way, is it safe to assume atomic read/writes of dsm_handle (uint32)? I would be surprised if not.

I don't see any reason why it should not to work - only few processes will wait for data - so lost attach/detach shm operations will not be too much.

Please see attached for implementation of this approach. The most surprising thing is that it actually works :)

One problem still remains with the process requesting information when the target process exits before it can have a chance to handle the signal. The requesting process then waits forever, because nobody attaches/detaches the queue. We've discussed this before and looks like we need to introduce a timeout parameter to pg_cmdstatus()...

I've added the timeout parameter to the pg_cmdstatus call, and more importantly to the sending side of the queue, so that one can limit the potential effect of handling the interrupt in case something goes really wrong.

I've tested a number of possible scenarios with artificial delays in reply and sending cancel request or SIGTERM to the receiving side and this is all seems to work nicely due to proper message queue detach event notification. Still I think it helps to know that there is a timeout in case the receiving side is really slow to read the message.

I've also decided we really ought to suppress any possible ERROR level messages generated during the course of processing the status request in order not to prevent the originally running transaction to complete. The errors so caught are just logged using LOG level and ignored in this new version of the patch.

I'm also submitting the instrumentation support as a separate patch on top of this. I'm not really fond of adding bool parameter to InstrEndLoop, but for now I didn't find any better way.

What I'm now thinking about is probably we can extend this backend communication mechanism in order to query GUC values effective in a different backend or even setting the values. The obvious candidate to check when you see some query misbehaving would be work_mem, for example. Or we could provide a report of all settings that were overridden in the backend's session, to the effect of running something like this:

select * from pg_settings where context = 'user' and setting != reset_val;

The obvious candidates to be set externally are the cmdstatus_analyze/instrument_*: when you decided you want to turn them on, you'd rather do that carefully for a single backend than globally per-cluster. One can still modify the postgresql.conf and then send SIGHUP to just a single backend, but I think a more direct way to alter the settings would be better.

In this light should we rename the API to something like "backend control" instead of "command status"? The SHOW/SET syntax could be extended to support the remote setting retrieval/update.

Alex

Attachment

Re: On-demand running query plans using auto_explain and signals

From

Pavel Stehule

Date:

16 September 2015, 18:07:59

2015-09-16 16:31 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:

On Tue, Sep 15, 2015 at 11:00 AM, Shulgin, Oleksandr <oleksandr.shulgin@zalando.de> wrote:
On Mon, Sep 14, 2015 at 7:27 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

2015-09-14 18:46 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:

... This way the receiver only writes to the slot and the sender only reads from it.

By the way, is it safe to assume atomic read/writes of dsm_handle (uint32)? I would be surprised if not.

I don't see any reason why it should not to work - only few processes will wait for data - so lost attach/detach shm operations will not be too much.

Please see attached for implementation of this approach. The most surprising thing is that it actually works :)

One problem still remains with the process requesting information when the target process exits before it can have a chance to handle the signal. The requesting process then waits forever, because nobody attaches/detaches the queue. We've discussed this before and looks like we need to introduce a timeout parameter to pg_cmdstatus()...

I've added the timeout parameter to the pg_cmdstatus call, and more importantly to the sending side of the queue, so that one can limit the potential effect of handling the interrupt in case something goes really wrong.

I don't think so introduction new user visible timer is good idea. This timer should be invisible

My idea - send a signal and wait 1sec, then check if target process is living still. Stop if not. Wait next 5 sec, then check target process. Stop if this process doesn't live or raise warning "requested process doesn't communicate, waiting.." The final cancel should be done by statement_timeout.

what do you think about it?

I've tested a number of possible scenarios with artificial delays in reply and sending cancel request or SIGTERM to the receiving side and this is all seems to work nicely due to proper message queue detach event notification. Still I think it helps to know that there is a timeout in case the receiving side is really slow to read the message.

I've also decided we really ought to suppress any possible ERROR level messages generated during the course of processing the status request in order not to prevent the originally running transaction to complete. The errors so caught are just logged using LOG level and ignored in this new version of the patch.

I'm also submitting the instrumentation support as a separate patch on top of this. I'm not really fond of adding bool parameter to InstrEndLoop, but for now I didn't find any better way.

What I'm now thinking about is probably we can extend this backend communication mechanism in order to query GUC values effective in a different backend or even setting the values. The obvious candidate to check when you see some query misbehaving would be work_mem, for example. Or we could provide a report of all settings that were overridden in the backend's session, to the effect of running something like this:

select * from pg_settings where context = 'user' and setting != reset_val;

this is a good idea. More times I interested what is current setting of query. We cannot to use simple query - because it is not possible to push PID to function simply, but you can to write function pg_settings_pid() so usage can look like

select * from pg_settings_pid(xxxx, possible other params) where ...

The obvious candidates to be set externally are the cmdstatus_analyze/instrument_*: when you decided you want to turn them on, you'd rather do that carefully for a single backend than globally per-cluster. One can still modify the postgresql.conf and then send SIGHUP to just a single backend, but I think a more direct way to alter the settings would be better.

I am 100% for possibility to read the setting. But reconfiguration from other process is too hot coffee - it can be available via extension, but not via usually used tools.

In this light should we rename the API to something like "backend control" instead of "command status"? The SHOW/SET syntax could be extended to support the remote setting retrieval/update.

prepare API, and this functionality can be part of referential implementation in contrib.

This patch should not to have too range be finished in this release cycle.

Regards

Pavel

--
Alex

Re: On-demand running query plans using auto_explain and signals

From

"Shulgin, Oleksandr"

Date:

17 September 2015, 09:56:21

On Wed, Sep 16, 2015 at 8:07 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

2015-09-16 16:31 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:

I've added the timeout parameter to the pg_cmdstatus call, and more importantly to the sending side of the queue, so that one can limit the potential effect of handling the interrupt in case something goes really wrong.

I don't think so introduction new user visible timer is good idea. This timer should be invisible

My idea - send a signal and wait 1sec, then check if target process is living still. Stop if not. Wait next 5 sec, then check target process. Stop if this process doesn't live or raise warning "requested process doesn't communicate, waiting.." The final cancel should be done by statement_timeout.

what do you think about it?

That won't work really well with something like I use to do when testing this patch, namely:

postgres=# select pid, array(select pg_cmdstatus(pid, 1, 10)) from pg_stat_activity where pid<>pg_backend_pid() \watch 1

while also running pgbench with -C option (to create new connection for every transaction). When a targeted backend exits before it can handle the signal, the receiving process keeps waiting forever.

The statement_timeout in this case will stop the whole select, not just individual function call. Unless you wrap it to set statement_timeout and catch QUERY_CANCELED in plpgsql, but then you won't be able to ^C the whole select. The ability to set a (short) timeout for the function itself proves to be a really useful feature to me.

We can still communicate some warnings to the client when no timeout is specified (and make 0 the default value actually).

What I'm now thinking about is probably we can extend this backend communication mechanism in order to query GUC values effective in a different backend or even setting the values. The obvious candidate to check when you see some query misbehaving would be work_mem, for example. Or we could provide a report of all settings that were overridden in the backend's session, to the effect of running something like this:

select * from pg_settings where context = 'user' and setting != reset_val;

this is a good idea. More times I interested what is current setting of query. We cannot to use simple query - because it is not possible to push PID to function simply, but you can to write function pg_settings_pid() so usage can look like

select * from pg_settings_pid(xxxx, possible other params) where ...

I would rather have a more general way to run *readonly* queries in the other backend than invent a new function for every occurrence.

The obvious candidates to be set externally are the cmdstatus_analyze/instrument_*: when you decided you want to turn them on, you'd rather do that carefully for a single backend than globally per-cluster. One can still modify the postgresql.conf and then send SIGHUP to just a single backend, but I think a more direct way to alter the settings would be better.

I am 100% for possibility to read the setting. But reconfiguration from other process is too hot coffee - it can be available via extension, but not via usually used tools.

It can be reserved to superuser, and nobody is forcing one to use it anyway. :-)

In this light should we rename the API to something like "backend control" instead of "command status"? The SHOW/SET syntax could be extended to support the remote setting retrieval/update.

prepare API, and this functionality can be part of referential implementation in contrib.

This patch should not to have too range be finished in this release cycle.

These are just the thoughts on what could be achieved using this cross-backend communication mechanism and ideas for generalization of the API.

Alex

Re: On-demand running query plans using auto_explain and signals

From

Pavel Stehule

Date:

17 September 2015, 10:06:56

2015-09-17 11:55 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:

On Wed, Sep 16, 2015 at 8:07 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

2015-09-16 16:31 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:

I've added the timeout parameter to the pg_cmdstatus call, and more importantly to the sending side of the queue, so that one can limit the potential effect of handling the interrupt in case something goes really wrong.

I don't think so introduction new user visible timer is good idea. This timer should be invisible

My idea - send a signal and wait 1sec, then check if target process is living still. Stop if not. Wait next 5 sec, then check target process. Stop if this process doesn't live or raise warning "requested process doesn't communicate, waiting.." The final cancel should be done by statement_timeout.

what do you think about it?

That won't work really well with something like I use to do when testing this patch, namely:

postgres=# select pid, array(select pg_cmdstatus(pid, 1, 10)) from pg_stat_activity where pid<>pg_backend_pid() \watch 1

while also running pgbench with -C option (to create new connection for every transaction). When a targeted backend exits before it can handle the signal, the receiving process keeps waiting forever.

no - every timeout you have to check, if targeted backend is living still, if not you will do cancel. It is 100% safe.

The statement_timeout in this case will stop the whole select, not just individual function call. Unless you wrap it to set statement_timeout and catch QUERY_CANCELED in plpgsql, but then you won't be able to ^C the whole select. The ability to set a (short) timeout for the function itself proves to be a really useful feature to me.

you cannot to handle QUERY_CANCELED in plpgsql. There is need some internal timeout - but this timeout should not be visible - any new GUC increase PostgreSQL complexity - and there is not a reason why do it.

We can still communicate some warnings to the client when no timeout is specified (and make 0 the default value actually).

What I'm now thinking about is probably we can extend this backend communication mechanism in order to query GUC values effective in a different backend or even setting the values. The obvious candidate to check when you see some query misbehaving would be work_mem, for example. Or we could provide a report of all settings that were overridden in the backend's session, to the effect of running something like this:

select * from pg_settings where context = 'user' and setting != reset_val;

this is a good idea. More times I interested what is current setting of query. We cannot to use simple query - because it is not possible to push PID to function simply, but you can to write function pg_settings_pid() so usage can look like

select * from pg_settings_pid(xxxx, possible other params) where ...

I would rather have a more general way to run *readonly* queries in the other backend than invent a new function for every occurrence.

The obvious candidates to be set externally are the cmdstatus_analyze/instrument_*: when you decided you want to turn them on, you'd rather do that carefully for a single backend than globally per-cluster. One can still modify the postgresql.conf and then send SIGHUP to just a single backend, but I think a more direct way to alter the settings would be better.

I am 100% for possibility to read the setting. But reconfiguration from other process is too hot coffee - it can be available via extension, but not via usually used tools.

It can be reserved to superuser, and nobody is forcing one to use it anyway. :-)

In this light should we rename the API to something like "backend control" instead of "command status"? The SHOW/SET syntax could be extended to support the remote setting retrieval/update.

prepare API, and this functionality can be part of referential implementation in contrib.

This patch should not to have too range be finished in this release cycle.

These are just the thoughts on what could be achieved using this cross-backend communication mechanism and ideas for generalization of the API.

--
Alex

Re: On-demand running query plans using auto_explain and signals

From

Robert Haas

Date:

17 September 2015, 12:07:00

On Tue, Sep 15, 2015 at 5:00 AM, Shulgin, Oleksandr
<oleksandr.shulgin@zalando.de> wrote:
> Please see attached for implementation of this approach.  The most
> surprising thing is that it actually works :)

It's cool to see these interesting ideas for using some of the code
I've been working on for the last couple of years.  However, it seems
to me that a better design would avoid the use of shm_mq.  Instead of
having the guy who asks the question create a DSM, have the guy who
sends back the answer create it.  Then, he can just make the DSM big
enough to store the entire string.  I think that's going to be a lot
more robust than what you have here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: On-demand running query plans using auto_explain and signals

From

Robert Haas

Date:

17 September 2015, 12:14:10

On Wed, Sep 16, 2015 at 10:31 AM, Shulgin, Oleksandr
<oleksandr.shulgin@zalando.de> wrote:
> I've also decided we really ought to suppress any possible ERROR level
> messages generated during the course of processing the status request in
> order not to prevent the originally running transaction to complete.  The
> errors so caught are just logged using LOG level and ignored in this new
> version of the patch.

This plan has almost zero chance of surviving committer-level
scrutiny, and if it somehow does, some other committer will scream
bloody murder as soon as they notice it.

It's not safe to just catch an error and continue on executing.  You
have to abort the (sub)transaction once an error happens.

Of course, this gets at a pretty crucial design question for this
patch, which is whether it's OK for one process to interrogate another
process's state, and what the consequences of that might be.  What
permissions are needed?  Can you DOS the guy running a query by
pinging him for status at a very high rate?

The other question here is whether it's really safe to do this.
ProcessInterrupts() gets called at lots of places in the code, and we
may freely add more in the future.  How are you going to prove that
ProcessCmdStatusInfoRequest() is safe to call in all of those places?
How do you know that some data structure you find by chasing down from
ActivePortal or current_query_stack doesn't have a NULL pointer
someplace that you don't expect, because the code that initializes
that pointer hasn't run yet?

I'd like to see this made to work and be safe, but I think proving
that it's truly robust in all cases is going to be hard.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: On-demand running query plans using auto_explain and signals

From

"Shulgin, Oleksandr"

Date:

17 September 2015, 14:06:28

On Thu, Sep 17, 2015 at 12:06 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

That won't work really well with something like I use to do when testing this patch, namely:

postgres=# select pid, array(select pg_cmdstatus(pid, 1, 10)) from pg_stat_activity where pid<>pg_backend_pid() \watch 1

while also running pgbench with -C option (to create new connection for every transaction). When a targeted backend exits before it can handle the signal, the receiving process keeps waiting forever.

no - every timeout you have to check, if targeted backend is living still, if not you will do cancel. It is 100% safe.

But then you need to make this internal timeout rather short, not 1s as originally suggested.

The statement_timeout in this case will stop the whole select, not just individual function call. Unless you wrap it to set statement_timeout and catch QUERY_CANCELED in plpgsql, but then you won't be able to ^C the whole select. The ability to set a (short) timeout for the function itself proves to be a really useful feature to me.

you cannot to handle QUERY_CANCELED in plpgsql.

Well, you can but its not that useful of course:

=# create or replace function test_query_cancel() returns void language plpgsql as $$ begin

perform pg_sleep(1);

exception when query_canceled then raise notice 'cancel';

end; $$;

CREATE FUNCTION

=# set statement_timeout to '100ms';

SET

=# select test_query_cancel();

NOTICE: cancel

test_query_cancel

-------------------

(1 row)

=# select test_query_cancel() from generate_series(1, 100) x;

NOTICE: cancel

^CCancel request sent

NOTICE: cancel

^CCancel request sent

Now you cannot cancel this query unless you do pg_terminate_backend() or equivalent.

There is need some internal timeout - but this timeout should not be visible - any new GUC increase PostgreSQL complexity - and there is not a reason why do it.

But the GUC was added for the timeout on the sending side, not the receiving one. There is no "one value fits all" for this, but you would still want to make the effects of this as limited as possible.

Alex

Re: On-demand running query plans using auto_explain and signals

From

Pavel Stehule

Date:

17 September 2015, 14:29:38

2015-09-17 14:06 GMT+02:00 Robert Haas <robertmhaas@gmail.com>:

On Tue, Sep 15, 2015 at 5:00 AM, Shulgin, Oleksandr
<oleksandr.shulgin@zalando.de> wrote:
> Please see attached for implementation of this approach. The most
> surprising thing is that it actually works :)

It's cool to see these interesting ideas for using some of the code
I've been working on for the last couple of years. However, it seems
to me that a better design would avoid the use of shm_mq. Instead of
having the guy who asks the question create a DSM, have the guy who
sends back the answer create it. Then, he can just make the DSM big
enough to store the entire string. I think that's going to be a lot
more robust than what you have here.

Please, can you explain what is wrong on using of shm_mq? It works pretty well now.

There can be a contra argument, why don't use shm_mq, because it does data exchange between processes and this patch introduce new pattern for same thing.

Regards

Pavel

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: On-demand running query plans using auto_explain and signals

From

Pavel Stehule

Date:

17 September 2015, 14:37:57

2015-09-17 16:06 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:

On Thu, Sep 17, 2015 at 12:06 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

That won't work really well with something like I use to do when testing this patch, namely:

postgres=# select pid, array(select pg_cmdstatus(pid, 1, 10)) from pg_stat_activity where pid<>pg_backend_pid() \watch 1

while also running pgbench with -C option (to create new connection for every transaction). When a targeted backend exits before it can handle the signal, the receiving process keeps waiting forever.

no - every timeout you have to check, if targeted backend is living still, if not you will do cancel. It is 100% safe.

But then you need to make this internal timeout rather short, not 1s as originally suggested.

can be - 1 sec is max, maybe 100ms is optimum.

The statement_timeout in this case will stop the whole select, not just individual function call. Unless you wrap it to set statement_timeout and catch QUERY_CANCELED in plpgsql, but then you won't be able to ^C the whole select. The ability to set a (short) timeout for the function itself proves to be a really useful feature to me.

you cannot to handle QUERY_CANCELED in plpgsql.

Well, you can but its not that useful of course:

hmm, some is wrong - I remember from some older plpgsql, so CANCEL message is not catchable. Maybe I have bad memory. I have to recheck it.

=# create or replace function test_query_cancel() returns void language plpgsql as $$ begin
perform pg_sleep(1);
exception when query_canceled then raise notice 'cancel';
end; $$;
CREATE FUNCTION
=# set statement_timeout to '100ms';
SET
=# select test_query_cancel();
NOTICE: cancel
test_query_cancel
-------------------

(1 row)
=# select test_query_cancel() from generate_series(1, 100) x;
NOTICE: cancel
^CCancel request sent
NOTICE: cancel
^CCancel request sent

Now you cannot cancel this query unless you do pg_terminate_backend() or equivalent.

There is need some internal timeout - but this timeout should not be visible - any new GUC increase PostgreSQL complexity - and there is not a reason why do it.

But the GUC was added for the timeout on the sending side, not the receiving one. There is no "one value fits all" for this, but you would still want to make the effects of this as limited as possible.

I still believe so any new GUC is not necessary. If you don't like statement_timeout, we can copy the behave of CREATE DATABASE - there are few 5sec cycles (when template1 is occupated) and break.

Regards

Pavel

--
Alex

Re: On-demand running query plans using auto_explain and signals

From

Robert Haas

Date:

17 September 2015, 14:47:00

On Thu, Sep 17, 2015 at 10:28 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Please, can you explain what is wrong on using of shm_mq? It works pretty
> well now.
>
> There can be a contra argument, why don't use shm_mq, because it does data
> exchange between processes and this patch introduce new pattern for same
> thing.

Sure, I can explain that.

First, when you communication through a shm_mq, the writer has to wait
when the queue is full, and the reader has to wait when the queue is
empty.  If the message is short enough to fit in the queue, then you
can send it and be done without blocking.  But if it is long, then you
will have each process repeatedly waiting for the other one until the
whole message has been sent.  That is going to make sending the
message potentially a lot slower than writing it all in one go,
especially if the system is under heavy load.

Also, if there are any bugs in the way the shm_mq is being used,
they're likely to be quite rare and hard to find, because the vast
majority of messages will probably be short enough to be sent in a
single chunk, so whatever bugs may exist when the processes play
ping-ping are unlikely to occur in practice except in unusual cases
where the message being returned is very long.

Second, using a shm_mq manipulates the state of the process latch.  I
don't think you can make the assumption that it's safe to reset the
process latch at any and every place where we check for interrupts.
For example, suppose the process is already using a shm_mq and the
CHECK_FOR_INTERRUPTS() call inside that code then discovers that
somebody has activated this mechanism and you now go try to send and
receive from a new shm_mq.  But even if that and every other
CHECK_FOR_INTERRUPTS() in the code can tolerate a process latch reset
today, it's a new coding rule that could easily trip people up in the
future.

Using a shm_mq is appropriate when the amount of data that needs to be
transmitted might be very large - too large to just allocate a buffer
for the whole thing - or when the amount of data can't be predicted
before memory is allocated.  But there is obviously no rule that a
shm_mq should be used any time we have "data exchange between
processes"; we have lots of shared-memory based IPC already and have
for many years, and shm_mq is newer than the vast majority of that
code.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: On-demand running query plans using auto_explain and signals

From

Pavel Stehule

Date:

17 September 2015, 15:17:17

2015-09-17 16:46 GMT+02:00 Robert Haas <robertmhaas@gmail.com>:

On Thu, Sep 17, 2015 at 10:28 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Please, can you explain what is wrong on using of shm_mq? It works pretty
> well now.
>
> There can be a contra argument, why don't use shm_mq, because it does data
> exchange between processes and this patch introduce new pattern for same
> thing.

Sure, I can explain that.

First, when you communication through a shm_mq, the writer has to wait
when the queue is full, and the reader has to wait when the queue is
empty. If the message is short enough to fit in the queue, then you
can send it and be done without blocking. But if it is long, then you
will have each process repeatedly waiting for the other one until the
whole message has been sent. That is going to make sending the
message potentially a lot slower than writing it all in one go,
especially if the system is under heavy load.

Is there some risk if we take too much large DSM segment? And what is max size of DSM segment? When we use shm_mq, we don't need to solve limits.

Also, if there are any bugs in the way the shm_mq is being used,
they're likely to be quite rare and hard to find, because the vast
majority of messages will probably be short enough to be sent in a
single chunk, so whatever bugs may exist when the processes play
ping-ping are unlikely to occur in practice except in unusual cases
where the message being returned is very long.

This is true for any functionality based on shm_mq - parallel seq scan,

Second, using a shm_mq manipulates the state of the process latch. I
don't think you can make the assumption that it's safe to reset the
process latch at any and every place where we check for interrupts.
For example, suppose the process is already using a shm_mq and the
CHECK_FOR_INTERRUPTS() call inside that code then discovers that
somebody has activated this mechanism and you now go try to send and
receive from a new shm_mq. But even if that and every other
CHECK_FOR_INTERRUPTS() in the code can tolerate a process latch reset
today, it's a new coding rule that could easily trip people up in the
future.

It is valid, and probably most important. But if we introduce own mechanism, we will play with process latch too (although we can use LWlocks)

Using a shm_mq is appropriate when the amount of data that needs to be
transmitted might be very large - too large to just allocate a buffer
for the whole thing - or when the amount of data can't be predicted
before memory is allocated. But there is obviously no rule that a
shm_mq should be used any time we have "data exchange between
processes"; we have lots of shared-memory based IPC already and have
for many years, and shm_mq is newer than the vast majority of that
code.

I am little bit disappointed - I hoped so shm_mq can be used as generic interprocess mechanism - that will ensure all corner cases for work with shared memory. I understand to shm_mq is new, and nobody used it, but this risk is better than invent wheels again and again.

Regards

Pavel

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: On-demand running query plans using auto_explain and signals

From

"Shulgin, Oleksandr"

Date:

17 September 2015, 15:31:00

On Thu, Sep 17, 2015 at 2:14 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Sep 16, 2015 at 10:31 AM, Shulgin, Oleksandr
<oleksandr.shulgin@zalando.de> wrote:
> I've also decided we really ought to suppress any possible ERROR level
> messages generated during the course of processing the status request in
> order not to prevent the originally running transaction to complete. The
> errors so caught are just logged using LOG level and ignored in this new
> version of the patch.

This plan has almost zero chance of surviving committer-level
scrutiny, and if it somehow does, some other committer will scream
bloody murder as soon as they notice it.

It's not safe to just catch an error and continue on executing. You
have to abort the (sub)transaction once an error happens.

Hm... is this really different from WHEN EXCEPTION clause in plpgsql?

Of course, this gets at a pretty crucial design question for this
patch, which is whether it's OK for one process to interrogate another
process's state, and what the consequences of that might be. What
permissions are needed?

I think superuser or same user is pretty reasonable requirement. Can be limited to superuser only if there are some specific concerns.

Can you DOS the guy running a query by
pinging him for status at a very high rate?

Probably, but if you've got enough permissions to do that pg_cancel/terminate_backend() would be easier.

The other question here is whether it's really safe to do this.
ProcessInterrupts() gets called at lots of places in the code, and we
may freely add more in the future. How are you going to prove that
ProcessCmdStatusInfoRequest() is safe to call in all of those places?
How do you know that some data structure you find by chasing down from
ActivePortal or current_query_stack doesn't have a NULL pointer
someplace that you don't expect, because the code that initializes
that pointer hasn't run yet?

I'd like to see this made to work and be safe, but I think proving
that it's truly robust in all cases is going to be hard.

Yes, this is a perfectly good point. For the purpose of obtaining the explain plans, I believe anything we capture in ExecutorRun ought to be safe to run Explain on: the plannedstmt is there and isn't going to change.

In a more general case, if we seek to extend this approach, then yes one should be very careful in what is allowed to run during ProcessInterrupts().

What we could do instead to be extra-safe, is make the running backend prepare the information for other backends to obtain from the shared memory. Explain plans can be easily captured at the ExecutorRun hook. The obvious downside is that you can no longer choose the explain format, or obtain partially collected instrumentation data.

Alex

Re: On-demand running query plans using auto_explain and signals

From

"Shulgin, Oleksandr"

Date:

17 September 2015, 16:48:06

On Thu, Sep 17, 2015 at 5:16 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

2015-09-17 16:46 GMT+02:00 Robert Haas <robertmhaas@gmail.com>:

Second, using a shm_mq manipulates the state of the process latch. I
don't think you can make the assumption that it's safe to reset the
process latch at any and every place where we check for interrupts.
For example, suppose the process is already using a shm_mq and the
CHECK_FOR_INTERRUPTS() call inside that code then discovers that
somebody has activated this mechanism and you now go try to send and
receive from a new shm_mq. But even if that and every other
CHECK_FOR_INTERRUPTS() in the code can tolerate a process latch reset
today, it's a new coding rule that could easily trip people up in the
future.

It is valid, and probably most important. But if we introduce own mechanism, we will play with process latch too (although we can use LWlocks)

Careful manipulation of the process latch is a valid point. Probably we were way too optimistic about the limits we can hit with this approach. :-(

But if we make the sender backend create the DSM with the response payload, it suddenly becomes really unclear at which point and who should make the final detach of that DSM. We're getting back to the problem of synchronization between these processes all over again.

Alex

Re: On-demand running query plans using auto_explain and signals

From

Robert Haas

Date:

17 September 2015, 20:13:18

On Thu, Sep 17, 2015 at 11:16 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Is there some risk if we take too much large DSM segment? And what is max
> size of DSM segment? When we use shm_mq, we don't need to solve limits.

I can't really think you are going to have a problem.  How much data
do you really intend to send back?  Surely it's going to be <100kB.
If you think it's not a problem to have a running query stop and send
a gigabyte of data someplace anytime somebody asks, well, I don't
think I agree.

>> Also, if there are any bugs in the way the shm_mq is being used,
>> they're likely to be quite rare and hard to find, because the vast
>> majority of messages will probably be short enough to be sent in a
>> single chunk, so whatever bugs may exist when the processes play
>> ping-ping are unlikely to occur in practice except in unusual cases
>> where the message being returned is very long.
>
> This is true for any functionality based on shm_mq - parallel seq scan,

Parallel sequential scan is likely to put a lot more data through a
shm_mq than you would for this.

>> Second, using a shm_mq manipulates the state of the process latch.  I
>> don't think you can make the assumption that it's safe to reset the
>> process latch at any and every place where we check for interrupts.
>> For example, suppose the process is already using a shm_mq and the
>> CHECK_FOR_INTERRUPTS() call inside that code then discovers that
>> somebody has activated this mechanism and you now go try to send and
>> receive from a new shm_mq.  But even if that and every other
>> CHECK_FOR_INTERRUPTS() in the code can tolerate a process latch reset
>> today, it's a new coding rule that could easily trip people up in the
>> future.
>
> It is valid, and probably most important. But if we introduce own mechanism,
> we will play with process latch too (although we can use LWlocks)

With the design I proposed, there is zero need to touch the process
latch, which is good, because I'm pretty sure that is going to be a
problem.  I don't think there is any need to use LWLocks here either.
When you get a request for data, you can just publish a DSM segment
with the data and that's it.  Why do you need anything more?  You
could set the requestor's latch if it's convenient; that wouldn't be a
problem.  But the process supplying the data can't end up in a
different state than it was before supplying that data, or stuff WILL
break.

>> Using a shm_mq is appropriate when the amount of data that needs to be
>> transmitted might be very large - too large to just allocate a buffer
>> for the whole thing - or when the amount of data can't be predicted
>> before memory is allocated.  But there is obviously no rule that a
>> shm_mq should be used any time we have "data exchange between
>> processes"; we have lots of shared-memory based IPC already and have
>> for many years, and shm_mq is newer than the vast majority of that
>> code.
>
> I am little bit disappointed - I hoped so shm_mq can be used as generic
> interprocess mechanism - that will ensure all corner cases for work with
> shared memory. I understand to shm_mq is new, and nobody used it, but this
> risk is better than invent wheels again and again.

shm_mq is useful, but if you insist on using a complicated tool when a
simple one is plenty sufficient, you may not get the results you're
hoping for.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: On-demand running query plans using auto_explain and signals

From

Pavel Stehule

Date:

18 September 2015, 04:17:37

2015-09-17 16:37 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:

2015-09-17 16:06 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:
On Thu, Sep 17, 2015 at 12:06 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

That won't work really well with something like I use to do when testing this patch, namely:

postgres=# select pid, array(select pg_cmdstatus(pid, 1, 10)) from pg_stat_activity where pid<>pg_backend_pid() \watch 1

while also running pgbench with -C option (to create new connection for every transaction). When a targeted backend exits before it can handle the signal, the receiving process keeps waiting forever.

no - every timeout you have to check, if targeted backend is living still, if not you will do cancel. It is 100% safe.

But then you need to make this internal timeout rather short, not 1s as originally suggested.

can be - 1 sec is max, maybe 100ms is optimum.

The statement_timeout in this case will stop the whole select, not just individual function call. Unless you wrap it to set statement_timeout and catch QUERY_CANCELED in plpgsql, but then you won't be able to ^C the whole select. The ability to set a (short) timeout for the function itself proves to be a really useful feature to me.

you cannot to handle QUERY_CANCELED in plpgsql.

Well, you can but its not that useful of course:

hmm, some is wrong - I remember from some older plpgsql, so CANCEL message is not catchable. Maybe I have bad memory. I have to recheck it.

it is ok. I didn't remeber well this behave. You cannot to catch CANCEL (and today ASSERT) in OTHER handler. It can be handled if it is explicitly mentioned.

Regards

Pavel

Re: On-demand running query plans using auto_explain and signals

From

Pavel Stehule

Date:

18 September 2015, 04:35:07

2015-09-17 22:13 GMT+02:00 Robert Haas <robertmhaas@gmail.com>:

On Thu, Sep 17, 2015 at 11:16 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Is there some risk if we take too much large DSM segment? And what is max
> size of DSM segment? When we use shm_mq, we don't need to solve limits.

I can't really think you are going to have a problem. How much data
do you really intend to send back? Surely it's going to be <100kB.
If you think it's not a problem to have a running query stop and send
a gigabyte of data someplace anytime somebody asks, well, I don't
think I agree.

I afraid so <100kB is too optimistic. I know so GoodData environment is a exception - it uses query generator, but I found few plans >1MB . It is unusual probably due long identifiers too - used 63bytes long hash - and some queries and models are strange. It does translation between analytic multi dimensional business oriented query language and SQL. Back to topic. With high probability we can calculate <10MB.

>> Also, if there are any bugs in the way the shm_mq is being used,
>> they're likely to be quite rare and hard to find, because the vast
>> majority of messages will probably be short enough to be sent in a
>> single chunk, so whatever bugs may exist when the processes play
>> ping-ping are unlikely to occur in practice except in unusual cases
>> where the message being returned is very long.
>
> This is true for any functionality based on shm_mq - parallel seq scan,

Parallel sequential scan is likely to put a lot more data through a
shm_mq than you would for this.

>> Second, using a shm_mq manipulates the state of the process latch. I
>> don't think you can make the assumption that it's safe to reset the
>> process latch at any and every place where we check for interrupts.
>> For example, suppose the process is already using a shm_mq and the
>> CHECK_FOR_INTERRUPTS() call inside that code then discovers that
>> somebody has activated this mechanism and you now go try to send and
>> receive from a new shm_mq. But even if that and every other
>> CHECK_FOR_INTERRUPTS() in the code can tolerate a process latch reset
>> today, it's a new coding rule that could easily trip people up in the
>> future.
>
> It is valid, and probably most important. But if we introduce own mechanism,
> we will play with process latch too (although we can use LWlocks)

With the design I proposed, there is zero need to touch the process
latch, which is good, because I'm pretty sure that is going to be a
problem. I don't think there is any need to use LWLocks here either.
When you get a request for data, you can just publish a DSM segment
with the data and that's it. Why do you need anything more? You
could set the requestor's latch if it's convenient; that wouldn't be a
problem. But the process supplying the data can't end up in a
different state than it was before supplying that data, or stuff WILL
break.

sure - but the same behave have to have shm_mq - if not, then only one can be used for communication between process - that is little bit limiting.

>> Using a shm_mq is appropriate when the amount of data that needs to be
>> transmitted might be very large - too large to just allocate a buffer
>> for the whole thing - or when the amount of data can't be predicted
>> before memory is allocated. But there is obviously no rule that a
>> shm_mq should be used any time we have "data exchange between
>> processes"; we have lots of shared-memory based IPC already and have
>> for many years, and shm_mq is newer than the vast majority of that
>> code.
>
> I am little bit disappointed - I hoped so shm_mq can be used as generic
> interprocess mechanism - that will ensure all corner cases for work with
> shared memory. I understand to shm_mq is new, and nobody used it, but this
> risk is better than invent wheels again and again.

shm_mq is useful, but if you insist on using a complicated tool when a
simple one is plenty sufficient, you may not get the results you're
hoping for.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: On-demand running query plans using auto_explain and signals

From

"Shulgin, Oleksandr"

Date:

18 September 2015, 09:00:23

On Thu, Sep 17, 2015 at 10:13 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Sep 17, 2015 at 11:16 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

>> Second, using a shm_mq manipulates the state of the process latch. I
>> don't think you can make the assumption that it's safe to reset the
>> process latch at any and every place where we check for interrupts.
>> For example, suppose the process is already using a shm_mq and the
>> CHECK_FOR_INTERRUPTS() call inside that code then discovers that
>> somebody has activated this mechanism and you now go try to send and
>> receive from a new shm_mq. But even if that and every other
>> CHECK_FOR_INTERRUPTS() in the code can tolerate a process latch reset
>> today, it's a new coding rule that could easily trip people up in the
>> future.
>
> It is valid, and probably most important. But if we introduce own mechanism,
> we will play with process latch too (although we can use LWlocks)

With the design I proposed, there is zero need to touch the process
latch, which is good, because I'm pretty sure that is going to be a
problem. I don't think there is any need to use LWLocks here either.
When you get a request for data, you can just publish a DSM segment
with the data and that's it. Why do you need anything more? You
could set the requestor's latch if it's convenient; that wouldn't be a
problem. But the process supplying the data can't end up in a
different state than it was before supplying that data, or stuff WILL
break.

There is still the whole problem of where exactly the backend being queried for the status should publish that DSM segment and when to free it?

If it's a location shared between all backends, there should be locking around it. Probably this is not a big problem, if you don't expect all the backends start querying each other rapidly. That is how it was implemented in the first versions of this patch actually.

If we take the per-backend slot approach the locking seems unnecessary and there are principally two options:

1) The backend puts the DSM handle in its own slot and notifies the requester to read it.

2) The backend puts the DSM handle in the slot of the requester (and notifies it).

If we go with the first option, the backend that has created the DSM will not know when it's OK to free it, so this has to be responsibility of the requester. If the latter exits before reading and freeing the DSM, we have a leak. Even bigger is the problem that the sender backend can no longer send responses to a number of concurrent requestors: if its slot is occupied by a DSM handle, it can not send a reply to another backend until the slot is freed.

With the second option we have all the same problems with not knowing when to free the DSM and potentially leaking it, but we can handle concurrent requests.

The current approach where the requester creates and frees the DSM doesn't suffer from these problems, so if we pre-allocate the segment just big enough we can avoid the use of shm_mq. That will take another GUC for the segment size. Certainly no one expects a query plan to weigh a bloody megabyte, but this is what happens to Pavel apparently.

Alex

Re: On-demand running query plans using auto_explain and signals

From

Pavel Stehule

Date:

18 September 2015, 09:26:08

2015-09-18 10:59 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:

On Thu, Sep 17, 2015 at 10:13 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Sep 17, 2015 at 11:16 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

>> Second, using a shm_mq manipulates the state of the process latch. I
>> don't think you can make the assumption that it's safe to reset the
>> process latch at any and every place where we check for interrupts.
>> For example, suppose the process is already using a shm_mq and the
>> CHECK_FOR_INTERRUPTS() call inside that code then discovers that
>> somebody has activated this mechanism and you now go try to send and
>> receive from a new shm_mq. But even if that and every other
>> CHECK_FOR_INTERRUPTS() in the code can tolerate a process latch reset
>> today, it's a new coding rule that could easily trip people up in the
>> future.
>
> It is valid, and probably most important. But if we introduce own mechanism,
> we will play with process latch too (although we can use LWlocks)

With the design I proposed, there is zero need to touch the process
latch, which is good, because I'm pretty sure that is going to be a
problem. I don't think there is any need to use LWLocks here either.
When you get a request for data, you can just publish a DSM segment
with the data and that's it. Why do you need anything more? You
could set the requestor's latch if it's convenient; that wouldn't be a
problem. But the process supplying the data can't end up in a
different state than it was before supplying that data, or stuff WILL
break.

There is still the whole problem of where exactly the backend being queried for the status should publish that DSM segment and when to free it?

If it's a location shared between all backends, there should be locking around it. Probably this is not a big problem, if you don't expect all the backends start querying each other rapidly. That is how it was implemented in the first versions of this patch actually.

If we take the per-backend slot approach the locking seems unnecessary and there are principally two options:

1) The backend puts the DSM handle in its own slot and notifies the requester to read it.
2) The backend puts the DSM handle in the slot of the requester (and notifies it).

If we go with the first option, the backend that has created the DSM will not know when it's OK to free it, so this has to be responsibility of the requester. If the latter exits before reading and freeing the DSM, we have a leak. Even bigger is the problem that the sender backend can no longer send responses to a number of concurrent requestors: if its slot is occupied by a DSM handle, it can not send a reply to another backend until the slot is freed.

With the second option we have all the same problems with not knowing when to free the DSM and potentially leaking it, but we can handle concurrent requests.

It should not be true - the data sender create DSM and fills it. Then set caller slot and send signal to caller. Caller can free DSM any time, because data sender send newer touch it.

The current approach where the requester creates and frees the DSM doesn't suffer from these problems, so if we pre-allocate the segment just big enough we can avoid the use of shm_mq. That will take another GUC for the segment size. Certainly no one expects a query plan to weigh a bloody megabyte, but this is what happens to Pavel apparently.

It is plan C - last variant from my view. Any other GUC :(

Pavel

--
Alex

Re: On-demand running query plans using auto_explain and signals

From

"Shulgin, Oleksandr"

Date:

18 September 2015, 10:05:47

On Fri, Sep 18, 2015 at 11:25 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

2015-09-18 10:59 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:

If we take the per-backend slot approach the locking seems unnecessary and there are principally two options:

1) The backend puts the DSM handle in its own slot and notifies the requester to read it.
2) The backend puts the DSM handle in the slot of the requester (and notifies it).

If we go with the first option, the backend that has created the DSM will not know when it's OK to free it, so this has to be responsibility of the requester. If the latter exits before reading and freeing the DSM, we have a leak. Even bigger is the problem that the sender backend can no longer send responses to a number of concurrent requestors: if its slot is occupied by a DSM handle, it can not send a reply to another backend until the slot is freed.

With the second option we have all the same problems with not knowing when to free the DSM and potentially leaking it, but we can handle concurrent requests.

It should not be true - the data sender create DSM and fills it. Then set caller slot and send signal to caller. Caller can free DSM any time, because data sender send newer touch it.

But the requester can timeout on waiting for reply and exit before it sees the reply DSM. Actually, I now don't even think a backend can free the DSM it has not created. First it will need to attach it, effectively increasing the refcount, and upon detach it will only decrease the refcount, but not actually release the segment...

So this has to be the responsibility of the reply sending backend in the end: to create and release the DSM *at some point*.

Alex

Re: On-demand running query plans using auto_explain and signals

From

Pavel Stehule

Date:

18 September 2015, 11:00:25

2015-09-18 12:05 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:

On Fri, Sep 18, 2015 at 11:25 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
2015-09-18 10:59 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:

If we take the per-backend slot approach the locking seems unnecessary and there are principally two options:

1) The backend puts the DSM handle in its own slot and notifies the requester to read it.
2) The backend puts the DSM handle in the slot of the requester (and notifies it).

If we go with the first option, the backend that has created the DSM will not know when it's OK to free it, so this has to be responsibility of the requester. If the latter exits before reading and freeing the DSM, we have a leak. Even bigger is the problem that the sender backend can no longer send responses to a number of concurrent requestors: if its slot is occupied by a DSM handle, it can not send a reply to another backend until the slot is freed.

With the second option we have all the same problems with not knowing when to free the DSM and potentially leaking it, but we can handle concurrent requests.

It should not be true - the data sender create DSM and fills it. Then set caller slot and send signal to caller. Caller can free DSM any time, because data sender send newer touch it.

But the requester can timeout on waiting for reply and exit before it sees the reply DSM. Actually, I now don't even think a backend can free the DSM it has not created. First it will need to attach it, effectively increasing the refcount, and upon detach it will only decrease the refcount, but not actually release the segment...

I am afraid so it has not simple and nice solution - when data sender will wait for to moment when data are received, then we have same complexity like we use shm_mq.

Isn't better to introduce new background worker with responsibility to clean orphaned DSM?

Regards

Pavel

So this has to be the responsibility of the reply sending backend in the end: to create and release the DSM *at some point*.

--
Alex

Re: On-demand running query plans using auto_explain and signals

From

"Shulgin, Oleksandr"

Date:

18 September 2015, 11:22:41

On Fri, Sep 18, 2015 at 12:59 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

2015-09-18 12:05 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:
On Fri, Sep 18, 2015 at 11:25 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

It should not be true - the data sender create DSM and fills it. Then set caller slot and send signal to caller. Caller can free DSM any time, because data sender send newer touch it.

But the requester can timeout on waiting for reply and exit before it sees the reply DSM. Actually, I now don't even think a backend can free the DSM it has not created. First it will need to attach it, effectively increasing the refcount, and upon detach it will only decrease the refcount, but not actually release the segment...

I am afraid so it has not simple and nice solution - when data sender will wait for to moment when data are received, then we have same complexity like we use shm_mq.

Isn't better to introduce new background worker with responsibility to clean orphaned DSM?

I'm not thrilled by this idea.

We don't even need a worker to do that, as leaked segments are reported by the backend itself upon transaction commit (see ResourceOwnerReleaseInternal), e.g:

WARNING: dynamic shared memory leak: segment 808539725 still referenced

Still I believe relying on some magic cleanup mechanism and not caring about managing the shared memory properly is not the way to go.

Alex

Re: On-demand running query plans using auto_explain and signals

From

Pavel Stehule

Date:

18 September 2015, 11:34:49

2015-09-18 13:22 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:

On Fri, Sep 18, 2015 at 12:59 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
2015-09-18 12:05 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:
On Fri, Sep 18, 2015 at 11:25 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

It should not be true - the data sender create DSM and fills it. Then set caller slot and send signal to caller. Caller can free DSM any time, because data sender send newer touch it.

But the requester can timeout on waiting for reply and exit before it sees the reply DSM. Actually, I now don't even think a backend can free the DSM it has not created. First it will need to attach it, effectively increasing the refcount, and upon detach it will only decrease the refcount, but not actually release the segment...

I am afraid so it has not simple and nice solution - when data sender will wait for to moment when data are received, then we have same complexity like we use shm_mq.

Isn't better to introduce new background worker with responsibility to clean orphaned DSM?

I'm not thrilled by this idea.

We don't even need a worker to do that, as leaked segments are reported by the backend itself upon transaction commit (see ResourceOwnerReleaseInternal), e.g:

WARNING: dynamic shared memory leak: segment 808539725 still referenced

Still I believe relying on some magic cleanup mechanism and not caring about managing the shared memory properly is not the way to go.

It was one my idea too, to check shared memory on exit. The disadvantage is clean - some times you can wait too long - although the very practical limit for session is about 2h.

--
Alex

Re: On-demand running query plans using auto_explain and signals

From

Robert Haas

Date:

18 September 2015, 12:00:01

On Thu, Sep 17, 2015 at 12:47 PM, Shulgin, Oleksandr
<oleksandr.shulgin@zalando.de> wrote:
> But if we make the sender backend create the DSM with the response payload,
> it suddenly becomes really unclear at which point and who should make the
> final detach of that DSM.  We're getting back to the problem of
> synchronization between these processes all over again.

Yes, some thought is needed here.  There's not really a problem if
somebody asks for information just once: you could just have the
publishing process keep it attached until process exit, and nothing
bad would happen.  The real problem comes when multiple people ask for
information in quick succession: if you detach the old segment too
quickly after publishing it, the guy who requested that data might not
have attached it yet, and then when he gets around to looking at it,
it won't be there.

This seems like a pretty solvable problem, though.  For example, when
somebody asks for information, they store in the main shared segment a
pointer to their PGPROC and then they signal the target process, which
responds by publishing a dsm_segment_id.  When the requesting process
has accessed it, or when it errors out or exits, it clears the PGPROC
and the dsm_segment_id.  The publisher avoids unmapping it until that
happens.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: On-demand running query plans using auto_explain and signals

From

Robert Haas

Date:

18 September 2015, 17:04:55

On Fri, Sep 18, 2015 at 6:59 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> I am afraid so it has not simple and nice solution - when data sender will
> wait for to moment when data are received, then we have same complexity like
> we use  shm_mq.
>
> Isn't better to introduce new background worker with responsibility to clean
> orphaned DSM?

That won't work, or at least not easily.  On Windows, the DSM is
cleaned up by the operating system as soon as nobody has it mapped.

Frankly, I think you guys are making this out to be way more
complicated than it really is.  Basically, I think the process being
queried should publish a DSM via a slot it owns.  The recipient is
responsible for reading it and then notifying the sender.  If a second
process requests data before the first process reads its data, the
second process can either (1) become an additional reader of the
already-published data or (2) wait for the first process to finish,
and then send its own inquiry.

There are some problems to solve here, but they hardly seem impossible.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: On-demand running query plans using auto_explain and signals

From

"Shulgin, Oleksandr"

Date:

25 September 2015, 17:13:38

On Fri, Sep 18, 2015 at 7:04 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Frankly, I think you guys are making this out to be way more
complicated than it really is. Basically, I think the process being
queried should publish a DSM via a slot it owns. The recipient is
responsible for reading it and then notifying the sender. If a second
process requests data before the first process reads its data, the
second process can either (1) become an additional reader of the
already-published data or (2) wait for the first process to finish,
and then send its own inquiry.

There are some problems to solve here, but they hardly seem impossible.

Thank you Robert, for your invaluable input on this patch!

I now believe that use of ProcessInterrupts() in the recently proposed design as well as manipulation of proc latch due to use of shared memory queue are major blockers.

In order to simplify things up, I've taken a step back and looked again at the auto_explain contrib module. I now propose the most simple thing possible in my opinion: a new GUC option for auto_explain. It will make every backend, in which the module is loaded via *_preload_libraries mechanism or using CREATE EXTENSION/LOAD commands, to actively publish the plans of queries in dynamic shared memory as long as auto_explain.publish_plans is turned on.

The greatest part for me, is that this approach doesn't require handling of signals and is isolated in an external module, so it can be readily used with the current server versions, no need to wait for >= 9.6.

Some implementation details:

For every backend that might be running (MaxBackends) we reserve a dsm_handle slot in the addins shared memory. When the new option is turned on, the ExecutorRun hook will produce a plan in whatever format is specified by the auto_explain.log_format, allocate a DSM segment, copy the plan into the segment and finally store the DSM handle into its own slot. No locking is required around this because every backend writes to its slot exclusively, no other backend can be writing into it concurrently. In the ExecutorFinish hook we invalidate the backend's slot by setting the handle to 0 and deallocate the DSM segment.

Reading of the plan is performed by a newly added function pg_explain_backend(PID). Since it can determine the target process' backendId, it simply reads the DSM handle from that backend's slot and tries to attach it (there's not much point in checking the handle for being non-0, because the other backend could release the segment the moment after we've checked it, so we rely on dsm_attach returning non-NULL). If attached successfully, we parse the contents and detach. At this point the backend to detach the last is actually releasing the segment, due to reference counting.

Handling of the nested statements plans is an open question. It can be really useful when the top-level plan is simply displaying a "SELECT my_stored_procedure()" and all the interesting stuff is happening behind the scenes, but I didn't start to think about how this could be implemented yet.

Pavel was really interested in retrieving the complete query text/plans which could be over a megabyte in his case (and pg_stat_activity.query is capped by 10240 bytes I believe). This is now possible with the patch, but some others might still want to put a threshold on the allocation, especially given this is shared memory. I can envision another GUC, but in our experience the extremely long queries (and plans) are most of the time due to use of VALUES() or IN() clauses with a huge list of literals.

I think we could fold the VALUES/IN into "?" if the query/plan text exceeds the specified threshold, or unconditionally (yes, pg_stat_statements, I'm looking at you). This should help in the cases when the most interesting part is in the plan nodes near the end, but there's such a huge list of literals before it.

Future plans:

I believe this approach can be extended to enable instrumentation once again. The backend executing the query could update the published plan every once in a while (for example, every N ms or 1% of rows processed in a node), and any other process interested in this data, can simply read it without the need for signals and complex and fragile communication. This obviously requires a core patch.

Some problems:

There is a potential problem with the limited total number of DSM segments: it is reserved in a way to only allow 2 per backend (on average) and 64 additional per server, so if you run with the new option enabled at all times, you're down to only 1 additional DSM per backend (again, on average). Not sure how critical this can be, but no one is forced to run with this option enabled for all backends.

If you don't want to run it enabled at all times, then enabling the GUC per-backend can be problematic. It's still possible to update the conf file and send SIGHUP to a single backend, but it's harder to accomplish over psql, for example. I think here we might still have some luck with the signals: use another array of per-backend slots with flags, set the target backend's flag and send it SIGUSR1. The backend wakes on the signal and examines its slot, then toggles the GUC if needed. Sounds pretty safe, eh?

No documentation changes yet, waiting for your comments. :-)

Happy hacking!

Alex

Attachment

0001-Add-auto_explain.publish_plans.patch

Re: On-demand running query plans using auto_explain and signals

From

Pavel Stehule

Date:

27 September 2015, 06:06:03

2015-09-25 19:13 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:

Some problems:

There is a potential problem with the limited total number of DSM segments: it is reserved in a way to only allow 2 per backend (on average) and 64 additional per server, so if you run with the new option enabled at all times, you're down to only 1 additional DSM per backend (again, on average). Not sure how critical this can be, but no one is forced to run with this option enabled for all backends.

If you don't want to run it enabled at all times, then enabling the GUC per-backend can be problematic. It's still possible to update the conf file and send SIGHUP to a single backend, but it's harder to accomplish over psql, for example. I think here we might still have some luck with the signals: use another array of per-backend slots with flags, set the target backend's flag and send it SIGUSR1. The backend wakes on the signal and examines its slot, then toggles the GUC if needed. Sounds pretty safe, eh?

No documentation changes yet, waiting for your comments. :-)

the preparing of content before execution is interesting idea, that can be used more. The almost queries and plans are not too big, so when the size of content is not too big - less than 1MB, then can be used one DSM for all backends. When size of content is bigger than limit, then DSM will be allocated specially for this content. The pointer to DSM and offset can be stored in requested process slot. The reading and writing to requested slot should be protected by spinlock, but it should block only two related processes for short time (copy memory). Other possibility is showing the size of content in requested process slot. Then the requester can preallocate enough size of shared memory. But this doesn't solve a issues related to waiting requester for content. So first variant is pretty simple, and should be preferred. The disadvantage is clear - it can enforce maybe significant slowdown of fast queries.

Regards

Pavel

Happy hacking!
--
Alex

Re: On-demand running query plans using auto_explain and signals

From

"Shulgin, Oleksandr"

Date:

28 September 2015, 10:01:40

On Sun, Sep 27, 2015 at 8:05 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

the preparing of content before execution is interesting idea, that can be used more. The almost queries and plans are not too big, so when the size of content is not too big - less than 1MB, then can be used one DSM for all backends.

When size of content is bigger than limit, then DSM will be allocated specially for this content. The pointer to DSM and offset can be stored in requested process slot. The reading and writing to requested slot should be protected by spinlock, but it should block only two related processes for short time (copy memory).

Sorry, I don't think this will fly.

The whole idea is that a backend publishes the plan of a query just before running it and it doesn't care which other backend(s) might be reading it, how many times and in which order. The only required locking (implicit) is contained in the code for dsm_attach/detach().

Other possibility is showing the size of content in requested process slot. Then the requester can preallocate enough size of shared memory. But this doesn't solve a issues related to waiting requester for content. So first variant is pretty simple, and should be preferred. The disadvantage is clear - it can enforce maybe significant slowdown of fast queries.

Both of these approaches have just too many synchronization problems, IMO.

--
Alex

Re: On-demand running query plans using auto_explain and signals

From

Pavel Stehule

Date:

28 September 2015, 10:05:56

2015-09-28 12:01 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:

On Sun, Sep 27, 2015 at 8:05 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

the preparing of content before execution is interesting idea, that can be used more. The almost queries and plans are not too big, so when the size of content is not too big - less than 1MB, then can be used one DSM for all backends.

When size of content is bigger than limit, then DSM will be allocated specially for this content. The pointer to DSM and offset can be stored in requested process slot. The reading and writing to requested slot should be protected by spinlock, but it should block only two related processes for short time (copy memory).

Sorry, I don't think this will fly.

The whole idea is that a backend publishes the plan of a query just before running it and it doesn't care which other backend(s) might be reading it, how many times and in which order. The only required locking (implicit) is contained in the code for dsm_attach/detach().

I didn't propose too different solution. There is only one difference - sharing DSM for smaller data. It is similar to using usual shared memory.

Regards

Pavel

Other possibility is showing the size of content in requested process slot. Then the requester can preallocate enough size of shared memory. But this doesn't solve a issues related to waiting requester for content. So first variant is pretty simple, and should be preferred. The disadvantage is clear - it can enforce maybe significant slowdown of fast queries.

Both of these approaches have just too many synchronization problems, IMO.

--
Alex

Re: On-demand running query plans using auto_explain and signals

From

"Shulgin, Oleksandr"

Date:

28 September 2015, 10:38:10

On Mon, Sep 28, 2015 at 12:05 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

2015-09-28 12:01 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:
On Sun, Sep 27, 2015 at 8:05 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

the preparing of content before execution is interesting idea, that can be used more. The almost queries and plans are not too big, so when the size of content is not too big - less than 1MB, then can be used one DSM for all backends.

When size of content is bigger than limit, then DSM will be allocated specially for this content. The pointer to DSM and offset can be stored in requested process slot. The reading and writing to requested slot should be protected by spinlock, but it should block only two related processes for short time (copy memory).

Sorry, I don't think this will fly.

The whole idea is that a backend publishes the plan of a query just before running it and it doesn't care which other backend(s) might be reading it, how many times and in which order. The only required locking (implicit) is contained in the code for dsm_attach/detach().

I didn't propose too different solution. There is only one difference - sharing DSM for smaller data. It is similar to using usual shared memory.

Does this mean implementing some sort of allocator on top of the shared memory segment? If so, how are you going to prevent fragmentation?

--
Alex

Re: On-demand running query plans using auto_explain and signals

From

"Shulgin, Oleksandr"

Date:

28 September 2015, 14:25:00

On Fri, Sep 25, 2015 at 7:13 PM, Shulgin, Oleksandr <oleksandr.shulgin@zalando.de> wrote:

Some implementation details:

For every backend that might be running (MaxBackends) we reserve a dsm_handle slot in the addins shared memory. When the new option is turned on, the ExecutorRun hook will produce a plan in whatever format is specified by the auto_explain.log_format, allocate a DSM segment, copy the plan into the segment and finally store the DSM handle into its own slot. No locking is required around this because every backend writes to its slot exclusively, no other backend can be writing into it concurrently. In the ExecutorFinish hook we invalidate the backend's slot by setting the handle to 0 and deallocate the DSM segment.

And this patch adds back the use of table of contents on the DSM. Benefits: magic number validation; communication of the actual plan size.

Alex

Attachment

0002-Add-SHM-table-of-contents-to-the-explain-DSM.patch

Re: On-demand running query plans using auto_explain and signals

From

Pavel Stehule

Date:

28 September 2015, 17:10:39

2015-09-28 12:37 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:

On Mon, Sep 28, 2015 at 12:05 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

2015-09-28 12:01 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:
On Sun, Sep 27, 2015 at 8:05 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

the preparing of content before execution is interesting idea, that can be used more. The almost queries and plans are not too big, so when the size of content is not too big - less than 1MB, then can be used one DSM for all backends.

When size of content is bigger than limit, then DSM will be allocated specially for this content. The pointer to DSM and offset can be stored in requested process slot. The reading and writing to requested slot should be protected by spinlock, but it should block only two related processes for short time (copy memory).

Sorry, I don't think this will fly.

The whole idea is that a backend publishes the plan of a query just before running it and it doesn't care which other backend(s) might be reading it, how many times and in which order. The only required locking (implicit) is contained in the code for dsm_attach/detach().

I didn't propose too different solution. There is only one difference - sharing DSM for smaller data. It is similar to using usual shared memory.

Does this mean implementing some sort of allocator on top of the shared memory segment? If so, how are you going to prevent fragmentation?

yes, simple memory allocator is necessary in this case. But it should be really simple - you can allocate only fixed size blocks - 10KB, 100KB and 1MB from separate buffers. So the fragmentation is not possible.

Regards

Pavel

--
Alex

Re: On-demand running query plans using auto_explain and signals

From

"Shulgin, Oleksandr"

Date:

28 September 2015, 17:14:18

On Mon, Sep 28, 2015 at 7:09 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

2015-09-28 12:37 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:

I didn't propose too different solution. There is only one difference - sharing DSM for smaller data. It is similar to using usual shared memory.

Does this mean implementing some sort of allocator on top of the shared memory segment? If so, how are you going to prevent fragmentation?

yes, simple memory allocator is necessary in this case. But it should be really simple - you can allocate only fixed size blocks - 10KB, 100KB and 1MB from separate buffers. So the fragmentation is not possible.

Maybe we're talking about completely different ideas, but I don't see how fixing the block helps to fight fragmentation, in particular. Can you sketch a simple example? E.g. 400 backends share the common segment and all of them want to publish a plan of ~1kb, for example. Now what?

Alex

Re: On-demand running query plans using auto_explain and signals

From

Pavel Stehule

Date:

28 September 2015, 18:07:24

2015-09-28 19:13 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:

On Mon, Sep 28, 2015 at 7:09 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:

2015-09-28 12:37 GMT+02:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:

I didn't propose too different solution. There is only one difference - sharing DSM for smaller data. It is similar to using usual shared memory.

Does this mean implementing some sort of allocator on top of the shared memory segment? If so, how are you going to prevent fragmentation?

yes, simple memory allocator is necessary in this case. But it should be really simple - you can allocate only fixed size blocks - 10KB, 100KB and 1MB from separate buffers. So the fragmentation is not possible.

Maybe we're talking about completely different ideas, but I don't see how fixing the block helps to fight fragmentation, in particular. Can you sketch a simple example? E.g. 400 backends share the common segment and all of them want to publish a plan of ~1kb, for example. Now what?

Probably only few backends will be data requesters and few backends will be requested for data. The size of shared data will be typically less than 10KB, with few exceptions 100KB, and few exceptions 10MB. So for 400 backends you need 400*10KB+100*100KB+20*1MB = 34MB. You can have three independent buffers for this sizes. If some process require 5KB then you returns 10KB (it is good enough) from first buffer. This simple mechanism can coverage 90% of usage. for other 10% you can create new DSM. Because you are working with fixed size blocks and you don't divide the size of block, then the fragmentation isn't possible. This model allow very fast allocation, very fast deallocation.

It is not terribly effective, but 34MB is better than 400 DSM or than 400MB (max size for any backend).

Our super long queries are terribly long > 1MB, but we use a few clients (less than 2x CPU cores)

Regards

Pavel

--
Alex

Re: On-demand running query plans using auto_explain and signals

From

Jim Nasby

Date:

28 September 2015, 22:28:28

On 9/18/15 5:05 AM, Shulgin, Oleksandr wrote:
>
>     It should not be true - the data sender create DSM and fills it.
>     Then set caller slot and send signal to caller. Caller can free DSM
>     any time, because data sender send newer touch it.
>
>
> But the requester can timeout on waiting for reply and exit before it
> sees the reply DSM.  Actually, I now don't even think a backend can free
> the DSM it has not created.  First it will need to attach it,
> effectively increasing the refcount, and upon detach it will only
> decrease the refcount, but not actually release the segment...
>
> So this has to be the responsibility of the reply sending backend in the
> end: to create and release the DSM *at some point*.

What's wrong with just releasing it at the end of the statement? When 
the statement is done there's no point to reading it asynchronously anymore.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: On-demand running query plans using auto_explain and signals

From

"Shulgin, Oleksandr"

Date:

29 September 2015, 10:52:41

On Tue, Sep 29, 2015 at 12:28 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

On 9/18/15 5:05 AM, Shulgin, Oleksandr wrote:

So this has to be the responsibility of the reply sending backend in the
end: to create and release the DSM *at some point*.

What's wrong with just releasing it at the end of the statement? When the statement is done there's no point to reading it asynchronously anymore.

That was only one of the problems in signals-based design, and it has other more significant problems. The current approach does exactly that: allocate the segment before running the plan, release at the end of the statement.

Alex

Re: On-demand running query plans using auto_explain and signals

From

Andres Freund

Date:

29 September 2015, 10:57:29

On 2015-09-25 19:13:13 +0200, Shulgin, Oleksandr wrote:
> the auto_explain contrib module.  I now propose the most simple thing
> possible in my opinion: a new GUC option for auto_explain.  It will make
> every backend, in which the module is loaded via *_preload_libraries
> mechanism or using CREATE EXTENSION/LOAD commands, to actively publish the
> plans of queries in dynamic shared memory as long as
> auto_explain.publish_plans is turned on.

Wait. You're proposing that every query does this unconditionally? For
every single query? That sounds prohibitively expensive to me.

Greetings,

Andres Freund

Re: On-demand running query plans using auto_explain and signals

From

"Shulgin, Oleksandr"

Date:

29 September 2015, 11:13:00

On Tue, Sep 29, 2015 at 12:57 PM, Andres Freund <andres@anarazel.de> wrote:

On 2015-09-25 19:13:13 +0200, Shulgin, Oleksandr wrote:
> the auto_explain contrib module. I now propose the most simple thing
> possible in my opinion: a new GUC option for auto_explain. It will make
> every backend, in which the module is loaded via *_preload_libraries
> mechanism or using CREATE EXTENSION/LOAD commands, to actively publish the
> plans of queries in dynamic shared memory as long as
> auto_explain.publish_plans is turned on.

Wait. You're proposing that every query does this unconditionally? For
every single query? That sounds prohibitively expensive to me.

Only if the extension is loaded AND the option is turned on. Likely you don't want to do this for an OLTP database, but for other types of load it might make sense. Also, I think it should be possible to toggle this on a per-process basis.

Finally, we can put in a cost threshold, so the plans only get published if they have total_cost >= publish_plan_cost_threshod.

Also, do you mean expensive in terms of CPU or the dynamic shared memory?

Alex

Re: On-demand running query plans using auto_explain and signals

From

Simon Riggs

Date:

29 September 2015, 17:01:48

On 25 September 2015 at 12:13, Shulgin, Oleksandr <oleksandr.shulgin@zalando.de> wrote:

I now believe that use of ProcessInterrupts() in the recently proposed design as well as manipulation of proc latch due to use of shared memory queue are major blockers.

In order to simplify things up, I've taken a step back and looked again at the auto_explain contrib module. I now propose the most simple thing possible in my opinion: a new GUC option for auto_explain. It will make every backend, in which the module is loaded via *_preload_libraries mechanism or using CREATE EXTENSION/LOAD commands, to actively publish the plans of queries in dynamic shared memory as long as auto_explain.publish_plans is turned on.

The greatest part for me, is that this approach doesn't require handling of signals and is isolated in an external module, so it can be readily used with the current server versions, no need to wait for >= 9.6.

This is a major change of direction, so the thread title no longer applies at all.

The requirement is to be able to see the output of EXPLAIN ANALYZE for a running process. Ideally, we would dump the diagnostic output for any process that runs longer than a specific timeout, so we can decide whether to kill it, or just save that for later diagnostics.

I'm interested in helping the original goal happen. Dumping an EXPLAIN, without ANALYZE info, is not at all interesting since it has no value for immediate action or post-facto diagnosis, sorry to say - making it happen for every backend just makes it even less useful.

Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: On-demand running query plans using auto_explain and signals

From

"Shulgin, Oleksandr"

Date:

29 September 2015, 17:53:11

On Tue, Sep 29, 2015 at 7:01 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

On 25 September 2015 at 12:13, Shulgin, Oleksandr <oleksandr.shulgin@zalando.de> wrote:

I now believe that use of ProcessInterrupts() in the recently proposed design as well as manipulation of proc latch due to use of shared memory queue are major blockers.

In order to simplify things up, I've taken a step back and looked again at the auto_explain contrib module. I now propose the most simple thing possible in my opinion: a new GUC option for auto_explain. It will make every backend, in which the module is loaded via *_preload_libraries mechanism or using CREATE EXTENSION/LOAD commands, to actively publish the plans of queries in dynamic shared memory as long as auto_explain.publish_plans is turned on.

The greatest part for me, is that this approach doesn't require handling of signals and is isolated in an external module, so it can be readily used with the current server versions, no need to wait for >= 9.6.

This is a major change of direction, so the thread title no longer applies at all.

The requirement is to be able to see the output of EXPLAIN ANALYZE for a running process. Ideally, we would dump the diagnostic output for any process that runs longer than a specific timeout, so we can decide whether to kill it, or just save that for later diagnostics.

I'm interested in helping the original goal happen. Dumping an EXPLAIN, without ANALYZE info, is not at all interesting since it has no value for immediate action or post-facto diagnosis, sorry to say - making it happen for every backend just makes it even less useful.

This is not a change of the direction, but rather of the approach. Hitting a process with a signal and hoping it will produce a meaningful response in all circumstances without disrupting its current task was way too naive. I'd really want to make this work with ANALYZE, just not as the first step. I believe this approach can be extended to include instrumentation support (obviously we will not be able to contain this in the auto_explain module).

Alex

Re: On-demand running query plans using auto_explain and signals

From

Simon Riggs

Date:

29 September 2015, 18:34:16

On 29 September 2015 at 12:52, Shulgin, Oleksandr <oleksandr.shulgin@zalando.de> wrote:

Hitting a process with a signal and hoping it will produce a meaningful response in all circumstances without disrupting its current task was way too naive.

Hmm, I would have to disagree, sorry. For me the problem was dynamically allocating everything at the time the signal is received and getting into problems when that caused errors.

* INIT - Allocate N areas of memory for use by queries, which can be expanded/contracted as needed. Keep a freelist of structures.

* OBSERVER - When requested, gain exclusive access to a diagnostic area, then allocate the designated process to that area, then send a signal

* QUERY - When signal received dump an EXPLAIN ANALYZE to the allocated diagnostic area, (set flag to show complete, set latch on observer)

* OBSERVER - process data in diagnostic area and then release area for use by next observation

If the EXPLAIN ANALYZE doesn't fit into the diagnostic chunk, LOG it as a problem and copy data only up to the size defined. Any other ERRORs that are caused by this process cause it to fail normally.

That allows the observer to be another backend, or it allows the query process to perform self-observation based upon a timeout (e.g. >1 hour) or a row limit (e.g. when an optimizer estimate is seen to be badly wrong).

Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: On-demand running query plans using auto_explain and signals

From

"Shulgin, Oleksandr"

Date:

29 September 2015, 19:52:35

On Tue, Sep 29, 2015 at 8:34 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

On 29 September 2015 at 12:52, Shulgin, Oleksandr <oleksandr.shulgin@zalando.de> wrote:

Hitting a process with a signal and hoping it will produce a meaningful response in all circumstances without disrupting its current task was way too naive.

Hmm, I would have to disagree, sorry. For me the problem was dynamically allocating everything at the time the signal is received and getting into problems when that caused errors.

What I mean is that we need to move the actual EXPLAIN run out of ProcessInterrupts(). It can be still fine to trigger the communication with a signal.

* INIT - Allocate N areas of memory for use by queries, which can be expanded/contracted as needed. Keep a freelist of structures.
* OBSERVER - When requested, gain exclusive access to a diagnostic area, then allocate the designated process to that area, then send a signal
* QUERY - When signal received dump an EXPLAIN ANALYZE to the allocated diagnostic area, (set flag to show complete, set latch on observer)
* OBSERVER - process data in diagnostic area and then release area for use by next observation

If the EXPLAIN ANALYZE doesn't fit into the diagnostic chunk, LOG it as a problem and copy data only up to the size defined. Any other ERRORs that are caused by this process cause it to fail normally.

Do you envision problems if we do this with a newly allocated DSM every time instead of pre-allocated area? This will have to revert the workflow, because only the QUERY knows the required segment size:

OBSERVER - sends a signal and waits for its proc latch to be set

QUERY - when signal is received allocates a DSM just big enough to fit the EXPLAIN plan, then locates the OBSERVER(s) and sets its latch (or their latches)

The EXPLAIN plan should already be produced somewhere in the executor, to avoid calling into explain.c from ProcessInterrupts().

That allows the observer to be another backend, or it allows the query process to perform self-observation based upon a timeout (e.g. >1 hour) or a row limit (e.g. when an optimizer estimate is seen to be badly wrong).

Do you think there is one single best place in the executor code where such a check could be added? I have very little idea about that.

Alex

Re: On-demand running query plans using auto_explain and signals

From

Simon Riggs

Date:

29 September 2015, 20:59:26

On 29 September 2015 at 20:52, Shulgin, Oleksandr <oleksandr.shulgin@zalando.de> wrote:

On Tue, Sep 29, 2015 at 8:34 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 29 September 2015 at 12:52, Shulgin, Oleksandr <oleksandr.shulgin@zalando.de> wrote:

Hitting a process with a signal and hoping it will produce a meaningful response in all circumstances without disrupting its current task was way too naive.

Hmm, I would have to disagree, sorry. For me the problem was dynamically allocating everything at the time the signal is received and getting into problems when that caused errors.

What I mean is that we need to move the actual EXPLAIN run out of ProcessInterrupts(). It can be still fine to trigger the communication with a signal.

Good

* INIT - Allocate N areas of memory for use by queries, which can be expanded/contracted as needed. Keep a freelist of structures.
* OBSERVER - When requested, gain exclusive access to a diagnostic area, then allocate the designated process to that area, then send a signal
* QUERY - When signal received dump an EXPLAIN ANALYZE to the allocated diagnostic area, (set flag to show complete, set latch on observer)
* OBSERVER - process data in diagnostic area and then release area for use by next observation

If the EXPLAIN ANALYZE doesn't fit into the diagnostic chunk, LOG it as a problem and copy data only up to the size defined. Any other ERRORs that are caused by this process cause it to fail normally.

Do you envision problems if we do this with a newly allocated DSM every time instead of pre-allocated area? This will have to revert the workflow, because only the QUERY knows the required segment size:

That's too fiddly; we need to keep it simple by using just fixed sizes.

OBSERVER - sends a signal and waits for its proc latch to be set
QUERY - when signal is received allocates a DSM just big enough to fit the EXPLAIN plan, then locates the OBSERVER(s) and sets its latch (or their latches)

The EXPLAIN plan should already be produced somewhere in the executor, to avoid calling into explain.c from ProcessInterrupts().

That allows the observer to be another backend, or it allows the query process to perform self-observation based upon a timeout (e.g. >1 hour) or a row limit (e.g. when an optimizer estimate is seen to be badly wrong).

Do you think there is one single best place in the executor code where such a check could be added? I have very little idea about that.

Fairly simple.

Main problem is knowing how to handle nested calls to the executor.

I'll look at the patch.

Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: On-demand running query plans using auto_explain and signals

From

Robert Haas

Date:

02 October 2015, 20:25:50

On Tue, Sep 29, 2015 at 1:52 PM, Shulgin, Oleksandr
<oleksandr.shulgin@zalando.de> wrote:
> This is not a change of the direction, but rather of the approach.  Hitting
> a process with a signal and hoping it will produce a meaningful response in
> all circumstances without disrupting its current task was way too naive.

I think that's exactly right.  It's not safe to do much anything from
a signal handler, and while ProcessInterrupts() is a very
substantially safer, the set of things that can be safely done there
is still awfully restricted.  You have to cope with the fact that the
function you just interrupted may be doing anything at all, and if you
change anything before returning, you may knock over the apple cart.
Just as bad, the state you inherit may not be very sane: we call
ProcessInterrupts() from LOTS of places and there is absolutely no
guarantee that every one of those places has the data structures that
you need to run EXPLAIN ANALYZE in a sane state.

I haven't looked at the new patch specifically so I don't have an
opinion on that at this time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: On-demand running query plans using auto_explain and signals

From

"Shulgin, Oleksandr"

Date:

15 October 2015, 09:16:25

On Tue, Sep 29, 2015 at 7:52 PM, Shulgin, Oleksandr <oleksandr.shulgin@zalando.de> wrote:

This is not a change of the direction, but rather of the approach. Hitting a process with a signal and hoping it will produce a meaningful response in all circumstances without disrupting its current task was way too naive. I'd really want to make this work with ANALYZE, just not as the first step. I believe this approach can be extended to include instrumentation support (obviously we will not be able to contain this in the auto_explain module).

I was thinking about this and what seems to be the biggest problem is when to actually turn the feature on. It seems unlikely that someone will want to enable it unconditionally. Enabling per-backend also doesn't seem to be a good approach because you don't know if the next query you'd like to look at is going to run in this exact backend.

What might be actually usable is poking pg_stat_statements for queryid to decide if we need to do explain (and possibly analyze). We could make it so that the explain is produced for any query that is known to run longer than certain configurable threshold on average (or we can provide for enabling this explicitly per entry in pg_stat_statements). Then the interested client can go and do pg_explain_backend() on the pid.

If we would also track the plan total costs, we could do some predictions so that if the same query comes along and is planned with total cost exceeding the recorded average the configurable amount of percent, then we enable explain.

Does this make sense to you? Does this make a good argument for merging pg_stat_statements and auto_explain into core?

Alex

Re: On-demand running query plans using auto_explain and signals

From

Tom Lane

Date:

15 October 2015, 13:43:00

"Shulgin, Oleksandr" <oleksandr.shulgin@zalando.de> writes:
> I was thinking about this and what seems to be the biggest problem is when
> to actually turn the feature on.  It seems unlikely that someone will want
> to enable it unconditionally.  Enabling per-backend also doesn't seem to be
> a good approach because you don't know if the next query you'd like to look
> at is going to run in this exact backend.

Check.

> What might be actually usable is poking pg_stat_statements for queryid to
> decide if we need to do explain (and possibly analyze).

Hm, interesting thought.

> Does this make sense to you?  Does this make a good argument for merging
> pg_stat_statements and auto_explain into core?

I'd say more that it's a good argument for moving this feature out to
one of those extensions, or perhaps building a third extension that
depends on both of those.  TBH, none of this stuff smells to me like
something that ought to be in core.
        regards, tom lane

Re: On-demand running query plans using auto_explain and signals

From

Pavel Stehule

Date:

15 October 2015, 14:04:54

2015-10-15 15:42 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:

"Shulgin, Oleksandr" <oleksandr.shulgin@zalando.de> writes:
> I was thinking about this and what seems to be the biggest problem is when
> to actually turn the feature on. It seems unlikely that someone will want
> to enable it unconditionally. Enabling per-backend also doesn't seem to be
> a good approach because you don't know if the next query you'd like to look
> at is going to run in this exact backend.

Check.

> What might be actually usable is poking pg_stat_statements for queryid to
> decide if we need to do explain (and possibly analyze).

Hm, interesting thought.

> Does this make sense to you? Does this make a good argument for merging
> pg_stat_statements and auto_explain into core?

I'd say more that it's a good argument for moving this feature out to
one of those extensions, or perhaps building a third extension that
depends on both of those. TBH, none of this stuff smells to me like
something that ought to be in core.

There are few features, that I would to see in core:

1. possibility to get full SQL string

2. possibility to get state string

We can speak how to do it well.

Regards

Pavel

regards, tom lane

Re: On-demand running query plans using auto_explain and signals

From

Julien Rouhaud

Date:

30 November 2015, 21:27:34

Hello,

On 15/10/2015 16:04, Pavel Stehule wrote:
> 
> 
> 2015-10-15 15:42 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us
> <mailto:tgl@sss.pgh.pa.us>>:
> 
>     "Shulgin, Oleksandr" <oleksandr.shulgin@zalando.de
>     <mailto:oleksandr.shulgin@zalando.de>> writes:
>     > I was thinking about this and what seems to be the biggest problem is when
>     > to actually turn the feature on.  It seems unlikely that someone will want
>     > to enable it unconditionally.  Enabling per-backend also doesn't seem to be
>     > a good approach because you don't know if the next query you'd like to look
>     > at is going to run in this exact backend.
> 
>     Check.
> 
>     > What might be actually usable is poking pg_stat_statements for queryid to
>     > decide if we need to do explain (and possibly analyze).
> 
>     Hm, interesting thought.
> 
>     > Does this make sense to you?  Does this make a good argument for merging
>     > pg_stat_statements and auto_explain into core?
> 
>     I'd say more that it's a good argument for moving this feature out to
>     one of those extensions, or perhaps building a third extension that
>     depends on both of those.  TBH, none of this stuff smells to me like
>     something that ought to be in core.
> 
> 
> There are few features, that I would to see in core:
> 
> 1. possibility to get full SQL string
> 2. possibility to get state string
> 
> We can speak how to do it well.
> 

I registered as reviewer on this, but after reading the whole thread for
the second time, it's still not clear to me if the last two submitted
patches (0001-Add-auto_explain.publish_plans.patch and
0002-Add-SHM-table-of-contents-to-the-explain-DSM.patch) are still
needed reviews, since multiple refactoring ideas and objections have
been raised since.


Regards.

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org

Re: On-demand running query plans using auto_explain and signals

From

Simon Riggs

Date:

30 November 2015, 23:04:34

On 30 November 2015 at 22:27, Julien Rouhaud <julien.rouhaud@dalibo.com> wrote:

I registered as reviewer on this, but after reading the whole thread for
the second time, it's still not clear to me if the last two submitted
patches (0001-Add-auto_explain.publish_plans.patch and
0002-Add-SHM-table-of-contents-to-the-explain-DSM.patch) are still
needed reviews, since multiple refactoring ideas and objections have
been raised since.

I looked into it more deeply and decided partial EXPLAINs aren't very useful yet.

I've got some thoughts around that which I'll publish when I have more time. I suggest this is a good idea, just needs some serious committer-love, so we should bounce this for now.

Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: On-demand running query plans using auto_explain and signals

From

"Shulgin, Oleksandr"

Date:

01 December 2015, 09:34:57

On Tue, Dec 1, 2015 at 12:04 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

On 30 November 2015 at 22:27, Julien Rouhaud <julien.rouhaud@dalibo.com> wrote:

I registered as reviewer on this, but after reading the whole thread for
the second time, it's still not clear to me if the last two submitted
patches (0001-Add-auto_explain.publish_plans.patch and
0002-Add-SHM-table-of-contents-to-the-explain-DSM.patch) are still
needed reviews, since multiple refactoring ideas and objections have
been raised since.

I looked into it more deeply and decided partial EXPLAINs aren't very useful yet.

I've got some thoughts around that which I'll publish when I have more time. I suggest this is a good idea, just needs some serious committer-love, so we should bounce this for now.

I have the plans to make something from this on top of pg_stat_statements and auto_explain, as I've mentioned last time. The next iteration will be based on the two latest patches above, so it still makes sense to review them.

As for EXPLAIN ANALYZE support, it will require changes to core, but this can be done separately.

Alex

Re: On-demand running query plans using auto_explain and signals

From

Tomas Vondra

Date:

16 December 2015, 19:40:08

Hi,

On 12/01/2015 10:34 AM, Shulgin, Oleksandr wrote:
>
> I have the plans to make something from this on top of
> pg_stat_statements and auto_explain, as I've mentioned last time.  The
> next iteration will be based on the two latest patches above, so it
> still makes sense to review them.
>
> As for EXPLAIN ANALYZE support, it will require changes to core, but
> this can be done separately.

I'm re-reading the thread, and I have to admit I'm utterly confused what 
is the current plan, what features it's supposed to provide and whether 
it will solve the use case I'm most interested in. Oleksandr, could you 
please post a summary explaining that?

My use case for this functionality is debugging of long-running queries, 
particularly getting EXPLAIN ANALYZE for them. In such cases I either 
can't run the EXPLAIN ANALYZE manually because it will either run 
forever (just like the query) and may not be the same (e.g. due to 
recent ANALYZE). So we need to extract the data from the process 
executing the query.

I'm not essentially opposed to doing this in an extension (better an 
extension than nothing), but I don't quite see how you could to do that 
from pg_stat_statements or auto_explain. AFAIK both extensions merely 
use hooks before/after the executor, and therefore can't do anything in 
between (while the query is actually running).

Perhaps you don't intend to solve this particular use case? Which use 
cases are you aiming to solve, then? Could you explain?

Maybe all we need to do is add another hook somewhere in the executor, 
and push the explain analyze into pg_stat_statements once in a while, 
entirely eliminating the need for inter-process communication (signals, 
queues, ...).

I'm pretty sure we don't need this for "short" queries, because in those 
cases we have other means to get the explain analyze (e.g. running the 
query again or auto_explain). So I can imagine having a rather high 
threshold (say, 60 seconds or even more), and we'd only push the explain 
analyze after crossing it. And then we'd only update it once in a while 
- say, again every 60 seconds.

Of course, this might be configurable by two GUCs:
   pg_stat_statements.explain_analyze_threshold = 60  # -1 is "off"   pg_stat_statements.explain_analyze_refresh = 60

FWIW I'd still prefer having "EXPLAIN ANALYZE" in core, but better this 
than nothing.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: On-demand running query plans using auto_explain and signals

From

"Shulgin, Oleksandr"

Date:

17 December 2015, 09:46:06

On Wed, Dec 16, 2015 at 8:39 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

Hi,

On 12/01/2015 10:34 AM, Shulgin, Oleksandr wrote:

I have the plans to make something from this on top of
pg_stat_statements and auto_explain, as I've mentioned last time. The
next iteration will be based on the two latest patches above, so it
still makes sense to review them.

As for EXPLAIN ANALYZE support, it will require changes to core, but
this can be done separately.

I'm re-reading the thread, and I have to admit I'm utterly confused what is the current plan, what features it's supposed to provide and whether it will solve the use case I'm most interested in. Oleksandr, could you please post a summary explaining that?

My use case for this functionality is debugging of long-running queries, particularly getting EXPLAIN ANALYZE for them. In such cases I either can't run the EXPLAIN ANALYZE manually because it will either run forever (just like the query) and may not be the same (e.g. due to recent ANALYZE). So we need to extract the data from the process executing the query.

I'm not essentially opposed to doing this in an extension (better an extension than nothing), but I don't quite see how you could to do that from pg_stat_statements or auto_explain. AFAIK both extensions merely use hooks before/after the executor, and therefore can't do anything in between (while the query is actually running).

Perhaps you don't intend to solve this particular use case? Which use cases are you aiming to solve, then? Could you explain?

Hi Tomas.

Thanks for your interest in this patch!

My motivation is the same as your use case: having a long-running query, be able to look inside this exact query run by this exact backend.

I admit the evolution of ideas in this patch can be very confusing: we were trying a number of different approaches, w/o thinking deeply on the implications, just to have a proof of concept.

Maybe all we need to do is add another hook somewhere in the executor, and push the explain analyze into pg_stat_statements once in a while, entirely eliminating the need for inter-process communication (signals, queues, ...).

I'm pretty sure we don't need this for "short" queries, because in those cases we have other means to get the explain analyze (e.g. running the query again or auto_explain). So I can imagine having a rather high threshold (say, 60 seconds or even more), and we'd only push the explain analyze after crossing it. And then we'd only update it once in a while - say, again every 60 seconds.

Of course, this might be configurable by two GUCs:

pg_stat_statements.explain_analyze_threshold = 60 # -1 is "off"
pg_stat_statements.explain_analyze_refresh = 60

FWIW I'd still prefer having "EXPLAIN ANALYZE" in core, but better this than nothing.

Yes, this is how pg_stat_statements idea comes into play: even if we implement support for online EXPLAIN ANALYZE, enabling it for every query is an overkill, but if you don't enable it from the query start, there is no way to enable it later on as the query has already progressed. So in order to know for which queries it makes sense to enable this feature, we need to know what is the query's expected run time, thus pg_stat_statements seems like a natural place to obtain this information and/or trigger the behavior.

I'm also all for simplification of the underlying communication mechanism: shared memory queues are neat, but not necessarily the best way to handle it. As for the use of signals: I believe this was a red herring, it will make the code much less fragile if the progressing backend itself will publish intermediary EXPLAIN ANALYZE reports for other backends to read.

The EXPLAIN (w/o ANALYZE) we can do completely as an extension: no core support required. To enable ANALYZE it will require a little hacking around Instrumentation methods: otherwise the Explain functions just crash when run in the middle of the query.

Hope that makes it clear.

Alex

Re: On-demand running query plans using auto_explain and signals

From

Michael Paquier

Date:

24 December 2015, 02:23:07

On Thu, Dec 17, 2015 at 6:45 PM, Shulgin, Oleksandr
<oleksandr.shulgin@zalando.de> wrote:
> On Wed, Dec 16, 2015 at 8:39 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com>
> wrote:
>> On 12/01/2015 10:34 AM, Shulgin, Oleksandr wrote:
>>>
>>>
>>> I have the plans to make something from this on top of
>>> pg_stat_statements and auto_explain, as I've mentioned last time.  The
>>> next iteration will be based on the two latest patches above, so it
>>> still makes sense to review them.
>>>
>>> As for EXPLAIN ANALYZE support, it will require changes to core, but
>>> this can be done separately.
>>
>> I'm re-reading the thread, and I have to admit I'm utterly confused what
>> is the current plan, what features it's supposed to provide and whether it
>> will solve the use case I'm most interested in. Oleksandr, could you please
>> post a summary explaining that?
>>
>> My use case for this functionality is debugging of long-running queries,
>> particularly getting EXPLAIN ANALYZE for them. In such cases I either can't
>> run the EXPLAIN ANALYZE manually because it will either run forever (just
>> like the query) and may not be the same (e.g. due to recent ANALYZE). So we
>> need to extract the data from the process executing the query.
>>
>> I'm not essentially opposed to doing this in an extension (better an
>> extension than nothing), but I don't quite see how you could to do that from
>> pg_stat_statements or auto_explain. AFAIK both extensions merely use hooks
>> before/after the executor, and therefore can't do anything in between (while
>> the query is actually running).
>>
>> Perhaps you don't intend to solve this particular use case? Which use
>> cases are you aiming to solve, then? Could you explain?
>
> Thanks for your interest in this patch!
>
> My motivation is the same as your use case: having a long-running query, be
> able to look inside this exact query run by this exact backend.
>
> I admit the evolution of ideas in this patch can be very confusing: we were
> trying a number of different approaches, w/o thinking deeply on the
> implications, just to have a proof of concept.

It's great to see ideas of concepts and things to help address those
issues, at least we are agreeing that there is a hole in the
instrumentation and that it would be nice to fill it with something.
Still, it is not completely clear which approach would be taken. Is it
fair to mark the current patch as returned with feedback then?
-- 
Michael

Re: On-demand running query plans using auto_explain and signals

From

Pavel Stehule

Date:

24 December 2015, 04:58:01

2015-12-24 3:23 GMT+01:00 Michael Paquier <michael.paquier@gmail.com>:

On Thu, Dec 17, 2015 at 6:45 PM, Shulgin, Oleksandr
<oleksandr.shulgin@zalando.de> wrote:
> On Wed, Dec 16, 2015 at 8:39 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com>
> wrote:
>> On 12/01/2015 10:34 AM, Shulgin, Oleksandr wrote:
>>>
>>>
>>> I have the plans to make something from this on top of
>>> pg_stat_statements and auto_explain, as I've mentioned last time. The
>>> next iteration will be based on the two latest patches above, so it
>>> still makes sense to review them.
>>>
>>> As for EXPLAIN ANALYZE support, it will require changes to core, but
>>> this can be done separately.
>>
>> I'm re-reading the thread, and I have to admit I'm utterly confused what
>> is the current plan, what features it's supposed to provide and whether it
>> will solve the use case I'm most interested in. Oleksandr, could you please
>> post a summary explaining that?
>>
>> My use case for this functionality is debugging of long-running queries,
>> particularly getting EXPLAIN ANALYZE for them. In such cases I either can't
>> run the EXPLAIN ANALYZE manually because it will either run forever (just
>> like the query) and may not be the same (e.g. due to recent ANALYZE). So we
>> need to extract the data from the process executing the query.
>>
>> I'm not essentially opposed to doing this in an extension (better an
>> extension than nothing), but I don't quite see how you could to do that from
>> pg_stat_statements or auto_explain. AFAIK both extensions merely use hooks
>> before/after the executor, and therefore can't do anything in between (while
>> the query is actually running).
>>
>> Perhaps you don't intend to solve this particular use case? Which use
>> cases are you aiming to solve, then? Could you explain?
>
> Thanks for your interest in this patch!
>
> My motivation is the same as your use case: having a long-running query, be
> able to look inside this exact query run by this exact backend.
>
> I admit the evolution of ideas in this patch can be very confusing: we were
> trying a number of different approaches, w/o thinking deeply on the
> implications, just to have a proof of concept.

It's great to see ideas of concepts and things to help address those
issues, at least we are agreeing that there is a hole in the
instrumentation and that it would be nice to fill it with something.
Still, it is not completely clear which approach would be taken. Is it
fair to mark the current patch as returned with feedback then?

Pavel

--
Michael

Re: On-demand running query plans using auto_explain and signals

From

Michael Paquier

Date:

24 December 2015, 07:36:53

On Thu, Dec 24, 2015 at 1:57 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
>
> 2015-12-24 3:23 GMT+01:00 Michael Paquier <michael.paquier@gmail.com>:
>>
>> On Thu, Dec 17, 2015 at 6:45 PM, Shulgin, Oleksandr
>> <oleksandr.shulgin@zalando.de> wrote:
>> > On Wed, Dec 16, 2015 at 8:39 PM, Tomas Vondra
>> > <tomas.vondra@2ndquadrant.com>
>> > wrote:
>> >> On 12/01/2015 10:34 AM, Shulgin, Oleksandr wrote:
>> >>>
>> >>>
>> >>> I have the plans to make something from this on top of
>> >>> pg_stat_statements and auto_explain, as I've mentioned last time.  The
>> >>> next iteration will be based on the two latest patches above, so it
>> >>> still makes sense to review them.
>> >>>
>> >>> As for EXPLAIN ANALYZE support, it will require changes to core, but
>> >>> this can be done separately.
>> >>
>> >> I'm re-reading the thread, and I have to admit I'm utterly confused
>> >> what
>> >> is the current plan, what features it's supposed to provide and whether
>> >> it
>> >> will solve the use case I'm most interested in. Oleksandr, could you
>> >> please
>> >> post a summary explaining that?
>> >>
>> >> My use case for this functionality is debugging of long-running
>> >> queries,
>> >> particularly getting EXPLAIN ANALYZE for them. In such cases I either
>> >> can't
>> >> run the EXPLAIN ANALYZE manually because it will either run forever
>> >> (just
>> >> like the query) and may not be the same (e.g. due to recent ANALYZE).
>> >> So we
>> >> need to extract the data from the process executing the query.
>> >>
>> >> I'm not essentially opposed to doing this in an extension (better an
>> >> extension than nothing), but I don't quite see how you could to do that
>> >> from
>> >> pg_stat_statements or auto_explain. AFAIK both extensions merely use
>> >> hooks
>> >> before/after the executor, and therefore can't do anything in between
>> >> (while
>> >> the query is actually running).
>> >>
>> >> Perhaps you don't intend to solve this particular use case? Which use
>> >> cases are you aiming to solve, then? Could you explain?
>> >
>> > Thanks for your interest in this patch!
>> >
>> > My motivation is the same as your use case: having a long-running query,
>> > be
>> > able to look inside this exact query run by this exact backend.
>> >
>> > I admit the evolution of ideas in this patch can be very confusing: we
>> > were
>> > trying a number of different approaches, w/o thinking deeply on the
>> > implications, just to have a proof of concept.
>>
>> It's great to see ideas of concepts and things to help address those
>> issues, at least we are agreeing that there is a hole in the
>> instrumentation and that it would be nice to fill it with something.
>> Still, it is not completely clear which approach would be taken. Is it
>> fair to mark the current patch as returned with feedback then?
>
>
> +1

So, done this way. We could always move it to next CF if someone
wishes to move on. But at this point that would be a different
approach than what was proposed first..
-- 
Michael