Thread: emit recovery stats via a new file or a new hook

emit recovery stats via a new file or a new hook

From

Bharath Rupireddy

Date:

31 October 2021, 13:36:07

Hi,

It is sometimes super important to be able to answer customer
questions like: What was the total time taken by the last recovery of
the server? What was the time taken by each phase of recovery/redo
processing of the startup process? Why did the recovery take so long?
We've encountered these questions while dealing with the postgres
customers. If these stats are available in an easily consumable
fashion, it will be easier for us to understand, debug and identify
root cause for "recovery taking a long time" problems, improve if
possible and answer the customer questions. Also, these recovery stats
can be read by an external analytical tool to show the recovery
patterns to the customers directly. Although postgres emits some info
via server logs thanks to the recent commit [3], it isn't easily
consumable for the use cases that I mentioned.

Here are a few thoughts on how we could go about doing this. I
proposed them earlier in [1],
1) capture and write recovery stats into a file
2) capture and emit recovery stats via a new hook
3) capture and write into a new system catalog table (assuming at the
end of the recovery the database is in a consistent state, but I'm not
sure if we ever update any catalog tables in/after the
startup/recovery phase)

As Robert rightly suggested at [2], option (3) isn't an easy way to do
that so we can park that idea aside, options (1) and (2) seem
reasonable.

Thoughts?

[1] - https://www.postgresql.org/message-id/CALj2ACUwb3x%2BJFHkXp4Lf603Q3qFgK0P6kSsJvZkH4QAvGv4ig%40mail.gmail.com
[2] -
https://www.postgresql.org/message-id/CA%2BTgmoZ0b7JkNexaoGDXJ%3D8Zq%2B_NFZBek1oyyPU%2BDDsRi1dsCw%40mail.gmail.com
[3] - commit 9ce346eabf350a130bba46be3f8c50ba28506969
Author: Robert Haas <rhaas@postgresql.org>
Date:   Mon Oct 25 11:51:57 2021 -0400

    Report progress of startup operations that take a long time.

Regards,
Bharath Rupireddy.

Re: emit recovery stats via a new file or a new hook

From

Andres Freund

Date:

31 October 2021, 19:00:17

Hi,

On 2021-10-31 19:06:07 +0530, Bharath Rupireddy wrote:
> It is sometimes super important to be able to answer customer
> questions like: What was the total time taken by the last recovery of
> the server? What was the time taken by each phase of recovery/redo
> processing of the startup process? Why did the recovery take so long?
> We've encountered these questions while dealing with the postgres
> customers. If these stats are available in an easily consumable
> fashion, it will be easier for us to understand, debug and identify
> root cause for "recovery taking a long time" problems, improve if
> possible and answer the customer questions. Also, these recovery stats
> can be read by an external analytical tool to show the recovery
> patterns to the customers directly. Although postgres emits some info
> via server logs thanks to the recent commit [3], it isn't easily
> consumable for the use cases that I mentioned.
> 
> Here are a few thoughts on how we could go about doing this. I
> proposed them earlier in [1],
> 1) capture and write recovery stats into a file
> 2) capture and emit recovery stats via a new hook
> 3) capture and write into a new system catalog table (assuming at the
> end of the recovery the database is in a consistent state, but I'm not
> sure if we ever update any catalog tables in/after the
> startup/recovery phase)
> 
> As Robert rightly suggested at [2], option (3) isn't an easy way to do
> that so we can park that idea aside, options (1) and (2) seem
> reasonable.

I don't think 1) is a good approach, because it just leads us down the
path of having dozens of log files. 2) isn't useful either, because
you'd need to load an extension library first, which users won't
have done before hitting the problem. And 3) isn't really possible.

I'm not sure that the new log messages aren't sufficient. But if they
aren't, it seems better to keep additional data in the stats system, and
make them visible via views, rather than adding yet another place to
keep stats.

Greetings,

Andres Freund

Re: emit recovery stats via a new file or a new hook

From

Amit Kapila

Date:

03 November 2021, 05:58:25

On Mon, Nov 1, 2021 at 12:30 AM Andres Freund <andres@anarazel.de> wrote:
>
> I'm not sure that the new log messages aren't sufficient. But if they
> aren't, it seems better to keep additional data in the stats system, and
> make them visible via views, rather than adding yet another place to
> keep stats.
>

+1. This is exactly what came to my mind after reading Bharath's email.

-- 
With Regards,
Amit Kapila.

Re: emit recovery stats via a new file or a new hook

From

Bharath Rupireddy

Date:

10 November 2021, 17:51:15

On Mon, Nov 1, 2021 at 12:30 AM Andres Freund <andres@anarazel.de> wrote:
> > Here are a few thoughts on how we could go about doing this. I
> > proposed them earlier in [1],
> > 1) capture and write recovery stats into a file
> > 2) capture and emit recovery stats via a new hook
> > 3) capture and write into a new system catalog table (assuming at the
> > end of the recovery the database is in a consistent state, but I'm not
> > sure if we ever update any catalog tables in/after the
> > startup/recovery phase)
> >
> > As Robert rightly suggested at [2], option (3) isn't an easy way to do
> > that so we can park that idea aside, options (1) and (2) seem
> > reasonable.
>
> I don't think 1) is a good approach, because it just leads us down the
> path of having dozens of log files.

What I had in my mind when I said "we write to a file'' was to have a
single file (similar to backup_label file) which just keeps the
last/latest recovery stats, but not all the previous recovery stats.

> 2) isn't useful either, because
> you'd need to load an extension library first, which users won't
> have done before hitting the problem.

The database vendor can implement the hook with their customizations
and load it as an extension in their postgres offerings. The postgres
will collect the recovery stats if the hook is defined and emit them
via this hook. Actually, this will be a better approach IMO. Thoughts?

> I'm not sure that the new log messages aren't sufficient. But if they
> aren't, it seems better to keep additional data in the stats system, and
> make them visible via views, rather than adding yet another place to
> keep stats.

For this, the analytic tools/users whoever wants to know the recovery
stats, will have to make connections to the database which might eat
up the total connections. Instead, the postgres emitting the stats via
a hook will be more promising. Thoughts?

Regards,
Bharath Rupireddy.

Re: emit recovery stats via a new file or a new hook

From

"Euler Taveira"

Date:

10 November 2021, 19:25:56

On Wed, Nov 10, 2021, at 2:51 PM, Bharath Rupireddy wrote:

On Mon, Nov 1, 2021 at 12:30 AM Andres Freund <andres@anarazel.de> wrote:
> I'm not sure that the new log messages aren't sufficient. But if they
> aren't, it seems better to keep additional data in the stats system, and
> make them visible via views, rather than adding yet another place to
> keep stats.

For this, the analytic tools/users whoever wants to know the recovery
stats, will have to make connections to the database which might eat
up the total connections. Instead, the postgres emitting the stats via
a hook will be more promising. Thoughts?

A connection is the defacto interface for accessing Postgres statistics, no?

The monitoring tools have already an interface to gather database metrics via

SQL.

You are proposing to create a new way that all tools need to implement. It

seems more effort than it's worth. +1 for storing this data into the stats

system.

Euler Taveira

EDB https://www.enterprisedb.com/