Re: Flush some statistics within running transactions - Mailing list pgsql-hackers

From Sami Imseih
Subject Re: Flush some statistics within running transactions
Date
Msg-id CAA5RZ0t6j0VYuUpxZ8JLq-ERoUriZ0rK=+8PCUtjRirmSmCx7A@mail.gmail.com
Whole thread Raw
In response to Re: Flush some statistics within running transactions  (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
Responses Re: Flush some statistics within running transactions
List pgsql-hackers
> > > The 1 second flush interval is currently hardcoded but we could imagine increase
> > > it or make it configurable.
> >
> > Someone may want to turn this off as well. I think a GUC will be needed.
>
> I gave this more thoughts and I wonder if this should be configurable at all.
> I mean, we don't do it for PGSTAT_MIN_INTERVAL, PGSTAT_MAX_INTERVAL and
> PGSTAT_IDLE_INTERVAL. We could imagine make it configurable if it produces
> noticeable performance impact but that's not what I observed.

Is there a reason we need a new constant (PGSTAT_ANYTIME_FLUSH_INTERVAL)
for anytime flushes and can't rely on the existing PGSTAT_MIN_INTERVAL?

Also, How did you benchmark? I am less concerned about long running
transactions,
background processes and more about short/high concurrency transactions seeing
additional overhead due to additional flushing. Is that latter a concern?

> > > stats: numscans, tuples_returned, tuples_fetched, blocks_fetched,
> > > blocks_hit
> >
> > I’m concerned that fields being temporarily out of sync might impact monitoring
> > calculations, if the formula is dealing with fields that have
> > different flush strategies.
>
> That's a good point. Maybe we should document the fields flush strategy?

Yeah, we will need to document this.

> > That said, minor discrepancies are usually tolerable for monitoring
> > data analysis.
> >
> > For the numscans, should we not also update the scan timestamp?
>
> The problem is that we could not call GetCurrentTransactionStopTimestamp(), so
> we would need to call GetCurrentTimestamp() instead. I'm not sure that calling
> GetCurrentTimestamp() every second would be a real issue though, and if it is
> maybe we could increase this 1s value.

> That said I agree that having seq_scan being updated and not last_seq_scan is not
> that great.

with v3 ,  I checked by running seq scans in a long running transaction,
and I observed both for these values being updated at the same time. I think
this is OK.

# pgstat_relation_flush_anytime_cb
```
tabentry->numscans += lstats->counts.numscans;
if (lstats->counts.numscans)
{
TimestampTz t = GetCurrentTimestamp();

if (t > tabentry->lastscan)
tabentry->lastscan = t;
}
```
and

# pgstat_relation_flush_cb
```
if (lstats->counts.numscans)
{
TimestampTz t = GetCurrentTransactionStopTimestamp();

if (t > tabentry->lastscan)
tabentry->lastscan = t;
}
```

--
Sami Imseih
Amazon Web Services (AWS)



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: how to gate experimental features (SQL/PGQ)
Next
From: Tom Lane
Date:
Subject: Re: how to gate experimental features (SQL/PGQ)