Re: pg_walinspect - a new extension to get raw WAL data and WAL stats - Mailing list pgsql-hackers
From | Bharath Rupireddy |
---|---|
Subject | Re: pg_walinspect - a new extension to get raw WAL data and WAL stats |
Date | |
Msg-id | CALj2ACUtqWX95uAj2VNJED0PnixEeQ=0MEzpouLi+zd_iTugRA@mail.gmail.com Whole thread Raw |
In response to | Re: pg_walinspect - a new extension to get raw WAL data and WAL stats (Ashutosh Sharma <ashu.coek88@gmail.com>) |
Responses |
Re: pg_walinspect - a new extension to get raw WAL data and WAL stats
Re: pg_walinspect - a new extension to get raw WAL data and WAL stats |
List | pgsql-hackers |
On Wed, Feb 16, 2022 at 9:04 AM Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > I don't think that's the use case of this patch. Unless there is some > other valid reason, I would suggest you remove it. Removed the function pg_verify_raw_wal_record. Robert and Greg also voted for removal upthread. > > > Should we add a function that returns the pointer to the first and > > > probably the last WAL record in the WAL segment? This would help users > > > to inspect the wal records in the entire wal segment if they wish to > > > do so. > > > > Good point. One can do this already with pg_get_wal_records_info and > > pg_walfile_name_offset. Usually, the LSN format itself can give an > > idea about the WAL file it is in. > > > > postgres=# select lsn, pg_walfile_name_offset(lsn) from > > pg_get_wal_records_info('0/5000000', '0/5FFFFFF') order by lsn asc > > limit 1; > > lsn | pg_walfile_name_offset > > -----------+------------------------------- > > 0/5000038 | (000000010000000000000005,56) > > (1 row) > > > > postgres=# select lsn, pg_walfile_name_offset(lsn) from > > pg_get_wal_records_info('0/5000000', '0/5FFFFFF') order by lsn desc > > limit 1; > > lsn | pg_walfile_name_offset > > -----------+------------------------------------- > > 0/5FFFFC0 | (000000010000000000000005,16777152) > > (1 row) > > > > The workaround you are suggesting is not very user friendly and FYKI > pg_wal_records_info simply hangs at times when we specify the higher > and lower limit of lsn in a wal file. > > To make things easier for the end users I would suggest we add a > function that can return a valid first and last lsn in a walfile. The > output of this function can be used to inspect the wal records in the > entire wal file if they wish to do so and I am sure they will. So, it > should be something like this: > > select first_valid_lsn, last_valid_lsn from > pg_get_first_last_valid_wal_record('wal-segment-name'); > > And above function can directly be used with pg_get_wal_records_info() like > > select pg_get_wal_records_info(pg_get_first_last_valid_wal_record('wal-segment')); > > I think this is a pretty basic ASK that we expect to be present in the > module like this. Added a new function that returns the first and last valid WAL record LSN of a given WAL file. > > > +PG_FUNCTION_INFO_V1(pg_get_raw_wal_record); > > > +PG_FUNCTION_INFO_V1(pg_get_first_valid_wal_record_lsn); > > > +PG_FUNCTION_INFO_V1(pg_verify_raw_wal_record); > > > +PG_FUNCTION_INFO_V1(pg_get_wal_record_info); > > > +PG_FUNCTION_INFO_V1(pg_get_wal_records_info); > > > > > > I think we should allow all these functions to be executed in wait and > > > *nowait* mode. If a user specifies nowait mode, the function should > > > return if no WAL data is present, rather than waiting for new WAL data > > > to become available, default behaviour could be anything you like. > > > > Currently, pg_walinspect uses read_local_xlog_page which waits in the > > while(1) loop if a future LSN is specified. As read_local_xlog_page is > > an implementation of XLogPageReadCB, which doesn't have a wait/nowait > > parameter, if we really need a wait/nowait mode behaviour, we need to > > do extra things(either add a backend-level global wait variable, set > > before XLogReadRecord, if set, read_local_xlog_page can just exit > > without waiting and reset after the XLogReadRecord or add an extra > > bool wait variable to XLogReaderState and use it in > > read_local_xlog_page). > > > > I am not asking to do any changes in the backend code. Please check - > how pg_waldump does this when a user requests to stop once the endptr > has reached. If not for all functions at least for a few functions we > can do this if it is doable. I've added a new function read_local_xlog_page_2 (similar to read_local_xlog_page but works in wait and no wait mode) and the callers can specify whether to wait or not wait using private_data. Actually, I wanted to use the private_data structure of read_local_xlog_page but the logical decoding already has context as private_data, that is why I had to have a new function. I know it creates a bit of duplicate code, but its cleaner than using backend-local variables or additional flags in XLogReaderState or adding wait/no-wait boolean to page_read callback. Any other suggestions are welcome here. With this, I'm able to have wait/no wait versions for any functions. But for now, I'm having wait/no wait for two functions (pg_get_wal_records_info and pg_get_wal_stats) for which it makes more sense. > > > +Datum > > > +pg_get_wal_records_info(PG_FUNCTION_ARGS) > > > +{ > > > +#define PG_GET_WAL_RECORDS_INFO_COLS 10 > > > > > > > > > We could probably have another variant of this function that would > > > work even if the end pointer is not specified, in which case the > > > default end pointer would be the last WAL record in the WAL segment. > > > Currently it mandates the use of an end pointer which slightly reduces > > > flexibility. > > > > Last WAL record in the WAL segment may not be of much use(one can > > figure out the last valid WAL record in a wal file as mentioned > > above), but the WAL records info till the latest current flush LSN of > > the server would be a useful functionality. But that too, can be found > > using something like "select lsn, prev_lsn, resource_manager from > > pg_get_wal_records_info('0/8099568', pg_current_wal_lsn());" > > > > What if a user wants to inspect all the valid wal records from a > startptr (startlsn) and he doesn't know the endptr? Why should he/she > be mandated to get the endptr and supply it to this function? I don't > think we should force users to do that. I think this is again a very > basic ASK that can be done in this version itself. It is not at all > any advanced thing that we can think of doing in the future. Agreed. Added new functions that emits wal records info/stats till the end of the WAL at the moment. > > > + > > > +/* > > > + * Get the first valid raw WAL record lsn. > > > + */ > > > +Datum > > > +pg_get_first_valid_wal_record_lsn(PG_FUNCTION_ARGS) > > > > > > > > > I think this function should return a pointer to the nearest valid WAL > > > record which can be the previous WAL record to the LSN entered by the > > > user or the next WAL record. If a user unknowingly enters an lsn that > > > does not exist then in such cases we should probably return the lsn of > > > the previous WAL record instead of hanging or waiting for the new WAL > > > record to arrive. > > > > Is it useful? > > It is useful in the same way as returning the next valid wal pointer > is. Why should a user wait for the next valid wal pointer to be > available instead the function should identify the previous valid wal > record and return it and put an appropriate message to the user. > > If there's a strong reason, how about naming > > pg_get_next_valid_wal_record_lsn returning the next valid wal record > > LSN and pg_get_previous_valid_wal_record_lsn returning the previous > > valid wal record LSN ? If you think having two functions is too much, > > then, how about pg_get_first_valid_wal_record_lsn returning both the > > next valid wal record LSN and its previous wal record LSN? > > > > The latter one looks better. Modified. Attaching v5 patch set, please review it further. Regards, Bharath Rupireddy.
Attachment
pgsql-hackers by date: