Re: Improvements and additions to COPY progress reporting - Mailing list pgsql-hackers
From | Josef Šimánek |
---|---|
Subject | Re: Improvements and additions to COPY progress reporting |
Date | |
Msg-id | CAFp7Qwqa6cQoA29xvCzaTtyu7k=1deAVN62PFr_aKMfa-41E_A@mail.gmail.com Whole thread Raw |
In response to | Improvements and additions to COPY progress reporting (Matthias van de Meent <boekewurm+postgres@gmail.com>) |
Responses |
Re: Improvements and additions to COPY progress reporting
|
List | pgsql-hackers |
po 8. 2. 2021 v 19:35 odesílatel Matthias van de Meent <boekewurm+postgres@gmail.com> napsal: > > Hi, > > With [0] we got COPY progress reporting. Before the column names of > this newly added view are effectively set in stone with the release of > pg14, I propose the following set of relatively small patches. These > are v2, because it is a patchset that is based on a set of patches > that I previously posted in [0]. > > 0001 Adds a column to pg_stat_progress_copy which details the amount > of tuples that were excluded from insertion by the WHERE clause of the > COPY FROM command. > > 0002 alters pg_stat_progress_copy to use 'tuple'-terminology instead > of 'line'-terminology. 'Line' doesn't make sense in the binary copy > case, and only for the 'text' copy format there can be a guarantee > that the source / output file actually contains the reported amount of > lines, whereas the amount of data tuples (which is also what it's > called internally) is guaranteed to equal for all data types. > > There was some discussion about this in [0] where the author thought > 'line' is more consistent with the CSV documentation, and where I > argued that 'tuple' is both more consistent with the rest of the > progress reporting tables and more consistent with the actual counted > items: these are the tuples serialized / inserted (as noted in the CSV > docs; "Thus the files are not strictly one line per table row like > text-format files."). > > Patch 0003 adds backlinks to the progress reporting docs from the docs > of the commands that have progress reporting (re/index, cluster, > vacuum, etc.) such that progress reporting is better discoverable from > the relevant commands, and removes the datname column from the > progress_copy view (that column was never committed). This too should > be fairly trivial and uncontroversial. > > 0004 adds the 'command' column to the progress_copy view; which > distinguishes between COPY FROM and COPY TO. The two commands are (in > my opinion) significantly different enough to warrant this column; > similar to the difference between CREATE INDEX/REINDEX [CONCURRENTLY] > which also report that information. I believe that this change is > appropriate; as the semantics of the columns change depending on the > command being executed. > > Lastly, 0005 adds 'io_target' to the reported information, that is, > FILE, PROGRAM, STDIO or CALLBACK. Although this can relatively easily > be determined based on the commands in pg_stat_activity, it is > reasonably something that a user would want to query on, as the > origin/target of COPY has security and performance implications, > whereas other options (e.g. format) are less interesting for clients > that are not executing that specific COPY command. I took a little deeper look and I'm not sure if I understand FILE and STDIO. I have finally tried to finalize some initial regress testing of COPY command progress using triggers. I have attached the initial patch applicable to your changes. As you can see COPY FROM STDIN is reported as FILE. That's probably expected, but it is a little confusing for me since STDIN and STDIO sound similar. What is the purpose of STDIO? When is the COPY command reported with io_target of STDIO? > Of special interest in 0005 is that it reports the io_target for the > logical replications' initial tablesyncs' internal COPY. This would > otherwise be measured, but no knowledge about the type of copy (or its > origin) would be available on the worker's side. I'm not married to > this patch 0005, but I believe it could be useful, and therefore > included it in the patchset. > > > With regards, > > Matthias van de Meent. > > > [0] https://www.postgresql.org/message-id/flat/CAFp7Qwr6_FmRM6pCO0x_a0mymOfX_Gg%2BFEKet4XaTGSW%3DLitKQ%40mail.gmail.com
Attachment
pgsql-hackers by date: