Re: [HACKERS] ANALYZE command progress checker - Mailing list pgsql-hackers
From | Ashutosh Sharma |
---|---|
Subject | Re: [HACKERS] ANALYZE command progress checker |
Date | |
Msg-id | CAE9k0PmGhP4+Sa219AzwOghhQFerZirysDpp2p6Hem24Rcg+ZQ@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] ANALYZE command progress checker (vinayak <Pokale_Vinayak_q3@lab.ntt.co.jp>) |
Responses |
Re: [HACKERS] ANALYZE command progress checker
Re: [HACKERS] ANALYZE command progress checker |
List | pgsql-hackers |
Hi Vinayak, Here are couple of review comments that may need your attention. 1. Firstly, I am seeing some trailing whitespace errors when trying to apply your v3 patch using git apply command. [ashu@localhost postgresql]$ git apply pg_stat_progress_analyze_v3.patch pg_stat_progress_analyze_v3.patch:155: trailing whitespace. CREATE VIEW pg_stat_progress_analyze AS pg_stat_progress_analyze_v3.patch:156: trailing whitespace. SELECT pg_stat_progress_analyze_v3.patch:157: trailing whitespace. S.pid AS pid, S.datid AS datid, D.datname AS datname, pg_stat_progress_analyze_v3.patch:158: trailing whitespace. S.relid AS relid, pg_stat_progress_analyze_v3.patch:159: trailing whitespace. CASE S.param1 WHEN 0 THEN 'initializing' error: patch failed: doc/src/sgml/monitoring.sgml:521 2) The test-case 'rules' is failing. I think it's failing because in rules.sql 'ORDERBY viewname' is used which will put 'pg_stat_progress_analyze' ahead of 'pg_stat_progress_vacuum' in the list of catalog views. You may need to correct rules.out file accordingly. This is what i could see in rules.sql, SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schema' ORDER BY viewname; I am still reviewing your patch and doing some testing. Will update if i find any issues. Thanks. -- With Regards, Ashutosh Sharma EnterpriseDB:http://www.enterprisedb.com On Fri, Mar 17, 2017 at 3:16 PM, vinayak <Pokale_Vinayak_q3@lab.ntt.co.jp> wrote: > > On 2017/03/17 10:38, Robert Haas wrote: >> >> On Fri, Mar 10, 2017 at 2:46 AM, vinayak >> <Pokale_Vinayak_q3@lab.ntt.co.jp> wrote: >>> >>> Thank you for reviewing the patch. >>> >>> The attached patch incorporated Michael and Amit comments also. >> >> I reviewed this tonight. > > Thank you for reviewing the patch. >> >> + /* Report compute index stats phase */ >> + pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE, >> + >> PROGRESS_ANALYZE_PHASE_COMPUTE_INDEX_STATS); >> >> Hmm, is there really any point to this? And is it even accurate? It >> doesn't look to me like we are computing any index statistics here; >> we're only allocating a few in-memory data structures here, I think, >> which is not a statistics computation and probably doesn't take any >> significant time. >> >> + /* Report compute heap stats phase */ >> + pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE, >> + >> PROGRESS_ANALYZE_PHASE_COMPUTE_HEAP_STATS); >> >> The phase you label as computing heap statistics also includes the >> computation of index statistics. I wouldn't bother separating the >> computation of heap and index statistics; I'd just have a "compute >> statistics" phase that begins right after the comment that starts >> "Compute the statistics." > > Understood. Fixed in the attached patch. >> >> >> + /* Report that we are now doing index cleanup */ >> + pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE, >> + PROGRESS_ANALYZE_PHASE_INDEX_CLEANUP); >> >> OK, this doesn't make any sense either. We are not cleaning up the >> indexes here. We are just closing them and releasing resources. I >> don't see why you need this phase at all; it can't last more than some >> tiny fraction of a second. It seems like you're trying to copy the >> exact same phases that exist for vacuum instead of thinking about what >> makes sense for ANALYZE. > > Understood. I have removed this phase. >> >> >> + /* Report number of heap blocks scaned so far*/ >> + pgstat_progress_update_param(PROGRESS_ANALYZE_HEAP_BLKS_SCANNED, >> targblock); >> >> While I don't think this is necessarily a bad thing to report, I find >> it very surprising that you're not reporting what seems to me like the >> most useful thing here - namely, the number of blocks that will be >> sampled in total and the number of those that you've selected. >> Instead, you're just reporting the length of the relation and the >> last-sampled block, which is a less-accurate guide to total progress. >> There are enough counters to report both things, so maybe we should. >> >> + /* Report total number of sample rows*/ >> + pgstat_progress_update_param(PROGRESS_ANALYZE_TOTAL_SAMPLE_ROWS, >> numrows); >> >> As an alternative to the previous paragraph, yet another thing you >> could report is number of rows sampled out of total rows to be >> sampled. But this isn't the way to do it: you are reporting the >> number of rows you sampled only once you've finished sampling. The >> point of progress reporting is to report progress -- that is, to >> report this value as it's updated, not just to report it when ANALYZE >> is practically done and the value has reached its maximum. > > Understood. > I have reported number of rows sampled out of total rows to be sampled. > I have not reported the number of blocks that will be sampled in total. > So currect pg_stat_progress_analyze view looks like: > > =# \d+ pg_stat_progress_analyze > View "pg_catalog.pg_stat_progress_analyze" > Column | Type | Collation | Nullable | Default | Storage > | Descripti > on > ------------------------+---------+-----------+----------+---------+----------+---------- > --- > pid | integer | | | | plain > | > datid | oid | | | | plain > | > datname | name | | | | plain > | > relid | oid | | | | plain > | > phase | text | | | | > extended | > num_target_sample_rows | bigint | | | | plain > | > num_rows_sampled | bigint | | | | plain > | > View definition: > SELECT s.pid, > s.datid, > d.datname, > s.relid, > CASE s.param1 > WHEN 0 THEN 'initializing'::text > WHEN 1 THEN 'collecting sample rows'::text > WHEN 2 THEN 'computing statistics'::text > ELSE NULL::text > END AS phase, > s.param2 AS num_target_sample_rows, > s.param3 AS num_rows_sampled > FROM pg_stat_get_progress_info('ANALYZE'::text) s(pid, datid, relid, > param1, param2, p > aram3, param4, param5, param6, param7, param8, param9, param10) > LEFT JOIN pg_database d ON s.datid = d.oid; > > Comment? >> >> The documentation for this patch isn't very good, either. You forgot >> to update the part of the documentation that says that VACUUM is the >> only command that currently supports progress reporting, and the >> descriptions of the phases and progress counters are brief and not >> entirely accurate - e.g. heap_blks_scanned is not actually the number >> of heap blocks scanned, because we are sampling, not reading all the >> blocks. > > Understood. I have updated the documentation. > I will also try to improve documentation. >> >> When adding calls to the progress reporting functions, please consider >> whether a blank line should be added before or after the new code, or >> both, or neither. I'd say some blank lines are needed in a few places >> where you didn't add them. > > +1. I have added blank lines in a few places. > > Regards, > Vinayak Pokale > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
pgsql-hackers by date: