Re: progress report for ANALYZE - Mailing list pgsql-hackers
From | Tatsuro Yamada |
---|---|
Subject | Re: progress report for ANALYZE |
Date | |
Msg-id | a0f1615b-7b5d-6759-62be-453ef0cb3686@nttcom.co.jp_1 Whole thread Raw |
In response to | Re: progress report for ANALYZE (Amit Langote <amitlangote09@gmail.com>) |
Responses |
Re: progress report for ANALYZE
|
List | pgsql-hackers |
Hi Amit-san, > >>> 1) >>>>> For now, I'm not sure it should be set current_child_table_relid to zero >>>>> when the current phase is changed from "acquiring inherited sample rows" to >>>>> "computing stats". See <Test result> bellow. >>>> >>>> In the upthread discussion [1], Robert asked to *not* do such things, >>>> that is, resetting some values due to phase change. I'm not sure his >>>> point applies to this case too though. >> >> //Below "computing stats" is for the partitioned table hoge, >> //therefore the second column from the left side would be >> //better to set Zero to easy to understand. > > On second thought, maybe we should not reset, because that might be > considered loss of information. To avoid confusion, we should simply > document that current_child_table_relid is only valid during the > "acquiring inherited sample rows" phase. Okay, agreed. :) >>>>> 2) >>>>> There are many "finalizing analyze" phases based on relids in the case >>>>> of partitioning tables. Would it better to fix the document? or it >>>>> would be better to reduce it to one? >>>>> >> It would be better to modify the document of "finalizing analyze" phase. >> >> # Before modify >> The command is updating pg_class. When this phase is completed, >> <command>ANALYZE</command> will end. >> >> # Modified >> The command is updating pg_class. When this phase is completed, >> <command>ANALYZE</command> will end. In the case of partitioned table, >> it might be shown on each partitions. >> >> What do you think that? I'm going to fix it, if you agreed. :) > > *All* phases are repeated in this case, not not just "finalizing > analyze", because ANALYZE repeatedly runs for each partition after the > parent partitioned table's ANALYZE finishes. ANALYZE's documentation > mentions that analyzing a partitioned table also analyzes all of its > partitions, so users should expect to see the progress information for > each partition. So, I don't think we need to clarify that if only in > one phase's description. Maybe we can add a note after the phase > description table which mentions this implementation detail about > partitioned tables. Like this: > > <note> > <para> > Note that when <command>ANALYZE</command> is run on a partitioned table, > all of its partitions are also recursively analyzed as also mentioned on > <xref linkend="sql-analyze"/>. In that case, <command>ANALYZE</command> > progress is reported first for the parent table, whereby its inheritance > statistics are collected, followed by that for each partition. > </para> > </note> Ah.. you are right: All phases are repeated, it shouldn't be fixed the only one phase's description. > Some more comments on the documentation: > > + Number of computed extended stats. This counter only advances > when the phase > + is <literal>computing extended stats</literal>. > > Number of computed extended stats -> Number of extended stats computed Will fix. > + Number of analyzed child tables. This counter only advances > when the phase > + is <literal>computing extended stats</literal>. > > Regarding, "Number of analyzed child table", note that we don't > "analyze" child tables in this phase, only scan its blocks to collect > samples for parent's ANALYZE. Also, the 2nd sentence is wrong -- you > meant "when the phase is <literal>acquiring inherited sample > rows</literal>. I suggest to write this as follows: > > Number of child tables scanned. This counter only advances when the phase > is <literal>acquiring inherited sample rows</literal>. Oops, I will fix it. :) > + <entry>OID of the child table currently being scanned. > + It might be different from relid when analyzing tables that > have child tables. > > I suggest: > > OID of the child table currently being scanned. This field is only valid when > the phase is <literal>computing extended stats</literal>. Will fix. > + The command is currently scanning the > <structfield>current_relid</structfield> > + to obtain samples. > > I suggest: > > The command is currently scanning the the table given by > <structfield>current_relid</structfield> to obtain sample rows. Will fix. > + The command is currently scanning the > <structfield>current_child_table_relid</structfield> > + to obtain samples. > > I suggest (based on phase description pg_stat_progress_create_index > phase descriptions): > > The command is currently scanning child tables to obtain sample rows. Columns > <structfield>child_tables_total</structfield>, > <structfield>child_tables_done</structfield>, and > <structfield>current_child_table_relid</structfield> contain the progress > information for this phase. Will fix. > + <row> > + <entry><literal>computing stats</literal></entry> > > I think the phase name should really be "computing statistics", that > is, use the full word. Will fix. > + <entry> > + The command is computing stats from the samples obtained > during the table scan. > + </entry> > + </row> > > So I suggest: > > The command is computing statistics from the sample rows obtained during > the table scan Will fix. > + <entry><literal>computing extended stats</literal></entry> > + <entry> > + The command is computing extended stats from the samples > obtained in the previous phase. > + </entry> > > I suggest: > > The command is computing extended statistics from the sample rows obtained > during the table scan. Will fix. Thanks for your above useful suggestions. It helps me a lot. :) Cheers! Tatsuro Yamada
pgsql-hackers by date: