Re: [PATCH] Add sampling statistics to autoanalyze log output - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: [PATCH] Add sampling statistics to autoanalyze log output
Date
Msg-id CAHGQGwECf_qdVSqcXWC5jg=ON0vJGRYV7pRz2KmJFjtU12PQHQ@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Add sampling statistics to autoanalyze log output  (Tatsuya Kawata <kawatatatsuya0913@gmail.com>)
List pgsql-hackers
On Sun, Jan 11, 2026 at 3:25 AM Tatsuya Kawata
<kawatatatsuya0913@gmail.com> wrote:
> Thank you for the valuable feedback!
> I addressed these comments and  updated the patch.

Thanks for updating the patch!


> Thank you for pointing this out. I agree that the displayed values were inappropriate. I have removed the foreign
tablesampling estimation logic from this patch. To avoid misleading, when analyzing foreign tables, the log now omits
thepage and row details: 
>
> ```
> scanned 0 of 0 pages, containing 0 live rows and 0 dead rows
> ```

With this change, we would end up reporting nearly the same sampling
information twice, for example:

    INFO:  analyzing "public.ft"
    INFO:  "ft": table contains 10 rows, 10 rows in sample
    INFO:  finished analyzing table "postgres.public.ft"
    sampling: 10 rows in sample, 10 estimated total rows

Wouldn't it be less confusing to avoid reporting the second sampling line?


> > The other nit comment is, as you add 4 parameters to acquire_sample_rows(), they are all output parameters, I think
youneed to update the header comment to describe them. 
>
> I agree with consolidating the sampling statistics into a single struct, and I have implemented this change. I
consideredwhether to include the existing totalrows and totaldeadrows in the new struct, but decided against it for
now.Since totalrows in particular is used extensively throughout the codebase, including it would expand the scope
beyondthe original goal of improving log output. This can be addressed in a future patch if needed. 

I'm not sure it's acceptable to change the FDW API and require
FDW authors to update their extensions, especially since
the benefit on the FDW side seems limited at this point.

OTOH, *if* we decide to change the API, we should clearly define
how the SamplingStats returned by AcquireSampleRowsFunc is
processed in core, update fdwhandler.sgml, and consider
whether this API change is actually useful for FDWs.


> I ran pgindent, but the results differed from the existing coding style in the same files:
>  - Three tabs were inserted before the struct name at the end of typedef (existing style uses one space)
>  - A space was inserted between * and the variable name (existing style has no space)
> So, I have reverted the pgindent changes. If there is a specific configuration needed, please let me know.

You may need to update src/tools/pgindent/typedefs.list.


> In this patch, I have added "inheritance tree" to the autoanalyze log message as well, and the inheritance tree
statisticsnow show the aggregated totals from all child tables: 

Based on my testing, the aggregated values look incorrect.
In the example below, both t_0 and t_1 report 5,000,000 live rows,
but the aggregated result is 6,779,052. Is that the expected
aggregation behavior?

    INFO:  analyzing "public.t" inheritance tree
    sampling: scanned 30000 of 44253 pages, containing 6779052 live
rows and 0 dead rows; 30000 rows in sample, 9999780 estimated total
rows
    ...
    INFO:  analyzing "public.t_0"
    sampling: scanned 22126 of 22126 pages, containing 5000000 live
rows and 0 dead rows; 30000 rows in sample, 5000000 estimated total
rows
    ...
    INFO:  analyzing "public.t_1"
    sampling: scanned 22127 of 22127 pages, containing 5000000 live
rows and 0 dead rows; 30000 rows in sample, 5000000 estimated total
rows

    =# \d+ t
                                         Partitioned table "public.t"
    Column |  Type   | Collation | Nullable | Default | Storage |
Compression | Stats target | Description
    --------+---------+-----------+----------+---------+---------+-------------+--------------+-------------
    i      | integer |           |          |         | plain   |
       |              |
    j      | integer |           |          |         | plain   |
       |              |
    Partition key: RANGE (i)
    Partitions: t_0 FOR VALUES FROM (1) TO (5000001),
               t_1 FOR VALUES FROM (5000001) TO (10000001)

    =# select count(*) from t_0;
     count
    ---------
    5000000
    (1 row)

    =# select count(*) from t_1;
     count
    ---------
    5000000
    (1 row)

Regards,

--
Fujii Masao



pgsql-hackers by date:

Previous
From: Andrey Borodin
Date:
Subject: Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
Next
From: Álvaro Herrera
Date:
Subject: Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY