Re: Import Statistics in postgres_fdw before resorting to sampling. - Mailing list pgsql-hackers
| From | Etsuro Fujita |
|---|---|
| Subject | Re: Import Statistics in postgres_fdw before resorting to sampling. |
| Date | |
| Msg-id | CAPmGK14EkE-LaGqpu1aMfc_bOZ54zubk2Wbx1TEm6TFYiOu26Q@mail.gmail.com Whole thread Raw |
| In response to | Re: Import Statistics in postgres_fdw before resorting to sampling. (Corey Huinker <corey.huinker@gmail.com>) |
| List | pgsql-hackers |
On Sat, Nov 22, 2025 at 6:31 AM Corey Huinker <corey.huinker@gmail.com> wrote: >> Other initial comments: >> >> The commit message says: >> >> This is managed via two new options, fetch_stats and remote_analyze, >> both are available at the server level and table level. If fetch_stats >> is true, then the ANALYZE command will first attempt to fetch statistics >> from the remote table and import those statistics locally. >> >> If remote_analyze is true, and if the first attempt to fetch remote >> statistics found no attribute statistics, then an attempt will be made >> to ANALYZE the remote table before a second and final attempt to fetch >> remote statistics. >> >> If no statistics are found, then ANALYZE will fall back to the normal >> behavior of sampling and local analysis. >> >> I think the first step assumes that the remote stats are up-to-date; >> if they aren't, it would cause a regression. (If the remote relation >> is a plain table, they are likely to be up-to-date, but for example, >> if it is a foreign table, it's possible that they are stale.) So how >> about making it the user's responsibility to make them up-to-date? If >> doing so, we wouldn't need to do the second and third steps anymore, >> making the patch simple. > Obviously there is no way to know the quality/freshness of remote stats if they are found. > > The analyze option was borne of feedback from other postgres hackers while brainstorming on what this option might looklike. I don't think we *need* this extra option for the feature to be a success, but it's relative simplicity did makeme want to put it out there to see who else liked it. Actually, I have some concerns about the ANALYZE and fall-back options. As for the former, if the remote user didn't have the MAINTAIN privilege on the remote table, remote ANALYZE would be just a waste effort. As for the latter, imagine the situation where a user ANALYZEs a foreign table whose remote table is significantly large. When the previous attempts fail, the user might want to re-try to import remote stats after ANALYZEing the remote table in the remote side in some way, rather than postgres_fdw automatically falling back to the normal lengthy processing. I think just throwing an error if the first attempt fails would make the system not only simple but reliable while providing some flexibility to users. >> On the other hand: >> >> This operation will only work on remote relations that can have stored >> statistics: tables, partitioned tables, and materialized views. If the >> remote relation is a view then remote fetching/analyzing is just wasted >> effort and the user is better of setting fetch_stats to false for that >> table. >> >> I'm not sure the waste effort is acceptable; IMO, if the remote table >> is a view, I think that the system should detect that in some way, and >> then just do the normal ANALYZE processing. > > > The stats fetch query is pretty light, but I can see fetching the relkind along with the relstats, and making decisionson whether to continue from there, only applying the relstats after attrstats have been successfully applied. Good idea! I would vote for throwing an error if the relkind is view, making the user set fetch_stats to false for the foreign table. Best regards, Etsuro Fujita
pgsql-hackers by date: