Re: WIP: Collecting statistics on CSV file data - Mailing list pgsql-hackers
From | Shigeru Hanada |
---|---|
Subject | Re: WIP: Collecting statistics on CSV file data |
Date | |
Msg-id | 4F351626.4030500@gmail.com Whole thread Raw |
In response to | Re: WIP: Collecting statistics on CSV file data (Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>) |
Responses |
Re: WIP: Collecting statistics on CSV file data
|
List | pgsql-hackers |
(2011/12/15 11:30), Etsuro Fujita wrote: > (2011/12/14 15:34), Shigeru Hanada wrote: >> I think this patch could be marked as "Ready for committer" with some >> minor fixes. Please find attached a revised patch (v6.1). I've tried to make pgsql_fdw work with this feature, and found that few static functions to be needed to exported to implement ANALYZE handler in short-cut style. The "Short-cut style" means the way to generate statistics (pg_class and pg_statistic) for foreign tables without retrieving sample data from foreign server. Attached patch (export_funcs.patch) exports examine_attribute and update_attstats which are necessary to implement ANALYZE handler for pgsql_fdw. In addition to exporting, update_attstats is also renamed to vac_update_attstats to fit with already exported function vac_update_relstats. I also attached archive of WIP pgsql_fdw with ANALYZE support. This version has better estimation than original pgsql_fdw, because it can use selectivity of qualifiers evaluated on local side to estimate number of result rows. To show the effect of ANALYZE clearly, WHERE push-down feature is disabled. Please see pgsqlAnalyzeForeignTable and store_remote_stats in pgsql_fdw.c. I used pgbench_accounts tables with 30000 records, and got reasonable rows estimation for queries below. <on remote side> postgres=# UPDATE pgbench_accounts SET filler = NULL postgres-# WHERE aid % 3 = 0; postgres=# ANALYZE; <on local side> postgres=# ANALYZE pgbench_accounts; -- needs explicit table name postgres=# EXPLAIN SELECT * FROM pgbench_accounts WHERE filler IS NULL; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------- Foreign Scan on pgbench_accounts (cost=100.00..40610.00 rows=100030 width=97) Filter: (filler IS NULL) Remote SQL: DECLARE pgsql_fdw_cursor_13 SCROLL CURSOR FOR SELECT aid, bid, abalance, filler FROM public.pgbench_accounts (3 rows) postgres=# EXPLAIN SELECT * FROM pgbench_accounts WHERE aid < 100; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------- Foreign Scan on pgbench_accounts (cost=100.00..40610.00 rows=96 width=97) Filter: (aid < 100) Remote SQL: DECLARE pgsql_fdw_cursor_14 SCROLL CURSOR FOR SELECT aid, bid, abalance, filler FROM public.pgbench_accounts (3 rows) postgres=# EXPLAIN SELECT * FROM pgbench_accounts WHERE aid < 1000; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------- Foreign Scan on pgbench_accounts (cost=100.00..40610.00 rows=1004 width=97) Filter: (aid < 1000) Remote SQL: DECLARE pgsql_fdw_cursor_15 SCROLL CURSOR FOR SELECT aid, bid, abalance, filler FROM public.pgbench_accounts (3 rows) In implementing ANALYZE handler, hardest part was copying anyarray values from remote to local. If we can make it common in core, it would help FDW authors who want to implement ANALYZE handler without retrieving sample rows from remote server. Regards, -- Shigeru Hanada
Attachment
pgsql-hackers by date: