Re: Getting sorted data from foreign server for merge join - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: Getting sorted data from foreign server for merge join |
Date | |
Msg-id | CAFjFpRc_AO4z+V3SfsMp2gmEMaTUSXh0EjEWktTSqcBtSC5f6w@mail.gmail.com Whole thread Raw |
In response to | Re: Getting sorted data from foreign server for merge join (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Getting sorted data from foreign server for merge join
|
List | pgsql-hackers |
On Wed, Dec 9, 2015 at 12:14 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Dec 2, 2015 at 6:45 AM, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
> Thanks Ashutosh.
>
> Re-reviewed and Re-verified the patch, pg_sort_all_pd_v5.patch
> looks good to me.
This patch needs a rebase.
Done.
It's not going to work to say this is a patch proposed for commit when
it's still got a TODO comment in it that obviously needs to be
changed. And the formatting of that long comment is pretty weird,
too, and not consistent with other functions in that same file (e.g.
get_remote_estimate, ec_member_matches_foreign, create_cursor).
The TODO was present in v4 but not in v5 and is not present in v6 attached here.. Formatted comment according estimate_path_cost_size(), convert_prep_stmt_params().
Aside from that, I think before we commit this, somebody should do
some testing that demonstrates that this is actually a good idea. Not
as part of the test case set for this patch, but just in general.
Merge joins are typically going to be relevant for large tables, but
the examples in the regression tests are necessarily tiny. I'd like
to see some sample data and some sample queries that get appreciably
faster with this code. If we can't find any, we don't need the code.
I tested the patch on my laptop with two types of queries, a join between two foreign tables on different foreign servers (pointing to the same self server) and a join between one foreign and one local table. The foreign tables and servers are created using sort_pd_setup.sql attached. Foreign tables pointed to table with index useful for join clause. Both the joining tables had 10M rows. The execution time of query was measured for 100 runs and average and standard deviation were calculated (using function query_execution_stats() in script sort_pd.sql) and are presented below.
1. Query between foreign tables
SELECT ft1.val, ft2.val FROM ft1 join ft2 on (ft1.val = ft2.val)
SELECT ft1.val, ft2.val FROM ft1 join ft2 on (ft1.val = ft2.val)
Plan and timings without patch
EXPLAIN (VERBOSE, ANALYSE) :query ;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=508510.02..1129945.94 rows=999995 width=8) (actual time=33803.826..82416.342 rows=10000000 loops=1)
Output: ft1.val, ft2.val
Hash Cond: (ft1.val = ft2.val)
-> Foreign Scan on public.ft1 (cost=100.00..344347.31 rows=9999977 width=4) (actual time=0.624..28531.803 rows=10000000 loops=1)
Output: ft1.val
Remote SQL: SELECT val FROM public.lt
-> Hash (cost=344347.31..344347.31 rows=9999977 width=4) (actual time=33258.025..33258.025 rows=10000000 loops=1)
Output: ft2.val
Buckets: 131072 Batches: 256 Memory Usage: 2400kB
-> Foreign Scan on public.ft2 (cost=100.00..344347.31 rows=9999977 width=4) (actual time=22.171..28134.970 rows=10000000 loops=1)
Output: ft2.val
Remote SQL: SELECT val FROM public.lt
Planning time: 33.155 ms
Execution time: 82914.607 ms
(14 rows)
avg_exe_time | std_dev_exe_time | min_exe_time | max_exe_time
--------------+------------------+--------------+--------------
78750.95487 | 2911.51825687913 | 74314.886 | 89358.464
EXPLAIN (VERBOSE, ANALYSE) :query ;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=508510.02..1129945.94 rows=999995 width=8) (actual time=33803.826..82416.342 rows=10000000 loops=1)
Output: ft1.val, ft2.val
Hash Cond: (ft1.val = ft2.val)
-> Foreign Scan on public.ft1 (cost=100.00..344347.31 rows=9999977 width=4) (actual time=0.624..28531.803 rows=10000000 loops=1)
Output: ft1.val
Remote SQL: SELECT val FROM public.lt
-> Hash (cost=344347.31..344347.31 rows=9999977 width=4) (actual time=33258.025..33258.025 rows=10000000 loops=1)
Output: ft2.val
Buckets: 131072 Batches: 256 Memory Usage: 2400kB
-> Foreign Scan on public.ft2 (cost=100.00..344347.31 rows=9999977 width=4) (actual time=22.171..28134.970 rows=10000000 loops=1)
Output: ft2.val
Remote SQL: SELECT val FROM public.lt
Planning time: 33.155 ms
Execution time: 82914.607 ms
(14 rows)
avg_exe_time | std_dev_exe_time | min_exe_time | max_exe_time
--------------+------------------+--------------+--------------
78750.95487 | 2911.51825687913 | 74314.886 | 89358.464
Plan and timing with patch
EXPLAIN (VERBOSE, ANALYSE) :query ;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=200.86..1183070.86 rows=10000000 width=8) (actual time=1.776..73140.219 rows=10000000 loops=1)
Output: ft1.val, ft2.val
Merge Cond: (ft1.val = ft2.val)
-> Foreign Scan on public.ft1 (cost=100.43..504035.43 rows=10000000 width=4) (actual time=0.937..30422.457 rows=10000000 loops=1)
Output: ft1.val, ft1.val2
Remote SQL: SELECT val FROM public.lt ORDER BY val ASC
-> Materialize (cost=100.43..529035.43 rows=10000000 width=4) (actual time=0.826..33448.822 rows=10000000 loops=1)
Output: ft2.val, ft2.val2
-> Foreign Scan on public.ft2 (cost=100.43..504035.43 rows=10000000 width=4) (actual time=0.818..31035.362 rows=10000000 loops=1)
Output: ft2.val, ft2.val2
Remote SQL: SELECT val FROM public.lt ORDER BY val ASC
Planning time: 163.161 ms
Execution time: 73654.106 ms
(13 rows)
avg_exe_time | std_dev_exe_time | min_exe_time | max_exe_time
--------------+------------------+--------------+--------------
71881.15916 | 819.091605498189 | 70197.312 | 74653.314
EXPLAIN (VERBOSE, ANALYSE) :query ;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=200.86..1183070.86 rows=10000000 width=8) (actual time=1.776..73140.219 rows=10000000 loops=1)
Output: ft1.val, ft2.val
Merge Cond: (ft1.val = ft2.val)
-> Foreign Scan on public.ft1 (cost=100.43..504035.43 rows=10000000 width=4) (actual time=0.937..30422.457 rows=10000000 loops=1)
Output: ft1.val, ft1.val2
Remote SQL: SELECT val FROM public.lt ORDER BY val ASC
-> Materialize (cost=100.43..529035.43 rows=10000000 width=4) (actual time=0.826..33448.822 rows=10000000 loops=1)
Output: ft2.val, ft2.val2
-> Foreign Scan on public.ft2 (cost=100.43..504035.43 rows=10000000 width=4) (actual time=0.818..31035.362 rows=10000000 loops=1)
Output: ft2.val, ft2.val2
Remote SQL: SELECT val FROM public.lt ORDER BY val ASC
Planning time: 163.161 ms
Execution time: 73654.106 ms
(13 rows)
avg_exe_time | std_dev_exe_time | min_exe_time | max_exe_time
--------------+------------------+--------------+--------------
71881.15916 | 819.091605498189 | 70197.312 | 74653.314
It can be observed that the with the patch, merge join strategy is used instead of hash join and the execution time reduces by approx 9%. A desired effect is that the deviation in the execution time has reduced heavily (almost by 75%).
2. Join between local and foreign table
Without patch the plan and timings are
EXPLAIN (VERBOSE, ANALYSE) :query ;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=308410.66..1019846.69 rows=9999970 width=8) (actual time=7674.681..47767.136 rows=10000000 loops=1)
Output: lt.val, ft1.val
Hash Cond: (ft1.val = lt.val)
-> Foreign Scan on public.ft1 (cost=100.00..344347.55 rows=9999985 width=4) (actual time=0.506..26679.980 rows=10000000 loops=1)
Output: ft1.val
Remote SQL: SELECT val FROM public.lt
-> Hash (cost=144247.85..144247.85 rows=9999985 width=4) (actual time=7667.598..7667.598 rows=10000000 loops=1)
Output: lt.val
Buckets: 131072 Batches: 256 Memory Usage: 2400kB
-> Seq Scan on public.lt (cost=0.00..144247.85 rows=9999985 width=4) (actual time=0.018..2959.111 rows=10000000 loops=1)
Output: lt.val
Planning time: 8.668 ms
Execution time: 48209.365 ms
(13 rows)
SELECT avg_exe_time, std_dev_exe_time, min_exe_time, max_exe_time
FROM query_execution_stats(:'query', :num_samples);
avg_exe_time | std_dev_exe_time | min_exe_time | max_exe_time
--------------+------------------+--------------+--------------
47246.46956 | 2579.42041949119 | 43603.411 | 56096.759
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=308410.66..1019846.69 rows=9999970 width=8) (actual time=7674.681..47767.136 rows=10000000 loops=1)
Output: lt.val, ft1.val
Hash Cond: (ft1.val = lt.val)
-> Foreign Scan on public.ft1 (cost=100.00..344347.55 rows=9999985 width=4) (actual time=0.506..26679.980 rows=10000000 loops=1)
Output: ft1.val
Remote SQL: SELECT val FROM public.lt
-> Hash (cost=144247.85..144247.85 rows=9999985 width=4) (actual time=7667.598..7667.598 rows=10000000 loops=1)
Output: lt.val
Buckets: 131072 Batches: 256 Memory Usage: 2400kB
-> Seq Scan on public.lt (cost=0.00..144247.85 rows=9999985 width=4) (actual time=0.018..2959.111 rows=10000000 loops=1)
Output: lt.val
Planning time: 8.668 ms
Execution time: 48209.365 ms
(13 rows)
SELECT avg_exe_time, std_dev_exe_time, min_exe_time, max_exe_time
FROM query_execution_stats(:'query', :num_samples);
avg_exe_time | std_dev_exe_time | min_exe_time | max_exe_time
--------------+------------------+--------------+--------------
47246.46956 | 2579.42041949119 | 43603.411 | 56096.759
With the patch the plan and timings are
EXPLAIN (VERBOSE, ANALYSE) :query ;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=155.01..957924.85 rows=9999970 width=8) (actual time=0.592..45125.356 rows=10000000 loops=1)
Output: lt.val, ft1.val
Merge Cond: (ft1.val = lt.val)
-> Foreign Scan on public.ft1 (cost=100.43..504038.91 rows=9999985 width=4) (actual time=0.551..30526.048 rows=10000000 loops=1)
Output: ft1.val, ft1.val2
Remote SQL: SELECT val FROM public.lt ORDER BY val ASC
-> Index Only Scan using i_lt_val on public.lt (cost=0.43..303939.21 rows=9999985 width=4) (actual time=0.032..6192.406 rows=10000000 loops=1)
Output: lt.val
Heap Fetches: 10000000
Planning time: 9.043 ms
Execution time: 45666.023 ms
(11 rows)
avg_exe_time | std_dev_exe_time | min_exe_time | max_exe_time
--------------+------------------+--------------+--------------
42803.36105 | 166.874491432755 | 42321.314 | 43316.902
EXPLAIN (VERBOSE, ANALYSE) :query ;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=155.01..957924.85 rows=9999970 width=8) (actual time=0.592..45125.356 rows=10000000 loops=1)
Output: lt.val, ft1.val
Merge Cond: (ft1.val = lt.val)
-> Foreign Scan on public.ft1 (cost=100.43..504038.91 rows=9999985 width=4) (actual time=0.551..30526.048 rows=10000000 loops=1)
Output: ft1.val, ft1.val2
Remote SQL: SELECT val FROM public.lt ORDER BY val ASC
-> Index Only Scan using i_lt_val on public.lt (cost=0.43..303939.21 rows=9999985 width=4) (actual time=0.032..6192.406 rows=10000000 loops=1)
Output: lt.val
Heap Fetches: 10000000
Planning time: 9.043 ms
Execution time: 45666.023 ms
(11 rows)
avg_exe_time | std_dev_exe_time | min_exe_time | max_exe_time
--------------+------------------+--------------+--------------
42803.36105 | 166.874491432755 | 42321.314 | 43316.902
Again observe that with the patch, merge join is used instead of hash join and timing reduces by approx 9%. Again the deviation in execution reduces heavily (almost by 75%). There is increase in planning time with the patch owing to firing EXPLAIN on the foreign server.
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Attachment
pgsql-hackers by date: