Re: Getting sorted data from foreign server - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Getting sorted data from foreign server |
Date | |
Msg-id | CA+TgmoYbO2TZ3JQVdrsLUXjc1YWeZEGETw3Q1cvn4GnbY81Ymw@mail.gmail.com Whole thread Raw |
In response to | Re: Getting sorted data from foreign server (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>) |
Responses |
Re: Getting sorted data from foreign server
|
List | pgsql-hackers |
On Thu, Oct 15, 2015 at 6:28 AM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: > Attached is the patch which takes care of above comments. I spent some time on this patch today. But it's still not right. I've attached a new version which fixes a serious problem with your last version - having postgresGetForeignPaths do the costing of the sorted path itself instead of delegating that to estimate_path_cost_size is wrong. In your version, 10% increment gets applied to the network transmission costs as well as the cost of generating the tupes - but only when use_remote_estimate == false. I fixed this and did some cosmetic cleanup. But you'll notice if you try this some of postgres_fdw's regression tests fail. This is rather mysterious: *************** *** 697,715 **** Sort Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8 Sort Key: t1.c1 ! -> Nested Loop Semi Join Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8 ! Join Filter: (t1.c3 = t2.c3) -> Foreign Scan on public.ft1 t1 Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8 Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" < 20)) ! -> Materialize Output: t2.c3 ! -> Foreign Scan on public.ft2 t2 Output: t2.c3 ! Filter: (date(t2.c4) = '01-17-1970'::date) ! Remote SQL: SELECT c3, c4 FROM "S 1"."T 1" WHERE (("C 1" > 10)) ! (15 rows) EXECUTE st2(10, 20); c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8 --- 697,718 ---- Sort Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8 Sort Key: t1.c1 ! -> Hash Join Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8 ! Hash Cond: (t1.c3 = t2.c3) -> Foreign Scan on public.ft1 t1 Output: t1.c1, t1.c2, t1.c3, t1.c4, t1.c5, t1.c6, t1.c7, t1.c8 Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" WHERE (("C 1" < 20)) ! -> Hash Output: t2.c3 ! -> HashAggregate Output: t2.c3 ! Group Key: t2.c3 ! -> Foreign Scan on public.ft2 t2 ! Output: t2.c3 ! Filter: (date(t2.c4) = '01-17-1970'::date) ! Remote SQL: SELECT c3, c4 FROM "S 1"."T 1" WHERE (("C 1" > 10)) ! (18 rows) What I think is happening here is that the planner notices that instead of doing a parameterized nestloop, it could pull down the data already sorted from the remote side, cheaply unique-ify it by using the ordering provided by the remote side, and then do a standard hash join. That might well be a sensible approach, but the ORDER BY that would make it correct doesn't show up in the Remote SQL. I don't know why that's happening, but it's not good. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
pgsql-hackers by date: