Thread: exceptional result of postres_fdw external table joining local table
exceptional result of postres_fdw external table joining local table
Hi,
I found a problem when using postres_fdw external table.
The PGDB version is 11.9。 I created a postgres_fdw external table to use a table from another DB, and I added use_remote_estimate true option to optimize remote sql。
gap_new=> \des+
List of foreign servers
Name | Owner | Foreign-data wrapper | Access privileges | Type | Version | FDW options | Descri
ption
----------------+-------+----------------------+-------------------+------+---------+------------------------------------------------------------------------------------+---
server28wx_fdw | gap | postgres_fdw | qu=U/qu | | | (host '192.168.1.28', port '5432', dbname 'db1', use_remote_estimate 'true') |
(1 row)
But when I run some full table scans and found a problems. the results of the following sql were different.
Method 1: directly join external table and local table
select count(distinct user_id),count(distinct member_code)
from a_fdw t1
join b as t6 on t1.user_id=t6.tags ->> '10'::varchar(50)
where coalesce(user_id,'') <> '';
-- result: 50739 50845
Method 2: create a local temp table from external table and then join temp table and local table.
drop table if exists temp_a;
select *
into temp temp_a
from a_fdw;
select count(distinct user_id),count(distinct member_code)
from a_fdw t1
join b as t6 on t1.user_id=t6.tags ->> '10'::varchar(50)
where coalesce(user_id,'') <> '';
--result 1641737 1645368
The two methods produces different results. And apparently, the result of Method 1 lost some data.
Is there some problem with use_remote_estimate option configuration ?
"Qu, Mischa, Majorel China" <mischa.qu@majorel.cn> writes: > But when I run some full table scans and found a problems. the results of the following sql were different. > Method 1: directly join external table and local table > select count(distinct user_id),count(distinct member_code) > from a_fdw t1 > join b as t6 on t1.user_id=t6.tags ->> '10'::varchar(50) > where coalesce(user_id,'') <> ''; > -- result: 50739 50845 What sort of plan is that query using? If it's a merge join, I'd bet that the problem is related to local and remote servers not having the same default collation. regards, tom lane
答复: exceptional result of postres_fdw external table joining local table
Yes, the default collation is different, The collation of remote DB is en_US.UTF-8, and local is C. Remote DB postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | 7981 kB | pg_default | default administrativeconnection database LOCAL DB postgres | postgres | UTF8 | C | C | | 8061 kB | pg_default | default administrativeconnection database and It's merge join。 Aggregate (cost=7058404551.15..7058404551.16 rows=1 width=16) (actual time=260773.462..260773.464 rows=1 loops=1) Output: count(DISTINCT t1.user_id), count(DISTINCT t6.member_code) -> Merge Join (cost=38935669.40..5266292528.36 rows=358422404557 width=131) (actual time=243519.160..260728.196 rows=65041loops=1) Output: t1.user_id, t6.member_code Merge Cond: ((t1.user_id)::text = ((t6.tags ->> '508'::text))) -> Foreign Scan on public.a_fdw t1 (cost=215387.41..493312.00 rows=1986961 width=28) (actual time=21277.241..27822.350rows=2392619 loops=1) Output: t1.id, t1.department_name, t1.leader_name, t1.leader_code, t1.store_code, t1.short_name, t1.staff_id,t1.staff_code, t1.staff_name, t1.staff_status, t1.ex ternal_id, t1.user_id, t1.external_name, t1.external_status, t1.bind_time, t1.unbind_time, t1.create_time, t1.update_time,t1.remark Filter: ((COALESCE(t1.user_id, ''::character varying))::text <> ''::text) Rows Removed by Filter: 3342 Remote SQL: SELECT user_id FROM public.a ORDER BY user_id ASC NULLS LAST -> Materialize (cost=38720281.99..38870604.70 rows=30064542 width=1101) (actual time=221529.199..231550.625 rows=10264430loops=1) Output: t6.member_code, t6.tags, ((t6.tags ->> '508'::text)) -> Sort (cost=38720281.99..38795443.34 rows=30064542 width=1101) (actual time=221529.195..230155.492 rows=10250260loops=1) Output: t6.member_code, t6.tags, ((t6.tags ->> '508'::text)) Sort Key: ((t6.tags ->> '508'::text)) Sort Method: external merge Disk: 33044192kB -> Seq Scan on public.b t6 (cost=0.00..6007802.42 rows=30064542 width=1101) (actual time=0.042..150962.505rows=30012748 loops=1) Output: t6.member_code, t6.tags, (t6.tags ->> '508'::text) Should I recreate foreign table using COLLATION en_US.UTF-8 -----邮件原件----- 发件人: Tom Lane <tgl@sss.pgh.pa.us> 发送时间: 2023年2月6日 23:34 收件人: Qu, Mischa, Majorel China <mischa.qu@majorel.cn> 抄送: pgsql-bugs@postgresql.org 主题: Re: exceptional result of postres_fdw external table joining local table "Qu, Mischa, Majorel China" <mischa.qu@majorel.cn> writes: > But when I run some full table scans and found a problems. the results of the following sql were different. > Method 1: directly join external table and local table select > count(distinct user_id),count(distinct member_code) from a_fdw t1 join > b as t6 on t1.user_id=t6.tags ->> '10'::varchar(50) where > coalesce(user_id,'') <> ''; > -- result: 50739 50845 What sort of plan is that query using? If it's a merge join, I'd bet that the problem is related to local and remote serversnot having the same default collation. regards, tom lane CAUTION: This e-mail and any attachments may contain confidential and/or privileged information. If you are not the intendedrecipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Anyunauthorized copying, disclosure or distribution of the material in this e-mail is forbidden. Please consider the environmentbefore printing this e-mail!.
答复: exceptional result of postres_fdw external table joining local table
Do only merge joins have this problem? -----邮件原件----- 发件人: Qu, Mischa, Majorel China 发送时间: 2023年2月7日 11:23 收件人: Tom Lane <tgl@sss.pgh.pa.us> 抄送: pgsql-bugs@postgresql.org 主题: 答复: exceptional result of postres_fdw external table joining local table Yes, the default collation is different, The collation of remote DB is en_US.UTF-8, and local is C. Remote DB postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | 7981 kB | pg_default | default administrativeconnection database LOCAL DB postgres | postgres | UTF8 | C | C | | 8061 kB | pg_default | default administrativeconnection database and It's merge join。 Aggregate (cost=7058404551.15..7058404551.16 rows=1 width=16) (actual time=260773.462..260773.464 rows=1 loops=1) Output: count(DISTINCT t1.user_id), count(DISTINCT t6.member_code) -> Merge Join (cost=38935669.40..5266292528.36 rows=358422404557 width=131) (actual time=243519.160..260728.196 rows=65041loops=1) Output: t1.user_id, t6.member_code Merge Cond: ((t1.user_id)::text = ((t6.tags ->> '508'::text))) -> Foreign Scan on public.a_fdw t1 (cost=215387.41..493312.00 rows=1986961 width=28) (actual time=21277.241..27822.350rows=2392619 loops=1) Output: t1.id, t1.department_name, t1.leader_name, t1.leader_code, t1.store_code, t1.short_name, t1.staff_id,t1.staff_code, t1.staff_name, t1.staff_status, t1.ex ternal_id, t1.user_id, t1.external_name, t1.external_status,t1.bind_time, t1.unbind_time, t1.create_time, t1.update_time, t1.remark Filter: ((COALESCE(t1.user_id, ''::character varying))::text <> ''::text) Rows Removed by Filter: 3342 Remote SQL: SELECT user_id FROM public.a ORDER BY user_id ASC NULLS LAST -> Materialize (cost=38720281.99..38870604.70 rows=30064542 width=1101) (actual time=221529.199..231550.625 rows=10264430loops=1) Output: t6.member_code, t6.tags, ((t6.tags ->> '508'::text)) -> Sort (cost=38720281.99..38795443.34 rows=30064542 width=1101) (actual time=221529.195..230155.492 rows=10250260loops=1) Output: t6.member_code, t6.tags, ((t6.tags ->> '508'::text)) Sort Key: ((t6.tags ->> '508'::text)) Sort Method: external merge Disk: 33044192kB -> Seq Scan on public.b t6 (cost=0.00..6007802.42 rows=30064542 width=1101) (actual time=0.042..150962.505rows=30012748 loops=1) Output: t6.member_code, t6.tags, (t6.tags ->> '508'::text) Should I recreate foreign table using COLLATION en_US.UTF-8 -----邮件原件----- 发件人: Tom Lane <tgl@sss.pgh.pa.us> 发送时间: 2023年2月6日 23:34 收件人: Qu, Mischa, Majorel China <mischa.qu@majorel.cn> 抄送: pgsql-bugs@postgresql.org 主题: Re: exceptional result of postres_fdw external table joining local table "Qu, Mischa, Majorel China" <mischa.qu@majorel.cn> writes: > But when I run some full table scans and found a problems. the results of the following sql were different. > Method 1: directly join external table and local table select > count(distinct user_id),count(distinct member_code) from a_fdw t1 join > b as t6 on t1.user_id=t6.tags ->> '10'::varchar(50) where > coalesce(user_id,'') <> ''; > -- result: 50739 50845 What sort of plan is that query using? If it's a merge join, I'd bet that the problem is related to local and remote serversnot having the same default collation. regards, tom lane CAUTION: This e-mail and any attachments may contain confidential and/or privileged information. If you are not the intendedrecipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Anyunauthorized copying, disclosure or distribution of the material in this e-mail is forbidden. Please consider the environmentbefore printing this e-mail!.
答复: exceptional result of postres_fdw external table joining local table
After I added colllation "en_US.utf8", the execution of another sql slowed down, and the query plan shows the remote sqlis a full table scan. Remote SQL: SELECT unionid, follow_user FROM public.wxwork_external_info And using default collation the querying plan is Remote SQL: SELECT unionid, follow_user FROM public.wxwork_external_info WHERE (($1::character varying(50) = unionid)) -----邮件原件----- 发件人: Qu, Mischa, Majorel China 发送时间: 2023年2月7日 11:45 收件人: Tom Lane <tgl@sss.pgh.pa.us> 抄送: pgsql-bugs@postgresql.org 主题: 答复: exceptional result of postres_fdw external table joining local table Do only merge joins have this problem? -----邮件原件----- 发件人: Qu, Mischa, Majorel China 发送时间: 2023年2月7日 11:23 收件人: Tom Lane <tgl@sss.pgh.pa.us> 抄送: pgsql-bugs@postgresql.org 主题: 答复: exceptional result of postres_fdw external table joining local table Yes, the default collation is different, The collation of remote DB is en_US.UTF-8, and local is C. Remote DB postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | | 7981 kB | pg_default | default administrativeconnection database LOCAL DB postgres | postgres | UTF8 | C | C | | 8061 kB | pg_default | default administrativeconnection database and It's merge join。 Aggregate (cost=7058404551.15..7058404551.16 rows=1 width=16) (actual time=260773.462..260773.464 rows=1 loops=1) Output: count(DISTINCT t1.user_id), count(DISTINCT t6.member_code) -> Merge Join (cost=38935669.40..5266292528.36 rows=358422404557 width=131) (actual time=243519.160..260728.196 rows=65041loops=1) Output: t1.user_id, t6.member_code Merge Cond: ((t1.user_id)::text = ((t6.tags ->> '508'::text))) -> Foreign Scan on public.a_fdw t1 (cost=215387.41..493312.00 rows=1986961 width=28) (actual time=21277.241..27822.350rows=2392619 loops=1) Output: t1.id, t1.department_name, t1.leader_name, t1.leader_code, t1.store_code, t1.short_name, t1.staff_id,t1.staff_code, t1.staff_name, t1.staff_status, t1.ex ternal_id, t1.user_id, t1.external_name, t1.external_status,t1.bind_time, t1.unbind_time, t1.create_time, t1.update_time, t1.remark Filter: ((COALESCE(t1.user_id, ''::character varying))::text <> ''::text) Rows Removed by Filter: 3342 Remote SQL: SELECT user_id FROM public.a ORDER BY user_id ASC NULLS LAST -> Materialize (cost=38720281.99..38870604.70 rows=30064542 width=1101) (actual time=221529.199..231550.625 rows=10264430loops=1) Output: t6.member_code, t6.tags, ((t6.tags ->> '508'::text)) -> Sort (cost=38720281.99..38795443.34 rows=30064542 width=1101) (actual time=221529.195..230155.492 rows=10250260loops=1) Output: t6.member_code, t6.tags, ((t6.tags ->> '508'::text)) Sort Key: ((t6.tags ->> '508'::text)) Sort Method: external merge Disk: 33044192kB -> Seq Scan on public.b t6 (cost=0.00..6007802.42 rows=30064542 width=1101) (actual time=0.042..150962.505rows=30012748 loops=1) Output: t6.member_code, t6.tags, (t6.tags ->> '508'::text) Should I recreate foreign table using COLLATION en_US.UTF-8 -----邮件原件----- 发件人: Tom Lane <tgl@sss.pgh.pa.us> 发送时间: 2023年2月6日 23:34 收件人: Qu, Mischa, Majorel China <mischa.qu@majorel.cn> 抄送: pgsql-bugs@postgresql.org 主题: Re: exceptional result of postres_fdw external table joining local table "Qu, Mischa, Majorel China" <mischa.qu@majorel.cn> writes: > But when I run some full table scans and found a problems. the results of the following sql were different. > Method 1: directly join external table and local table select > count(distinct user_id),count(distinct member_code) from a_fdw t1 join > b as t6 on t1.user_id=t6.tags ->> '10'::varchar(50) where > coalesce(user_id,'') <> ''; > -- result: 50739 50845 What sort of plan is that query using? If it's a merge join, I'd bet that the problem is related to local and remote serversnot having the same default collation. regards, tom lane CAUTION: This e-mail and any attachments may contain confidential and/or privileged information. If you are not the intendedrecipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Anyunauthorized copying, disclosure or distribution of the material in this e-mail is forbidden. Please consider the environmentbefore printing this e-mail!.