Thread: Do parallel queries work with only dblink not with fdw?
In order to improve cpu and disk utilization, I am testing parallel queries.
The approach with dblink_send_query() and dblink_get_result() works in my proof-of-concept. Runtime of my reference query was reduced from 55 seconds to ~20seconds using 4 parallel connections. Not what I had hoped but certainly a significant improvement.
The approach with dblink_send_query() and dblink_get_result() works in my proof-of-concept. Runtime of my reference query was reduced from 55 seconds to ~20seconds using 4 parallel connections. Not what I had hoped but certainly a significant improvement.
My approach is fairly easy:
db_link_send_query('conn1', 'statement based on partitioning field');
db_link_send_query('conn2', 'statement based on partitioning field');
...
db_link_send_query('conn2', 'statement based on partitioning field');
...
SELECT
dblink_get_result('conn1')
dblink_get_result('conn1')
UNION ALL
dblink_get_result('conn2')
...
dblink_get_result('conn2')
...
SELECT * FROM myFdwTable WHERE (clause based on partitioning field)
...
On 05/04/2016 01:28 AM, Klaus P. wrote: > In order to improve cpu and disk utilization, I am testing parallel > queries. > > The approach with dblink_send_query() and dblink_get_result() works in > my proof-of-concept. Runtime of my reference query was reduced from 55 > seconds to ~20seconds using 4 parallel connections. Not what I had hoped > but certainly a significant improvement. > > My approach is fairly easy: > > db_link_send_query('conn1', 'statement based on partitioning field'); > db_link_send_query('conn2', 'statement based on partitioning field'); > > ... > > SELECT > dblink_get_result('conn1') > UNION ALL > dblink_get_result('conn2') > > ... > > > However, using fdw foreign data wrappers, I was not able to run any > query in parallel. I came across this presentation > http://de.slideshare.net/EnterpriseDB/foreign-data-wrappers-and-you-with-postgres > where it says on page 12 "parallel FDW access". Is there any example > available on how to run FDW queries in parallel? > > My approach with FDW is similar to the dblink example above: > > SELECT * FROM myFdwTable WHERE (clause based on partitioning field) > UNION ALL > SELECT * FROM myFdwTable WHERE (clause based on partitioning field) > ... > > My experience is however that the statements are carried out sequentially. I am not sure of the status of parallel query in FDW, but for those that are some more information would be helpful: 1) You are using postgres_fdw, correct? 2) What version of the FDW are you using? 3) What version(s) of Postgres are you connecting from/to? > > Thanks > > Klaus Pieper -- Adrian Klaver adrian.klaver@aklaver.com
> -----Ursprüngliche Nachricht----- > Von: Adrian Klaver [mailto:adrian.klaver@aklaver.com] > > I am not sure of the status of parallel query in FDW, but for those that are some > more information would be helpful: > > 1) You are using postgres_fdw, correct? Yes. > > 2) What version of the FDW are you using? Not sure if FDW could be different from the cluster version (see below). Postgres_fdw.dll shows file version 9.5.2.16088- this is the one installed with the Windows PostgreSQL package. > > 3) What version(s) of Postgres are you connecting from/to? For testing I have installed a fresh download 9.5.2 Windows x64 (on Windows Server 2008 R2, if this is of interest) There is only one test database running in this cluster. I am connecting in a "loopback" mode to that database. I could repeat my tests with two different databases or run it on a Linux / Debian machine if that makes a difference. Thanks Klaus
>From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Klaus P. >Sent: Mittwoch, 4. Mai 2016 10:28 >To: pgsql-general@postgresql.org >Subject: [GENERAL] Do parallel queries work with only dblink not with fdw? > >In order to improve cpu and disk utilization, I am testing parallel queries. > >The approach with dblink_send_query() and dblink_get_result() works in my proof-of-concept. Runtime of my reference querywas reduced from 55 seconds to ~20seconds using 4 parallel connections. Not what I had hoped but certainly a significantimprovement. >My approach is fairly easy: >db_link_send_query('conn1', 'statement based on partitioning field'); >db_link_send_query('conn2', 'statement based on partitioning field'); > >... >SELECT >dblink_get_result('conn1') >UNION ALL >dblink_get_result('conn2') > >... Hello, I'm using the db_link approach too, but rather than using dblink_get_result, I store the result in an additional table. This was faster, at least for my use case and at the time when I implemented that solution... something like db_link_send_query('conn0', 'CREATE UNLOGGED TABLE my_result... db_link_send_query('conn1', 'INSERT INTO my_result statement based on partitioning field'); db_link_send_query('conn2', 'INSERT INTO my_result statement based on partitioning field'); ... select * from my_result regards, Marc Mamin
Hi, Sorry for appending to that thread, but I think this is related: Does anyone have experience with parsel [1] and/or it's extension parallelsql [2]? :Stefan [1] http://geeohspatial.blogspot.ch/2013/12/a-simple-function-for-parallel-queries_18.html [2] https://github.com/k1aus/parallelsql 2016-05-04 16:10 GMT+02:00 Marc Mamin <M.Mamin@intershop.de>: >>From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Klaus P. >>Sent: Mittwoch, 4. Mai 2016 10:28 >>To: pgsql-general@postgresql.org >>Subject: [GENERAL] Do parallel queries work with only dblink not with fdw? >> >>In order to improve cpu and disk utilization, I am testing parallel queries. >> >>The approach with dblink_send_query() and dblink_get_result() works in my proof-of-concept. Runtime of my reference querywas reduced from 55 seconds to ~20seconds using 4 parallel connections. Not what I had hoped but certainly a significantimprovement. >>My approach is fairly easy: >>db_link_send_query('conn1', 'statement based on partitioning field'); >>db_link_send_query('conn2', 'statement based on partitioning field'); >> >>... >>SELECT >>dblink_get_result('conn1') >>UNION ALL >>dblink_get_result('conn2') >> >>... > > > Hello, > > I'm using the db_link approach too, but rather than using dblink_get_result, I store the result in an additional table. > This was faster, at least for my use case and at the time when I implemented that solution... > > something like > > db_link_send_query('conn0', 'CREATE UNLOGGED TABLE my_result... > db_link_send_query('conn1', 'INSERT INTO my_result statement based on partitioning field'); > db_link_send_query('conn2', 'INSERT INTO my_result statement based on partitioning field'); > ... > select * from my_result > > regards, > > Marc Mamin > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general