Joining on CTE is unusually slow? - Mailing list pgsql-general
From | Jon Rosebaugh |
---|---|
Subject | Joining on CTE is unusually slow? |
Date | |
Msg-id | 1406676160.968728.147067773.4399E74D@webmail.messagingengine.com Whole thread Raw |
Responses |
Re: Joining on CTE is unusually slow?
|
List | pgsql-general |
I have a CTE that produces some row ids. I want to do a query with a complicated join based on those row ids. I've tried running them split into two (run CTE query, collect row ids, then run the complicated query with id IN (id_1, id_2, id_3)) and it takes only a few seconds to run, but when I run this as a single query it takes several minutes. Is this expected behavior? We have a message table (msg) where each message has a parent_id (a nullable FK to the same table). I use the CTE to assemble a list of all messages downstream of a given message (including that message). It works pretty well: WITH RECURSIVE downstream_thread(id, parent_id) AS (SELECT pn_msg.id AS id, pn_msg.parent_id AS parent_id FROM msg_table AS pn_msg JOIN msg_asset_table ON pn_msg.id = msg_asset_table.msg_id JOIN asset_table ON asset_table.id = msg_asset_table.asset_id WHERE pn_msg.message_type = 'interest_notification' AND asset_table.type = 'tp' AND asset_table.public_id = '59c89bdcaf6711e3b67f12313b0a607d' UNION SELECT pr_msg.id AS id, pr_msg.parent_id AS parent_id FROM msg_table AS pr_msg JOIN downstream_thread ON pr_msg.parent_id = downstream_thread.id) SELECT id from downstream_thread; There are 2.3 million rows in msg_table, but for any expected asset public_id, this query will return only a dozen rows at most, and runs in 6 seconds. Once I have the row ids, I use this complicated join; basically I'm finding replies to the original message that come from any user at the company the original message was addressed to. This takes under a second. SELECT notification_reply_msg.* FROM msg_table AS notification_reply_msg JOIN reference_table AS notification_reply_ref ON msg_table.id = notification_reply_ref.msg_id JOIN thread_table AS notification_reply_thread ON notification_reply_ref.thread_id = notification_reply_thread.id JOIN user_table AS notification_reply_user ON notification_reply_thread.user_id = notification_reply_user.id JOIN user_table AS interest_notification_user ON interest_notification_user.company_id = notification_reply_user.company_id JOIN thread_table AS interest_notification_thread ON interest_notification_thread.user_id = interest_notification_user.id JOIN reference_table AS interest_notification_ref ON interest_notification_ref.thread_id = interest_notification_thread.id JOIN msg_table AS interest_notification_msg ON interest_notification_ref.msg_id = interest_notification_msg.id WHERE interest_notification_msg.id IN (2648995, 2648996) and notification_reply_msg.id IN (2648995, 2648996) AND interest_notification_msg.message_type = 'interest_notification' AND interest_notification_ref.header = 'to' AND notification_reply_ref.header = 'from'; However, I tried combining the two queries: WITH RECURSIVE downstream_thread(id, parent_id) AS (SELECT pn_msg.id AS id, pn_msg.parent_id AS parent_id FROM msg_table AS pn_msg JOIN msg_asset_table ON pn_msg.id = msg_asset_table.msg_id JOIN asset_table ON asset_table.id = msg_asset_table.asset_id WHERE pn_msg.message_type = 'interest_notification' AND asset_table.type = 'tp' AND asset_table.public_id = '59c89bdcaf6711e3b67f12313b0a607d' UNION SELECT pr_msg.id AS id, pr_msg.parent_id AS parent_id FROM msg_table AS pr_msg JOIN downstream_thread ON pr_msg.parent_id = downstream_thread.id) SELECT notification_reply_msg.* FROM msg_table AS notification_reply_msg JOIN reference_table AS notification_reply_ref ON msg_table.id = notification_reply_ref.msg_id JOIN thread_table AS notification_reply_thread ON notification_reply_ref.thread_id = notification_reply_thread.id JOIN user_table AS notification_reply_user ON notification_reply_thread.user_id = notification_reply_user.id JOIN user_table AS interest_notification_user ON interest_notification_user.company_id = notification_reply_user.company_id JOIN thread_table AS interest_notification_thread ON interest_notification_thread.user_id = interest_notification_user.id JOIN reference_table AS interest_notification_ref ON interest_notification_ref.thread_id = interest_notification_thread.id JOIN msg_table AS interest_notification_msg ON interest_notification_ref.msg_id = interest_notification_msg.id WHERE interest_notification_msg.id IN (SELECT id from downstream_thread) and notification_reply_msg.id IN (SELECT id from downstream_thread) AND interest_notification_msg.message_type = 'interest_notification' AND interest_notification_ref.header = 'to' AND notification_reply_ref.header = 'from'; The only difference is that I've added the WITH RECURSIVE expression at the beginning and changed the id list to "SELECT id FROM downstream_thread". This takes over eight minutes to run. Is this the expected behavior when joining on CTE expressions? I realize I haven't given the full schema/metadata/explain output as explained in the "Slow Query Questions" wiki page; I wasn't sure if that applied in this case because it's more a question of why combining these two parts is much slower. However, I'm happy to provide it all; I'm just worried about how to identify what's relevant, because that's a lot of tables in that join.
pgsql-general by date: