Re: New hook after raw parsing, before analyze - Mailing list pgsql-hackers
From | Greg Stark |
---|---|
Subject | Re: New hook after raw parsing, before analyze |
Date | |
Msg-id | CAM-w4HPv6Cpd+LsEEoO8V48TbvAtxhaTTEP1VY4dVR0jD9q4AA@mail.gmail.com Whole thread Raw |
In response to | Re: New hook after raw parsing, before analyze (David Beck <dbeck@starschema.net>) |
Responses |
Re: New hook after raw parsing, before analyze
|
List | pgsql-hackers |
On Sat, Feb 15, 2014 at 2:06 PM, David Beck <dbeck@starschema.net> wrote: > - when the query arrives a smart rewrite would know 1) what tables are local 2) what tables need new catalog entries 3)what can be joined on the other side > - the rewriter would potentially add SQL statements in the beginning of the query for creating the missing FDW catalogentries if needed > - the FDW would be handled by the same extension so they can easily talk to each other about the status of the objects,so the rewriter would know if the background transfer of the small table is completed and should do the rewrite accordingly I think you have a pretty big impedence mismatch with Postgres which assumes the schema of the database is fairly static and known when parsing begins. To do what you describe you pretty much need to write your own SQL parser. There is a hook "post_parse_analyze_hook" but I think it comes too late as it comes after the analyze step which is when Postgres looks up the schema information for every relation mentioned in the query. What you would need is a post_parse_hook which would work on the raw parse tree before the analyze step. That doesn't seem terribly controversial to add though there may be some technical details. The API would of course be completely unstable from major release to major release -- the parse tree gets knocked around quite a bit. But that only gets you so far. I think you would be able to get lazy creation of schema objects fairly straightforwardly. Ie. Any legacy objects referenced in a query get corresponding fdw schema objects created before analyze is done -- though I would expect a few gotchas, possibly deadlocks, with concurrent queries creating the same objects. But I don't think that gets you the joins which I think would be quite a battle. And I have to wonder if you aren't going the long way around to do something that can be done more simply some other way. If you have 150k objects I wonder if your objects aren't all very similar and could be handled by a single Postgres schema object. Either a single FDW object or a simple function. If they really are different schema objects perhaps a single function that returns a more flexible data type like an hstore blob? That has obvious disadvantages over an object that the planner understands better but at least you would be in a well supported stable API (well, for interesting definitions of "stable" given 9.4 will have a whole new version of hstore). As a side note, you should evaluate carefully what lazily creating objects will buy you. Perhaps just creating 150k objects would be cheaper than maintaining this code. In particular since the user *might* access all 150k you still have to worry about the worst case anyway and it might be cheaper to just engineer for it in the first place. -- greg
pgsql-hackers by date: