Thread: Merged Model for libpq
Hi All,
I would like to know about the best approach to take for providing a merged model of libpq library. When I say "merged model" it means that the client and server would be running as a single process. A single client libpq application can be linked to either the client-server libpq library or merged libpq library. For more clarity here is a small flow diagram:
Client Server Model:
Application -> libpq library (cs) -> TCP/IP network -> libpq (backend) -> pgsql server
Merged Model:
Application -> libpq library (merged) -> pgsql server
One approach that we are having in mind is to use the SPI interface and re-implement the libpq APIs. Is there any other better approach? Would it be possible to implement the client server protocol into an API interface, without involving the TCP/IP network?
Your thoughts and suggestions on this would be highly appreciated.
Rgds,
anna
--
'உண்மை ஒருநாள் வெளியாகும், அதில் உள்ளங்களெல்லாம் தெளிவாகும்.
பொறுமை ஒருநாள் புலியாகும், அதற்கு பொய்யும் புரட்டும் பலியாகும்.'
- பட்டுக்கோட்டை கல்யாணசுந்தரம்
http://www.youtube.com/watch?v=0J71uLUvjnU&feature=related
I would like to know about the best approach to take for providing a merged model of libpq library. When I say "merged model" it means that the client and server would be running as a single process. A single client libpq application can be linked to either the client-server libpq library or merged libpq library. For more clarity here is a small flow diagram:
Client Server Model:
Application -> libpq library (cs) -> TCP/IP network -> libpq (backend) -> pgsql server
Merged Model:
Application -> libpq library (merged) -> pgsql server
One approach that we are having in mind is to use the SPI interface and re-implement the libpq APIs. Is there any other better approach? Would it be possible to implement the client server protocol into an API interface, without involving the TCP/IP network?
Your thoughts and suggestions on this would be highly appreciated.
Rgds,
anna
--
'உண்மை ஒருநாள் வெளியாகும், அதில் உள்ளங்களெல்லாம் தெளிவாகும்.
பொறுமை ஒருநாள் புலியாகும், அதற்கு பொய்யும் புரட்டும் பலியாகும்.'
- பட்டுக்கோட்டை கல்யாணசுந்தரம்
http://www.youtube.com/watch?v=0J71uLUvjnU&feature=related
On Thu, Mar 31, 2011 at 11:34 AM, Annamalai Gurusami <annamalai.gurusami@gmail.com> wrote: > Hi All, > > I would like to know about the best approach to take for providing a merged > model of libpq library. When I say "merged model" it means that the client > and server would be running as a single process. A single client libpq > application can be linked to either the client-server libpq library or > merged libpq library. For more clarity here is a small flow diagram: > > Client Server Model: > > Application -> libpq library (cs) -> TCP/IP network -> libpq (backend) -> > pgsql server > > Merged Model: > > Application -> libpq library (merged) -> pgsql server > > One approach that we are having in mind is to use the SPI interface and > re-implement the libpq APIs. Is there any other better approach? Would it > be possible to implement the client server protocol into an API interface, > without involving the TCP/IP network? > > Your thoughts and suggestions on this would be highly appreciated. One big issue with SPI that need to be aware of is that you have no explicit transaction control Once you are in SPI land, you are in one transaction and one transaction only -- this means all locks are held indefinitely as well as other issues. I don't think fully server side applications are practical until we get stored procedures with explicit transaction control (and once we have them, that's all i'll ever write if given a choice). merlin
On 03/31/11 9:34 AM, Annamalai Gurusami wrote: > Would it be possible to implement the client server protocol into an > API interface, without involving the TCP/IP network? sure, done already. 'domain sockets', the default for local connections that don't expressly call for localhost
On Fri, Apr 1, 2011 at 4:47 PM, John R Pierce <pierce@hogranch.com> wrote: > On 03/31/11 9:34 AM, Annamalai Gurusami wrote: >> >> Would it be possible to implement the client server protocol into an API >> interface, without involving the TCP/IP network? > > sure, done already. 'domain sockets', the default for local connections > that don't expressly call for localhost er, yes, but that's not the whole story -- everything still has to go through the protocol, parsing, marshaling, etc. it's of course fairly trivial problem to wrap spi into libpq-ish interface but it's pointless until we have explicit transactions imnsho. merlin
On 04/01/11 2:54 PM, Merlin Moncure wrote: > On Fri, Apr 1, 2011 at 4:47 PM, John R Pierce<pierce@hogranch.com> wrote: >> On 03/31/11 9:34 AM, Annamalai Gurusami wrote: >>> Would it be possible to implement the client server protocol into an API >>> interface, without involving the TCP/IP network? >> sure, done already. 'domain sockets', the default for local connections >> that don't expressly call for localhost > er, yes, but that's not the whole story -- everything still has to go > through the protocol, parsing, marshaling, etc. how would you implement SQL without parsing, etc? Annamali asked specifically for an implementation of the existing client-server protocol without TCP/IP, and thats exactly what the Unix socket interface is.
On 2 April 2011 03:47, John R Pierce <pierce@hogranch.com> wrote: > how would you implement SQL without parsing, etc? Annamali asked specifically for an implementation of the existingclient-server protocol without TCP/IP, and thats exactly what the Unix socket interface is. > Maybe a little background here would help to understand our situation. We have an in-memory storage engine implemented inhouse and we have successfully ported postgresql engine (the sql engine) on top of our in-memory storage engine. So what we have is: Postgres SQL Engine + Our proprietary main-memory storage engine We have introduced our own C API to write clients. For client-server (CS) model, we have this C API implemented on top of libpq library. For embedded model (EM) (client and server in same process, as different threads), we have this C API implemented using SPI. So we have two libraries of this C API, one is CS model and the other is EM model. A client that uses this C API can be either linked to CS library or EM library based on their needs. The application program itself need not be modified. Now, we are trying to see whether we can avoid this C API layer and instead implement the libpq itself using the SPI interface. If we do this, then any libpq client can either be client-server or embedded. In this context, I am trying to explore whether for the embedded model of libpq, using the SPI interface is the only option. Or would you recommend using some other approach for client and server communication when they run in the same process? So Unix domain sockets is not satisfactory for us. We need something with better performance because client and server are in same process and in different threads. In this context, I was trying to find out if the client-server protocol can be implemented without involving sockets. Since the client and server are in same process (diff threads), would it be possible to implement the protocol using something like ACE_Message_Queue? If we do this then serialization of objects would not be necessary and lot of data copy can be avoided. We can just pass pointers from server to client (wherever appropriate). I thought that this can be an alternative to using the SPI. But is this feasible? Is the client-server protocol, as implemented now, amenable to such refactoring? It is a big story, but I thought the background will help highlight our context. Can you guys provide more information that would help us to make informed decisions? Thank you. Rgds, anna -- 'உண்மை ஒருநாள் வெளியாகும், அதில் உள்ளங்களெல்லாம் தெளிவாகும். பொறுமை ஒருநாள் புலியாகும், அதற்கு பொய்யும் புரட்டும் பலியாகும்.' - பட்டுக்கோட்டை கல்யாணசுந்தரம் http://www.youtube.com/watch?v=0J71uLUvjnU&feature=related
On 04/01/11 10:15 PM, Annamalai Gurusami wrote: > It is a big story, but I thought the background will help highlight > our context. Can you guys provide more information that would help us > to make informed decisions? what you describe is neither postgres nor SQL perhaps you should look at a storage engine like BerkeleyDB
On 2 April 2011 11:17, John R Pierce <pierce@hogranch.com> wrote: > what you describe is neither postgres nor SQL > > perhaps you should look at a storage engine like BerkeleyDB I hope that not everybody dismisses this mail thread because of the above response. We are deriving our product from pgsql. And since we are customizing pgsql to our proprietary telecom products, we are using things that are not designed for that purpose. For example, we are using SPI to come up with an embedded client. I was basically trying to find out if there are better alternatives. Have the pgsql development team thought about embedded clients and is SPI the way to go? What we are trying to achieve is that a single application can work as an ordinary client or an embedded client. For example, if we implement libpq using SPI interface then any libpq client can behave as an ordinary client (using current libpq library) or as an embedded client (by making use of libpq over SPI - which we are implementing). I have no clue as to why you have recommended BerkeleyDB in this context! What I have described is pgsql and the applications all use SQL queries. If somethings are not clear and requires further elaboration from me, kindly let me know. Providing inputs to extend pgsql in a proper well-defined way will help us to contribute back the feature to pgsql (if my company decides so and if pgsql needs it.) Even if the feature is not contributed back, if the pgsql dev team finds it a useful feature, anybody can implement it. Thank you. Rgds, anna -- 'உண்மை ஒருநாள் வெளியாகும், அதில் உள்ளங்களெல்லாம் தெளிவாகும். பொறுமை ஒருநாள் புலியாகும், அதற்கு பொய்யும் புரட்டும் பலியாகும்.' - பட்டுக்கோட்டை கல்யாணசுந்தரம் http://www.youtube.com/watch?v=0J71uLUvjnU&feature=related
On 04/04/11 12:43, Annamalai Gurusami wrote: > What we are trying to achieve is that a single application can work as > an ordinary client or an embedded client. That makes a lot of sense, and would be useful for testing too. > I have no clue as to why you have recommended BerkeleyDB in this > context! What I have described is pgsql and the applications all use > SQL queries. Yeah... I'd think that FireBird, SQLite or embedded MySQL would make a lot more sense than BDB. Personally, I suspect that anybody who suggests Berkeley DB for a job hasn't programmed with it! I can personally see some advantages in being able to use the same API for in-database and outside-database clients. The biggest issue, though, is transaction management. Until/unless Pg gains support for autonomous transactions, there are operations that can be performed in libpq that just don't make sense in an spi context. -- Craig Ringer
On Sun, Apr 3, 2011 at 11:43 PM, Annamalai Gurusami <annamalai.gurusami@gmail.com> wrote: > On 2 April 2011 11:17, John R Pierce <pierce@hogranch.com> wrote: > >> what you describe is neither postgres nor SQL >> >> perhaps you should look at a storage engine like BerkeleyDB > > I hope that not everybody dismisses this mail thread because of the > above response. We are deriving our product from pgsql. And since we > are customizing pgsql to our proprietary telecom products, we are > using things that are not designed for that purpose. For example, we > are using SPI to come up with an embedded client. I was basically > trying to find out if there are better alternatives. Have the pgsql > development team thought about embedded clients and is SPI the way to > go? > > What we are trying to achieve is that a single application can work as > an ordinary client or an embedded client. For example, if we > implement libpq using SPI interface then any libpq client can behave > as an ordinary client (using current libpq library) or as an embedded > client (by making use of libpq over SPI - which we are implementing). > > I have no clue as to why you have recommended BerkeleyDB in this > context! What I have described is pgsql and the applications all use > SQL queries. If somethings are not clear and requires further > elaboration from me, kindly let me know. Providing inputs to extend > pgsql in a proper well-defined way will help us to contribute back the > feature to pgsql (if my company decides so and if pgsql needs it.) > Even if the feature is not contributed back, if the pgsql dev team > finds it a useful feature, anybody can implement it. I'm not sure you grasped the ramification of my message upthread. There is a lot of use for libpq (or libpq-ish) api in the backend to execute queries. Unfortunately, that api can not wrap the SPI interface as it exists today. The SPI interface is for writing backend functions, not application code. Those functions *must* be called from the application layer, and *must* terminate within a reasonable amount of time (think seconds). I think you are looking in the wrong place -- if you want to embed a libpq api in the backend, perhaps you might want to look at wrapping the backend in standalone mode. This has issues that will prevent general use in an application, but it's a start, and should give you an idea of what you are up against. A more involved project would be to look at modifying the postgresql internals so that you could usefully embed code and run it with explicit transaction control. This is a pretty big task and would likely end up as a complete stored procedure implementation. If done though, you could run in a more or less clientless way. PostgreSQL today can not usefully operate without participation from a client (although that client can be quite thin if you want it to be). Having 100% of your application in SPI layer is *not* going to work. merlin
Annamalai Gurusami <annamalai.gurusami@gmail.com> writes: > On 2 April 2011 11:17, John R Pierce <pierce@hogranch.com> wrote: >> what you describe is neither postgres nor SQL >> perhaps you should look at a storage engine like BerkeleyDB > I hope that not everybody dismisses this mail thread because of the > above response. We are deriving our product from pgsql. And since we > are customizing pgsql to our proprietary telecom products, we are > using things that are not designed for that purpose. For example, we > are using SPI to come up with an embedded client. I was basically > trying to find out if there are better alternatives. Have the pgsql > development team thought about embedded clients and is SPI the way to > go? I don't think you've entirely grasped the seriousness of that response. The PG development team *has* thought about embedded scenarios, and explicitly rejected them. There is no interest at all here in that line of development, and we are unlikely to even consider patches that might make it easier. We don't like the reliability implications of having random client code in the same address space as the database code. Moreover, the general trend of recent development has been towards making the database more, not less, dependent on auxiliary processes such as autovacuum and bgwriter. There's no way to manage that in an embedded scenario ... at least not without resorting to threads, which is another thing that we are unprepared to support. So really you should be looking at some other DBMS if you want an embedded implementation. It'd be nice if PG could be all things to all people, but it can't; and this is one of the things it can't be. regards, tom lane
On Mon, Apr 4, 2011 at 9:31 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Annamalai Gurusami <annamalai.gurusami@gmail.com> writes: >> On 2 April 2011 11:17, John R Pierce <pierce@hogranch.com> wrote: >>> what you describe is neither postgres nor SQL >>> perhaps you should look at a storage engine like BerkeleyDB > >> I hope that not everybody dismisses this mail thread because of the >> above response. We are deriving our product from pgsql. And since we >> are customizing pgsql to our proprietary telecom products, we are >> using things that are not designed for that purpose. For example, we >> are using SPI to come up with an embedded client. I was basically >> trying to find out if there are better alternatives. Have the pgsql >> development team thought about embedded clients and is SPI the way to >> go? > > I don't think you've entirely grasped the seriousness of that response. > The PG development team *has* thought about embedded scenarios, and > explicitly rejected them. There is no interest at all here in that line > of development, and we are unlikely to even consider patches that might > make it easier. We don't like the reliability implications of having > random client code in the same address space as the database code. > Moreover, the general trend of recent development has been towards > making the database more, not less, dependent on auxiliary processes > such as autovacuum and bgwriter. There's no way to manage that in an > embedded scenario ... at least not without resorting to threads, which > is another thing that we are unprepared to support. > > So really you should be looking at some other DBMS if you want an > embedded implementation. It'd be nice if PG could be all things to all > people, but it can't; and this is one of the things it can't be. That's a perhaps overly strong statement. First of all, we already support user provided code (in C no less) in the database. It is raw and problematic for most people but it's also pretty cool. True embedding where the user application is in direct control of the process is of course not practical, but that doesn't mean a tighter coupling of user code and the database is not possible. Stored procedures (I know I'm way into broken record mode on this) would likely cover what Annamalai is looking to do IMSNO, even if they were limited to a high level language like plpgsql, since you could still dip into C appropriately using classic methods. Getting there isn't easy, of course. In the current state of affairs you can kinda sorta emulate this by having a client side 'ticker' that dials in every period of time and executes a control function which kicks off your server side logic and maintains this state. That way the bulk of your code and data manipulation is database side and you more or less completely bypass the overhead of streaming information through the protocol to the client unless you want to pay that cost. There are a lot of reasons not to do this...it's a 'wrong tool' type of thing, but people want to do it and it's interesting to think about what doors you could open if it could be taken further. merlin
Merlin Moncure <mmoncure@gmail.com> writes: > On Mon, Apr 4, 2011 at 9:31 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> So really you should be looking at some other DBMS if you want an >> embedded implementation. It'd be nice if PG could be all things to all >> people, but it can't; and this is one of the things it can't be. > That's a perhaps overly strong statement. First of all, we already > support user provided code (in C no less) in the database. It is raw > and problematic for most people but it's also pretty cool. Sure. The difference there is that it's understood by all parties that user-supplied server-side C code has to conform to the expectations and coding practices of the server. The server code is in charge, not the user-supplied code. Generally people who ask for an embedded database expect the opposite. Certainly people who are accustomed to coding against the libpq API expect that they are in charge, not libpq. This is not only a matter of "who calls whom" but who controls memory management, error recovery practices, etc. > True embedding where the user application is in direct control of the > process is of course not practical, but that doesn't mean a tighter > coupling of user code and the database is not possible. Stored > procedures (I know I'm way into broken record mode on this) would > likely cover what Annamalai is looking to do IMSNO, even if they were > limited to a high level language like plpgsql, since you could still > dip into C appropriately using classic methods. Possibly. Annamalai's stated goal of driving a locally-implemented database through a libpq-ish API, and having that be interchangeable with a traditional client setup, doesn't seem to fit into this viewpoint though. I guess to get much further we'd have to ask why is that the goal and what's the wider purpose? regards, tom lane