Thread: Merged Model for libpq

Merged Model for libpq

From

Annamalai Gurusami

Date:

31 March 2011, 13:35:06

Hi All,

I would like to know about the best approach to take for providing a merged model of libpq library. When I say "merged model" it means that the client and server would be running as a single process. A single client libpq application can be linked to either the client-server libpq library or merged libpq library. For more clarity here is a small flow diagram:

Client Server Model:

Application -> libpq library (cs) -> TCP/IP network -> libpq (backend) -> pgsql server

Merged Model:

Application -> libpq library (merged) -> pgsql server

One approach that we are having in mind is to use the SPI interface and re-implement the libpq APIs. Is there any other better approach? Would it be possible to implement the client server protocol into an API interface, without involving the TCP/IP network?

Your thoughts and suggestions on this would be highly appreciated.

Rgds,
anna

--
'உண்மை ஒருநாள் வெளியாகும், அதில் உள்ளங்களெல்லாம் தெளிவாகும்.
பொறுமை ஒருநாள் புலியாகும், அதற்கு பொய்யும் புரட்டும் பலியாகும்.'
- பட்டுக்கோட்டை கல்யாணசுந்தரம்
http://www.youtube.com/watch?v=0J71uLUvjnU&feature=related

Re: Merged Model for libpq

From

Merlin Moncure

Date:

01 April 2011, 18:33:27

On Thu, Mar 31, 2011 at 11:34 AM, Annamalai Gurusami
<annamalai.gurusami@gmail.com> wrote:
> Hi All,
>
> I would like to know about the best approach to take for providing a merged
> model of libpq library.  When I say "merged model" it means that the client
> and server would be running as a single process.  A single client libpq
> application can be linked to either the client-server libpq library or
> merged libpq library.  For more clarity here is a small flow diagram:
>
> Client Server Model:
>
> Application -> libpq library (cs) -> TCP/IP network -> libpq (backend) ->
> pgsql server
>
> Merged Model:
>
> Application -> libpq library (merged) -> pgsql server
>
> One approach that we are having in mind is to use the SPI interface and
> re-implement the libpq APIs.  Is there any other better approach?   Would it
> be possible to implement the client server protocol into an API interface,
> without involving the TCP/IP network?
>
> Your thoughts and suggestions on this would be highly appreciated.

One big issue with SPI that need to be aware of is that you have no
explicit transaction control  Once you are in SPI land, you are in one
transaction and one transaction only -- this means all locks are held
indefinitely as well as other issues.

I don't think fully server side applications are practical until we
get stored procedures with explicit transaction control (and once we
have them, that's all i'll ever write if given a choice).

merlin

Re: Merged Model for libpq

From

John R Pierce

Date:

01 April 2011, 18:47:51

On 03/31/11 9:34 AM, Annamalai Gurusami wrote:
> Would it be possible to implement the client server protocol into an
> API interface, without involving the TCP/IP network?

sure, done already.  'domain sockets', the default for local connections
that don't expressly call for localhost

Re: Merged Model for libpq

From

Merlin Moncure

Date:

01 April 2011, 18:54:55

On Fri, Apr 1, 2011 at 4:47 PM, John R Pierce <pierce@hogranch.com> wrote:
> On 03/31/11 9:34 AM, Annamalai Gurusami wrote:
>>
>> Would it be possible to implement the client server protocol into an API
>> interface, without involving the TCP/IP network?
>
> sure, done already.  'domain sockets', the default for local connections
> that don't expressly call for localhost

er, yes, but that's not the whole story -- everything still has to go
through the protocol, parsing, marshaling, etc.

it's of course fairly trivial problem to wrap spi into libpq-ish
interface but it's pointless until we have explicit transactions
imnsho.

merlin

Re: Merged Model for libpq

From

John R Pierce

Date:

01 April 2011, 19:17:34

On 04/01/11 2:54 PM, Merlin Moncure wrote:
> On Fri, Apr 1, 2011 at 4:47 PM, John R Pierce<pierce@hogranch.com>  wrote:
>> On 03/31/11 9:34 AM, Annamalai Gurusami wrote:
>>> Would it be possible to implement the client server protocol into an API
>>> interface, without involving the TCP/IP network?
>> sure, done already.  'domain sockets', the default for local connections
>> that don't expressly call for localhost
> er, yes, but that's not the whole story -- everything still has to go
> through the protocol, parsing, marshaling, etc.

how would you implement SQL without parsing, etc?    Annamali asked
specifically for an implementation of the existing client-server
protocol without TCP/IP, and thats exactly what the Unix socket
interface is.

Re: Merged Model for libpq

From

Annamalai Gurusami

Date:

02 April 2011, 02:15:46

On 2 April 2011 03:47, John R Pierce <pierce@hogranch.com> wrote:

> how would you implement SQL without parsing, etc? Annamali asked specifically for an implementation of the
existingclient-server protocol without TCP/IP, and thats exactly what the Unix socket interface is.
>

Maybe a little background here would help to understand our situation.
We have an in-memory storage engine implemented inhouse and we have
successfully ported postgresql engine (the sql engine) on top of our
in-memory storage engine. So what we have is:

Postgres SQL Engine + Our proprietary main-memory storage engine

We have introduced our own C API to write clients. For client-server
(CS) model, we have this C API implemented on top of libpq library.
For embedded model (EM) (client and server in same process, as
different threads), we have this C API implemented using SPI. So we
have two libraries of this C API, one is CS model and the other is EM
model. A client that uses this C API can be either linked to CS
library or EM library based on their needs. The application program
itself need not be modified.

Now, we are trying to see whether we can avoid this C API layer and
instead implement the libpq itself using the SPI interface. If we do
this, then any libpq client can either be client-server or embedded.
In this context, I am trying to explore whether for the embedded model
of libpq, using the SPI interface is the only option. Or would you
recommend using some other approach for client and server
communication when they run in the same process?

So Unix domain sockets is not satisfactory for us. We need something
with better performance because client and server are in same process
and in different threads. In this context, I was trying to find out
if the client-server protocol can be implemented without involving
sockets. Since the client and server are in same process (diff
threads), would it be possible to implement the protocol using
something like ACE_Message_Queue? If we do this then serialization of
objects would not be necessary and lot of data copy can be avoided.
We can just pass pointers from server to client (wherever
appropriate). I thought that this can be an alternative to using the
SPI. But is this feasible? Is the client-server protocol, as
implemented now, amenable to such refactoring?

It is a big story, but I thought the background will help highlight
our context. Can you guys provide more information that would help us
to make informed decisions?

Thank you.

Rgds,
anna

--
'உண்மை ஒருநாள் வெளியாகும், அதில் உள்ளங்களெல்லாம் தெளிவாகும்.
பொறுமை ஒருநாள் புலியாகும், அதற்கு பொய்யும் புரட்டும் பலியாகும்.'
- பட்டுக்கோட்டை கல்யாணசுந்தரம்
http://www.youtube.com/watch?v=0J71uLUvjnU&feature=related

Re: Merged Model for libpq

From

John R Pierce

Date:

02 April 2011, 02:47:51

On 04/01/11 10:15 PM, Annamalai Gurusami wrote:
> It is a big story, but I thought the background will help highlight
> our context.  Can you guys provide more information that would help us
> to make informed decisions?

what you describe is neither postgres nor SQL

perhaps you should look at a storage engine like BerkeleyDB

Re: Merged Model for libpq

From

Annamalai Gurusami

Date:

04 April 2011, 01:43:10

On 2 April 2011 11:17, John R Pierce <pierce@hogranch.com> wrote:

> what you describe is neither postgres nor SQL
>
> perhaps you should look at a storage engine like BerkeleyDB

I hope that not everybody dismisses this mail thread because of the
above response. We are deriving our product from pgsql. And since we
are customizing pgsql to our proprietary telecom products, we are
using things that are not designed for that purpose. For example, we
are using SPI to come up with an embedded client. I was basically
trying to find out if there are better alternatives. Have the pgsql
development team thought about embedded clients and is SPI the way to
go?

What we are trying to achieve is that a single application can work as
an ordinary client or an embedded client. For example, if we
implement libpq using SPI interface then any libpq client can behave
as an ordinary client (using current libpq library) or as an embedded
client (by making use of libpq over SPI - which we are implementing).

I have no clue as to why you have recommended BerkeleyDB in this
context! What I have described is pgsql and the applications all use
SQL queries. If somethings are not clear and requires further
elaboration from me, kindly let me know. Providing inputs to extend
pgsql in a proper well-defined way will help us to contribute back the
feature to pgsql (if my company decides so and if pgsql needs it.)
Even if the feature is not contributed back, if the pgsql dev team
finds it a useful feature, anybody can implement it.

Thank you.

Rgds,
anna

Re: Merged Model for libpq

From

Craig Ringer

Date:

04 April 2011, 02:18:34

On 04/04/11 12:43, Annamalai Gurusami wrote:

> What we are trying to achieve is that a single application can work as
> an ordinary client or an embedded client.

That makes a lot of sense, and would be useful for testing too.

> I have no clue as to why you have recommended BerkeleyDB in this
> context!   What I have described is pgsql and the applications all use
> SQL queries.

Yeah... I'd think that FireBird, SQLite or embedded MySQL would make a
lot more sense than BDB. Personally, I suspect that anybody who suggests
Berkeley DB for a job hasn't programmed with it!

I can personally see some advantages in being able to use the same API
for in-database and outside-database clients. The biggest issue, though,
is transaction management. Until/unless Pg gains support for autonomous
transactions, there are operations that can be performed in libpq that
just don't make sense in an spi context.

--
Craig Ringer

Re: Merged Model for libpq

From

Merlin Moncure

Date:

04 April 2011, 11:09:12

On Sun, Apr 3, 2011 at 11:43 PM, Annamalai Gurusami
<annamalai.gurusami@gmail.com> wrote:
> On 2 April 2011 11:17, John R Pierce <pierce@hogranch.com> wrote:
>
>> what you describe is neither postgres nor SQL
>>
>> perhaps you should look at a storage engine like BerkeleyDB
>
> I hope that not everybody dismisses this mail thread because of the
> above response.  We are deriving our product from pgsql.  And since we
> are customizing pgsql to our proprietary telecom products, we are
> using things that are not designed for that purpose.  For example, we
> are using SPI to come up with an embedded client.  I was basically
> trying to find out if there are better alternatives.  Have the pgsql
> development team thought about embedded clients and is SPI the way to
> go?
>
> What we are trying to achieve is that a single application can work as
> an ordinary client or an embedded client.  For example, if we
> implement libpq using SPI interface then any libpq client can behave
> as an ordinary client (using current libpq library) or as an embedded
> client (by making use of libpq over SPI - which we are implementing).
>
> I have no clue as to why you have recommended BerkeleyDB in this
> context!  What I have described is pgsql and the applications all use
> SQL queries.  If somethings are not clear and requires further
> elaboration from me, kindly let me know.  Providing inputs to extend
> pgsql in a proper well-defined way will help us to contribute back the
> feature to pgsql (if my company decides so and if pgsql needs it.)
> Even if the feature is not contributed back, if the pgsql dev team
> finds it a useful feature, anybody can implement it.

I'm not sure you grasped the ramification of my message upthread.
There is a lot of use for libpq (or libpq-ish) api in the backend to
execute queries.  Unfortunately, that api can not wrap the SPI
interface as it exists today.  The SPI interface is for writing
backend functions, not application code.  Those functions *must* be
called from the application layer, and *must* terminate within a
reasonable amount of time (think seconds).  I think you are looking in
the wrong place -- if you want to embed a libpq api in the backend,
perhaps you might want to look at wrapping the backend in standalone
mode.  This has issues that will prevent general use in an
application, but it's a start, and should give you an idea of what you
are up against.

A more involved project would be to look at modifying the postgresql
internals so that you could usefully embed code and run it with
explicit transaction control.  This is a pretty big task and would
likely end up as a complete stored procedure implementation.  If done
though, you could run in a more or less clientless way.

PostgreSQL today can not usefully operate without participation from a
client (although that client can be quite thin if you want it to be).
Having 100% of your application in SPI layer is *not* going to work.

merlin

Re: Merged Model for libpq

From

Tom Lane

Date:

04 April 2011, 11:31:35

Annamalai Gurusami <annamalai.gurusami@gmail.com> writes:
> On 2 April 2011 11:17, John R Pierce <pierce@hogranch.com> wrote:
>> what you describe is neither postgres nor SQL
>> perhaps you should look at a storage engine like BerkeleyDB

> I hope that not everybody dismisses this mail thread because of the
> above response.  We are deriving our product from pgsql.  And since we
> are customizing pgsql to our proprietary telecom products, we are
> using things that are not designed for that purpose.  For example, we
> are using SPI to come up with an embedded client.  I was basically
> trying to find out if there are better alternatives.  Have the pgsql
> development team thought about embedded clients and is SPI the way to
> go?

I don't think you've entirely grasped the seriousness of that response.
The PG development team *has* thought about embedded scenarios, and
explicitly rejected them.  There is no interest at all here in that line
of development, and we are unlikely to even consider patches that might
make it easier.  We don't like the reliability implications of having
random client code in the same address space as the database code.
Moreover, the general trend of recent development has been towards
making the database more, not less, dependent on auxiliary processes
such as autovacuum and bgwriter.  There's no way to manage that in an
embedded scenario ... at least not without resorting to threads, which
is another thing that we are unprepared to support.

So really you should be looking at some other DBMS if you want an
embedded implementation.  It'd be nice if PG could be all things to all
people, but it can't; and this is one of the things it can't be.

            regards, tom lane

Re: Merged Model for libpq

From

Merlin Moncure

Date:

04 April 2011, 12:52:34

On Mon, Apr 4, 2011 at 9:31 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Annamalai Gurusami <annamalai.gurusami@gmail.com> writes:
>> On 2 April 2011 11:17, John R Pierce <pierce@hogranch.com> wrote:
>>> what you describe is neither postgres nor SQL
>>> perhaps you should look at a storage engine like BerkeleyDB
>
>> I hope that not everybody dismisses this mail thread because of the
>> above response.  We are deriving our product from pgsql.  And since we
>> are customizing pgsql to our proprietary telecom products, we are
>> using things that are not designed for that purpose.  For example, we
>> are using SPI to come up with an embedded client.  I was basically
>> trying to find out if there are better alternatives.  Have the pgsql
>> development team thought about embedded clients and is SPI the way to
>> go?
>
> I don't think you've entirely grasped the seriousness of that response.
> The PG development team *has* thought about embedded scenarios, and
> explicitly rejected them.  There is no interest at all here in that line
> of development, and we are unlikely to even consider patches that might
> make it easier.  We don't like the reliability implications of having
> random client code in the same address space as the database code.
> Moreover, the general trend of recent development has been towards
> making the database more, not less, dependent on auxiliary processes
> such as autovacuum and bgwriter.  There's no way to manage that in an
> embedded scenario ... at least not without resorting to threads, which
> is another thing that we are unprepared to support.
>
> So really you should be looking at some other DBMS if you want an
> embedded implementation.  It'd be nice if PG could be all things to all
> people, but it can't; and this is one of the things it can't be.

That's a perhaps overly strong statement.  First of all, we already
support user provided code (in C no less) in the database.  It is raw
and problematic for most people but it's also pretty cool.

True embedding where the user application is in direct control of the
process is of course not practical, but that doesn't mean a tighter
coupling of user code and the database is not possible.  Stored
procedures (I know I'm way into broken record mode on this) would
likely cover what Annamalai is looking to do IMSNO, even if they were
limited to a high level language like plpgsql, since you could still
dip into C appropriately using classic methods.  Getting there isn't
easy, of course.

In the current state of affairs you can kinda sorta emulate this by
having a client side 'ticker' that dials in every period of time and
executes a control function which kicks off your server side logic and
maintains this state.  That way the bulk of your code and data
manipulation is database side and you more or less completely bypass
the overhead of streaming information through the protocol to the
client unless you want to pay that cost.  There are a lot of reasons
not to do this...it's a 'wrong tool' type of thing, but people want to
do it and it's interesting to think about what doors you could open if
it could be taken further.

merlin

Re: Merged Model for libpq

From

Tom Lane

Date:

04 April 2011, 13:19:59

Merlin Moncure <mmoncure@gmail.com> writes:
> On Mon, Apr 4, 2011 at 9:31 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> So really you should be looking at some other DBMS if you want an
>> embedded implementation. It'd be nice if PG could be all things to all
>> people, but it can't; and this is one of the things it can't be.

> That's a perhaps overly strong statement.  First of all, we already
> support user provided code (in C no less) in the database.  It is raw
> and problematic for most people but it's also pretty cool.

Sure.  The difference there is that it's understood by all parties that
user-supplied server-side C code has to conform to the expectations and
coding practices of the server.  The server code is in charge, not the
user-supplied code.  Generally people who ask for an embedded database
expect the opposite.  Certainly people who are accustomed to coding
against the libpq API expect that they are in charge, not libpq.  This
is not only a matter of "who calls whom" but who controls memory
management, error recovery practices, etc.

> True embedding where the user application is in direct control of the
> process is of course not practical, but that doesn't mean a tighter
> coupling of user code and the database is not possible.  Stored
> procedures (I know I'm way into broken record mode on this) would
> likely cover what Annamalai is looking to do IMSNO, even if they were
> limited to a high level language like plpgsql, since you could still
> dip into C appropriately using classic methods.

Possibly.  Annamalai's stated goal of driving a locally-implemented
database through a libpq-ish API, and having that be interchangeable
with a traditional client setup, doesn't seem to fit into this viewpoint
though.  I guess to get much further we'd have to ask why is that the
goal and what's the wider purpose?

            regards, tom lane