Home > mailing lists
Re: PostgresDataSource Question - Mailing list pgsql-jdbc

From	Kovács Péter
Subject	Re: PostgresDataSource Question
Date	January 3, 2002 10:41:47
Msg-id	8A2DDD7ED7876A4698F6FF204F62CBFC01EC456D@budg112a.sysdata.siemens.hu Whole thread Raw
In response to	PostgresDataSource Question (Ned Wolpert <wolpert@yahoo.com>)
Responses	Re: PostgresDataSource Question Re: PostgresDataSource Question
List	pgsql-jdbc
Tree view
Ned,

I am afraid that (with my effective contribution :-)) things have got mixed
up a bit here. So I will try to sort it  out.

First of all: the XADataSourceImpl does some kind of pooling, but it is
probably not intended to implement the functionality you want. I will
explain what this sorta pooling does, but before doing so I'd like to make
another statement that has not yet been made: support for connection pooling
and support for distributed transactions can be implemented separately. So
you do not need to bother with the xa package at all if you want to support
connection pooling or even if you want _implement_ connection pooling (in my
view there is a difference between supporting connection pooling and
implementing connection pooling). The reason why I was kind of pushing the
xa package is that, people on this list keep talking about incompleteness
regarding the Exoffice's xa package -- and they are right from their
perspective. But if you look at it in another way, this is good stuff -- and
this is clearly my fault that I did not fully explain my perspective.

I think that first I should explain my view. What I really need is a way to
handle transactions in an environment where there are only local
transactions. Furthermore, I would like to do it so that the transaction
handling mechanism is abstract enough for me to be able to replace the
database server at will. I want it to be so much abstract as to allow me to
use not only JDBC but also other interfaces (OODBMs have completely
different client interfaces). In a world, where you have only JDBC with
local transactions, you can happily pass around Connection instances, each
representing exactly one local transaction. But if you want your code more
generic, you will want to have something more abstract.

What I said above may be obvious, but a brief explanation may be helpful. I
am using a simple model here: I partition my application into two layers:
business logic layer (BLL) and data access layer (DAL). When I was talking
about "more generic code" in the previous paragraph I meant the BLL. Let me
further explain what I mean. With any real-life application I will want to
start and end transactions in the BLL. [I could implement such interfaces in
the DAL, that would span a whole transaction (where atomicity needed) thus
obviating the need for the BLL to use transaction demarcation. But in this
case part of the business logic would inevitably have to be implemented in
the DAL, which is not desirable since it reduces modularity with all the
disatvantages that I will not detail here.] My point is to use an abstract
notion/type/concept for transaction demarcation in the BLL, so that if I
want replace an OODBMs (for example Versant) with an RDMBS (for example
PostgreSQL), I will have to change only the data access layer and do not
have to touch the BLL.

Even though tha JTA has been designed to handle distributed transactions, it
also can handle local transactions as well. And if you look at the interface
exposed to the application server (TransactionManager), this interface does
all what I need, and is completely agnostic of whether the underlying
transaction is distributed or local. (Beside the requirement I described
above, it also offers the benefit, that I do not have to pass around any
transaction objects between function calls, because a transaction can be
attached to and detached from a thread.) So why should not I use the JTA, if
it does the job (well)???

So I was setting out to create my own implementation of JTA. But how to do
it? The JTA defines its versions of XA interface to integrate the resource
manager(s). Why should I invent another interface? Most of the RDMBSs
already provide implementations of these interfaces anyway. Does it do any
harm, if the architecture is also capable of handling distributed
transactions. (My strong belief is that: it does not, and it is even good
that I can use the same infrastructure for distributed transactions, as my
needs and tools advance -- but you may disagree.)

((I actually implemented a mechanism for handling local transactions in an
abstract way using a custom API. It was not easy, but you learn a lot from
doing this kind of stuff, because you have to go through and find solutions
to problems which arise in such an environment. After I thought the
implementation was complete and worked nicely, another potential problem
came to my mind. The problem was the following. I used ThreadLocal to attach
transaction objects to threads. Also, I used CORBA for IPC. Now, every
decent CORBA implementation uses thread pools to process incoming requests.
What happens --I asked myself--, if the user-programmer forgets to end
[commit or rollback] the transaction??? It may take some time before the
timer for the transaction expires and will be cleaned up from the thread it
has been attached to. During this time, the thread can be reused by the ORB
to process another incoming CORBA request, and the implementation that
executes in the reused thread will be confused, because it will find that it
is part of an ongoing transaction. The clean solution to this is to use a
mechanism which is integrated in the CORBA infrastructure. CORBA provides
the local interface Current to move around thread specific information. But,
I asked myself, if I am already bogged down so deeply in this mess, why
should not I use OMG's Object Transaction Service -- and why should not I go
the standard way accross the board. And that made it. [Just one small
differentiation between standards: some of the standards are open in the
sense that you can freely make AND distribute complying implementations, and
some of the standards are open in the sense that you can make clean-room
implementations, but you cannot distribute complying implementation without
further arrangement with the standard's owner/author/...the lawyers know
better what. My understanding is that JNDI, JDBC and JTA falls in the former
category and EJB and Servlets in the later, but I may be completely
wrong.]))

My current implementation of the JTA uses the OMG's OTS Version 1.1 (OpenORB
Transaction Service 1.2.0). So the transactions are global, but since I make
sure that one PostgreSQL connection participates only in transactions in
which only connections from the same datasource will participate, the
transaction will practically remain local in the sense that there will be no
2pcs.

Summary: I need Exoffice's xa package, because it can be used to integrate
PostgreSQL into a JTA implementation. I am not interested in how it
impelements 2pc, whether it fakes or not or whether it implements 2pc at
all. You cannot use PosgtreSQL for 2pc anyway (as we already repeated it too
many times).

So what kind of pooling is done in XADataSourceImpl? The best way to
describe it is going through a scenario.

We have the following components:
-- DataSource: implemented in the middleware;
-- Pool: a pool of connections implemented in the middleware;
-- PostgresqlXADataSource: implemented by the jdbc driver. Itself implements
org.postgresql.xa.XADataSourceImpl.

The application requests a connection from the DataSource. Assume that we're
right after startup, so theres nothing in the Pool, so the DataSource will
further the request to PostgresqlXADataSource by calling
PostgresqlXADataSource.getXAConnection(). This returns an "empty"
XAConnectionImpl instance. It is empty in the sense, that it has no physical
connection assigned to. The XAConnectionImpl instance is returned to the
DataSource. Now there are two possibilities: (1) we're in a global
transaction or (2) we're NOT in a global transaction. In case (1) DataSource
calls XAConnectionImpl.start() with the XID of the transaction. The result
is that XADataSourceImpl a) creates a new physical connection, b) mappes it
internall to the XID, c) creates a ClientConnections and returns it to the
application. When the application calls methods on the ClientConnection, the
physical connection is always retrieved (ultimately through the XID) and is
used to do the real job. When the application calls ClientConnection.close()
the DataSource gets notified, calls XADataSourceImpl.end(xid, TMSUCCESSFUL)
and puts the XADataSourceImpl into the Pool. Calling
XADataSourceImpl.end(xid, TMSUCCESSFUL) will have the result that
XADataSource will be "emptied", ie. detached from the physical connection
(which remains internally mapped to the XID in XADataSourceImpl). At this
point the DataSource might think that it has a free connection in the pool,
whereas what it has its only a shell, that will be attached next time to a
physical connection as needed. There also exists, at this point in time, in
the system a physical connection, but it has not been committed, so it is
not free, it is tied (internally mapped) to the ongoing transaction. Let's
assume that the application does not commit the transaction (TX) and and
reuses its XADataSourceImpl in the pool to do work in the same TX. It will
enlist XADataSourceImpl via XAConnectionImpl.start(xid,TMRESUME) which will
have the result that the physical connection with the open local transaction
will be attached back to the (single) XADataSourceImpl instance). Assume
that the app calls again ClientConnection.close() and the TX is still open.
The XADataSourceImpl instance will be put back into the Pool. Also assume
that another app thread in another global TX (TX2) requests a connection
from the DataSource. [If the other thread had requested a connection from
the DataSource before the first thread called ClientConnection.close(), the
DataSource (the Pool being empty) would have had to request a new connection
from XADataSourceImpl, which would have resulted in the construction of
another instance of XAConnectionImpl. This also a possible scenario, but
this is not the case now.] The DataSource takes the XAConnectionImpl
instance from the pool and enlists it which will result in
XAConnectionImpl.start(xid2,TMRESUME). The XADataSourceImpl will find in its
internal map no physical connection with this XID, so it will create a new
one (PHC2) and attaches it to our (only) XADataSourceImpl instance. Now only
we have only one XADataSourceImpl instance (it was so far always available
in the Pool when the DataSource needed one), but there are to physical
connection, one which is in use by the second thread as part of TX2, and one
which is mapped to the first transaction and is awaiting commit or further
use. Now this state is represents the adverse effect of the decoupling of
the physical connections from the PooledConnections (XAConnectonImpl) I
talked about in one of my previous mails: the DataSource is pooling/handling
XAConnectionImpl instances that are only loosly coupled to physical
instances. We can probably agree that the main purpose of connection pooling
is (a) reuse existing connections and (b) limiting the number of connections
being open at a point in time. Now requirement (a) will be always met by the
above mechanism, but requirement (b) will be met only over time (on average,
if you wish).

Now let's say the app in TX2 calls ClientImpl.close() [the DataSource puts
the XAConnectionImpl instance back in the Pool] and commits. PHC2 will be
commiteded and put (releaseTxConnection) in the internal pool of
XADataSourceImpl. Note that this is the first time that a physical
connection has been put into the internal pool of XADataSourceImpl. Our
first physical connection is still mapped to the first TX and will be put
into the internal pool only after the transaction it is mapped to has been
committed (and the commit() has successfully been called on the physical
connection). It is clear that when a connection is requested from
XADataSourceImpl, it will first look for a free one in its internal pool
before creating a new one, but this pooling mechanism does not (and in fact,
based on the spec, is not supposed to) do anything along lines of meeting
pooling requriement (b). I can imagine for example an RDMBS-JDBC driver
combination, where physical connections can be effectively detached from and
attached to transactions. In such a case, the JDBC driver does not need to
implement any internal pooling. The XADataSourceImpl in our case needs to
maintain a pool of physical connection (if you wish) per force, because the
PostgreSQL implementation does not allow to detach physical connections from
transactions. (I do not know the internals of the backend, but I do not
think it is impossible [or even very complicated] to implement such a
feature and I am not sure how it could be useful anyway.)

Peter


> -----Original Message-----
> From: Ned Wolpert [mailto:wolpert@yahoo.com]
> Sent: Thursday, January 03, 2002 2:48 AM
> To: Ned Wolpert; Kovács Péter; pgsql-jdbc@postgresql.org
> Subject: PostgresDataSource Question
>
>
> Folks-
>
>   I'm re-examing the PostgresDataSource class, and it seems
> that I missed
> a few things.  I need someone to verify what it is I'm
> looking at. This is
> based on my pooled stuff I submitted eariler, and the current
> conversation
> that has been going on about my submittal.
>
>   Basically, it seems that the XADataSourceImpl is a working pooling
> manager.  It is an abstract class, only extended by
> PostgresqlDataSource.
> The XADataSourceImpl provides the access to the pool from their method
> newConnection() and releaseConnection(), neither of which are called
> elsewhere.
>
>   It looks like the code was expecting the
> org.postgresql.jdbc2.Connection
>
> object to 'release' it if it was called by the datasource, when the
> connection was closed, but the Connection class was never modified. In
> short, the pool is almost there already, just not complete. The class
> PostgresqlDataSource _can_ pool, it just doesn't.  Does this look like
> a proper analysis to others?
>
>   I can do one of two things at this point, and I would like people's
> opinion as to what I should do. One, I can continue working on my pool
> manager, which will extend XADataSourceImpl and will still
> have to wrap
> the connection classes to notify my pooling manager of changes that
> occurs.  or Two, create a set of patches that will impact the jdbc2
> package and PostgresDataSource class to finish what was started.
>
>   What do you think folks? I'm starting to lean to option
> two, but would
> like to hear other people's opinions.  If we pick two, that means
> that my pooling manager is _part_ of the PostgresDataSource, not a
> seperate class.  Could some of the CVS committers comment on this?
> (Also, I'll be having patches for basically all the classes
> in the jdbc2
> and xa package.)
>
> =====
> Virtually,        |                   "Must you shout too?"
> Ned Wolpert       |                                  -Dante
> wolpert@yahoo.com |
> _________________/              "Who watches the watchmen?"
> 4e75                                       -Juvenal, 120 AD
>
> -- Place your commercial here --                      fnord
>
> __________________________________________________
> Do You Yahoo!?
> Send your FREE holiday greetings online!
> http://greetings.yahoo.com
>
pgsql-jdbc by date:
From: "Nick Fankhauser"
Date: 03 January 2002, 10:12:30
Subject: Re: can't access database from servlet
From: Ned Wolpert
Date: 03 January 2002, 11:21:20
Subject: Re: PostgresDataSource Question
Re: PostgresDataSource Question - Mailing list pgsql-jdbc

Previous

Next