Re: PostgresDataSource Question - Mailing list pgsql-jdbc
From | Kovács Péter |
---|---|
Subject | Re: PostgresDataSource Question |
Date | |
Msg-id | 8A2DDD7ED7876A4698F6FF204F62CBFC01EC456D@budg112a.sysdata.siemens.hu Whole thread Raw |
In response to | PostgresDataSource Question (Ned Wolpert <wolpert@yahoo.com>) |
Responses |
Re: PostgresDataSource Question
Re: PostgresDataSource Question |
List | pgsql-jdbc |
Ned, I am afraid that (with my effective contribution :-)) things have got mixed up a bit here. So I will try to sort it out. First of all: the XADataSourceImpl does some kind of pooling, but it is probably not intended to implement the functionality you want. I will explain what this sorta pooling does, but before doing so I'd like to make another statement that has not yet been made: support for connection pooling and support for distributed transactions can be implemented separately. So you do not need to bother with the xa package at all if you want to support connection pooling or even if you want _implement_ connection pooling (in my view there is a difference between supporting connection pooling and implementing connection pooling). The reason why I was kind of pushing the xa package is that, people on this list keep talking about incompleteness regarding the Exoffice's xa package -- and they are right from their perspective. But if you look at it in another way, this is good stuff -- and this is clearly my fault that I did not fully explain my perspective. I think that first I should explain my view. What I really need is a way to handle transactions in an environment where there are only local transactions. Furthermore, I would like to do it so that the transaction handling mechanism is abstract enough for me to be able to replace the database server at will. I want it to be so much abstract as to allow me to use not only JDBC but also other interfaces (OODBMs have completely different client interfaces). In a world, where you have only JDBC with local transactions, you can happily pass around Connection instances, each representing exactly one local transaction. But if you want your code more generic, you will want to have something more abstract. What I said above may be obvious, but a brief explanation may be helpful. I am using a simple model here: I partition my application into two layers: business logic layer (BLL) and data access layer (DAL). When I was talking about "more generic code" in the previous paragraph I meant the BLL. Let me further explain what I mean. With any real-life application I will want to start and end transactions in the BLL. [I could implement such interfaces in the DAL, that would span a whole transaction (where atomicity needed) thus obviating the need for the BLL to use transaction demarcation. But in this case part of the business logic would inevitably have to be implemented in the DAL, which is not desirable since it reduces modularity with all the disatvantages that I will not detail here.] My point is to use an abstract notion/type/concept for transaction demarcation in the BLL, so that if I want replace an OODBMs (for example Versant) with an RDMBS (for example PostgreSQL), I will have to change only the data access layer and do not have to touch the BLL. Even though tha JTA has been designed to handle distributed transactions, it also can handle local transactions as well. And if you look at the interface exposed to the application server (TransactionManager), this interface does all what I need, and is completely agnostic of whether the underlying transaction is distributed or local. (Beside the requirement I described above, it also offers the benefit, that I do not have to pass around any transaction objects between function calls, because a transaction can be attached to and detached from a thread.) So why should not I use the JTA, if it does the job (well)??? So I was setting out to create my own implementation of JTA. But how to do it? The JTA defines its versions of XA interface to integrate the resource manager(s). Why should I invent another interface? Most of the RDMBSs already provide implementations of these interfaces anyway. Does it do any harm, if the architecture is also capable of handling distributed transactions. (My strong belief is that: it does not, and it is even good that I can use the same infrastructure for distributed transactions, as my needs and tools advance -- but you may disagree.) ((I actually implemented a mechanism for handling local transactions in an abstract way using a custom API. It was not easy, but you learn a lot from doing this kind of stuff, because you have to go through and find solutions to problems which arise in such an environment. After I thought the implementation was complete and worked nicely, another potential problem came to my mind. The problem was the following. I used ThreadLocal to attach transaction objects to threads. Also, I used CORBA for IPC. Now, every decent CORBA implementation uses thread pools to process incoming requests. What happens --I asked myself--, if the user-programmer forgets to end [commit or rollback] the transaction??? It may take some time before the timer for the transaction expires and will be cleaned up from the thread it has been attached to. During this time, the thread can be reused by the ORB to process another incoming CORBA request, and the implementation that executes in the reused thread will be confused, because it will find that it is part of an ongoing transaction. The clean solution to this is to use a mechanism which is integrated in the CORBA infrastructure. CORBA provides the local interface Current to move around thread specific information. But, I asked myself, if I am already bogged down so deeply in this mess, why should not I use OMG's Object Transaction Service -- and why should not I go the standard way accross the board. And that made it. [Just one small differentiation between standards: some of the standards are open in the sense that you can freely make AND distribute complying implementations, and some of the standards are open in the sense that you can make clean-room implementations, but you cannot distribute complying implementation without further arrangement with the standard's owner/author/...the lawyers know better what. My understanding is that JNDI, JDBC and JTA falls in the former category and EJB and Servlets in the later, but I may be completely wrong.])) My current implementation of the JTA uses the OMG's OTS Version 1.1 (OpenORB Transaction Service 1.2.0). So the transactions are global, but since I make sure that one PostgreSQL connection participates only in transactions in which only connections from the same datasource will participate, the transaction will practically remain local in the sense that there will be no 2pcs. Summary: I need Exoffice's xa package, because it can be used to integrate PostgreSQL into a JTA implementation. I am not interested in how it impelements 2pc, whether it fakes or not or whether it implements 2pc at all. You cannot use PosgtreSQL for 2pc anyway (as we already repeated it too many times). So what kind of pooling is done in XADataSourceImpl? The best way to describe it is going through a scenario. We have the following components: -- DataSource: implemented in the middleware; -- Pool: a pool of connections implemented in the middleware; -- PostgresqlXADataSource: implemented by the jdbc driver. Itself implements org.postgresql.xa.XADataSourceImpl. The application requests a connection from the DataSource. Assume that we're right after startup, so theres nothing in the Pool, so the DataSource will further the request to PostgresqlXADataSource by calling PostgresqlXADataSource.getXAConnection(). This returns an "empty" XAConnectionImpl instance. It is empty in the sense, that it has no physical connection assigned to. The XAConnectionImpl instance is returned to the DataSource. Now there are two possibilities: (1) we're in a global transaction or (2) we're NOT in a global transaction. In case (1) DataSource calls XAConnectionImpl.start() with the XID of the transaction. The result is that XADataSourceImpl a) creates a new physical connection, b) mappes it internall to the XID, c) creates a ClientConnections and returns it to the application. When the application calls methods on the ClientConnection, the physical connection is always retrieved (ultimately through the XID) and is used to do the real job. When the application calls ClientConnection.close() the DataSource gets notified, calls XADataSourceImpl.end(xid, TMSUCCESSFUL) and puts the XADataSourceImpl into the Pool. Calling XADataSourceImpl.end(xid, TMSUCCESSFUL) will have the result that XADataSource will be "emptied", ie. detached from the physical connection (which remains internally mapped to the XID in XADataSourceImpl). At this point the DataSource might think that it has a free connection in the pool, whereas what it has its only a shell, that will be attached next time to a physical connection as needed. There also exists, at this point in time, in the system a physical connection, but it has not been committed, so it is not free, it is tied (internally mapped) to the ongoing transaction. Let's assume that the application does not commit the transaction (TX) and and reuses its XADataSourceImpl in the pool to do work in the same TX. It will enlist XADataSourceImpl via XAConnectionImpl.start(xid,TMRESUME) which will have the result that the physical connection with the open local transaction will be attached back to the (single) XADataSourceImpl instance). Assume that the app calls again ClientConnection.close() and the TX is still open. The XADataSourceImpl instance will be put back into the Pool. Also assume that another app thread in another global TX (TX2) requests a connection from the DataSource. [If the other thread had requested a connection from the DataSource before the first thread called ClientConnection.close(), the DataSource (the Pool being empty) would have had to request a new connection from XADataSourceImpl, which would have resulted in the construction of another instance of XAConnectionImpl. This also a possible scenario, but this is not the case now.] The DataSource takes the XAConnectionImpl instance from the pool and enlists it which will result in XAConnectionImpl.start(xid2,TMRESUME). The XADataSourceImpl will find in its internal map no physical connection with this XID, so it will create a new one (PHC2) and attaches it to our (only) XADataSourceImpl instance. Now only we have only one XADataSourceImpl instance (it was so far always available in the Pool when the DataSource needed one), but there are to physical connection, one which is in use by the second thread as part of TX2, and one which is mapped to the first transaction and is awaiting commit or further use. Now this state is represents the adverse effect of the decoupling of the physical connections from the PooledConnections (XAConnectonImpl) I talked about in one of my previous mails: the DataSource is pooling/handling XAConnectionImpl instances that are only loosly coupled to physical instances. We can probably agree that the main purpose of connection pooling is (a) reuse existing connections and (b) limiting the number of connections being open at a point in time. Now requirement (a) will be always met by the above mechanism, but requirement (b) will be met only over time (on average, if you wish). Now let's say the app in TX2 calls ClientImpl.close() [the DataSource puts the XAConnectionImpl instance back in the Pool] and commits. PHC2 will be commiteded and put (releaseTxConnection) in the internal pool of XADataSourceImpl. Note that this is the first time that a physical connection has been put into the internal pool of XADataSourceImpl. Our first physical connection is still mapped to the first TX and will be put into the internal pool only after the transaction it is mapped to has been committed (and the commit() has successfully been called on the physical connection). It is clear that when a connection is requested from XADataSourceImpl, it will first look for a free one in its internal pool before creating a new one, but this pooling mechanism does not (and in fact, based on the spec, is not supposed to) do anything along lines of meeting pooling requriement (b). I can imagine for example an RDMBS-JDBC driver combination, where physical connections can be effectively detached from and attached to transactions. In such a case, the JDBC driver does not need to implement any internal pooling. The XADataSourceImpl in our case needs to maintain a pool of physical connection (if you wish) per force, because the PostgreSQL implementation does not allow to detach physical connections from transactions. (I do not know the internals of the backend, but I do not think it is impossible [or even very complicated] to implement such a feature and I am not sure how it could be useful anyway.) Peter > -----Original Message----- > From: Ned Wolpert [mailto:wolpert@yahoo.com] > Sent: Thursday, January 03, 2002 2:48 AM > To: Ned Wolpert; Kovács Péter; pgsql-jdbc@postgresql.org > Subject: PostgresDataSource Question > > > Folks- > > I'm re-examing the PostgresDataSource class, and it seems > that I missed > a few things. I need someone to verify what it is I'm > looking at. This is > based on my pooled stuff I submitted eariler, and the current > conversation > that has been going on about my submittal. > > Basically, it seems that the XADataSourceImpl is a working pooling > manager. It is an abstract class, only extended by > PostgresqlDataSource. > The XADataSourceImpl provides the access to the pool from their method > newConnection() and releaseConnection(), neither of which are called > elsewhere. > > It looks like the code was expecting the > org.postgresql.jdbc2.Connection > > object to 'release' it if it was called by the datasource, when the > connection was closed, but the Connection class was never modified. In > short, the pool is almost there already, just not complete. The class > PostgresqlDataSource _can_ pool, it just doesn't. Does this look like > a proper analysis to others? > > I can do one of two things at this point, and I would like people's > opinion as to what I should do. One, I can continue working on my pool > manager, which will extend XADataSourceImpl and will still > have to wrap > the connection classes to notify my pooling manager of changes that > occurs. or Two, create a set of patches that will impact the jdbc2 > package and PostgresDataSource class to finish what was started. > > What do you think folks? I'm starting to lean to option > two, but would > like to hear other people's opinions. If we pick two, that means > that my pooling manager is _part_ of the PostgresDataSource, not a > seperate class. Could some of the CVS committers comment on this? > (Also, I'll be having patches for basically all the classes > in the jdbc2 > and xa package.) > > ===== > Virtually, | "Must you shout too?" > Ned Wolpert | -Dante > wolpert@yahoo.com | > _________________/ "Who watches the watchmen?" > 4e75 -Juvenal, 120 AD > > -- Place your commercial here -- fnord > > __________________________________________________ > Do You Yahoo!? > Send your FREE holiday greetings online! > http://greetings.yahoo.com >
pgsql-jdbc by date: