Pl/Java - next step? - Mailing list pgsql-hackers
From | Thomas Hallgren |
---|---|
Subject | Pl/Java - next step? |
Date | |
Msg-id | c17ae3$dst$1@news.hub.org Whole thread Raw |
Responses |
Re: Pl/Java - next step?
Re: Pl/Java - next step? Re: Pl/Java - next step? |
List | pgsql-hackers |
Two Pl/Java implementations exists today. Due to the architecture of PostgreSQL, compromises have been made in both of them to deal with the fact that each connection lives in its own process. One, I'll call it "Pl/Java_JNI" will spawn a JVM on demand for each connection and the other, "Pl/Java_remote", will spawn at least one JVM that lives in a process of its own and use an inter-process calling mechanism. I can see PostgreSQL moving forward in one of four different directions: 1. Select Pl/Java_JNI. 2. Select Pl/Java_remote 3. Choose both and agree on the SQL + Java semantics 4. Make the postmaster spawn threads rather than processes (controversial? Nah :-) ) As the one behind Pl/Java_JNI I'm perhaps not the most objective person when it comes to choice, but I'll make an effort here and try to list the pros and cons with each choice. My objective is to start a healthy discussion. I think Pl/Java migth boost usability of PostgreSQL quite a bit and with an almost explosive growth of the Java Community its essential that we conclude this sooner rather than later. ** 1. Select Pl/Java_JNI ** #Pros:# - Each call becomes extremely lightweight. JNI is in essence a straight forward in-process function invocation. Minimizing call overhead becomes very important for functions that a) are called very often and b) functions that need to call back into the backend several times. - Minimum resource utilization when passing values. Values can be passed by reference. TriggerData, TupleDesc, HeapTuple, byte arrays etc. need not be copied. Return values can be allocated directly in the correct MemoryContext. - Transaction visibility Using a JDBC driver that's implemented directly on top of SPI ensures that the transaction visibility is correct without the need to either propagate a transaction context or make remote calls back into the backend. - Connection isolation Easy to use since the developer "owns" the whole JVM. There's no need to terminate all connections in order to replace code or to establish a debug session. Migration can take place gradually. - Simplicity No hassle setting up inter-process communication or maintaining a separate JVM. - Modern JVM's are less demanding Sun and other JVM vendors are making serious efforts to make the JVM more adaptable. Java is not used for heavy weight server processing only. Small utility programs become more and more common. Thus, decreasing start-up time and ability to adapt resource consumption have very high priority. Look here what Java 1.5 does http://java.sun.com/j2se/1.5.0/docs/relnotes/features.html#vm. - Well knonw programming envionment JNI is standard. A potential developer of the code have access to on-line training. #Cons:# - Resource consumption. A JVM is expensive from a resource perspective. - Connection start-up time is high. Booting a JVM takes time. Setups where connections that makes invocations to Pl/Java are closed and created frequently will suffer from this. - Java execution model differs from the one used by PostgreSQL Java uses multithreading wether you like it or not. And the JVM will throw exceptions. The Pl/Java_JNI handles this by introducing some macros that a potential developer that makes additions to the port must be aware of. This also introduces limitations for the user of Pl/Java JNI (such as very limited functionality once an error has been generated by the backend). ** 2. Select Pl/Java_remote ** #Pros:# - Each connection becomes fairly lightweight. A connection is represented as a thread in the remote JVM. Threads are much less expensive than a full-blown JVM. - Connection start-up time is low Startup time will be very quick since thread creation is cheap. Even quicker if a thread-pool is utilized. - Reuse of an existing JVM Small systems might use the same JVM to run an app-server as the one used by triggers and functions. Albeit not great from a "separation of concern" perspective, it might be very efficient for special needs. - Ability to run the JVM on another server The JVM can run on a server different from the one running the backend process. If the number of calls are few in relation to the actual work performed in each call, this might be interesting. #Cons:# - RPC calls are slow Call between processes are inherently very slow compared to in-process calls. - RPC resources needed Each connection will need an additional socket or shared memory segment. - Transaction visibility A connection established in the remote JVM must have the same transaction visibility as the invoker. In essence, a transaction context must be propagated to the remote JVM, or the remote JVM must have a JDBC driver that calls back into the backend. - RPC management CORBA or some other mechanism must be installed and maintained. - Starting/Stopping JVM affects all connections Attaching a debugger or generating profiling information implies a restart of the JVM, killing all existing connections that make use of Pl/Java_remote. Code migration implies full stop + restart (The JSR121 Isolation API didn't make it into the 1.5 release). - Complex programming envionment A potential developer of the code base have a lot to learn. The API between backend and Java code is non-standard. ** 3. Choose both and agree on the SQL + Java semantics ** #Pros:# - Best of two worlds The user can decide, depending on his/ her setup, thus gaining optimal performance. - Everyone wins Nobody needs to feel sad when their implementation was rejected. #Cons:# - Might be perceived as a kludge The competitors don't need multiple implementations. Introducing two ways of doing it might be perceived as ways to get around a less then perfect design with uncertainties and choice of another database as the result. - The choice is not evident The user have to make a choice. Sometimes the choice is not evident. - Project synchronization Someone needs to synchronize the projects. - Double effort Almost everything needs to be developed twice since the approaches have fundamental differences. ** 4. Make the postmaster spawn threads rather than processes ** I know this is very controversial and perhaps I should not bring it up at all. But then again, why not? Most readers are open-minded right? #Pros:# - Really best of two words There would be one JVM per postmaster and in-process calls would be used throughout - Other pl<lang> could benefit? Other languages where multithreading is an option could benefit the same way Java does. - Other pros Beyond the scope of the topic. #Cons:# - Code rewrite Right. All PostgreSQL code would need an overhaul. That would be a serious effort to say the least. - Code base selection We'd still need to choose what existing Pl/Java implementation that should be used as base for the in-process + multithreaded implementation. - Other cons Beyond the scope of the topic. What are the next steps? Setting up benchmarking and test performance perhaps? Should not be done my me, nor by the people behind the Pl/Java_remote, but rather by someone who is truly objective. Kind regards Thomas Hallgren
pgsql-hackers by date: