Thread: protocol change in 7.4
There has been some previous discussion of changing the FE/BE protocol in 7.4, in order to fix several problems. I think this is worth doing: if we can resolve all these issues in a single release, it will lessen the upgrade difficulties for users. I'm aware of the following problems that need a protocol change to fix them: (1) Add an optional textual message to NOTIFY (2) Remove the hard-coded limits on database and user names (SM_USER, SM_DATABASE), replace them with variable-length fields. (3) Remove some legacy elements in the startup packet ('unused' can go -- perhaps 'tty' as well). I thinkthe 'length' field of the password packet is also not used, but I'll need to double-check that. (4) Fix the COPY protocol (Tom?) (5) Fix the Fastpath protocol (Tom?) (6) Protocol-level support for prepared queries, in order to bypass the parser (and maybe be more compatiblewith the implementation of prepared queries in other databases). (7) Include the current transaction status, since it's difficult for the client app to determine it forcertain (Tom/Bruce?) If I've missed anything or if there is something you think we should add, please let me know. I can implement (1), (2), (3), and possibly (7), if someone can tell me exactly what is required (my memory of the discussion relating to this is fuzzy). The rest is up for grabs. Finally, how should we manage the transition? I wasn't around for the earlier protocol changes, so I'd appreciate any input on steps we can take to improve backward-compatibility. Cheers, Neil -- Neil Conway <neilc@samurai.com> || PGP Key ID: DB3C29FC
Neil Conway wrote: > There has been some previous discussion of changing the FE/BE protocol > in 7.4, in order to fix several problems. I think this is worth doing: > if we can resolve all these issues in a single release, it will lessen > the upgrade difficulties for users. <snip> > > If I've missed anything or if there is something you think we should > add, please let me know. Is there any thought about changing the protocol to support two-phase commit? Not that 2PC and distributed transactions would be implemented in 7.4, but to prevent another protocol change in the future? Mike Mascari mascarm@mascari.com
Mike Mascari <mascarm@mascari.com> writes: > Is there any thought about changing the protocol to support > two-phase commit? Not that 2PC and distributed transactions would be > implemented in 7.4, but to prevent another protocol change in the > future? My understanding is that 2PC is one way to implement multi-master replication. If that's what you're referring to, then I'm not sure I see the point: the multi-master replication project (pgreplication) doesn't use 2PC, due to apparent scalability problems (not to mention that it also uses a separate channel for communications between backends on different nodes). Cheers, Neil -- Neil Conway <neilc@samurai.com> || PGP Key ID: DB3C29FC
Neil Conway wrote: > Mike Mascari <mascarm@mascari.com> writes: > >>Is there any thought about changing the protocol to support >>two-phase commit? Not that 2PC and distributed transactions would be >>implemented in 7.4, but to prevent another protocol change in the >>future? > > My understanding is that 2PC is one way to implement multi-master > replication. If that's what you're referring to, then I'm not sure I > see the point: the multi-master replication project (pgreplication) > doesn't use 2PC, due to apparent scalability problems (not to mention > that it also uses a separate channel for communications between > backends on different nodes). Actually, I was thinking along the lines of a true CREATE DATABASE LINK implementation, where multiple databases could participate in a distributed transaction. That would require the backend in which the main query is executing to act as the "coordinator" and each of the other participating databases to act as "cohorts". And would require a protocol change to support the PREPARE, COMMIT-VOTE/ABORT-VOTE reply, and an ACK message following the completion of the distributed COMMIT or ABORT. Mike Mascari mascarm@mascari.com
On Mon, Nov 04, 2002 at 08:10:29PM -0500, Mike Mascari wrote: > Actually, I was thinking along the lines of a true CREATE > DATABASE LINK implementation, where multiple databases could > participate in a distributed transaction. That would require the > backend in which the main query is executing to act as the > "coordinator" and each of the other participating databases to > act as "cohorts". And would require a protocol change to support > the PREPARE, COMMIT-VOTE/ABORT-VOTE reply, and an ACK message > following the completion of the distributed COMMIT or ABORT. Right, you need TPC in order for pgsql to participate in transactions that span anything outside the DB proper. A DB link is one example, or an external transaction manager that coordinates DB and filesystem updates, for example. Zope could use this, to coordinate the DB with it's internal object store. Ross
Hi, Mike Mascari <mascarm@mascari.com> wrote: > Is there any thought about changing the protocol to support > two-phase commit? Not that 2PC and distributed transactions > would be implemented in 7.4, but to prevent another protocol > change in the future? I'm now implementing 2PC replication and distributed transaction. My 2PC needs some supports in startup packet to establish a replication session or a recovery session. BTW, my 2PC replication is working, and I'm implementing 2PC recovery now. -- NAGAYASU Satoshi <snaga@snaga.org>
> > > >I'm now implementing 2PC replication and distributed transaction. My 2PC >needs some supports in startup packet to establish a replication session >or a recovery session. > >BTW, my 2PC replication is working, and I'm implementing 2PC recovery now. > I would like to here more about your implementation. Do you have some documentation that I could read? If not, perhaps (if you have the time) you could put together a post describing your work. Like Is it an internal or external solution. Are you sending SQL or tuples in your update messages. How are you handling failure detection? Is this partial or full replication? Please forgive me for asking so many questions, but I'm rather intrigued by database replication. Darren > >
Darren Johnson <darren@up.hrcoxmail.com> wrote: > I would like to here more about your implementation. Do you have some > documentation that I > could read? Documentation is not available, but I have some slides for my presentation. http://snaga.org/pgsql/20021018_2pc.pdf Some answers for your questions may be in these slides. And a current source code is available from: http://snaga.org/pgsql/pgsql-20021025.tgz > If not, perhaps (if you have the time) you could put together a post > describing your work. Like > Is it an internal or external solution. Are you sending SQL or tuples > in your update messages. > How are you handling failure detection? Is this partial or full > replication? It is an internal solution. In 2PC, pre-commit and commit are required. So my implementation has some internal modifications on transaction handling, log recording and else. -- NAGAYASU Satoshi <snaga@snaga.org>
I don't see why 2PC would require any protocol-level change. I would think that the API would be something like BEGIN;issue some commands ...PRECOMMIT;-- if the above does not return an error, thenCOMMIT; In other words, 2PC would require some new commands, but a new command doesn't affect the protocol layer. regards, tom lane
Questions have arisen during discussions about errors relating to how to support error codes without changing the FE/BE protocols. (see TODO.detail/error) Now that the protocol is up for revision, how about supporting sql state strings, error codes, and other information directly in the protocol. Regards, Grant Neil Conway wrote: > There has been some previous discussion of changing the FE/BE protocol > in 7.4, in order to fix several problems. I think this is worth doing: > if we can resolve all these issues in a single release, it will lessen > the upgrade difficulties for users. > > I'm aware of the following problems that need a protocol change to fix > them: > > (1) Add an optional textual message to NOTIFY > > (2) Remove the hard-coded limits on database and user names > (SM_USER, SM_DATABASE), replace them with variable-length > fields. > > (3) Remove some legacy elements in the startup packet > ('unused' can go -- perhaps 'tty' as well). I think the > 'length' field of the password packet is also not used, > but I'll need to double-check that. > > (4) Fix the COPY protocol (Tom?) > > (5) Fix the Fastpath protocol (Tom?) > > (6) Protocol-level support for prepared queries, in order to > bypass the parser (and maybe be more compatible with the > implementation of prepared queries in other databases). > > (7) Include the current transaction status, since it's > difficult for the client app to determine it for certain > (Tom/Bruce?) > > If I've missed anything or if there is something you think we should > add, please let me know. > > I can implement (1), (2), (3), and possibly (7), if someone can tell > me exactly what is required (my memory of the discussion relating to > this is fuzzy). The rest is up for grabs. > > Finally, how should we manage the transition? I wasn't around for the > earlier protocol changes, so I'd appreciate any input on steps we can > take to improve backward-compatibility. > > Cheers, > > Neil >
Grant Finnemore <grantf@guruhut.co.za> writes: > Now that the protocol is up for revision, how about supporting > sql state strings, error codes, and other information directly in > the protocol. Ah, thanks for pointing that out. Error codes would be another thing we can ideally support in 7.4, and we'd need a protocol change to do it properly, AFAICS. IIRC, Peter E. expressed some interest in doing this... Cheers, Neil -- Neil Conway <neilc@samurai.com> || PGP Key ID: DB3C29FC
Tom Lane wrote: > I don't see why 2PC would require any protocol-level change. I would > think that the API would be something like > > BEGIN; > issue some commands ... > PRECOMMIT; > -- if the above does not return an error, then > COMMIT; > > In other words, 2PC would require some new commands, but a new command > doesn't affect the protocol layer. I think a precommit-vote-commit phase of 2PC can be implemented in command-lavel or protocol-level. In command-level 2PC, an user application (or application programmer) must know the DBMS is clustered or not (to use PRECOMMIT command). In protocol-layer 2PC, no new SQL command is required. A precommit-vote-commit phase will be called implicitly. It means an user application can be used without any modification. An application can use a traditional way (BEGIN...COMMIT). So I made my decision to use protocol-layer implementation. It doesn't affect the SQL command layer. -- NAGAYASU Satoshi <snaga@snaga.org>
On Mon, Nov 04, 2002 at 07:22:54PM -0500, Neil Conway wrote: > (1) Add an optional textual message to NOTIFY > > (2) Remove the hard-coded limits on database and user names > (SM_USER, SM_DATABASE), replace them with variable-length > fields. > > (3) Remove some legacy elements in the startup packet > ('unused' can go -- perhaps 'tty' as well). I think the > 'length' field of the password packet is also not used, > but I'll need to double-check that. > > (4) Fix the COPY protocol (Tom?) > > (5) Fix the Fastpath protocol (Tom?) > > (6) Protocol-level support for prepared queries, in order to > bypass the parser (and maybe be more compatible with the > implementation of prepared queries in other databases). > > (7) Include the current transaction status, since it's > difficult for the client app to determine it for certain > (Tom/Bruce?) (8) Error codes (maybe needn't change protocol) - without this is PostgreSQL useless in real DB aplication (9) Think about full dynamic charset encoding (add new encoding on the fly) Karel -- Karel Zak <zakkr@zf.jcu.cz>http://home.zf.jcu.cz/~zakkr/C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz
<br /><font face="sans-serif" size="2">On 11/05/2002 04:42:55 AM Neil Conway wrote:<br /> > Mike Mascari <mascarm@mascari.com>writes:<br /> > > Is there any thought about changing the protocol to support<br /> >> two-phase commit? Not that 2PC and distributed transactions would be<br /> > > implemented in 7.4, but toprevent another protocol change in the<br /> > > future?<br /> > <br /> > My understanding is that 2PC is oneway to implement multi-master<br /> > replication. If that's what you're referring to, then I'm not sure I<br /></font><br/><font face="sans-serif" size="2">Another use of two-phase commit is in messaging middleware (MOM, message orientedmiddleware), were both the middleware and the database participate in the same transaction. Consider:</font><br /><br/><font face="sans-serif" size="2">- DB: begin</font><br /><font face="sans-serif" size="2">- MOM: begin</font><br /><fontface="sans-serif" size="2">- DB: insert</font><br /><font face="sans-serif" size="2">- MOM: send message</font><br/><font face="sans-serif" size="2">- DB: prepare</font><br /><font face="sans-serif" size="2">- MOM: prepare==> fails</font><br /><font face="sans-serif" size="2">- DB: rollback</font><br /><font face="sans-serif" size="2">-MOM: rollback</font><br /><br /><font face="sans-serif" size="2">just a simple example...</font><br /><br /><fontface="sans-serif" size="2">Maarten</font> <code><font size="3"><br /><br /> ----------------------------------------------------------------<br /> Visit our Internet site at http://www.reuters.com<br/><br /> Get closer to the financial markets with Reuters Messaging - for more<br /> informationand to register, visit http://www.reuters.com/messaging<br /><br /> Any views expressed in this message are thoseof the individual<br /> sender, except where the sender specifically states them to be<br /> the views of Reuters Ltd.<br/></font></code>
Satoshi Nagayasu kirjutas T, 05.11.2002 kell 08:05: > Tom Lane wrote: > > I don't see why 2PC would require any protocol-level change. I would > > think that the API would be something like > > > > BEGIN; > > issue some commands ... > > PRECOMMIT; > > -- if the above does not return an error, then > > COMMIT; > > > > In other words, 2PC would require some new commands, but a new command > > doesn't affect the protocol layer. > > I think a precommit-vote-commit phase of 2PC can be implemented in > command-lavel or protocol-level. > > In command-level 2PC, an user application (or application programmer) > must know the DBMS is clustered or not (to use PRECOMMIT command). > > In protocol-layer 2PC, no new SQL command is required. > A precommit-vote-commit phase will be called implicitly. It means an > user application can be used without any modification. An application > can use a traditional way (BEGIN...COMMIT). If application continues to use just BEGIN/COMMIT, then the protocol level must parse command stream and recognize COMMIT in order to replace it with PRECOMMIT, COMMIT. If the communication library has to do that anyway, it could still do the replacement without affecting wire protocol, no ? ------------------ Hannu
Hannu Krosing <hannu@tm.ee> wrote: > > I think a precommit-vote-commit phase of 2PC can be implemented in > > command-lavel or protocol-level. > > > > In command-level 2PC, an user application (or application programmer) > > must know the DBMS is clustered or not (to use PRECOMMIT command). > > > > In protocol-layer 2PC, no new SQL command is required. > > A precommit-vote-commit phase will be called implicitly. It means an > > user application can be used without any modification. An application > > can use a traditional way (BEGIN...COMMIT). > > If application continues to use just BEGIN/COMMIT, then the protocol > level must parse command stream and recognize COMMIT in order to replace > it with PRECOMMIT, COMMIT. > > If the communication library has to do that anyway, it could still do > the replacement without affecting wire protocol, no ? In my implementation, 'the extended(2PC) FE/BE protocol' is used only in the communication between the master and slave server(s), not between a client app and the master server. libpq <--Normal FE/BE--> (master)postgres <--Extended(2PC)FE/BE--> (slave)postgres <--Extended(2PC)FE/BE--> (slave)postgres <--Extended(2PC)FE/BE--> (slave)postgres A client application and client's libpq can work continuously without any modification. This is very important. And protocol modification between master and slave server(s) is not so serious issue (I think). -- NAGAYASU Satoshi <snaga@snaga.org>
On Tue, Nov 05, 2002 at 08:54:46PM +0900, Satoshi Nagayasu wrote: > > > Hannu Krosing <hannu@tm.ee> wrote: > > > > > > In protocol-layer 2PC, no new SQL command is required. > > > A precommit-vote-commit phase will be called implicitly. It means an > > > user application can be used without any modification. An application > > > can use a traditional way (BEGIN...COMMIT). > > > > If application continues to use just BEGIN/COMMIT, then the protocol > > level must parse command stream and recognize COMMIT in order to replace > > it with PRECOMMIT, COMMIT. > > > > If the communication library has to do that anyway, it could still do > > the replacement without affecting wire protocol, no ? No, I think Satoshi is suggesting that from the client's point of view, he's embedded the entire precommit-vote-commit cycle inside the COMMIT command. > In my implementation, 'the extended(2PC) FE/BE protocol' is used only in > the communication between the master and slave server(s), not between a > client app and the master server. > > libpq <--Normal FE/BE--> (master)postgres <--Extended(2PC)FE/BE--> (slave)postgres > <--Extended(2PC)FE/BE--> (slave)postgres > <--Extended(2PC)FE/BE--> (slave)postgres > > A client application and client's libpq can work continuously without > any modification. This is very important. And protocol modification > between master and slave server(s) is not so serious issue (I think). > Ah, but this limits your use of 2PC to transparent DB replication - since the client doesn't have access to the PRECOMMIT phase (usually called prepare phase, but that's anothor overloaded term in the DB world!) it _can't_ serve as the transaction master, so the other use cases that people have mentioned here (zope, MOMs, etc.) wouldn't be possible. Hmm, unless a connection can be switched into 2PC mode, so something other than a postgresql server can act as the transaction master. Does your implementation cascade? Can slaves have slaves? Ross
"Ross J. Reedstrom" <reedstrm@rice.edu> wrote: > > > If application continues to use just BEGIN/COMMIT, then the protocol > > > level must parse command stream and recognize COMMIT in order to replace > > > it with PRECOMMIT, COMMIT. > > > > > > If the communication library has to do that anyway, it could still do > > > the replacement without affecting wire protocol, no ? > > No, I think Satoshi is suggesting that from the client's point of view, > he's embedded the entire precommit-vote-commit cycle inside the COMMIT > command. Exactly. When user send the COMMIT command to the master server, the master.talks to the slaves to process precommit-vote-commit using the 2PC. The 2PC cycle is hidden from user application. User application just talks the normal FE/BE protocol. > > > In my implementation, 'the extended(2PC) FE/BE protocol' is used only in > > the communication between the master and slave server(s), not between a > > client app and the master server. > > > > libpq <--Normal FE/BE--> (master)postgres <--Extended(2PC)FE/BE--> (slave)postgres > > <--Extended(2PC)FE/BE--> (slave)postgres > > <--Extended(2PC)FE/BE--> (slave)postgres > > > > A client application and client's libpq can work continuously without > > any modification. This is very important. And protocol modification > > between master and slave server(s) is not so serious issue (I think). > > > > Ah, but this limits your use of 2PC to transparent DB replication - since > the client doesn't have access to the PRECOMMIT phase (usually called > prepare phase, but that's anothor overloaded term in the DB world!) it > _can't_ serve as the transaction master, so the other use cases that > people have mentioned here (zope, MOMs, etc.) wouldn't be possible. > > Hmm, unless a connection can be switched into 2PC mode, so something > other than a postgresql server can act as the transaction master. I think the client should not act as the transaction master. But if it is needed, the client can talk to postgres servers with the extended 2PC FE/BE protocol. Because all postgres servers(master and slave) can understand both the normal FE/BE protocol and the extended 2PC FE/BE protocol. It is switched in the startup packet. See 10 page. http://snaga.org/pgsql/20021018_2pc.pdf I embeded 'the connection type' in the startup packet to switch postgres backend's behavior (normal FE/BE protocol or 2PC FE/BE protocol). In current implementation, if the connection type is 'R', it is handled as the 2PC FE/BE connection (replication connection). > Does your implementation cascade? Can slaves have slaves? It is not implemented, but I hope so. :-) And I think it is not so difficult. -- NAGAYASU Satoshi <snaga@snaga.org>
Satoshi Nagayasu kirjutas K, 06.11.2002 kell 04:15: > > > "Ross J. Reedstrom" <reedstrm@rice.edu> wrote: > > > > If application continues to use just BEGIN/COMMIT, then the protocol > > > > level must parse command stream and recognize COMMIT in order to replace > > > > it with PRECOMMIT, COMMIT. > > > > > > > > If the communication library has to do that anyway, it could still do > > > > the replacement without affecting wire protocol, no ? > > > > No, I think Satoshi is suggesting that from the client's point of view, > > he's embedded the entire precommit-vote-commit cycle inside the COMMIT > > command. > > Exactly. When user send the COMMIT command to the master server, the > master.talks to the slaves to process precommit-vote-commit using the > 2PC. The 2PC cycle is hidden from user application. User application > just talks the normal FE/BE protocol. But _can_ client (libpq/jdbc/...) also talk 2PC FE/BE protocol, i.e. act as "master" ? > > > In my implementation, 'the extended(2PC) FE/BE protocol' is used only in > > > the communication between the master and slave server(s), not between a > > > client app and the master server. > > > > > > libpq <--Normal FE/BE--> (master)postgres <--Extended(2PC)FE/BE--> (slave)postgres > > > <--Extended(2PC)FE/BE--> (slave)postgres > > > <--Extended(2PC)FE/BE--> (slave)postgres > > > > > > A client application and client's libpq can work continuously without > > > any modification. This is very important. And protocol modification > > > between master and slave server(s) is not so serious issue (I think). > > > > > > > Ah, but this limits your use of 2PC to transparent DB replication - since > > the client doesn't have access to the PRECOMMIT phase (usually called > > prepare phase, but that's anothor overloaded term in the DB world!) it > > _can't_ serve as the transaction master, so the other use cases that > > people have mentioned here (zope, MOMs, etc.) wouldn't be possible. > > > > Hmm, unless a connection can be switched into 2PC mode, so something > > other than a postgresql server can act as the transaction master. > > I think the client should not act as the transaction master. But if it > is needed, the client can talk to postgres servers with the extended 2PC > FE/BE protocol. > > Because all postgres servers(master and slave) can understand both the > normal FE/BE protocol and the extended 2PC FE/BE protocol. It is > switched in the startup packet. Why is the protocol change neccessary ? Is there some fundamental reason that the slave backends can't just wait and see if the first "commit" command is PRECOMMIT or COMMIT and then act accordingly on for each transaction ? ----------------- Hannu
Neil Conway wrote: > (6) Protocol-level support for prepared queries, in order to > bypass the parser (and maybe be more compatible with the > implementation of prepared queries in other databases). Let me add (6b) Protocol level support for query parameters. This would actuallymake (6) more powerful and speed up nonprepared (but similar)queries via the query cache (which is already there IIRC).[I talk about <statement> USING :var... ] (n) Platform independant binary representation of parameters andresults (like in CORBA). This can _really_ speed upcommunicationwith compiled programs if you take the time toimplement it. This was previously planned for a futureCORBAfe/be protocol, but this does not seem to come any timesoon. (n+1) Optional additional Result qualifiers. E.g. dynamic embeddedsql has aflag to indicate that this column is a key.Previously it wasimpossible to set this flag to a meaningful value. Alsothe standard has additional statistical informationabout thesize of the column etc. If it's unclear what I'm talking aboutI will look up the exact location in thestandard (it's embeddedsql, dynamic sql, get descriptor) Yours Christof
Christof Petig wrote: > Neil Conway wrote: > >> (6) Protocol-level support for prepared queries, in order to >> bypass the parser (and maybe be more compatible with the >> implementation of prepared queries in other databases). > > > Let me add > (6b) Protocol level support for query parameters. This would actually > make (6) more powerful and speed up non prepared (but similar) > queries via the query cache (which is already there IIRC). > [I talk about <statement> USING :var ... ] > > (n) Platform independant binary representation of parameters and > results (like in CORBA). This can _really_ speed up > communication with compiled programs if you take the time to > implement it. This was previously planned for a future > CORBA fe/be protocol, but this does not seem to come any time > soon. After one night's sleep I think that perhaps a CORBA based protocol might be less work (but I have no idea about a decent authentification schema, I'd tend to reuse the already authentificated stream). A corbaized query-only interface might easily cover these issues and be less work than a full corba backend access. JDBC (I don't know much about it) might give a reasonable interface design (perhaps combined with a libpq[++|xx] like interface if there's benefit to it). > (n+1) Optional additional Result qualifiers. E.g. dynamic embedded > sql has a > flag to indicate that this column is a key. Previously it was > impossible to set this flag to a meaningful value. Also > the standard has additional statistical information about the > size of the column etc. If it's unclear what I'm talking about > I will look up the exact location in the standard (it's embedded > sql, dynamic sql, get descriptor) This does not need an implementation soon. But the new protocol should allow future things like this. All these proposals are motivated by (future) ecpg [C/C++] needs. So IMHO the ODBC, JDBC, libpqxx people might be interested in many of these issues, too. We definitely should make sure to have asked them. Yours Christof
Hannu Krosing <hannu@tm.ee> wrote: > > Exactly. When user send the COMMIT command to the master server, the > > master.talks to the slaves to process precommit-vote-commit using the > > 2PC. The 2PC cycle is hidden from user application. User application > > just talks the normal FE/BE protocol. > > But _can_ client (libpq/jdbc/...) also talk 2PC FE/BE protocol, i.e. act > as "master" ? Not for now. The current libpq/jdbc can talk only normal FE/BE protocol. But it can be implemented. Because my (experimantal)libpq can talk 2PC FE/BE protocol. :-) > > > > I think the client should not act as the transaction master. But if it > > is needed, the client can talk to postgres servers with the extended 2PC > > FE/BE protocol. > > > > Because all postgres servers(master and slave) can understand both the > > normal FE/BE protocol and the extended 2PC FE/BE protocol. It is > > switched in the startup packet. > > Why is the protocol change neccessary ? Because the postgres backend must detect a type of incomming connection (from the client app or the master). If it is comming from the client, the backend relays the queries to the slaves (act as the master). But if it is comming from the master server, the backend must act as a slave, and does not relay the queries. How the backend detect them in the multi-master replication? Detecting inside the start packet is a simple way. My implementation is working without protocol modification, because the session type information is embeded in the 'unused' field now. So the backend can understand both the normal FE/BE protocol and the extended 2PC FE/BE protocol. But if the unused field is removed in 7.4, my replication will not work. I think there are several types of connection in the sync replication or the distributed transaction. Especially, the bulk transfer of tables or indexes will be neccesary for the distributed query in future. So, I think embedding the connection type information in the startup packet is a good idea. > > Is there some fundamental reason that the slave backends can't just wait > and see if the first "commit" command is PRECOMMIT or COMMIT and then > act accordingly on for each transaction ? Are two "commit" commands required on the clustered postgres? And is one "commit" command required on the single postgres? I think it will confuse the application programmer. -- NAGAYASU Satoshi <snaga@snaga.org>
"Ross J. Reedstrom" <reedstrm@rice.edu> wrote: > > Because the postgres backend must detect a type of incomming connection > > (from the client app or the master). > > > > If it is comming from the client, the backend relays the queries to the > > slaves (act as the master). > > > > But if it is comming from the master server, the backend must act as a > > slave, and does not relay the queries. > > So, your replication is an all-or-nothing type of thing? you can't > replicate some tables and not others? If only some tables are replicated, > then you can't decide if this is a distributed transaction until it's > been parsed. Yes. My current replication implementation is 'query based' replication, so all queries to the database (except SELECT command) are replicated. The database will be completely replicated, not partial. I know this 'query based' design can't be used for a distributed transaction. I think more internal communication between distributed servers is required. We need 'the partial transfer of tables', 'the bulk transfer of the index' or something like that for a distributed transaction. I'm working for it now. As I said, several connection types, a client application connection, an internal transfer connection or a recovery connection, will be required on replication and distributed transaction in near future. Embedding connection types in the startup packet is a good way to decide how the backend should behave. It is simple and extendable, isn't it? If the backend can't understand the incoming connection type, the backend will answer "I can't understand." and need only disconnect it. > > Also, if we want to cascade, then one server can be both master and slave, > as it were. For full-on-2PC, I'm not sure cascading is a good idea, but > it's something to consider, especially if there's provisions for partial > replication, or 'optional' slaves. Yes. There are several implementation designs for replication. Sync or async, pre- or post-, full or partial, query-level or I/O-level or journal-level. I think there is no "best way" for replication, because applications have different requirements in different situations. So the protocol should be more extendable. > I think Hannu is suggesting that COMMIT could occur from either of two > states in the transaction state diagram: from an open transaction, or > from PRECOMMIT. There's no need to determine before that moment if > this particular transaction is part of a 2PC or not, is there? So, no > you don't _require_ PRECOMMIT/COMMIT because it's clustered: if a > 'bare' COMMIT shows up, do what you currently do: hide the details. > If a PRECOMMIT shows up, report status back to the 'client'. After status is returned, what does the 'client' do? Should the client talk the 2PC protocol? For example, if the database is replicated in 8 servers, does the client application keep 8 connections for each server? Is this good? -- NAGAYASU Satoshi <snaga@snaga.org>
Hi all, Mike Mascari <mascarm@mascari.com> wrote: > Is there any thought about changing the protocol to support > two-phase commit? Not that 2PC and distributed transactions > would be implemented in 7.4, but to prevent another protocol > change in the future? I'm now implementing 2PC replication and distributed transaction. My 2PC needs some support in startup packet to establish a replication session and a recovery session. BTW, 2PC replication is working, and I'm implementing 2PC recovery now. -- NAGAYASU Satoshi <snaga@snaga.org>
On Wed, Nov 06, 2002 at 05:02:14PM +0900, Satoshi Nagayasu wrote: > Hannu Krosing <hannu@tm.ee> wrote: > > > Exactly. When user send the COMMIT command to the master server, the > > > master.talks to the slaves to process precommit-vote-commit using the > > > 2PC. The 2PC cycle is hidden from user application. User application > > > just talks the normal FE/BE protocol. > > > > But _can_ client (libpq/jdbc/...) also talk 2PC FE/BE protocol, i.e. act > > as "master" ? > > Not for now. The current libpq/jdbc can talk only normal FE/BE protocol. > But it can be implemented. > > Because my (experimantal)libpq can talk 2PC FE/BE protocol. :-) <snip> > Because the postgres backend must detect a type of incomming connection > (from the client app or the master). > > If it is comming from the client, the backend relays the queries to the > slaves (act as the master). > > But if it is comming from the master server, the backend must act as a > slave, and does not relay the queries. So, your replication is an all-or-nothing type of thing? you can't replicate some tables and not others? If only some tables are replicated, then you can't decide if this is a distributed transaction until it's been parsed. Also, if we want to cascade, then one server can be both master and slave, as it were. For full-on-2PC, I'm not sure cascading is a good idea, but it's something to consider, especially if there's provisions for partial replication, or 'optional' slaves. > > I think there are several types of connection in the sync replication or > the distributed transaction. Especially, the bulk transfer of tables or > indexes will be neccesary for the distributed query in future. > > So, I think embedding the connection type information in the startup > packet is a good idea. > > > > > Is there some fundamental reason that the slave backends can't just wait > > and see if the first "commit" command is PRECOMMIT or COMMIT and then > > act accordingly on for each transaction ? > > Are two "commit" commands required on the clustered postgres? > And is one "commit" command required on the single postgres? I think Hannu is suggesting that COMMIT could occur from either of two states in the transaction state diagram: from an open transaction, or from PRECOMMIT. There's no need to determine before that moment if this particular transaction is part of a 2PC or not, is there? So, no you don't _require_ PRECOMMIT/COMMIT because it's clustered: if a 'bare' COMMIT shows up, do what you currently do: hide the details. If a PRECOMMIT shows up, report status back to the 'client'. So, it seems to me that the minimum protocol change necessary to support this model is reporting the current transaction status to the client. > I think it will confuse the application programmer. I think your mental image of an application programmer needsto be expanded: it should also include middleware vendors, who very much want to be able to control a distributed transaction, one part of which may be a postgresql replicated cluster. Ross
Is exists patch for 7.4devel ? regards Haris Peco On Tuesday 05 November 2002 01:14 am, Satoshi Nagayasu wrote: > Hi all, > > Mike Mascari <mascarm@mascari.com> wrote: > > Is there any thought about changing the protocol to support > > two-phase commit? Not that 2PC and distributed transactions > > would be implemented in 7.4, but to prevent another protocol > > change in the future? > > I'm now implementing 2PC replication and distributed transaction. My 2PC > needs some support in startup packet to establish a replication session > and a recovery session. > > BTW, 2PC replication is working, and I'm implementing 2PC recovery now.
>There has been some previous discussion of changing the FE/BE protocol >in 7.4, in order to fix several problems. I think this is worth doing: >if we can resolve all these issues in a single release, it will lessen >the upgrade difficulties for users. Here are a couple of other changes you might consider (maybe these changes already exist and I just don't know about them): a) Make much of the metadata sent to the client optional. When I execute 20 fetches against the same cursor, I don't need the same metadata 20 times. For narrow result sets, the metadata can easily double or triple the number of bytes sent across the net. It looks like the protocol needs the field count, but everything else seems to be sent for the convenience of the client application. b) Send a decoded version of atttypmod - specifically, decode the precision and scale for numeric types.
> b) Send a decoded version of atttypmod - specifically, decode the > precision and scale for numeric types. > I want decode type,length,precision and scale regards Haris Peco
> > b) Send a decoded version of atttypmod - specifically, decode the > > precision and scale for numeric types. > > >I want decode type,length,precision and scale Type is returned by PQftype(), length is returned by PQfsize(). Precision and scale are encoded in the return value from PQfmod() and you have to have a magic decoder ring to understand them. (Magic decoder rings are available, you just have to read the source code :-) PQftype() is not easy to use because it returns an OID instead of a name (or a standardized symbol), but I can't think of anything better to return to the client. Of course if you really want to make use of PQftype(), you can preload a client-side cache of type definitions. I seem to remember seeing a patch a while back that would build the cache and decode precision and scale too. PQfsize() is entertaining, but not often what you really want (you really want the width of the widest value in the column after conversion to some string format - it seems reasonable to let the client applicatin worry about that, although maybe that would be a useful client-side libpq function).
On Thursday 07 November 2002 09:50 pm, korry wrote: > > > b) Send a decoded version of atttypmod - specifically, decode the > > > precision and scale for numeric types. > > > >I want decode type,length,precision and scale > > Type is returned by PQftype(), length is returned by PQfsize(). Precision > and scale are encoded in the return value from PQfmod() and you have to > have a magic decoder ring to understand them. (Magic decoder rings are > available, you just have to read the source code :-) > > PQftype() is not easy to use because it returns an OID instead of a name > (or a standardized symbol), but I can't think of anything better to return > to the client. Of course if you really want to make use of PQftype(), you > can preload a client-side cache of type definitions. I seem to remember > seeing a patch a while back that would build the cache and decode precision > and scale too. > > PQfsize() is entertaining, but not often what you really want (you really > want the width of the widest value in the column after conversion to some > string format - it seems reasonable to let the client applicatin worry > about that, although maybe that would be a useful client-side libpq > function). > > I want this in any catalog view regards Haris Peco
> Is exists patch for 7.4devel ? A full tarball is based on 7.3devel. There is no patch for 7.4devel. http://snaga.org/pgsql/ -- NAGAYASU Satoshi <snaga@snaga.org>
snpe kirjutas R, 08.11.2002 kell 03:49: > > PQfsize() is entertaining, but not often what you really want (you really > > want the width of the widest value in the column after conversion to some > > string format - it seems reasonable to let the client applicatin worry > > about that, although maybe that would be a useful client-side libpq > > function). > > > > > I want this in any catalog view But this will make such a view terribly slow, as ith has to do max(length(field)) over the whole table for any field displayed ------------ Hannu
On Friday 08 November 2002 09:29 am, Hannu Krosing wrote: > snpe kirjutas R, 08.11.2002 kell 03:49: > > > PQfsize() is entertaining, but not often what you really want (you > > > really want the width of the widest value in the column after > > > conversion to some string format - it seems reasonable to let the > > > client applicatin worry about that, although maybe that would be a > > > useful client-side libpq function). > > > > I want this in any catalog view > > But this will make such a view terribly slow, as ith has to do > max(length(field)) over the whole table for any field displayed > Why not with this functions ?
I have added the following TODO item on protocol changes: > * Wire Protocol Changes > o Show transaction status in psql > o Allow binding of query parameters, support for prepared queries > o Add optional textual message to NOTIFY > o Remove hard-coded limits on user/db/password names > o Remove unused elements of startup packet (unused, tty, passlength) > o Fix COPY/fastpath protocol? > o Replication support? > o Error codes > o Dynamic character set handling > o Special passing of binary values in platform-neutral format (bytea?) > o ecpg improvements? > o Add decoded type, length, precision --------------------------------------------------------------------------- snpe wrote: > On Thursday 07 November 2002 09:50 pm, korry wrote: > > > > b) Send a decoded version of atttypmod - specifically, decode the > > > > precision and scale for numeric types. > > > > > >I want decode type,length,precision and scale > > > > Type is returned by PQftype(), length is returned by PQfsize(). Precision > > and scale are encoded in the return value from PQfmod() and you have to > > have a magic decoder ring to understand them. (Magic decoder rings are > > available, you just have to read the source code :-) > > > > PQftype() is not easy to use because it returns an OID instead of a name > > (or a standardized symbol), but I can't think of anything better to return > > to the client. Of course if you really want to make use of PQftype(), you > > can preload a client-side cache of type definitions. I seem to remember > > seeing a patch a while back that would build the cache and decode precision > > and scale too. > > > > PQfsize() is entertaining, but not often what you really want (you really > > want the width of the widest value in the column after conversion to some > > string format - it seems reasonable to let the client applicatin worry > > about that, although maybe that would be a useful client-side libpq > > function). > > > > > I want this in any catalog view > > regards > Haris Peco > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073