Thread: pg_sequence catalog
While I was hacking around sequence stuff, I felt the urge to look into an old peeve: That sequence metadata is not stored in a proper catalog. Right now in order to find out some metadata about sequences (min, max, increment, etc.), you need to look into the sequence. That is like having to query a table in order to find out about its schema. There are also known issues with the current storage such as that we can't safely update the sequence name stored in the sequence when we rename, so we just don't. This patch introduces a new catalog pg_sequence that stores the sequence metadata. The sequences themselves now only store the counter and supporting information. I don't know if this is a net improvement. Maybe this introduces as many new issues as it removes. But I figured I'll post this, so that at least we can discuss it. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 31 August 2016 at 21:17, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > While I was hacking around sequence stuff, I felt the urge to look into > an old peeve: That sequence metadata is not stored in a proper catalog. > Right now in order to find out some metadata about sequences (min, max, > increment, etc.), you need to look into the sequence. That is like > having to query a table in order to find out about its schema. > > There are also known issues with the current storage such as that we > can't safely update the sequence name stored in the sequence when we > rename, so we just don't. ... and don't have a comment warning the poor confused reader about that, either. As I discovered when doing my first pass at sequence decoding. > I don't know if this is a net improvement. Maybe this introduces as > many new issues as it removes. But I figured I'll post this, so that at > least we can discuss it. This will change behaviour subtly. Probably not in ways we care much about, but I'd rather that be an informed decision than an implicit "oops we didn't think about that" one. Things stored in the Form_pg_sequence are affected immediately, non-transactionally, by ALTER SEQUENCE. If you: BEGIN; ALTER SEQUENCE myseq INTERVAL 10; ROLLBACK; then the *next call to nextval* after the ALTER will step by 10. Well, roughly, there's some slush there due to caching etc. Rolling back has no effect. Yet other effects of ALTER SEQUENCE, notably RENAME, are transactional and are subject to normal locking, visibility and rollback rules. Even more fun, ALTER SEQUENCE ... RESTART is immediate and non-transactional .... but TRUNCATE ... RESTART IDENTITY *is* transactional and takes effect only at commit. ALTER SEQUENCE writes a new Form_pg_sequence with the new value to the existing relfilenode. TRUNCATE instead updates pg_class for the sequence with a new relfilenode and writes its changes to the new relfilenode. So even two operations that seem the same are very different. If understand the proposed change correctly, this change will make previously non-transactional ALTER SEQUENCE operations transactional and subject to normal rules, since the relevant information is now in a proper catalog. Personally I think that's a big improvement. The current situation is warts upon warts. Prior proposals to move sequences away from a one-relation-and-one-filenode-per-sequence model have fallen down in early planning stages. This seems like a simpler, more sensible step to take, and it won't block later cleanups of how we store sequences if we decide to go there. It'll also make it slightly easier to handle logical decoding of sequence advances, since it'll be possible to send *just* the new sequence value. Right now we can't tell if interval, etc, also got changed, and have to send most of the Form_pg_sequence on the wire for every sequence advance, which is a little sucky. This change won't solve the problem I outlined in the other thread though, that sequences are transactional sometimes and not other times. So +1 for the idea from me. It'll just need relnotes warning of the subtle behaviour change. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Craig Ringer <craig@2ndquadrant.com> writes: > On 31 August 2016 at 21:17, Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: >> I don't know if this is a net improvement. Maybe this introduces as >> many new issues as it removes. But I figured I'll post this, so that at >> least we can discuss it. > This will change behaviour subtly. Uh, not as subtly as all that, because "select * from sequence" will now return a different set of columns, which will flat out break a lot of clients that use that method to get sequence properties. In previous discussions of this idea, we had speculated about turning sequences into, effectively, views, so that we could still return all the same columns --- only now some of them would be coming from a catalog rather than the non-transactional per-sequence storage. I'm not sure how feasible that is, but it's worth thinking about. Personally, my big beef with the current approach to sequences is that we eat a whole relation (including a whole relfilenode) per sequence. I wish that we could reduce a sequence to just a single row in a catalog, including the nontransactional state. Not sure how feasible that is either, but accomplishing it would move the benefits of making a change out of the "debatable whether it's worth it" category, IMO. regards, tom lane
I wrote: > Personally, my big beef with the current approach to sequences is that > we eat a whole relation (including a whole relfilenode) per sequence. > I wish that we could reduce a sequence to just a single row in a > catalog, including the nontransactional state. Not sure how feasible > that is either, but accomplishing it would move the benefits of making > a change out of the "debatable whether it's worth it" category, IMO. BTW, another thing to keep in mind here is the ideas that have been kicked around in the past about alternative sequence implementations managed through a "sequence AM API". I dunno whether now is the time to start creating that API abstraction, but let's at least consider it if we're whacking the catalog representation around. regards, tom lane
On 31 August 2016 at 22:01, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Personally, my big beef with the current approach to sequences is that > we eat a whole relation (including a whole relfilenode) per sequence. > I wish that we could reduce a sequence to just a single row in a > catalog, including the nontransactional state. I'd be happy to see incremental improvement in this space as Peter has suggested, though I certainly see the value of something like seqam too. It sounds like you're thinking of something like a normal(ish) heap tuple where we just overwrite some fields in-place without fiddling xmin/xmax and making a new row version. Right? Like we currently overwrite the lone Form_pg_sequence on the 1-page sequence relations. I initially thought that TRUNCATE ... RESTART IDENTITY would be somewhat of a problem with this. We effectively have a temporary "timeline" fork in the sequence value where it's provisionally restarted and we start using values from the restarted sequence within the xact that restarted it. But actually, it'd fit pretty well. TRUNCATE ... RESTART IDENTITY would write a new row version with a new xmin, and set xmax on the old sequence row. nextval(...) within the truncating xact would update the new row's non-transactional fields when it allocated new sequence chunks. On commit, everyone starts using the new row due to normal transactional visibility rules. On rollback everyone ignores it like they would any other dead tuple from an aborted act and uses the old tuple's nontransactional fields. It Just Works(TM). nextval(...) takes AccessShareLock on a sequence relation. TRUNCATE ... RESTART IDENTITY takes AccessExclusiveLock. So we can never have nextval(...) advancing the "old" timeline in other xacts at the same time as we consume values on the restarted sequence inside the xact that did the restarting. We still need the new "timeline" though, because we have to retain the old value for rollback. It feels intuitively pretty gross to effectively dirty-read and write a few fields of a tuple. But that's what we do all the time with xmin/xmax etc, it's not really that different. It'd certainly make sequence decoding easier too. A LOT easier. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 31/08/16 16:10, Tom Lane wrote: > I wrote: >> Personally, my big beef with the current approach to sequences is that >> we eat a whole relation (including a whole relfilenode) per sequence. >> I wish that we could reduce a sequence to just a single row in a >> catalog, including the nontransactional state. Not sure how feasible >> that is either, but accomplishing it would move the benefits of making >> a change out of the "debatable whether it's worth it" category, IMO. > > BTW, another thing to keep in mind here is the ideas that have been > kicked around in the past about alternative sequence implementations > managed through a "sequence AM API". I dunno whether now is the time > to start creating that API abstraction, but let's at least consider > it if we're whacking the catalog representation around. > FWIW if I was going to continue with the sequence AM API, the next patch would have included split of sequence metadata and sequence state into separate catalogs, so from that point this actually seems like an improvement (I didn't look at the code though). As a side note, I don't plan to resurrect the seqam patch at least until we have reasonable built-in logical replication functionality. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Craig Ringer <craig@2ndquadrant.com> writes: > On 31 August 2016 at 22:01, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Personally, my big beef with the current approach to sequences is that >> we eat a whole relation (including a whole relfilenode) per sequence. >> I wish that we could reduce a sequence to just a single row in a >> catalog, including the nontransactional state. > It sounds like you're thinking of something like a normal(ish) heap > tuple where we just overwrite some fields in-place without fiddling > xmin/xmax and making a new row version. Right? Like we currently > overwrite the lone Form_pg_sequence on the 1-page sequence relations. That would be what to do with the nontransactional state. If I recall previous discussions correctly, there's a stumbling block if you want to treat ALTER SEQUENCE changes as transactional --- but maybe that doesn't make sense anyway. If we did want to try that, maybe we need two auxiliary catalogs, one for the transactionally-updatable sequence fields and one for the nontransactional fields. > It feels intuitively pretty gross to effectively dirty-read and write > a few fields of a tuple. But that's what we do all the time with > xmin/xmax etc, it's not really that different. True. I think two rows would work around that, but maybe we don't have to. Another issue is what is the low-level interlock between nextvals in different processes. Right now it's the buffer lock on the sequence's page. With a scheme like this, if we just kept doing that, we'd have a single lock covering probably O(100) different sequences which might lead to contention problems. We could probably improve on that with some thought. regards, tom lane
On 2016-08-31 11:23:27 -0400, Tom Lane wrote: > Another issue is what is the low-level interlock between nextvals > in different processes. Right now it's the buffer lock on the > sequence's page. With a scheme like this, if we just kept doing > that, we'd have a single lock covering probably O(100) different > sequences which might lead to contention problems. We could probably > improve on that with some thought. I was thinking of forcing the rows to be spread to exactly one page per sequence... Andres
Andres Freund wrote: > On 2016-08-31 11:23:27 -0400, Tom Lane wrote: > > Another issue is what is the low-level interlock between nextvals > > in different processes. Right now it's the buffer lock on the > > sequence's page. With a scheme like this, if we just kept doing > > that, we'd have a single lock covering probably O(100) different > > sequences which might lead to contention problems. We could probably > > improve on that with some thought. > > I was thinking of forcing the rows to be spread to exactly one page per > sequence... I was thinking that nextval could grab a shared buffer lock and release immediately, just to ensure no one holds exclusive buffer lock concurrently (which would be used for things like dropping one seq tuple from the page, when a sequence is dropped); then control access to each sequence tuple using LockDatabaseObject. This is a HW lock, heavier than a buffer's LWLock, but it seems better than wasting a full 8kb for each sequence. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2016-08-31 12:56:45 -0300, Alvaro Herrera wrote: > I was thinking that nextval could grab a shared buffer lock and release > immediately, just to ensure no one holds exclusive buffer lock > concurrently (which would be used for things like dropping one seq tuple > from the page, when a sequence is dropped); then control access to each > sequence tuple using LockDatabaseObject. This is a HW lock, heavier > than a buffer's LWLock, but it seems better than wasting a full 8kb for > each sequence. That's going to go be a *lot* slower, I don't think that's ok. I've a hard time worrying about the space waste here; especially considering where we're coming from. Andres
Andres Freund <andres@anarazel.de> writes: > On 2016-08-31 12:56:45 -0300, Alvaro Herrera wrote: >> I was thinking that nextval could grab a shared buffer lock and release >> immediately, just to ensure no one holds exclusive buffer lock >> concurrently (which would be used for things like dropping one seq tuple >> from the page, when a sequence is dropped); then control access to each >> sequence tuple using LockDatabaseObject. This is a HW lock, heavier >> than a buffer's LWLock, but it seems better than wasting a full 8kb for >> each sequence. > That's going to go be a *lot* slower, I don't think that's ok. I've a > hard time worrying about the space waste here; especially considering > where we're coming from. Improving on the space wastage is exactly the point IMO. If it's still going to be 8k per sequence on disk (*and* in shared buffers, remember), I'm not sure it's worth all the work to change things at all. We're already dealing with taking a heavyweight lock for each sequence (the relation AccessShareLock). I wonder whether it'd be possible to repurpose that effort somehow. Another idea would be to have nominally per-sequence LWLocks (or spinlocks?) controlling nextval's nontransactional accesses to the catalog rows, but to map those down to some fixed number of locks in a way similar to the current fallback implementation for spinlocks, which maps them onto a fixed number of semaphores. You'd trade off shared memory against contention while choosing the underlying number of locks. regards, tom lane
Hi, On 2016-08-31 12:53:30 -0400, Tom Lane wrote: > Improving on the space wastage is exactly the point IMO. If it's still > going to be 8k per sequence on disk (*and* in shared buffers, remember), > I'm not sure it's worth all the work to change things at all. A separate file is a heck lot more heavyweight than another 8 kb in an existing file. > Another idea would be to have nominally per-sequence LWLocks (or > spinlocks?) controlling nextval's nontransactional accesses to the catalog > rows, but to map those down to some fixed number of locks in a way similar > to the current fallback implementation for spinlocks, which maps them onto > a fixed number of semaphores. You'd trade off shared memory against > contention while choosing the underlying number of locks. If we could rely on spinlocks to actually be spinlocks, we could just put the spinlock besides the actual state data... Not entirely pretty, but probably pretty simple. - Andres
Andres Freund wrote: > Hi, > > On 2016-08-31 12:53:30 -0400, Tom Lane wrote: > > Improving on the space wastage is exactly the point IMO. If it's still > > going to be 8k per sequence on disk (*and* in shared buffers, remember), > > I'm not sure it's worth all the work to change things at all. > > A separate file is a heck lot more heavyweight than another 8 kb in an > existing file. Yes, sure, we're still improving even if we stick to one-seq-per-bufpage, but while we're at it, we could as well find a way to make it as best as we can. And allowing multiple seqs per page seems a much better situation, so let's try to get there. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2016-08-31 14:25:43 -0300, Alvaro Herrera wrote: > Andres Freund wrote: > > Hi, > > > > On 2016-08-31 12:53:30 -0400, Tom Lane wrote: > > > Improving on the space wastage is exactly the point IMO. If it's still > > > going to be 8k per sequence on disk (*and* in shared buffers, remember), > > > I'm not sure it's worth all the work to change things at all. > > > > A separate file is a heck lot more heavyweight than another 8 kb in an > > existing file. > > Yes, sure, we're still improving even if we stick to one-seq-per-bufpage, > but while we're at it, we could as well find a way to make it as best as > we can. And allowing multiple seqs per page seems a much better > situation, so let's try to get there. It's not really that simple. Having independent sequence rows closer together will have its own performance cost. Suddenly independent sequences will compete for the same page level lock, NUMA systems will have to transfer the page/cacheline around even if it's independent sequences being accessed in different backends, we'll have to take care about cacheline padding...
andres@anarazel.de (Andres Freund) writes: > On 2016-08-31 14:25:43 -0300, Alvaro Herrera wrote: >> Yes, sure, we're still improving even if we stick to one-seq-per-bufpage, >> but while we're at it, we could as well find a way to make it as best as >> we can. And allowing multiple seqs per page seems a much better >> situation, so let's try to get there. > It's not really that simple. Having independent sequence rows closer > together will have its own performance cost. You are ignoring the performance costs associated with eating 100x more shared buffer space than necessary. regards, tom lane
On 2016-08-31 13:59:48 -0400, Tom Lane wrote: > andres@anarazel.de (Andres Freund) writes: > > On 2016-08-31 14:25:43 -0300, Alvaro Herrera wrote: > >> Yes, sure, we're still improving even if we stick to one-seq-per-bufpage, > >> but while we're at it, we could as well find a way to make it as best as > >> we can. And allowing multiple seqs per page seems a much better > >> situation, so let's try to get there. > > > It's not really that simple. Having independent sequence rows closer > > together will have its own performance cost. > > You are ignoring the performance costs associated with eating 100x more > shared buffer space than necessary. I doubt that's measurable in any real-world scenario. You seldomly have hundreds of thousands of sequences that you all select from at a high rate.
Andres Freund <andres@anarazel.de> writes: > On 2016-08-31 13:59:48 -0400, Tom Lane wrote: >> You are ignoring the performance costs associated with eating 100x more >> shared buffer space than necessary. > I doubt that's measurable in any real-world scenario. You seldomly have > hundreds of thousands of sequences that you all select from at a high > rate. If there are only a few sequences in the database, cross-sequence contention is not going to be a big issue anyway. I think most of the point of making this change at all is to make things work better when you do have a boatload of sequences. Also, we could probably afford to add enough dummy padding to the sequence tuples so that they're each exactly 64 bytes, thereby having only one or two active counters per cache line. regards, tom lane
On 2016-08-31 14:23:41 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2016-08-31 13:59:48 -0400, Tom Lane wrote: > >> You are ignoring the performance costs associated with eating 100x more > >> shared buffer space than necessary. > > > I doubt that's measurable in any real-world scenario. You seldomly have > > hundreds of thousands of sequences that you all select from at a high > > rate. > > If there are only a few sequences in the database, cross-sequence > contention is not going to be a big issue anyway. Isn't that *precisely* when it's going to matter? If you have 5 active tables & sequences where the latter previously used independent locks, and they'd now be contending on a single lock. If you have hundreds of thousands of active sequences, I doubt individual page locks would become a point of contention. Andres
On Wed, Aug 31, 2016 at 3:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Uh, not as subtly as all that, because "select * from sequence" will > now return a different set of columns, which will flat out break a > lot of clients that use that method to get sequence properties. So? Clients expect changes like this between major releases surely. Subtle changes that cause silent breakage for end-users seems scarier than unsubtle breakage that tool authors can fix. On Wed, Aug 31, 2016 at 7:30 PM, Andres Freund <andres@anarazel.de> wrote: > Isn't that *precisely* when it's going to matter? If you have 5 active > tables & sequences where the latter previously used independent locks, > and they'd now be contending on a single lock. If you have hundreds of > thousands of active sequences, I doubt individual page locks would > become a point of contention. I think even two sequences could be a point of contention if you, for example, are using COPY to load data into two otherwise completely independent tables in two separate processes. But that just means the row needs to be padded out to a cache line, no? Or was the concern about things like trying to pin the same page, parse the same page header, follow nearby line pointers...? I'm not sure how effective all that caching is today but it doesn't seem impossible to imagine getting that all optimized away. -- greg
On 4 September 2016 at 23:17, Greg Stark <stark@mit.edu> wrote: > On Wed, Aug 31, 2016 at 3:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> >> Uh, not as subtly as all that, because "select * from sequence" will >> now return a different set of columns, which will flat out break a >> lot of clients that use that method to get sequence properties. > > So? Clients expect changes like this between major releases surely. > Subtle changes that cause silent breakage for end-users seems scarier > than unsubtle breakage that tool authors can fix. Agreed; some change in the behaviour if SELECT * FROM sequence is effectively part of this proposal. I was going to make the same comment myself. I think we should balance the problems caused by not having a sequence catalog against the problems caused by changing the currently somewhat broken situation. I vote in favour of applying Peter's idea/patch, though if that is not acceptable I don't think its worth spending further time on for any of us, so if thats the case lets Return with Feedback and move on. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Simon Riggs <simon@2ndquadrant.com> writes: > On 4 September 2016 at 23:17, Greg Stark <stark@mit.edu> wrote: >> So? Clients expect changes like this between major releases surely. >> Subtle changes that cause silent breakage for end-users seems scarier >> than unsubtle breakage that tool authors can fix. > Agreed; some change in the behaviour if SELECT * FROM sequence is > effectively part of this proposal. I was going to make the same > comment myself. Well, if we're going to blow off compatibility on that score, I suggest that we blow it off all the way. Make sequences not be relations anymore, and what you do instead of "SELECT * FROM sequence" is "SELECT * FROM pg_sequences WHERE seqname = 'sequence'". Or more likely, since sequences should still belong to schemas, we need a "regsequence" OID-alias type like "regclass" and you do "SELECT * FROM pg_sequences WHERE oid = 'foo.bar'::regsequence". The main problem I can see with this is that serial columns will have default expressions that are written out as "nextval('foo_f1_seq'::regclass)". I do not think we can afford to break dumps containing that, but I'm not sure how to get the regclass cast replaced with a regsequence cast. regards, tom lane
On September 5, 2016 7:26:42 AM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote: >Simon Riggs <simon@2ndquadrant.com> writes: >> On 4 September 2016 at 23:17, Greg Stark <stark@mit.edu> wrote: >>> So? Clients expect changes like this between major releases surely. >>> Subtle changes that cause silent breakage for end-users seems >scarier >>> than unsubtle breakage that tool authors can fix. > >> Agreed; some change in the behaviour if SELECT * FROM sequence is >> effectively part of this proposal. I was going to make the same >> comment myself. > >Well, if we're going to blow off compatibility on that score, I suggest >that we blow it off all the way. Make sequences not be relations >anymore, >and what you do instead of "SELECT * FROM sequence" is "SELECT * FROM >pg_sequences WHERE seqname = 'sequence'". Or more likely, since >sequences >should still belong to schemas, we need a "regsequence" OID-alias type >like "regclass" and you do "SELECT * FROM pg_sequences WHERE oid = >'foo.bar'::regsequence". > >The main problem I can see with this is that serial columns will >have default expressions that are written out as >"nextval('foo_f1_seq'::regclass)". I do not think we can afford to >break >dumps containing that, but I'm not sure how to get the regclass cast >replaced with a regsequence cast. Why not just continue having a pgclass entry, but no relfilenode? -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Andres Freund <andres@anarazel.de> writes: > On September 5, 2016 7:26:42 AM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The main problem I can see with this is that serial columns will have >> default expressions that are written out as >> "nextval('foo_f1_seq'::regclass)". I do not think we can afford to >> break dumps containing that, but I'm not sure how to get the regclass >> cast replaced with a regsequence cast. > Why not just continue having a pgclass entry, but no relfilenode? Yeah, maybe. I was hoping to dispense with the pg_attribute rows, but maybe that's not enough overhead to worry about. In this viewpoint, we'd keep the sequence-specific data in a pg_sequence catalog. pg_sequence rows would be extensions of the associated pg_class rows in much the same way that pg_index rows extend the pg_class entries for indexes. We should supply a view pg_sequences that performs the implied join, and encourage users to select from that rather than directly from pg_sequence (compare pg_indexes view). regards, tom lane
On 9/5/16 10:35 PM, Tom Lane wrote: > In this viewpoint, we'd keep the sequence-specific data in a pg_sequence > catalog. pg_sequence rows would be extensions of the associated pg_class > rows in much the same way that pg_index rows extend the pg_class entries > for indexes. We should supply a view pg_sequences that performs the > implied join, and encourage users to select from that rather than directly > from pg_sequence (compare pg_indexes view). Let's start with that. Here is a patch that adds a pg_sequences view in the style of pg_tables, pg_indexes, etc. This seems useful independent of anything else, but would give us more freedom to change things around behind the scenes. A slight naming wart: I added a function lastval(regclass) for internal use to get a sequence's "last value". But we already have a public function lastval(), which gets the most recent nextval() result of any sequence. Those are two quite different things. I don't want to abandon the term "last value" here, however, because that is what the sequence relation uses internally, and also Oracle uses it in its system views with the same semantics that I propose here. We could use a more verbose name like sequence_last_value(regclass), perhaps. lastval has been kept separate from pg_sequence_parameters, because if we were to go ahead with a new catalog layout later, then pg_sequence_parameters would become obsolescent while we would possibly still need a lastval function. The column names of the new view have been deliberately tuned to use a more conventional style than the information schema while avoiding what I would consider some past naming mistakes. (For example, I hate "is_cycled", which reads like "sequence has wrapped around at least once in the past"). Here are some similar views in other places: https://docs.oracle.com/cd/B28359_01/server.111/b28320/statviews_2053.htm https://www.ibm.com/support/knowledgecenter/en/SSEPGG_9.7.0/com.ibm.db2.luw.sql.ref.doc/doc/r0004203.html https://msdn.microsoft.com/en-us/library/ff877934.aspx -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Thu, Sep 1, 2016 at 12:00 AM, Andres Freund <andres@anarazel.de> wrote: > On 2016-08-31 14:23:41 -0400, Tom Lane wrote: >> Andres Freund <andres@anarazel.de> writes: >> > On 2016-08-31 13:59:48 -0400, Tom Lane wrote: >> >> You are ignoring the performance costs associated with eating 100x more >> >> shared buffer space than necessary. >> >> > I doubt that's measurable in any real-world scenario. You seldomly have >> > hundreds of thousands of sequences that you all select from at a high >> > rate. >> >> If there are only a few sequences in the database, cross-sequence >> contention is not going to be a big issue anyway. > > Isn't that *precisely* when it's going to matter? If you have 5 active > tables & sequences where the latter previously used independent locks, > and they'd now be contending on a single lock. > I may be missing something here, but why would it contend on a lock, as per locking scheme proposed by Alvaro, access to sequence object will need a share lock on buffer page. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On 2016-09-10 17:23:21 +0530, Amit Kapila wrote: > On Thu, Sep 1, 2016 at 12:00 AM, Andres Freund <andres@anarazel.de> wrote: > > On 2016-08-31 14:23:41 -0400, Tom Lane wrote: > >> Andres Freund <andres@anarazel.de> writes: > >> > On 2016-08-31 13:59:48 -0400, Tom Lane wrote: > >> >> You are ignoring the performance costs associated with eating 100x more > >> >> shared buffer space than necessary. > >> > >> > I doubt that's measurable in any real-world scenario. You seldomly have > >> > hundreds of thousands of sequences that you all select from at a high > >> > rate. > >> > >> If there are only a few sequences in the database, cross-sequence > >> contention is not going to be a big issue anyway. > > > > Isn't that *precisely* when it's going to matter? If you have 5 active > > tables & sequences where the latter previously used independent locks, > > and they'd now be contending on a single lock. > > > > I may be missing something here, but why would it contend on a lock, > as per locking scheme proposed by Alvaro, access to sequence object > will need a share lock on buffer page. To make checkpointing/bgwriter work correctly we need exclusive locks on pages while writing..., or some new lock level preventing the page from being written out, while "shared dirtying" locks are being held. Andres
On Sun, Sep 11, 2016 at 12:39 AM, Andres Freund <andres@anarazel.de> wrote: > On 2016-09-10 17:23:21 +0530, Amit Kapila wrote: >> > >> >> I may be missing something here, but why would it contend on a lock, >> as per locking scheme proposed by Alvaro, access to sequence object >> will need a share lock on buffer page. > > To make checkpointing/bgwriter work correctly we need exclusive locks on > pages while writing..., or some new lock level preventing the page from > being written out, while "shared dirtying" locks are being held. > Right and I think you have a very valid concern, but if we think that storing multiple sequences on a same page is a reasonable approach, then we can invent some locking mechanism as indicated by you such that two writes on same page won't block each other, but they will be blocked with bgwriter/checkpointer. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Sun, Sep 11, 2016 at 2:15 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Sun, Sep 11, 2016 at 12:39 AM, Andres Freund <andres@anarazel.de> wrote: >> On 2016-09-10 17:23:21 +0530, Amit Kapila wrote: >>> > >>> >>> I may be missing something here, but why would it contend on a lock, >>> as per locking scheme proposed by Alvaro, access to sequence object >>> will need a share lock on buffer page. >> >> To make checkpointing/bgwriter work correctly we need exclusive locks on >> pages while writing..., or some new lock level preventing the page from >> being written out, while "shared dirtying" locks are being held. >> > > Right and I think you have a very valid concern, but if we think that > storing multiple sequences on a same page is a reasonable approach, > then we can invent some locking mechanism as indicated by you such > that two writes on same page won't block each other, but they will be > blocked with bgwriter/checkpointer. This thread has died a couple of weeks back, so I am marking it as returned with feedback by seeing the discussion that has been done. Feel free to update the patch if you think that's not adapted. -- Michael
On 9/10/16 7:17 AM, Peter Eisentraut wrote: > Let's start with that. Here is a patch that adds a pg_sequences view in > the style of pg_tables, pg_indexes, etc. This seems useful independent > of anything else, but would give us more freedom to change things around > behind the scenes. I have registered the pg_sequences view patch separately in the upcoming commit fest: https://commitfest.postgresql.org/11/865/ (The posted patch still applies, so no new patch.) Attached is an updated patch for the pg_sequence catalog. I haven't received an actual technical review in the last CF, so I'll move it forward. However, if the pg_sequences view is accepted, the catalog patch will of course need an update. So if someone is looking here with limited time, review the view patch first. As before, this patch requires the "sequences and pg_upgrade" patch in order for pg_upgrade to work. I think together these three changes will make sequence metadata inspection and modification easier and more robust and will facilitate future changes in this area. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Review of the pg_sequences view. This seems like a useful addition to me, making life easier for administrators and monitoring tools. While there is already a view in information_schema it is missing cache_size and last_value. = Functional review - The patch applies and passes the test suite without any issue. - A worry is that it might get a bit confusing to have both the future catalog pg_sequence and the view pg_sequences. - I think it would be useful to include is_cycled in the view. - When creating a temporary sequences and then running "SELECT * FROM pg_sequences" in another session I get the following error. ERROR: cannot access temporary tables of other sessions - Shouldn't last_value be NULL directly after we have created the sequence but nobody has called nextval() yet? - I noticed that last_value includes the cached values, but that also seems to me like the correct thing to do. - I do not like the name of the new function, lastval(regclass). I think like you suggested it would be better with something more verbose. sequence_lastval()? sequence_last_value()? = Code - There is an XXX comment still in the code. It is about the name of the lastval1() function. = Documentation - The documentation does not mention the last_value column. - The extra empty line after "</table>" does not fit with the formatting of the rest of the SGML file. Andreas
On 11/8/16 6:43 PM, Andreas Karlsson wrote: > - A worry is that it might get a bit confusing to have both the future > catalog pg_sequence and the view pg_sequences. We already have this in other cases: pg_index/pg_indexes, pg_user_mapping/pg_user_mappings. It's an established naming system by now. > - I think it would be useful to include is_cycled in the view. It's there under the name "cycle". > - When creating a temporary sequences and then running "SELECT * FROM > pg_sequences" in another session I get the following error. > > ERROR: cannot access temporary tables of other sessions Fixed that by adding pg_is_other_temp_schema() to the view definition. We use that in the information schema but not in the system views so far. That might be worth looking into. > - Shouldn't last_value be NULL directly after we have created the > sequence but nobody has called nextval() yet? > > - I noticed that last_value includes the cached values, but that also > seems to me like the correct thing to do. The documentation now emphasizes that this is the value stored on disk. This matches what Oracle does. > - I do not like the name of the new function, lastval(regclass). I think > like you suggested it would be better with something more verbose. > sequence_lastval()? sequence_last_value()? changed > - There is an XXX comment still in the code. It is about the name of the > lastval1() function. fixed > - The documentation does not mention the last_value column. fixed > - The extra empty line after "</table>" does not fit with the formatting > of the rest of the SGML file. fixed -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 11/10/2016 05:29 AM, Peter Eisentraut wrote: > On 11/8/16 6:43 PM, Andreas Karlsson wrote: >> - A worry is that it might get a bit confusing to have both the future >> catalog pg_sequence and the view pg_sequences. > > We already have this in other cases: pg_index/pg_indexes, > pg_user_mapping/pg_user_mappings. It's an established naming system by now. Then I am fine with it. >> - I think it would be useful to include is_cycled in the view. > > It's there under the name "cycle". You are right, my bad, I managed to confuse myself. >> - Shouldn't last_value be NULL directly after we have created the >> sequence but nobody has called nextval() yet? >> >> - I noticed that last_value includes the cached values, but that also >> seems to me like the correct thing to do. > > The documentation now emphasizes that this is the value stored on disk. > This matches what Oracle does. We also store is_called on disk, I think that if is_called is false then last_value should be NULL. Either that or we should add is_called. I will give the patch another review some time this week when i can find the time. Andreas
On 11/10/2016 06:27 AM, Andreas Karlsson wrote: > On 11/10/2016 05:29 AM, Peter Eisentraut wrote: >> On 11/8/16 6:43 PM, Andreas Karlsson wrote: >>> - Shouldn't last_value be NULL directly after we have created the >>> sequence but nobody has called nextval() yet? >>> >>> - I noticed that last_value includes the cached values, but that also >>> seems to me like the correct thing to do. >> >> The documentation now emphasizes that this is the value stored on disk. >> This matches what Oracle does. > > We also store is_called on disk, I think that if is_called is false then > last_value should be NULL. Either that or we should add is_called. > > I will give the patch another review some time this week when i can find > the time. Other than my comment above about is_called and last_value I think the patch looks great. Andreas
Review for pg_sequence catalog I like this change since it moves all the parts which should be transactional to the system catalogs while keeping the only non-transactional stuff in the sequence relations. There was some discussion upthread about more compact representations for the sequences, but I feel that is a separate issue mostly unrelated to this patch. I might play around more with it but it seems to work well so far. As pointed out by Peter this patch also requires the changes to pg_upgrade. I have not looked at those patches. = Functional review - The patch applies and compiles and seems to work fine after some quick manual testing. - The pg_dump tests fails due to the pg_dump code not being updated. I have attached a patch which fixes this. = Benchmarks I was a bit worried that the extra syscache lookups might slow down nextval(), but I got a measurable speed up on a very simple workload which consisted of only calls to nextval() to one sequence. The speedup was about 10% on my machine. = Code The changes to the code looks generally good. > @@ -1155,6 +1156,8 @@ doDeletion(const ObjectAddress *object, int flags) > else > heap_drop_with_catalog(object->objectId); > } > + if (relKind == RELKIND_SEQUENCE) > + DeleteSequenceTuple(object->objectId); > break; > } I think it might be cleaner here to put this as a "else if" just like "relKind == RELKIND_INDEX". = Documentation The patch does not update catalogs.sgml which it should do. Andreas
Attachment
On 11/11/16 12:53 PM, Andreas Karlsson wrote: > Other than my comment above about is_called and last_value I think the > patch looks great. I have made that change and committed the patch. Thanks. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 11/11/16 10:06 PM, Andreas Karlsson wrote: > As pointed out by Peter this patch also requires the changes to > pg_upgrade. I have not looked at those patches. The required changes to pg_upgrade have been committed, so that will work now. > - The pg_dump tests fails due to the pg_dump code not being updated. I > have attached a patch which fixes this. fixed > I was a bit worried that the extra syscache lookups might slow down > nextval(), but I got a measurable speed up on a very simple workload > which consisted of only calls to nextval() to one sequence. The speedup > was about 10% on my machine. I have done a fair amount of performance testing and the results were usually in the neighborhood of 1%-2% slower or faster, nothing really clear. But running nextval by itself in a tight loop isn't really a normal use. The important thing is the concurrency behavior. > > @@ -1155,6 +1156,8 @@ doDeletion(const ObjectAddress *object, int flags) > > else > > heap_drop_with_catalog(object->objectId); > > } > > + if (relKind == RELKIND_SEQUENCE) > > + DeleteSequenceTuple(object->objectId); > > break; > > } > > I think it might be cleaner here to put this as a "else if" just like > "relKind == RELKIND_INDEX". The sequence tuple has to be deleted in addition to the heap drop. I have added a comment to make the clearer. > The patch does not update catalogs.sgml which it should do. added -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
I think this patch looks good now so I am setting it to ready for committer. I like the idea of the patch and I think that while this change will break some tools which look at the sequence relations I think the advantages are worth it (for example making more sequence DDL respecting MVCC). Andreas
On Fri, Dec 2, 2016 at 1:47 PM, Andreas Karlsson <andreas@proxel.se> wrote:
I think this patch looks good now so I am setting it to ready for committer.
I like the idea of the patch and I think that while this change will break some tools which look at the sequence relations I think the advantages are worth it (for example making more sequence DDL respecting MVCC).
Moved to next CF with the same status (ready for committer).
Regards,
Hari Babu
Fujitsu Australia
On 12/1/16 9:47 PM, Andreas Karlsson wrote: > I think this patch looks good now so I am setting it to ready for committer. committed, thanks -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Dec 20, 2016 at 7:14 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 12/1/16 9:47 PM, Andreas Karlsson wrote: >> I think this patch looks good now so I am setting it to ready for committer. > > committed, thanks The regression tests for hot standby check fails since it uses the following statement: -select min_value as sequence_min_value from hsseq; which is no longer supported I guess. It should be modified as following: select min_value as sequence_min_value from pg_sequences where sequencename = 'hsseq'; Attached is a patch which reflects the above changes. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 1/3/17 7:23 AM, Kuntal Ghosh wrote: > The regression tests for hot standby check fails since it uses the > following statement: > -select min_value as sequence_min_value from hsseq; > which is no longer supported I guess. It should be modified as following: > select min_value as sequence_min_value from pg_sequences where > sequencename = 'hsseq'; > > Attached is a patch which reflects the above changes. Fixed, thanks. I made a note to self to port this test to the TAP framework. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 01/03/2017 03:30 PM, Peter Eisentraut wrote: > On 1/3/17 7:23 AM, Kuntal Ghosh wrote: >> The regression tests for hot standby check fails since it uses the >> following statement: >> -select min_value as sequence_min_value from hsseq; >> which is no longer supported I guess. It should be modified as following: >> select min_value as sequence_min_value from pg_sequences where >> sequencename = 'hsseq'; >> >> Attached is a patch which reflects the above changes. > > Fixed, thanks. > > I made a note to self to port this test to the TAP framework. Hm, doesn't this change the intent of the test case? As I read the test it seems to make sure that we are allowed to do a read from a sequence relation on the slave. If so I think it should be changed to something like the below. select is_called from hsseq; Andreas
On 1/3/17 1:17 PM, Andreas Karlsson wrote: > Hm, doesn't this change the intent of the test case? As I read the test > it seems to make sure that we are allowed to do a read from a sequence > relation on the slave. If so I think it should be changed to something > like the below. > > select is_called from hsseq; You're right. Fixed that way. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services