Thread: pg_sequence catalog

pg_sequence catalog

From

Peter Eisentraut

Date:

31 August 2016, 13:17:18

While I was hacking around sequence stuff, I felt the urge to look into
an old peeve: That sequence metadata is not stored in a proper catalog.
Right now in order to find out some metadata about sequences (min, max,
increment, etc.), you need to look into the sequence.  That is like
having to query a table in order to find out about its schema.

There are also known issues with the current storage such as that we
can't safely update the sequence name stored in the sequence when we
rename, so we just don't.

This patch introduces a new catalog pg_sequence that stores the sequence
metadata.  The sequences themselves now only store the counter and
supporting information.

I don't know if this is a net improvement.  Maybe this introduces as
many new issues as it removes.  But I figured I'll post this, so that at
least we can discuss it.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

pg_sequence.patch

Re: pg_sequence catalog

From

Craig Ringer

Date:

31 August 2016, 13:43:34

On 31 August 2016 at 21:17, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> While I was hacking around sequence stuff, I felt the urge to look into
> an old peeve: That sequence metadata is not stored in a proper catalog.
> Right now in order to find out some metadata about sequences (min, max,
> increment, etc.), you need to look into the sequence.  That is like
> having to query a table in order to find out about its schema.
>
> There are also known issues with the current storage such as that we
> can't safely update the sequence name stored in the sequence when we
> rename, so we just don't.

... and don't have a comment warning the poor confused reader about
that, either. As I discovered when doing my first pass at sequence
decoding.

> I don't know if this is a net improvement.  Maybe this introduces as
> many new issues as it removes.  But I figured I'll post this, so that at
> least we can discuss it.

This will change behaviour subtly. Probably not in ways we care much
about, but I'd rather that be an informed decision than an implicit
"oops we didn't think about that" one.

Things stored in the Form_pg_sequence are affected immediately,
non-transactionally, by ALTER SEQUENCE. If you:

BEGIN;
ALTER SEQUENCE myseq INTERVAL 10;
ROLLBACK;

then the *next call to nextval* after the ALTER will step by 10. Well,
roughly, there's some slush there due to caching etc. Rolling back has
no effect. Yet other effects of ALTER SEQUENCE, notably RENAME, are
transactional and are subject to normal locking, visibility and
rollback rules.

Even more fun, ALTER SEQUENCE ... RESTART is immediate and
non-transactional .... but TRUNCATE ... RESTART IDENTITY *is*
transactional and takes effect only at commit. ALTER SEQUENCE writes a
new Form_pg_sequence with the new value to the existing relfilenode.
TRUNCATE instead updates pg_class for the sequence with a new
relfilenode and writes its changes to the new relfilenode. So even two
operations that seem the same are very different.

If  understand the proposed change correctly, this change will make
previously non-transactional ALTER SEQUENCE operations transactional
and subject to normal rules, since the relevant information is now in
a proper catalog.

Personally I think that's a big improvement. The current situation is
warts upon warts.

Prior proposals to move sequences away from a
one-relation-and-one-filenode-per-sequence model have fallen down in
early planning stages. This seems like a simpler, more sensible step
to take, and it won't block later cleanups of how we store sequences
if we decide to go there.

It'll also make it slightly easier to handle logical decoding of
sequence advances, since it'll be possible to send *just* the new
sequence value. Right now we can't tell if interval, etc, also got
changed, and have to send most of the Form_pg_sequence on  the wire
for every sequence advance, which is a little sucky. This change won't
solve the problem I outlined in the other thread though, that
sequences are transactional sometimes and not other times.

So +1 for the idea from me. It'll just need relnotes warning of the
subtle behaviour change.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: pg_sequence catalog

From

Tom Lane

Date:

31 August 2016, 14:01:57

Craig Ringer <craig@2ndquadrant.com> writes:
> On 31 August 2016 at 21:17, Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
>> I don't know if this is a net improvement.  Maybe this introduces as
>> many new issues as it removes.  But I figured I'll post this, so that at
>> least we can discuss it.

> This will change behaviour subtly.

Uh, not as subtly as all that, because "select * from sequence" will
now return a different set of columns, which will flat out break a
lot of clients that use that method to get sequence properties.

In previous discussions of this idea, we had speculated about turning
sequences into, effectively, views, so that we could still return all
the same columns --- only now some of them would be coming from a
catalog rather than the non-transactional per-sequence storage.
I'm not sure how feasible that is, but it's worth thinking about.

Personally, my big beef with the current approach to sequences is that
we eat a whole relation (including a whole relfilenode) per sequence.
I wish that we could reduce a sequence to just a single row in a
catalog, including the nontransactional state.  Not sure how feasible
that is either, but accomplishing it would move the benefits of making
a change out of the "debatable whether it's worth it" category, IMO.
        regards, tom lane

Re: pg_sequence catalog

From

Tom Lane

Date:

31 August 2016, 14:10:43

I wrote:
> Personally, my big beef with the current approach to sequences is that
> we eat a whole relation (including a whole relfilenode) per sequence.
> I wish that we could reduce a sequence to just a single row in a
> catalog, including the nontransactional state.  Not sure how feasible
> that is either, but accomplishing it would move the benefits of making
> a change out of the "debatable whether it's worth it" category, IMO.

BTW, another thing to keep in mind here is the ideas that have been
kicked around in the past about alternative sequence implementations
managed through a "sequence AM API".  I dunno whether now is the time
to start creating that API abstraction, but let's at least consider
it if we're whacking the catalog representation around.
        regards, tom lane

Re: pg_sequence catalog

From

Craig Ringer

Date:

31 August 2016, 14:35:14

On 31 August 2016 at 22:01, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Personally, my big beef with the current approach to sequences is that
> we eat a whole relation (including a whole relfilenode) per sequence.
> I wish that we could reduce a sequence to just a single row in a
> catalog, including the nontransactional state.

I'd be happy to see incremental improvement in this space as Peter has
suggested, though I certainly see the value of something like seqam
too.

It sounds like you're thinking of something like a normal(ish) heap
tuple where we just overwrite some fields in-place without fiddling
xmin/xmax and making a new row version. Right? Like we currently
overwrite the lone Form_pg_sequence  on the 1-page sequence relations.

I initially thought that TRUNCATE ... RESTART IDENTITY would be
somewhat of a problem with this. We effectively have a temporary
"timeline" fork in the sequence value where it's provisionally
restarted and we start using values from the restarted sequence within
the xact that restarted it. But actually, it'd fit pretty well.
TRUNCATE ... RESTART IDENTITY would write a new row version with a new
xmin, and set xmax on the old sequence row. nextval(...) within the
truncating xact would update the new row's non-transactional fields
when it allocated new sequence chunks. On commit, everyone starts
using the new row due to normal transactional visibility rules. On
rollback everyone ignores it like they would any other dead tuple from
an aborted act and uses the old tuple's nontransactional fields. It
Just Works(TM).

nextval(...) takes AccessShareLock on a sequence relation. TRUNCATE
... RESTART IDENTITY takes AccessExclusiveLock. So we can never have
nextval(...) advancing the "old" timeline in other xacts at the same
time as we consume values on the restarted sequence inside the xact
that did the restarting. We still need the new "timeline" though,
because we have to retain the old value for rollback.

It feels intuitively pretty gross to effectively dirty-read and write
a few fields of a tuple. But that's what we do all the time with
xmin/xmax etc, it's not really that different.

It'd certainly make sequence decoding easier too. A LOT easier.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: pg_sequence catalog

From

Petr Jelinek

Date:

31 August 2016, 14:40:32

On 31/08/16 16:10, Tom Lane wrote:
> I wrote:
>> Personally, my big beef with the current approach to sequences is that
>> we eat a whole relation (including a whole relfilenode) per sequence.
>> I wish that we could reduce a sequence to just a single row in a
>> catalog, including the nontransactional state.  Not sure how feasible
>> that is either, but accomplishing it would move the benefits of making
>> a change out of the "debatable whether it's worth it" category, IMO.
>
> BTW, another thing to keep in mind here is the ideas that have been
> kicked around in the past about alternative sequence implementations
> managed through a "sequence AM API".  I dunno whether now is the time
> to start creating that API abstraction, but let's at least consider
> it if we're whacking the catalog representation around.
>

FWIW if I was going to continue with the sequence AM API, the next patch 
would have included split of sequence metadata and sequence state into 
separate catalogs, so from that point this actually seems like an 
improvement (I didn't look at the code though).

As a side note, I don't plan to resurrect the seqam patch at least until 
we have reasonable built-in logical replication functionality.

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services

Re: pg_sequence catalog

From

Tom Lane

Date:

31 August 2016, 15:23:32

Craig Ringer <craig@2ndquadrant.com> writes:
> On 31 August 2016 at 22:01, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Personally, my big beef with the current approach to sequences is that
>> we eat a whole relation (including a whole relfilenode) per sequence.
>> I wish that we could reduce a sequence to just a single row in a
>> catalog, including the nontransactional state.

> It sounds like you're thinking of something like a normal(ish) heap
> tuple where we just overwrite some fields in-place without fiddling
> xmin/xmax and making a new row version. Right? Like we currently
> overwrite the lone Form_pg_sequence  on the 1-page sequence relations.

That would be what to do with the nontransactional state.  If I recall
previous discussions correctly, there's a stumbling block if you want
to treat ALTER SEQUENCE changes as transactional --- but maybe that
doesn't make sense anyway.  If we did want to try that, maybe we need
two auxiliary catalogs, one for the transactionally-updatable sequence
fields and one for the nontransactional fields.

> It feels intuitively pretty gross to effectively dirty-read and write
> a few fields of a tuple. But that's what we do all the time with
> xmin/xmax etc, it's not really that different.

True.  I think two rows would work around that, but maybe we don't
have to.

Another issue is what is the low-level interlock between nextvals
in different processes.  Right now it's the buffer lock on the
sequence's page.  With a scheme like this, if we just kept doing
that, we'd have a single lock covering probably O(100) different
sequences which might lead to contention problems.  We could probably
improve on that with some thought.
        regards, tom lane

Re: pg_sequence catalog

From

Andres Freund

Date:

31 August 2016, 15:42:04

On 2016-08-31 11:23:27 -0400, Tom Lane wrote:
> Another issue is what is the low-level interlock between nextvals
> in different processes.  Right now it's the buffer lock on the
> sequence's page.  With a scheme like this, if we just kept doing
> that, we'd have a single lock covering probably O(100) different
> sequences which might lead to contention problems.  We could probably
> improve on that with some thought.

I was thinking of forcing the rows to be spread to exactly one page per
sequence...

Andres

Re: pg_sequence catalog

From

Alvaro Herrera

Date:

31 August 2016, 15:57:04

Andres Freund wrote:
> On 2016-08-31 11:23:27 -0400, Tom Lane wrote:
> > Another issue is what is the low-level interlock between nextvals
> > in different processes.  Right now it's the buffer lock on the
> > sequence's page.  With a scheme like this, if we just kept doing
> > that, we'd have a single lock covering probably O(100) different
> > sequences which might lead to contention problems.  We could probably
> > improve on that with some thought.
> 
> I was thinking of forcing the rows to be spread to exactly one page per
> sequence...

I was thinking that nextval could grab a shared buffer lock and release
immediately, just to ensure no one holds exclusive buffer lock
concurrently (which would be used for things like dropping one seq tuple
from the page, when a sequence is dropped); then control access to each
sequence tuple using LockDatabaseObject.  This is a HW lock, heavier
than a buffer's LWLock, but it seems better than wasting a full 8kb for
each sequence.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pg_sequence catalog

From

Andres Freund

Date:

31 August 2016, 16:05:43

On 2016-08-31 12:56:45 -0300, Alvaro Herrera wrote:
> I was thinking that nextval could grab a shared buffer lock and release
> immediately, just to ensure no one holds exclusive buffer lock
> concurrently (which would be used for things like dropping one seq tuple
> from the page, when a sequence is dropped); then control access to each
> sequence tuple using LockDatabaseObject.  This is a HW lock, heavier
> than a buffer's LWLock, but it seems better than wasting a full 8kb for
> each sequence.

That's going to go be a *lot* slower, I don't think that's ok.  I've a
hard time worrying about the space waste here; especially considering
where we're coming from.

Andres

Re: pg_sequence catalog

From

Tom Lane

Date:

31 August 2016, 16:53:37

Andres Freund <andres@anarazel.de> writes:
> On 2016-08-31 12:56:45 -0300, Alvaro Herrera wrote:
>> I was thinking that nextval could grab a shared buffer lock and release
>> immediately, just to ensure no one holds exclusive buffer lock
>> concurrently (which would be used for things like dropping one seq tuple
>> from the page, when a sequence is dropped); then control access to each
>> sequence tuple using LockDatabaseObject.  This is a HW lock, heavier
>> than a buffer's LWLock, but it seems better than wasting a full 8kb for
>> each sequence.

> That's going to go be a *lot* slower, I don't think that's ok.  I've a
> hard time worrying about the space waste here; especially considering
> where we're coming from.

Improving on the space wastage is exactly the point IMO.  If it's still
going to be 8k per sequence on disk (*and* in shared buffers, remember),
I'm not sure it's worth all the work to change things at all.

We're already dealing with taking a heavyweight lock for each sequence
(the relation AccessShareLock).  I wonder whether it'd be possible to
repurpose that effort somehow.

Another idea would be to have nominally per-sequence LWLocks (or
spinlocks?) controlling nextval's nontransactional accesses to the catalog
rows, but to map those down to some fixed number of locks in a way similar
to the current fallback implementation for spinlocks, which maps them onto
a fixed number of semaphores.  You'd trade off shared memory against
contention while choosing the underlying number of locks.
        regards, tom lane

Re: pg_sequence catalog

From

Andres Freund

Date:

31 August 2016, 17:08:52

Hi,

On 2016-08-31 12:53:30 -0400, Tom Lane wrote:
> Improving on the space wastage is exactly the point IMO.  If it's still
> going to be 8k per sequence on disk (*and* in shared buffers, remember),
> I'm not sure it's worth all the work to change things at all.

A separate file is a heck lot more heavyweight than another 8 kb in an
existing file.


> Another idea would be to have nominally per-sequence LWLocks (or
> spinlocks?) controlling nextval's nontransactional accesses to the catalog
> rows, but to map those down to some fixed number of locks in a way similar
> to the current fallback implementation for spinlocks, which maps them onto
> a fixed number of semaphores.  You'd trade off shared memory against
> contention while choosing the underlying number of locks.

If we could rely on spinlocks to actually be spinlocks, we could just
put the spinlock besides the actual state data... Not entirely pretty,
but probably pretty simple.


- Andres

Re: pg_sequence catalog

From

Alvaro Herrera

Date:

31 August 2016, 17:25:56

Andres Freund wrote:
> Hi,
> 
> On 2016-08-31 12:53:30 -0400, Tom Lane wrote:
> > Improving on the space wastage is exactly the point IMO.  If it's still
> > going to be 8k per sequence on disk (*and* in shared buffers, remember),
> > I'm not sure it's worth all the work to change things at all.
> 
> A separate file is a heck lot more heavyweight than another 8 kb in an
> existing file.

Yes, sure, we're still improving even if we stick to one-seq-per-bufpage,
but while we're at it, we could as well find a way to make it as best as
we can.  And allowing multiple seqs per page seems a much better
situation, so let's try to get there.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pg_sequence catalog

From

andres@anarazel.de (Andres Freund)

Date:

31 August 2016, 17:28:09

On 2016-08-31 14:25:43 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> > Hi,
> > 
> > On 2016-08-31 12:53:30 -0400, Tom Lane wrote:
> > > Improving on the space wastage is exactly the point IMO.  If it's still
> > > going to be 8k per sequence on disk (*and* in shared buffers, remember),
> > > I'm not sure it's worth all the work to change things at all.
> > 
> > A separate file is a heck lot more heavyweight than another 8 kb in an
> > existing file.
> 
> Yes, sure, we're still improving even if we stick to one-seq-per-bufpage,
> but while we're at it, we could as well find a way to make it as best as
> we can.  And allowing multiple seqs per page seems a much better
> situation, so let's try to get there.

It's not really that simple. Having independent sequence rows closer
together will have its own performance cost. Suddenly independent
sequences will compete for the same page level lock, NUMA systems will
have to transfer the page/cacheline around even if it's independent
sequences being accessed in different backends, we'll have to take care
about cacheline padding...

Re: pg_sequence catalog

From

Tom Lane

Date:

31 August 2016, 17:59:58

andres@anarazel.de (Andres Freund) writes:
> On 2016-08-31 14:25:43 -0300, Alvaro Herrera wrote:
>> Yes, sure, we're still improving even if we stick to one-seq-per-bufpage,
>> but while we're at it, we could as well find a way to make it as best as
>> we can.  And allowing multiple seqs per page seems a much better
>> situation, so let's try to get there.

> It's not really that simple. Having independent sequence rows closer
> together will have its own performance cost.

You are ignoring the performance costs associated with eating 100x more
shared buffer space than necessary.
        regards, tom lane

Re: pg_sequence catalog

From

Andres Freund

Date:

31 August 2016, 18:01:52

On 2016-08-31 13:59:48 -0400, Tom Lane wrote:
> andres@anarazel.de (Andres Freund) writes:
> > On 2016-08-31 14:25:43 -0300, Alvaro Herrera wrote:
> >> Yes, sure, we're still improving even if we stick to one-seq-per-bufpage,
> >> but while we're at it, we could as well find a way to make it as best as
> >> we can.  And allowing multiple seqs per page seems a much better
> >> situation, so let's try to get there.
> 
> > It's not really that simple. Having independent sequence rows closer
> > together will have its own performance cost.
> 
> You are ignoring the performance costs associated with eating 100x more
> shared buffer space than necessary.

I doubt that's measurable in any real-world scenario. You seldomly have
hundreds of thousands of sequences that you all select from at a high
rate.

Re: pg_sequence catalog

From

Tom Lane

Date:

31 August 2016, 18:23:47

Andres Freund <andres@anarazel.de> writes:
> On 2016-08-31 13:59:48 -0400, Tom Lane wrote:
>> You are ignoring the performance costs associated with eating 100x more
>> shared buffer space than necessary.

> I doubt that's measurable in any real-world scenario. You seldomly have
> hundreds of thousands of sequences that you all select from at a high
> rate.

If there are only a few sequences in the database, cross-sequence
contention is not going to be a big issue anyway.  I think most of
the point of making this change at all is to make things work better
when you do have a boatload of sequences.

Also, we could probably afford to add enough dummy padding to the sequence
tuples so that they're each exactly 64 bytes, thereby having only one
or two active counters per cache line.
        regards, tom lane

Re: pg_sequence catalog

From

Andres Freund

Date:

31 August 2016, 18:30:12

On 2016-08-31 14:23:41 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2016-08-31 13:59:48 -0400, Tom Lane wrote:
> >> You are ignoring the performance costs associated with eating 100x more
> >> shared buffer space than necessary.
> 
> > I doubt that's measurable in any real-world scenario. You seldomly have
> > hundreds of thousands of sequences that you all select from at a high
> > rate.
> 
> If there are only a few sequences in the database, cross-sequence
> contention is not going to be a big issue anyway.

Isn't that *precisely* when it's going to matter? If you have 5 active
tables & sequences where the latter previously used independent locks,
and they'd now be contending on a single lock.  If you have hundreds of
thousands of active sequences, I doubt individual page locks would
become a point of contention.

Andres

Re: pg_sequence catalog

From

Greg Stark

Date:

04 September 2016, 22:17:57

On Wed, Aug 31, 2016 at 3:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Uh, not as subtly as all that, because "select * from sequence" will
> now return a different set of columns, which will flat out break a
> lot of clients that use that method to get sequence properties.

So? Clients expect changes like this between major releases surely.
Subtle changes that cause silent breakage for end-users seems scarier
than unsubtle breakage that tool authors can fix.

On Wed, Aug 31, 2016 at 7:30 PM, Andres Freund <andres@anarazel.de> wrote:
> Isn't that *precisely* when it's going to matter? If you have 5 active
> tables & sequences where the latter previously used independent locks,
> and they'd now be contending on a single lock.  If you have hundreds of
> thousands of active sequences, I doubt individual page locks would
> become a point of contention.

I think even two sequences could be a point of contention if you, for
example, are using COPY to load data into two otherwise completely
independent tables in two separate processes.

But that just means the row needs to be padded out to a cache line,
no? Or was the concern about things like trying to pin the same page,
parse the same page header, follow nearby line pointers...? I'm not
sure how effective all that caching is today but it doesn't seem
impossible to imagine getting that all optimized away.

-- 
greg

Re: pg_sequence catalog

From

Simon Riggs

Date:

05 September 2016, 14:13:20

On 4 September 2016 at 23:17, Greg Stark <stark@mit.edu> wrote:
> On Wed, Aug 31, 2016 at 3:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>> Uh, not as subtly as all that, because "select * from sequence" will
>> now return a different set of columns, which will flat out break a
>> lot of clients that use that method to get sequence properties.
>
> So? Clients expect changes like this between major releases surely.
> Subtle changes that cause silent breakage for end-users seems scarier
> than unsubtle breakage that tool authors can fix.

Agreed; some change in the behaviour if SELECT * FROM sequence is
effectively part of this proposal. I was going to make the same
comment myself.

I think we should balance the problems caused by not having a sequence
catalog against the problems caused by changing the currently somewhat
broken situation.

I vote in favour of applying Peter's idea/patch, though if that is not
acceptable I don't think its worth spending further time on for any of
us, so if thats the case lets Return with Feedback and move on.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pg_sequence catalog

From

Tom Lane

Date:

05 September 2016, 14:27:51

Simon Riggs <simon@2ndquadrant.com> writes:
> On 4 September 2016 at 23:17, Greg Stark <stark@mit.edu> wrote:
>> So? Clients expect changes like this between major releases surely.
>> Subtle changes that cause silent breakage for end-users seems scarier
>> than unsubtle breakage that tool authors can fix.

> Agreed; some change in the behaviour if SELECT * FROM sequence is
> effectively part of this proposal. I was going to make the same
> comment myself.

Well, if we're going to blow off compatibility on that score, I suggest
that we blow it off all the way.  Make sequences not be relations anymore,
and what you do instead of "SELECT * FROM sequence" is "SELECT * FROM
pg_sequences WHERE seqname = 'sequence'".  Or more likely, since sequences
should still belong to schemas, we need a "regsequence" OID-alias type
like "regclass" and you do "SELECT * FROM pg_sequences WHERE oid =
'foo.bar'::regsequence".

The main problem I can see with this is that serial columns will
have default expressions that are written out as
"nextval('foo_f1_seq'::regclass)".  I do not think we can afford to break
dumps containing that, but I'm not sure how to get the regclass cast
replaced with a regsequence cast.
        regards, tom lane

Re: pg_sequence catalog

From

Andres Freund

Date:

05 September 2016, 23:18:55


On September 5, 2016 7:26:42 AM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>Simon Riggs <simon@2ndquadrant.com> writes:
>> On 4 September 2016 at 23:17, Greg Stark <stark@mit.edu> wrote:
>>> So? Clients expect changes like this between major releases surely.
>>> Subtle changes that cause silent breakage for end-users seems
>scarier
>>> than unsubtle breakage that tool authors can fix.
>
>> Agreed; some change in the behaviour if SELECT * FROM sequence is
>> effectively part of this proposal. I was going to make the same
>> comment myself.
>
>Well, if we're going to blow off compatibility on that score, I suggest
>that we blow it off all the way.  Make sequences not be relations
>anymore,
>and what you do instead of "SELECT * FROM sequence" is "SELECT * FROM
>pg_sequences WHERE seqname = 'sequence'".  Or more likely, since
>sequences
>should still belong to schemas, we need a "regsequence" OID-alias type
>like "regclass" and you do "SELECT * FROM pg_sequences WHERE oid =
>'foo.bar'::regsequence".
>
>The main problem I can see with this is that serial columns will
>have default expressions that are written out as
>"nextval('foo_f1_seq'::regclass)".  I do not think we can afford to
>break
>dumps containing that, but I'm not sure how to get the regclass cast
>replaced with a regsequence cast.

Why not just continue having a pgclass entry, but no relfilenode?
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: pg_sequence catalog

From

Tom Lane

Date:

06 September 2016, 02:35:28

Andres Freund <andres@anarazel.de> writes:
> On September 5, 2016 7:26:42 AM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The main problem I can see with this is that serial columns will have
>> default expressions that are written out as
>> "nextval('foo_f1_seq'::regclass)".  I do not think we can afford to
>> break dumps containing that, but I'm not sure how to get the regclass
>> cast replaced with a regsequence cast.

> Why not just continue having a pgclass entry, but no relfilenode?

Yeah, maybe.  I was hoping to dispense with the pg_attribute rows, but
maybe that's not enough overhead to worry about.

In this viewpoint, we'd keep the sequence-specific data in a pg_sequence
catalog.  pg_sequence rows would be extensions of the associated pg_class
rows in much the same way that pg_index rows extend the pg_class entries
for indexes.  We should supply a view pg_sequences that performs the
implied join, and encourage users to select from that rather than directly
from pg_sequence (compare pg_indexes view).
        regards, tom lane

Re: pg_sequence catalog

From

Peter Eisentraut

Date:

10 September 2016, 11:17:55

On 9/5/16 10:35 PM, Tom Lane wrote:
> In this viewpoint, we'd keep the sequence-specific data in a pg_sequence
> catalog.  pg_sequence rows would be extensions of the associated pg_class
> rows in much the same way that pg_index rows extend the pg_class entries
> for indexes.  We should supply a view pg_sequences that performs the
> implied join, and encourage users to select from that rather than directly
> from pg_sequence (compare pg_indexes view).

Let's start with that.  Here is a patch that adds a pg_sequences view in
the style of pg_tables, pg_indexes, etc.  This seems useful independent
of anything else, but would give us more freedom to change things around
behind the scenes.

A slight naming wart: I added a function lastval(regclass) for internal
use to get a sequence's "last value".  But we already have a public
function lastval(), which gets the most recent nextval() result of any
sequence.  Those are two quite different things.  I don't want to
abandon the term "last value" here, however, because that is what the
sequence relation uses internally, and also Oracle uses it in its system
views with the same semantics that I propose here.  We could use a more
verbose name like sequence_last_value(regclass), perhaps.

lastval has been kept separate from pg_sequence_parameters, because if
we were to go ahead with a new catalog layout later, then
pg_sequence_parameters would become obsolescent while we would possibly
still need a lastval function.

The column names of the new view have been deliberately tuned to use a
more conventional style than the information schema while avoiding what
I would consider some past naming mistakes.  (For example, I hate
"is_cycled", which reads like "sequence has wrapped around at least once
in the past").

Here are some similar views in other places:

https://docs.oracle.com/cd/B28359_01/server.111/b28320/statviews_2053.htm
https://www.ibm.com/support/knowledgecenter/en/SSEPGG_9.7.0/com.ibm.db2.luw.sql.ref.doc/doc/r0004203.html
https://msdn.microsoft.com/en-us/library/ff877934.aspx

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

0001-Add-pg_sequences-view.patch

Re: pg_sequence catalog

From

Amit Kapila

Date:

10 September 2016, 11:53:32

On Thu, Sep 1, 2016 at 12:00 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2016-08-31 14:23:41 -0400, Tom Lane wrote:
>> Andres Freund <andres@anarazel.de> writes:
>> > On 2016-08-31 13:59:48 -0400, Tom Lane wrote:
>> >> You are ignoring the performance costs associated with eating 100x more
>> >> shared buffer space than necessary.
>>
>> > I doubt that's measurable in any real-world scenario. You seldomly have
>> > hundreds of thousands of sequences that you all select from at a high
>> > rate.
>>
>> If there are only a few sequences in the database, cross-sequence
>> contention is not going to be a big issue anyway.
>
> Isn't that *precisely* when it's going to matter? If you have 5 active
> tables & sequences where the latter previously used independent locks,
> and they'd now be contending on a single lock.
>

I may be missing something here, but why would it contend on a lock,
as per locking scheme proposed by Alvaro, access to sequence object
will need a share lock on buffer page.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: pg_sequence catalog

From

Andres Freund

Date:

10 September 2016, 19:09:46

On 2016-09-10 17:23:21 +0530, Amit Kapila wrote:
> On Thu, Sep 1, 2016 at 12:00 AM, Andres Freund <andres@anarazel.de> wrote:
> > On 2016-08-31 14:23:41 -0400, Tom Lane wrote:
> >> Andres Freund <andres@anarazel.de> writes:
> >> > On 2016-08-31 13:59:48 -0400, Tom Lane wrote:
> >> >> You are ignoring the performance costs associated with eating 100x more
> >> >> shared buffer space than necessary.
> >>
> >> > I doubt that's measurable in any real-world scenario. You seldomly have
> >> > hundreds of thousands of sequences that you all select from at a high
> >> > rate.
> >>
> >> If there are only a few sequences in the database, cross-sequence
> >> contention is not going to be a big issue anyway.
> >
> > Isn't that *precisely* when it's going to matter? If you have 5 active
> > tables & sequences where the latter previously used independent locks,
> > and they'd now be contending on a single lock.
> >
> 
> I may be missing something here, but why would it contend on a lock,
> as per locking scheme proposed by Alvaro, access to sequence object
> will need a share lock on buffer page.

To make checkpointing/bgwriter work correctly we need exclusive locks on
pages while writing..., or some new lock level preventing the page from
being written out, while "shared dirtying" locks are being held.

Andres

Re: pg_sequence catalog

From

Amit Kapila

Date:

11 September 2016, 05:15:20

On Sun, Sep 11, 2016 at 12:39 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2016-09-10 17:23:21 +0530, Amit Kapila wrote:
>> >
>>
>> I may be missing something here, but why would it contend on a lock,
>> as per locking scheme proposed by Alvaro, access to sequence object
>> will need a share lock on buffer page.
>
> To make checkpointing/bgwriter work correctly we need exclusive locks on
> pages while writing..., or some new lock level preventing the page from
> being written out, while "shared dirtying" locks are being held.
>

Right and I think you have a very valid concern, but if we think that
storing multiple sequences on a same page is a reasonable approach,
then we can invent some locking mechanism as indicated by you such
that two writes on same page won't block each other, but they will be
blocked with bgwriter/checkpointer.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: pg_sequence catalog

From

Michael Paquier

Date:

03 October 2016, 04:01:40

On Sun, Sep 11, 2016 at 2:15 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Sun, Sep 11, 2016 at 12:39 AM, Andres Freund <andres@anarazel.de> wrote:
>> On 2016-09-10 17:23:21 +0530, Amit Kapila wrote:
>>> >
>>>
>>> I may be missing something here, but why would it contend on a lock,
>>> as per locking scheme proposed by Alvaro, access to sequence object
>>> will need a share lock on buffer page.
>>
>> To make checkpointing/bgwriter work correctly we need exclusive locks on
>> pages while writing..., or some new lock level preventing the page from
>> being written out, while "shared dirtying" locks are being held.
>>
>
> Right and I think you have a very valid concern, but if we think that
> storing multiple sequences on a same page is a reasonable approach,
> then we can invent some locking mechanism as indicated by you such
> that two writes on same page won't block each other, but they will be
> blocked with bgwriter/checkpointer.

This thread has died a couple of weeks back, so I am marking it as
returned with feedback by seeing the discussion that has been done.
Feel free to update the patch if you think that's not adapted.
-- 
Michael

Re: pg_sequence catalog

From

Peter Eisentraut

Date:

31 October 2016, 16:25:51

On 9/10/16 7:17 AM, Peter Eisentraut wrote:
> Let's start with that.  Here is a patch that adds a pg_sequences view in
> the style of pg_tables, pg_indexes, etc.  This seems useful independent
> of anything else, but would give us more freedom to change things around
> behind the scenes.

I have registered the pg_sequences view patch separately in the upcoming
commit fest: https://commitfest.postgresql.org/11/865/ (The posted patch
still applies, so no new patch.)

Attached is an updated patch for the pg_sequence catalog.  I haven't
received an actual technical review in the last CF, so I'll move it
forward.  However, if the pg_sequences view is accepted, the catalog
patch will of course need an update.  So if someone is looking here with
limited time, review the view patch first.

As before, this patch requires the "sequences and pg_upgrade" patch in
order for pg_upgrade to work.

I think together these three changes will make sequence metadata
inspection and modification easier and more robust and will facilitate
future changes in this area.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

v2-0001-Add-pg_sequence-system-catalog.patch

Re: pg_sequence catalog

From

Andreas Karlsson

Date:

09 November 2016, 01:50:47

Review of the pg_sequences view.

This seems like a useful addition to me, making life easier for 
administrators and monitoring tools. While there is already a view in 
information_schema it is missing cache_size and last_value.

= Functional review

- The patch applies and passes the test suite without any issue.

- A worry is that it might get a bit confusing to have both the future 
catalog pg_sequence and the view pg_sequences.

- I think it would be useful to include is_cycled in the view.

- When creating a temporary sequences and then running "SELECT * FROM 
pg_sequences" in another session I get the following error.

ERROR:  cannot access temporary tables of other sessions

- Shouldn't last_value be NULL directly after we have created the 
sequence but nobody has called nextval() yet?

- I noticed that last_value includes the cached values, but that also 
seems to me like the correct thing to do.

- I do not like the name of the new function, lastval(regclass). I think 
like you suggested it would be better with something more verbose. 
sequence_lastval()? sequence_last_value()?

= Code

- There is an XXX comment still in the code. It is about the name of the 
lastval1() function.

= Documentation

- The documentation does not mention the last_value column.

- The extra empty line after "</table>" does not fit with the formatting 
of the rest of the SGML file.

Andreas

Re: pg_sequence catalog

From

Peter Eisentraut

Date:

10 November 2016, 04:29:24

On 11/8/16 6:43 PM, Andreas Karlsson wrote:
> - A worry is that it might get a bit confusing to have both the future
> catalog pg_sequence and the view pg_sequences.

We already have this in other cases: pg_index/pg_indexes,
pg_user_mapping/pg_user_mappings.  It's an established naming system by now.

> - I think it would be useful to include is_cycled in the view.

It's there under the name "cycle".

> - When creating a temporary sequences and then running "SELECT * FROM
> pg_sequences" in another session I get the following error.
>
> ERROR:  cannot access temporary tables of other sessions

Fixed that by adding pg_is_other_temp_schema() to the view definition.
We use that in the information schema but not in the system views so
far.  That might be worth looking into.

> - Shouldn't last_value be NULL directly after we have created the
> sequence but nobody has called nextval() yet?
>
> - I noticed that last_value includes the cached values, but that also
> seems to me like the correct thing to do.

The documentation now emphasizes that this is the value stored on disk.
This matches what Oracle does.

> - I do not like the name of the new function, lastval(regclass). I think
> like you suggested it would be better with something more verbose.
> sequence_lastval()? sequence_last_value()?

changed

> - There is an XXX comment still in the code. It is about the name of the
> lastval1() function.

fixed

> - The documentation does not mention the last_value column.

fixed

> - The extra empty line after "</table>" does not fit with the formatting
> of the rest of the SGML file.

fixed

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

v2-0001-Add-pg_sequences-view.patch

Re: pg_sequence catalog

From

Andreas Karlsson

Date:

10 November 2016, 05:27:18

On 11/10/2016 05:29 AM, Peter Eisentraut wrote:
> On 11/8/16 6:43 PM, Andreas Karlsson wrote:
>> - A worry is that it might get a bit confusing to have both the future
>> catalog pg_sequence and the view pg_sequences.
>
> We already have this in other cases: pg_index/pg_indexes,
> pg_user_mapping/pg_user_mappings.  It's an established naming system by now.

Then I am fine with it.

>> - I think it would be useful to include is_cycled in the view.
>
> It's there under the name "cycle".

You are right, my bad, I managed to confuse myself.

>> - Shouldn't last_value be NULL directly after we have created the
>> sequence but nobody has called nextval() yet?
>>
>> - I noticed that last_value includes the cached values, but that also
>> seems to me like the correct thing to do.
>
> The documentation now emphasizes that this is the value stored on disk.
> This matches what Oracle does.

We also store is_called on disk, I think that if is_called is false then 
last_value should be NULL. Either that or we should add is_called.

I will give the patch another review some time this week when i can find 
the time.

Andreas

Re: pg_sequence catalog

From

Andreas Karlsson

Date:

11 November 2016, 17:53:41

On 11/10/2016 06:27 AM, Andreas Karlsson wrote:
> On 11/10/2016 05:29 AM, Peter Eisentraut wrote:
>> On 11/8/16 6:43 PM, Andreas Karlsson wrote:
>>> - Shouldn't last_value be NULL directly after we have created the
>>> sequence but nobody has called nextval() yet?
>>>
>>> - I noticed that last_value includes the cached values, but that also
>>> seems to me like the correct thing to do.
>>
>> The documentation now emphasizes that this is the value stored on disk.
>> This matches what Oracle does.
>
> We also store is_called on disk, I think that if is_called is false then
> last_value should be NULL. Either that or we should add is_called.
>
> I will give the patch another review some time this week when i can find
> the time.

Other than my comment above about is_called and last_value I think the 
patch looks great.

Andreas

Re: pg_sequence catalog

From

Andreas Karlsson

Date:

12 November 2016, 03:06:39

Review for pg_sequence catalog

I like this change since it moves all the parts which should be
transactional to the system catalogs while keeping the only
non-transactional stuff in the sequence relations.

There was some discussion upthread about more compact representations
for the sequences, but I feel that is a separate issue mostly unrelated
to this patch.

I might play around more with it but it seems to work well so far.

As pointed out by Peter this patch also requires the changes to
pg_upgrade. I have not looked at those patches.

= Functional review

- The patch applies and compiles and seems to work fine after some quick
manual testing.

- The pg_dump tests fails due to the pg_dump code not being updated. I
have attached a patch which fixes this.

= Benchmarks

I was a bit worried that the extra syscache lookups might slow down
nextval(), but I got a measurable speed up on a very simple workload
which consisted of only calls to nextval() to one sequence. The speedup
was about 10% on my machine.

= Code

The changes to the code looks generally good.

 > @@ -1155,6 +1156,8 @@ doDeletion(const ObjectAddress *object, int flags)
 >                     else
 >                         heap_drop_with_catalog(object->objectId);
 >                 }
 > +               if (relKind == RELKIND_SEQUENCE)
 > +                   DeleteSequenceTuple(object->objectId);
 >                 break;
 >             }

I think it might be cleaner here to put this as a "else if" just like
"relKind == RELKIND_INDEX".

= Documentation

The patch does not update catalogs.sgml which it should do.

Andreas

Attachment

add-pg_sequence-system-catalog-pg_dump.patch

Re: pg_sequence catalog

From

Peter Eisentraut

Date:

18 November 2016, 20:00:51

On 11/11/16 12:53 PM, Andreas Karlsson wrote:
> Other than my comment above about is_called and last_value I think the 
> patch looks great.

I have made that change and committed the patch.  Thanks.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pg_sequence catalog

From

Peter Eisentraut

Date:

01 December 2016, 03:58:27

On 11/11/16 10:06 PM, Andreas Karlsson wrote:
> As pointed out by Peter this patch also requires the changes to
> pg_upgrade. I have not looked at those patches.

The required changes to pg_upgrade have been committed, so that will
work now.

> - The pg_dump tests fails due to the pg_dump code not being updated. I
> have attached a patch which fixes this.

fixed

> I was a bit worried that the extra syscache lookups might slow down
> nextval(), but I got a measurable speed up on a very simple workload
> which consisted of only calls to nextval() to one sequence. The speedup
> was about 10% on my machine.

I have done a fair amount of performance testing and the results were
usually in the neighborhood of 1%-2% slower or faster, nothing really
clear.  But running nextval by itself in a tight loop isn't really a
normal use.  The important thing is the concurrency behavior.

>  > @@ -1155,6 +1156,8 @@ doDeletion(const ObjectAddress *object, int flags)
>  >                     else
>  >                         heap_drop_with_catalog(object->objectId);
>  >                 }
>  > +               if (relKind == RELKIND_SEQUENCE)
>  > +                   DeleteSequenceTuple(object->objectId);
>  >                 break;
>  >             }
>
> I think it might be cleaner here to put this as a "else if" just like
> "relKind == RELKIND_INDEX".

The sequence tuple has to be deleted in addition to the heap drop.  I
have added a comment to make the clearer.

> The patch does not update catalogs.sgml which it should do.

added

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

v3-0001-Add-pg_sequence-system-catalog.patch

Re: pg_sequence catalog

From

Andreas Karlsson

Date:

02 December 2016, 02:47:19

I think this patch looks good now so I am setting it to ready for committer.

I like the idea of the patch and I think that while this change will 
break some tools which look at the sequence relations I think the 
advantages are worth it (for example making more sequence DDL respecting 
MVCC).

Andreas

Re: pg_sequence catalog

From

Haribabu Kommi

Date:

02 December 2016, 06:29:51

On Fri, Dec 2, 2016 at 1:47 PM, Andreas Karlsson <andreas@proxel.se> wrote:

I think this patch looks good now so I am setting it to ready for committer.

I like the idea of the patch and I think that while this change will break some tools which look at the sequence relations I think the advantages are worth it (for example making more sequence DDL respecting MVCC).

Moved to next CF with the same status (ready for committer).

Regards,

Hari Babu

Fujitsu Australia

Re: [HACKERS] pg_sequence catalog

From

Peter Eisentraut

Date:

20 December 2016, 16:44:29

On 12/1/16 9:47 PM, Andreas Karlsson wrote:
> I think this patch looks good now so I am setting it to ready for committer.

committed, thanks

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] pg_sequence catalog

From

Kuntal Ghosh

Date:

03 January 2017, 15:23:15

On Tue, Dec 20, 2016 at 7:14 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 12/1/16 9:47 PM, Andreas Karlsson wrote:
>> I think this patch looks good now so I am setting it to ready for committer.
>
> committed, thanks
The regression tests for hot standby check fails since it uses the
following statement:
-select min_value as sequence_min_value from hsseq;
which is no longer supported I guess. It should be modified as following:
select min_value as sequence_min_value from pg_sequences where
sequencename = 'hsseq';

Attached is a patch which reflects the above changes.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

fix_sequence_test_in_standbycheck.sql

Re: [HACKERS] pg_sequence catalog

From

Peter Eisentraut

Date:

03 January 2017, 17:30:58

On 1/3/17 7:23 AM, Kuntal Ghosh wrote:
> The regression tests for hot standby check fails since it uses the
> following statement:
> -select min_value as sequence_min_value from hsseq;
> which is no longer supported I guess. It should be modified as following:
> select min_value as sequence_min_value from pg_sequences where
> sequencename = 'hsseq';
> 
> Attached is a patch which reflects the above changes.

Fixed, thanks.

I made a note to self to port this test to the TAP framework.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] pg_sequence catalog

From

Andreas Karlsson

Date:

03 January 2017, 21:17:32

On 01/03/2017 03:30 PM, Peter Eisentraut wrote:
> On 1/3/17 7:23 AM, Kuntal Ghosh wrote:
>> The regression tests for hot standby check fails since it uses the
>> following statement:
>> -select min_value as sequence_min_value from hsseq;
>> which is no longer supported I guess. It should be modified as following:
>> select min_value as sequence_min_value from pg_sequences where
>> sequencename = 'hsseq';
>>
>> Attached is a patch which reflects the above changes.
>
> Fixed, thanks.
>
> I made a note to self to port this test to the TAP framework.

Hm, doesn't this change the intent of the test case? As I read the test 
it seems to make sure that we are allowed to do a read from a sequence 
relation on the slave. If so I think it should be changed to something 
like the below.

select is_called from hsseq;

Andreas

Re: [HACKERS] pg_sequence catalog

From

Peter Eisentraut

Date:

04 January 2017, 16:53:57

On 1/3/17 1:17 PM, Andreas Karlsson wrote:
> Hm, doesn't this change the intent of the test case? As I read the test 
> it seems to make sure that we are allowed to do a read from a sequence 
> relation on the slave. If so I think it should be changed to something 
> like the below.
> 
> select is_called from hsseq;

You're right.  Fixed that way.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services