Thread: Global Sequences

Global Sequences

From

Simon Riggs

Date:

15 October 2012, 21:34:04

Sequences, as defined by SQL Standard, provide a series of unique
values. The current implementation on PostgreSQL isolates the
generation mechanism to only a single node, as is common on many
RDBMS.

For sharded or replicated systems it forces people to various hackish
mechanisms in user space for emulating a global or cluster-wide
sequence.

The solution to this problem is an in-core solution that allows
coordination between nodes to guarantee unique values.

There are a few options
1) Manual separation of the value space, so that N1 has 50% of
possible values and N2 has 50%. That has problems when we reconfigure
the cluster, and requires complex manual reallocation of values. So it
starts good but ends badly.
2) Automatic separation of the value space. This could mimic the
manual operation, so it does everything for you - but thats just
making a bad idea automatic
3) Lazy allocation from the value space. When a node is close to
running out of values, it requests a new allocation and coordinates
with all nodes to confirm the new allocation is good.

(3) is similar to the way values are allocated currently, so the only
addition is a multi-node allocation algorithm to allocate new value
ranges. That seems to be the best way to go. Any implementation for
that presumes how the node configuration and inter-node transport
works, which we would like to keep open for use by various external
tools.

So, proposal is to allow nextval() allocation to access a plugin,
rather than simply write a WAL record and increment. If the plugin is
loaded all sequences call it (not OIDs).

We'd call this the Global Sequence API. The API looks like it would be
pretty stable to me. We can put something in contrib if required to
prove it works, as well as providing some optional caching to further
avoid performance effects from being noted.

Note that if you did just want to implement manual separation of
ranges then this would also make it slightly easier, so this approach
supports all flavors, which a more hardcoded solution would not.

Any comments before I demonstrate a patch to do this?

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Global Sequences

From

Josh Berkus

Date:

15 October 2012, 22:47:47

On 10/15/12 2:33 PM, Simon Riggs wrote:
> 3) Lazy allocation from the value space. When a node is close to
> running out of values, it requests a new allocation and coordinates
> with all nodes to confirm the new allocation is good.
> 
> (3) is similar to the way values are allocated currently, so the only
> addition is a multi-node allocation algorithm to allocate new value
> ranges. That seems to be the best way to go. Any implementation for
> that presumes how the node configuration and inter-node transport
> works, which we would like to keep open for use by various external
> tools.

So you're talking about allocating a block of values for each node?
Seems straightforwards.  That does mean that there needs to be a
"master" node in charge of the allocations, though, yes?  How would this
be tooled/managed?

I'd also love to hear from the PostgresXC folks on whether this solution
works for them.  Postgres-R too.  If it works for all three of those
tools, it's liable to work for any potential new tool.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: Global Sequences

From

Daniel Farina

Date:

15 October 2012, 23:56:58

On Mon, Oct 15, 2012 at 2:33 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Sequences, as defined by SQL Standard, provide a series of unique
> values. The current implementation on PostgreSQL isolates the
> generation mechanism to only a single node, as is common on many
> RDBMS.
>
> For sharded or replicated systems it forces people to various hackish
> mechanisms in user space for emulating a global or cluster-wide
> sequence.
>
> The solution to this problem is an in-core solution that allows
> coordination between nodes to guarantee unique values.
>
> There are a few options
> 1) Manual separation of the value space, so that N1 has 50% of
> possible values and N2 has 50%. That has problems when we reconfigure
> the cluster, and requires complex manual reallocation of values. So it
> starts good but ends badly.
> 2) Automatic separation of the value space. This could mimic the
> manual operation, so it does everything for you - but thats just
> making a bad idea automatic
> 3) Lazy allocation from the value space. When a node is close to
> running out of values, it requests a new allocation and coordinates
> with all nodes to confirm the new allocation is good.

While useful to some people, it seems like a way to avoid a crazy
amount of complexity whereas most people are fine just using
uncoordinated 128-bit integers.  There are some who want temporal
locality and smaller datums, but couldn't the problem be minimized
somewhat by presuming non-coordinated identifier generation?

It seems like a proper subset of what you propose, so perhaps that's a
nice way to bisect the problem.  I agree with you that an in-database
way to do this -- even if in principle it could be done by clients
most of the time -- would lend a lot more cohesion to the system.

FWIW, I like "3" otherwise -- much like the foibles of most
partitioning schemes and some of the stated design considerations of
segment exclusion, I think absolute rigidity at all times makes it
really hard to gradually move things into a desired alignment
incrementally while the system is online.

-- 
fdr

Re: Global Sequences

From

Stephen Frost

Date:

16 October 2012, 01:00:09

Josh,

* Josh Berkus (josh@agliodbs.com) wrote:
> I'd also love to hear from the PostgresXC folks on whether this solution
> works for them.  Postgres-R too.  If it works for all three of those
> tools, it's liable to work for any potential new tool.

AIUI, PG-XC and PG-R need an order, so they both use an independent
system (eg, the PG-XC GTM) to provide that ordering.

Again, AIUI, Simon's proposal would not guarantee any ordering but
instead would only guarantee non-overlap.  Since the proposal being
pushed appeared to involve all the complexity of dealing with something
like a GTM, by having to have some third system which manages the
allocations, figure out what to do if it isn't available, etc, perhaps
the requirement to provide an ordering should be added on to it and
then the PG-XC GTM simply used for it.

If we're not going to have an ordering requirement then I'm not
convinced that the pre-allocation approach (where you break the space
up into many more blocks than you would ever expect to use and then
double-up those blocks on to the same physical system) is a bad
solution.  It's certainly also been done a number of times, typically
quite successfully.
Thanks,
    Stephen

Re: Global Sequences

From

Josh Berkus

Date:

16 October 2012, 01:30:24

Stephen,

> AIUI, PG-XC and PG-R need an order, so they both use an independent
> system (eg, the PG-XC GTM) to provide that ordering.

You're thinking of XIDs.  This is a proposal for user-defined sequences.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: Global Sequences

From

Stephen Frost

Date:

16 October 2012, 01:37:44

* Josh Berkus (josh@agliodbs.com) wrote:
> > AIUI, PG-XC and PG-R need an order, so they both use an independent
> > system (eg, the PG-XC GTM) to provide that ordering.
>
> You're thinking of XIDs.  This is a proposal for user-defined sequences.

Right, I got that it's a proposal for user sequences.  I stand by my
comments regarding that proposal- if you're going to have all the
complexity of having to talk to some other system to figure out what
value(s) you can use, you might as well get an ordering with it.  If
that's not necessary, then doing pre-allocation is probably sufficient
and would provide a direct way, without having to ask any other system,
to figure out which system a given value is on.
Thanks,
    Stephen

Re: Global Sequences

From

Michael Paquier

Date:

16 October 2012, 01:49:08

On Tue, Oct 16, 2012 at 10:30 AM, Josh Berkus <josh@agliodbs.com> wrote:

Stephen,

> AIUI, PG-XC and PG-R need an order, so they both use an independent
> system (eg, the PG-XC GTM) to provide that ordering.

You're thinking of XIDs. This is a proposal for user-defined sequences.

XC also provides global sequence values with the GTM.

--
Michael Paquier
http://michael.otacoo.com

Re: Global Sequences

From

Christopher Browne

Date:

16 October 2012, 02:03:47

On Mon, Oct 15, 2012 at 5:33 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Sequences, as defined by SQL Standard, provide a series of unique
> values. The current implementation on PostgreSQL isolates the
> generation mechanism to only a single node, as is common on many
> RDBMS.

I remember constructing at least the thought process surrounding your
"option #3" as the one thing thing I arrived at that seemed as though
it might be valuable for the many-years-ago Slony-II summit.

The notion of having each node give out sections of a sequence seems
pretty viable; as a hugely loose analogy, DHCP servers  take a
somewhat similar approach in assigning IP addresses in ranges shared
across those servers.

At the time, back in, I think, 2005, there was some agreement that it
was a viable idea to give out chunks of sequence range; it wasn't one
of the tough problems warranting Heavy Thought, so there wasn't any
concentration on it, and that pretty all went by the wayside.

Today, I am somewhat skeptical that there's much value to the
exercise.  It isn't difficult to come up with pretty unique values,
between constructing something with a node ID prefix or such, or using
a DCE UUID that is very likely to be globally unique.

The reason to want a "global sequence" is that it's supposed to give
out values in pretty much a sequential order.  But a "shared global
sequence" will have aspects of that that are decidedly *not* in
sequential order.  If it's partially *un*ordered, I suspect this
undermines the value of it.

There's a necessary trade-off; you can either have it globally
*strongly* ordered, and, if so, you'll have to pay a hefty
coordination price, or you can have the cheaper answer of a weakly
ordered sequence.  The latter leaves me feeling rather "meh."
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

Re: Global Sequences

From

Stephen Frost

Date:

16 October 2012, 02:08:09

* Christopher Browne (cbbrowne@gmail.com) wrote:
> There's a necessary trade-off; you can either have it globally
> *strongly* ordered, and, if so, you'll have to pay a hefty
> coordination price, or you can have the cheaper answer of a weakly
> ordered sequence.  The latter leaves me feeling rather "meh."

If all the systems involved are local to the system giving out the
sequences, ala PG-XC's GTM, I don't believe it's really all *that*
expensive..
Thanks,
    Stephen

Re: Global Sequences

From

Simon Riggs

Date:

16 October 2012, 06:45:01

On 16 October 2012 03:03, Christopher Browne <cbbrowne@gmail.com> wrote:

> There's a necessary trade-off; you can either have it globally
> *strongly* ordered, and, if so, you'll have to pay a hefty
> coordination price, or you can have the cheaper answer of a weakly
> ordered sequence.  The latter leaves me feeling rather "meh."

Oracle allows you to define whether you want ORDER or not for a
sequence when used in clustered mode.

Requesting a sequence to be strongly ordered across a generic
distributed system is very much like asking performance=none and
high_availability=off, which is why I didn't suggest it. So you're
right about the "hefty coordination price" but our conclusions differ
because of our understanding of that price.

I don't think it makes sense to spend the time implementing that option.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Global Sequences

From

Yeb Havinga

Date:

16 October 2012, 07:33:52

On 2012-10-15 23:33, Simon Riggs wrote:
> So, proposal is to allow nextval() allocation to access a plugin,
> rather than simply write a WAL record and increment. If the plugin is
> loaded all sequences call it (not OIDs).
+1. It is currently impossible to alter nextvals behaviour, without 
making changes in core. It is possible to define an alternative 
implementation and try to force to use it by using the search_path, but 
serial datatypes are always bound to pg_catalog.nextval(). This would 
enable every distributed PostgreSQL system to make a cleaner 
implementation for global sequences than they currently have, and would 
also encourage reuse of distributed nextval implementations.

regards,
Yeb

-- 
Yeb Havinga
http://www.mgrid.net/
Mastering Medical Data

Re: Global Sequences

From

Markus Wanner

Date:

16 October 2012, 12:26:39

On 10/16/2012 12:47 AM, Josh Berkus wrote:
> I'd also love to hear from the PostgresXC folks on whether this solution
> works for them.  Postgres-R too.

In Postgres-R, option 3) is implemented. Though, by default sequences
work just like on a single machine, giving you monotonically increasing
sequence values - independent from the node you call nextval() from. IMO
that's the user's expectation. (And yes, this has a performance penalty.
But no, there's no compromise in availability).

It is implemented very much like the per-backend cache we already have
in vanilla Postgres, but taken to the per-node level. This gives the
user a nice compromise between strongly ordered and entirely random
values, allowing fine-tuning the trade off between performance and
laziness in the ordering (think of CACHE 10 vs. CACHE 10000).

> If it works for all three of those
> tools, it's liable to work for any potential new tool.

In Postgres-R, this per-node cache uses additional attributes in the
pg_sequence system catalog to store state of this cache. This is
something I'm sure is not feasible to do from within a plugin.

Why does a "Global Sequences" API necessarily hook at the nextval() and
setval() level? That sounds like it yields an awkward amount of
duplicate work. Reading this thread, so far it looks like we agree that
option 3) is the most feasible optimization (the strict ordering being
the un-optimized starting point). Do we really need an API that allows
for implementations of options 1) and 2)?

What I'd appreciate more is a common implementation for option 3) with
an API to plug in different solutions to the underlying consensus problem.

Regards

Markus Wanner

Re: Global Sequences

From

Simon Riggs

Date:

16 October 2012, 12:36:28

On 16 October 2012 13:26, Markus Wanner <markus@bluegap.ch> wrote:

> Why does a "Global Sequences" API necessarily hook at the nextval() and
> setval() level? That sounds like it yields an awkward amount of
> duplicate work. Reading this thread, so far it looks like we agree that
> option 3) is the most feasible optimization (the strict ordering being
> the un-optimized starting point). Do we really need an API that allows
> for implementations of options 1) and 2)?

Where else would you put the hook? The hook's location as described
won't change whether you decide you want 1, 2 or 3.

> What I'd appreciate more is a common implementation for option 3) with
> an API to plug in different solutions to the underlying consensus problem.

Implementations will be similar, differing mostly in the topology and
transport layer, which means its not going to be possible to provide
such a thing initially without slowing it down to the point we don't
actually get it at all.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Global Sequences

From

Peter Eisentraut

Date:

16 October 2012, 12:54:14

On 10/15/12 5:33 PM, Simon Riggs wrote:
> There are a few options
> 1) Manual separation of the value space, so that N1 has 50% of
> possible values and N2 has 50%. That has problems when we reconfigure
> the cluster, and requires complex manual reallocation of values. So it
> starts good but ends badly.
> 2) Automatic separation of the value space. This could mimic the
> manual operation, so it does everything for you - but thats just
> making a bad idea automatic
> 3) Lazy allocation from the value space. When a node is close to
> running out of values, it requests a new allocation and coordinates
> with all nodes to confirm the new allocation is good.

What would the allocation service look like?  Is it another PostgreSQL
server?  What's the communication protocol?  How would backups work?
Crash recovery?

Option 4 is of course to use UUIDs.

Re: Global Sequences

From

Andrew Dunstan

Date:

16 October 2012, 13:11:10

On 10/16/2012 08:54 AM, Peter Eisentraut wrote:
>
> Option 4 is of course to use UUIDs.
>
>

Yeah, I was wondering what this would really solve that using UUIDs 
wouldn't solve.

cheers

andrew

Re: Global Sequences

From

Andres Freund

Date:

16 October 2012, 13:16:07

On Tuesday, October 16, 2012 02:58:11 PM Andrew Dunstan wrote:
> On 10/16/2012 08:54 AM, Peter Eisentraut wrote:
> > Option 4 is of course to use UUIDs.
> 
> Yeah, I was wondering what this would really solve that using UUIDs
> wouldn't solve.

Large indexes over random values perform notably worse than mostly/completely 
ordered ones as they can be perfectly packed. Beside the fact that uuids have 
2/4 times the storage overhead of int4/int8.

That has influences both in query and modification performance.

Also, not allowing plainly numeric pk's makes porting an application pretty 
annoying...

Greetings,

Andres
-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Global Sequences

From

Simon Riggs

Date:

16 October 2012, 13:21:03

On 16 October 2012 13:54, Peter Eisentraut <peter_e@gmx.net> wrote:
> On 10/15/12 5:33 PM, Simon Riggs wrote:
>> There are a few options
>> 1) Manual separation of the value space, so that N1 has 50% of
>> possible values and N2 has 50%. That has problems when we reconfigure
>> the cluster, and requires complex manual reallocation of values. So it
>> starts good but ends badly.
>> 2) Automatic separation of the value space. This could mimic the
>> manual operation, so it does everything for you - but thats just
>> making a bad idea automatic
>> 3) Lazy allocation from the value space. When a node is close to
>> running out of values, it requests a new allocation and coordinates
>> with all nodes to confirm the new allocation is good.
>
> What would the allocation service look like?  Is it another PostgreSQL
> server?  What's the communication protocol?  How would backups work?
> Crash recovery?

I've proposed a plugin for the allocation only. So the allocation
looks like anything you want.

Crash recovery and backups would not need changes.

> Option 4 is of course to use UUIDs.

That is a user level option. If user chooses sequences, then we must
support them.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Global Sequences

From

Tom Lane

Date:

16 October 2012, 14:15:37

Simon Riggs <simon@2ndQuadrant.com> writes:
> So, proposal is to allow nextval() allocation to access a plugin,
> rather than simply write a WAL record and increment. If the plugin is
> loaded all sequences call it (not OIDs).

I think this is a fundamentally wrong way to go about doing what you
want to do.  It presumes that DDL-level manipulation of global sequences
is exactly like local sequences; an assumption that is obviously false.
What you really want is something vaguely like nextval but applied to
a distinct type of object.  That is, I think we first need a different
kind of object called a "global sequence" with its own DDL operations.

The nearby thread about "message queue" objects seems rather closely
related.  Perhaps it would be fruitful to think about the commonalities
involved in two (or more?) new relkinds for global objects.
        regards, tom lane

Re: Global Sequences

From

Simon Riggs

Date:

16 October 2012, 14:49:56

On 16 October 2012 15:15, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> So, proposal is to allow nextval() allocation to access a plugin,
>> rather than simply write a WAL record and increment. If the plugin is
>> loaded all sequences call it (not OIDs).
>
> I think this is a fundamentally wrong way to go about doing what you
> want to do.  It presumes that DDL-level manipulation of global sequences
> is exactly like local sequences; an assumption that is obviously false.
> What you really want is something vaguely like nextval but applied to
> a distinct type of object.  That is, I think we first need a different
> kind of object called a "global sequence" with its own DDL operations.
>
> The nearby thread about "message queue" objects seems rather closely
> related.  Perhaps it would be fruitful to think about the commonalities
> involved in two (or more?) new relkinds for global objects.

The message queue concept doesn't exist at all yet, so when we create
it we can specify anything we want. That is a different situation and
hence a different solution. CREATE SEQUENCE is SQL Standard and used
by SERIAL, many people's SQL, SQL generation tools etc.. My objective
is to come up with something that makes the standard code work
correctly in a replicated environment.

If rewriting the application was acceptable, we could just do as Peter
suggests and say "use UUIDs". Many other people who think rewriting
everything is OK spell that "CouchDB" etc.. But that doesn't solve the
problem at hand, which is making existing things work, rather than
force people to rethink and rewrite.

So CREATE GLOBAL SEQUENCE as a new kind of object altogether wouldn't
solve the problem I'm trying to address.

I guess we could use a decoration syntax on a sequence, like this...

ALTER SEQUENCE foo GLOBAL
or
ALTER SEQUENCE foo ALLOCATION FUNCTION myglobalalloc(); -- if we cared
to specify the alloc function on a per object basis.
or
ALTER SEQUENCE foo WITH (allocation=global);

So that we can explicitly specify which sequences to replicate
globally and which locally.

We'd need to have a userset GUC  default_sequence_allocation = local (default) | global
so that SERIAL and other new sequences could pick up the required
behaviour when created.

In any case, we need to have a plugin/user definable allocation
function for next few years at least.

Thoughts?

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Global Sequences

From

Tom Lane

Date:

16 October 2012, 16:15:57

Simon Riggs <simon@2ndQuadrant.com> writes:
> On 16 October 2012 15:15, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I think this is a fundamentally wrong way to go about doing what you
>> want to do.  It presumes that DDL-level manipulation of global sequences
>> is exactly like local sequences; an assumption that is obviously false.

> The message queue concept doesn't exist at all yet, so when we create
> it we can specify anything we want. That is a different situation and
> hence a different solution. CREATE SEQUENCE is SQL Standard and used
> by SERIAL, many people's SQL, SQL generation tools etc.. My objective
> is to come up with something that makes the standard code work
> correctly in a replicated environment.

I challenge you to find anything in the SQL standard that suggests that
sequences have any nonlocal behavior.  If anything, what you propose
violates the standard, it doesn't make us follow it more closely.

Furthermore, I find it hard to believe that people won't want both
local and global sequences in the same database --- so one way or the
other we need some DDL-level reflection of the difference.

A larger point though is that the various implementation choices you
mentioned probably need to be configurable by DDL options.  I doubt that
it will work well to say "install plugin A to get behavior X, or install
plugin B to get behavior Y, and whichever you choose is not further
configurable, it'll be the same behavior for all sequences".  So I fully
expect that we're going to need something different from bog-standard
CREATE SEQUENCE.  Exactly what isn't clear --- but I think modifying
nextval's behavior is way down the list of concerns.
        regards, tom lane

Re: Global Sequences

From

Peter Eisentraut

Date:

16 October 2012, 16:17:24

On 10/16/12 9:20 AM, Simon Riggs wrote:
> I've proposed a plugin for the allocation only. So the allocation
> looks like anything you want.

Are you planning to provide a reference implementation of some kind?

Re: Global Sequences

From

Daniel Farina

Date:

16 October 2012, 16:46:15

On Tue, Oct 16, 2012 at 5:54 AM, Peter Eisentraut <peter_e@gmx.net> wrote:
> On 10/15/12 5:33 PM, Simon Riggs wrote:
>> There are a few options
>> 1) Manual separation of the value space, so that N1 has 50% of
>> possible values and N2 has 50%. That has problems when we reconfigure
>> the cluster, and requires complex manual reallocation of values. So it
>> starts good but ends badly.
>> 2) Automatic separation of the value space. This could mimic the
>> manual operation, so it does everything for you - but thats just
>> making a bad idea automatic
>> 3) Lazy allocation from the value space. When a node is close to
>> running out of values, it requests a new allocation and coordinates
>> with all nodes to confirm the new allocation is good.
>
> What would the allocation service look like?  Is it another PostgreSQL
> server?  What's the communication protocol?  How would backups work?
> Crash recovery?

As a reasonable proxy to look at the first question, one may look at
how twitter uses their home-grown software snowflake.

https://github.com/twitter/snowflake

A colleague, Blake Mizerany, wrote a smaller version called "noeqd",
based on the same ideas, but he wanted something with fewer
dependencies.  Unless you are very Java-library-and-toolchain adept
you might find this more fun to play with.

https://github.com/bmizerany/noeqd

-- 
fdr

Re: Global Sequences

From

Simon Riggs

Date:

16 October 2012, 16:53:41

On 16 October 2012 17:15, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> On 16 October 2012 15:15, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I think this is a fundamentally wrong way to go about doing what you
>>> want to do.  It presumes that DDL-level manipulation of global sequences
>>> is exactly like local sequences; an assumption that is obviously false.
>
>> The message queue concept doesn't exist at all yet, so when we create
>> it we can specify anything we want. That is a different situation and
>> hence a different solution. CREATE SEQUENCE is SQL Standard and used
>> by SERIAL, many people's SQL, SQL generation tools etc.. My objective
>> is to come up with something that makes the standard code work
>> correctly in a replicated environment.
>
> I challenge you to find anything in the SQL standard that suggests that
> sequences have any nonlocal behavior.

No need for challenge, I agree, the SQL standard doesn't speak about
that. I didn't say it did.

> If anything, what you propose
> violates the standard,

And so that doesn't follow, but anyway...

> it doesn't make us follow it more closely.

I wasn't arguing that my proposal did that (made us follow standard
more closely).

My point is that application code exists that expects sequences to
Just Work, and so the aim of the proposal was to do that in a
replicated environment as well as single node.

> Furthermore, I find it hard to believe that people won't want both
> local and global sequences in the same database --- so one way or the
> other we need some DDL-level reflection of the difference.

Agreed, which is why I proposed some DDL-level syntax. Was that OK?

> A larger point though is that the various implementation choices you
> mentioned probably need to be configurable by DDL options.  I doubt that
> it will work well to say "install plugin A to get behavior X, or install
> plugin B to get behavior Y, and whichever you choose is not further
> configurable, it'll be the same behavior for all sequences".

Again, I accept that as of my last post, and I proposed syntax to provide it.

> So I fully
> expect that we're going to need something different from bog-standard
> CREATE SEQUENCE.

There's no point in that at all, as explained. It's sequences that
need to work. We can already call my_nextval() rather than nextval()
if we want a roll-your own sequence facility and can rewrite
applications to call that, assuming UUID isn't appropriate.

Please don't force people to rewrite their applications; it might not
go in the direction we want.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Global Sequences

From

Simon Riggs

Date:

16 October 2012, 16:57:50

On 16 October 2012 17:17, Peter Eisentraut <peter_e@gmx.net> wrote:
> On 10/16/12 9:20 AM, Simon Riggs wrote:
>> I've proposed a plugin for the allocation only. So the allocation
>> looks like anything you want.
>
> Are you planning to provide a reference implementation of some kind?

I'll provide hooks and a stub for testing.

A full implementation relies upon the physical transport used. For
BDR, there will be a full working version that relies upon that. TPL,
PGDG.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Global Sequences

From

Tom Lane

Date:

16 October 2012, 17:29:19

Simon Riggs <simon@2ndQuadrant.com> writes:
> On 16 October 2012 17:15, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> So I fully
>> expect that we're going to need something different from bog-standard
>> CREATE SEQUENCE.

> There's no point in that at all, as explained. It's sequences that
> need to work. We can already call my_nextval() rather than nextval()
> if we want a roll-your own sequence facility and can rewrite
> applications to call that, assuming UUID isn't appropriate.

I wasn't objecting to the concept of allowing nextval() to have
overloaded behaviors; more saying that that wasn't where to start the
design process.

In particular, the reason proposing a hook first seems backwards is that
if we have a catalog-level representation that some sequences are local
and others not, we should be using that to drive the determination of
whether to call a substitute function --- and maybe which one to call.
For instance, I could see attaching a function OID to each sequence
and then having nextval() call that function, instead of a hook per se.

Or maybe better, invent a level of indirection like a "sequence access
method" (comparable to index access methods) that provides a compatible
set of substitute functions for sequence operations.  If you want to
override nextval() for a sequence, don't you likely also need to
override setval(), currval(), etc?  Not to mention overriding ALTER
SEQUENCE's behavior.
        regards, tom lane

Re: Global Sequences

From

Markus Wanner

Date:

17 October 2012, 08:10:15

Simon,

On 10/16/2012 02:36 PM, Simon Riggs wrote:
> Where else would you put the hook? The hook's location as described
> won't change whether you decide you want 1, 2 or 3.

You assume we want an API that supports all three options. In that case,
yes, the hooks need to be very general.

Given that option 3 got by far the most support, I question whether we
need such a highly general API. I envision an API that keeps the
bookkeeping and cache lookup functionality within Postgres. So we have a
single, combined-effort, known working implementation for that.

What remains to be done within the plugin effectively is the consensus
problem: it all boils down to the question of which node gets the next
chunk of N sequence numbers. Where N can be 1 (default CACHE setting in
Postgres) or any higher number for better performance (reduces the total
communication overhead by a factor of N - or at least pretty close to
that, if you take into account "lost" chucks due to node failures).

A plugin providing that has to offer a method to request for a global
ordering and would have to trigger a callback upon reaching consensus
with other nodes on who gets the next chunk of sequence numbers. That
works for all N >= 1. And properly implements option 3 (but doesn't
allow implementations of options 1 or 2, which I claim we don't need,
anyway).

> Implementations will be similar, differing mostly in the topology and
> transport layer

I understand that different users have different needs WRT transport
layers - moving the hooks as outlined above still allows flexibility in
that regard.

What different topologies do you have in mind? I'd broadly categorize
this all as multi-master. Do you need finer grained differentiation? Or
do you somehow include slaves (i.e. read-only transactions) in this process?

As you yourself are saying, implementations will only differ in that
way, let's keep the common code the same. And not require plugins to
duplicate that. (This also allows us to use the system catalogs for book
keeping, as another benefit).

> which means its not going to be possible to provide
> such a thing initially without slowing it down to the point we don't
> actually get it at all.

Sorry, I don't quite understand what you are trying to say, here.

Overall, thanks for bringing this up. I'm glad to see something
happening in this area, after all.

Regards

Markus Wanner

Re: Global Sequences

From

Markus Wanner

Date:

17 October 2012, 08:21:08

Tom,

On 10/16/2012 06:15 PM, Tom Lane wrote:
> I challenge you to find anything in the SQL standard that suggests that
> sequences have any nonlocal behavior.  If anything, what you propose
> violates the standard, it doesn't make us follow it more closely.

If you look at a distributed database as a transparent equivalent of a
single-node system, I'd say the SQL standard applies to the entire
distributed system. From that point of view, I'd rather argue that any
"local-only" behavior violates the standard.

Regards

Markus Wanner

Re: Global Sequences

From

Simon Riggs

Date:

17 October 2012, 08:34:22

On 17 October 2012 09:10, Markus Wanner <markus@bluegap.ch> wrote:
> Simon,
>
> On 10/16/2012 02:36 PM, Simon Riggs wrote:
>> Where else would you put the hook? The hook's location as described
>> won't change whether you decide you want 1, 2 or 3.
>
> You assume we want an API that supports all three options. In that case,
> yes, the hooks need to be very general.

I'm not assuming that, so much of what you say is moot, though it is
good and welcome input.

> Given that option 3 got by far the most support, I question whether we
> need such a highly general API. I envision an API that keeps the
> bookkeeping and cache lookup functionality within Postgres. So we have a
> single, combined-effort, known working implementation for that.

IMHO an API is required for "give me the next allocation of numbers",
essentially a bulk equivalent of nextval().

Anything lower level is going to depend upon implementation details
that I don't think we should expose.

I'm sure there will be much commonality between 2 similar
implementations, just as there is similar code in each index type. But
maintaining modularity is important and ahead of us actually seeing 2
implementations, trying to prejudge that is going to slow us all down
and likely screw us up.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Global Sequences

From

Markus Wanner

Date:

17 October 2012, 09:19:21

Simon,

On 10/17/2012 10:34 AM, Simon Riggs wrote:
> IMHO an API is required for "give me the next allocation of numbers",
> essentially a bulk equivalent of nextval().

Agreed. That pretty exactly matches what I described (and what's
implemented in Postgres-R). The API then only needs to be called every N
invocations of nextval(), because otherwise nextval() can simply return
a cached number previously allocated in a single step, eliminating a lot
of the communication overhead.

You realize an API at that level doesn't allow for an implementation of
options 1 and 2? (Which I'm convinced we don't need, so that's fine with
me).

> Anything lower level is going to depend upon implementation details
> that I don't think we should expose.

Exactly. Just like we shouldn't expose other implementation details,
like writing to system catalogs or WAL.

> I'm sure there will be much commonality between 2 similar
> implementations, just as there is similar code in each index type. But
> maintaining modularity is important and ahead of us actually seeing 2
> implementations, trying to prejudge that is going to slow us all down
> and likely screw us up.

Agreed. Let me add, that modularity only serves a purpose, if the
boundaries between the modules are chosen wisely. It sounds like we are
on the same page, though.

To testify this: IMHO an API for setval() is required to invalidate all
node's caches and re-set an initial value, as a starting point for the
next bulk of numbers that nextval() will return.

currval() doesn't need to be changed or "hooked" at all, because it's a
read-only operation.

Regards

Markus Wanner

Re: Global Sequences

From

Dimitri Fontaine

Date:

17 October 2012, 10:21:47

Simon Riggs <simon@2ndQuadrant.com> writes:
> On 16 October 2012 15:15, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> What you really want is something vaguely like nextval but applied to
>> a distinct type of object.  That is, I think we first need a different
>> kind of object called a "global sequence" with its own DDL operations.
>>
> hence a different solution. CREATE SEQUENCE is SQL Standard and used
> by SERIAL, many people's SQL, SQL generation tools etc.. My objective
> is to come up with something that makes the standard code work
> correctly in a replicated environment.

I think we still can have both. I like Tom's suggestion better, as it
provides for a cleaner implementation in the long run, I think.

Now, the way I see how to get a GLOBAL SEQUENCE by default when creating
a SERIAL column would be with an Event Trigger. To get there, we need to
implement some kind of "INSTEAD OF" Event Trigger, and the good news is
that we only need to do that in a very specialized command, not as a
generic facility. At least as a starter.
  CREATE EVENT TRIGGER global_sequences_please                    ON ddl_create_sequence                  WHEN context
in('generated')     EXECUTE PROCEDURE create_global_sequence_instead_thanks();

That would take care of any SERIAL or BIGSERIAL column creation and
leave alone manual CREATE SEQUENCE commands, as those would have a
context of 'toplevel' as opposed to 'generated'.

This context exposing and filtering is included in my current patch for
Event Triggers that I'm submitting quite soon to the next commit fest.

We still need to implement the "ddl_create_sequence" event that only
fires before create sequence and refrain from creating a sequence if it
did have an event trigger attached, whatever that did.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

Re: Global Sequences

From

Simon Riggs

Date:

18 October 2012, 06:56:57

On 17 October 2012 11:21, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> On 16 October 2012 15:15, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> What you really want is something vaguely like nextval but applied to
>>> a distinct type of object.  That is, I think we first need a different
>>> kind of object called a "global sequence" with its own DDL operations.
>>>
>> hence a different solution. CREATE SEQUENCE is SQL Standard and used
>> by SERIAL, many people's SQL, SQL generation tools etc.. My objective
>> is to come up with something that makes the standard code work
>> correctly in a replicated environment.
>
> I think we still can have both. I like Tom's suggestion better, as it
> provides for a cleaner implementation in the long run, I think.

Not sure how it is cleaner when we have to have trigger stuff hanging
around to make one object pretend to be another. That also creates a
chain of dependency which puts this into the future, rather than now.

The goal is make-sequences-work, not to invent something new that
might be cooler or more useful. If we create something new, then we
need to consider the references Daniel described, but that is a whole
different thing and already accessible if you need/want that.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Global Sequences

From

Simon Riggs

Date:

18 October 2012, 06:58:46

On 17 October 2012 10:19, Markus Wanner <markus@bluegap.ch> wrote:

> On 10/17/2012 10:34 AM, Simon Riggs wrote:
>> IMHO an API is required for "give me the next allocation of numbers",
>> essentially a bulk equivalent of nextval().
>
> Agreed. That pretty exactly matches what I described (and what's
> implemented in Postgres-R). The API then only needs to be called every N
> invocations of nextval(), because otherwise nextval() can simply return
> a cached number previously allocated in a single step, eliminating a lot
> of the communication overhead.
>
> You realize an API at that level doesn't allow for an implementation of
> options 1 and 2? (Which I'm convinced we don't need, so that's fine with
> me).
>
>> Anything lower level is going to depend upon implementation details
>> that I don't think we should expose.
>
> Exactly. Just like we shouldn't expose other implementation details,
> like writing to system catalogs or WAL.
>
>> I'm sure there will be much commonality between 2 similar
>> implementations, just as there is similar code in each index type. But
>> maintaining modularity is important and ahead of us actually seeing 2
>> implementations, trying to prejudge that is going to slow us all down
>> and likely screw us up.
>
> Agreed. Let me add, that modularity only serves a purpose, if the
> boundaries between the modules are chosen wisely. It sounds like we are
> on the same page, though.
>
> To testify this: IMHO an API for setval() is required to invalidate all
> node's caches and re-set an initial value, as a starting point for the
> next bulk of numbers that nextval() will return.
>
> currval() doesn't need to be changed or "hooked" at all, because it's a
> read-only operation.

Agreed....

API calls for sam_nextval_alloc() and sam_setval()
using Tom's Sequence Access Method naming.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Global Sequences

From

Simon Riggs

Date:

18 October 2012, 06:59:44

On 16 October 2012 18:29, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Or maybe better, invent a level of indirection like a "sequence access
> method" (comparable to index access methods) that provides a compatible
> set of substitute functions for sequence operations.  If you want to
> override nextval() for a sequence, don't you likely also need to
> override setval(), currval(), etc?  Not to mention overriding ALTER
> SEQUENCE's behavior.

Agreed, though with exact API as discussed on portion of thread with Markus.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Global Sequences

From

Dimitri Fontaine

Date:

18 October 2012, 11:15:00

Simon Riggs <simon@2ndquadrant.com> writes:
> Not sure how it is cleaner when we have to have trigger stuff hanging
> around to make one object pretend to be another. That also creates a
> chain of dependency which puts this into the future, rather than now.

Yes, that part isn't cleaner at all. The part where we have a separate
Object to deal with I like better, and I tried to reconciliate the two
view points.

Note that the event trigger would come installed and disabled, the user
would only have to activate it:
  ALTER EVENT TRIGGER distribute_my_cluster ENABLE;

Still the same issue.

> The goal is make-sequences-work, not to invent something new that
> might be cooler or more useful. If we create something new, then we
> need to consider the references Daniel described, but that is a whole
> different thing and already accessible if you need/want that.

So ok, I withdraw my consensus proposal. I tried.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

Re: Global Sequences

From

Robert Haas

Date:

18 October 2012, 15:08:45

On Tue, Oct 16, 2012 at 1:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> In particular, the reason proposing a hook first seems backwards is that
> if we have a catalog-level representation that some sequences are local
> and others not, we should be using that to drive the determination of
> whether to call a substitute function --- and maybe which one to call.
> For instance, I could see attaching a function OID to each sequence
> and then having nextval() call that function, instead of a hook per se.

Yeah, I like that.  That makes it easy to configure your database so
that some sequences have special behavior (which the database designer
can set up however they like) and others can be just vanilla, and the
plugin doesn't have to try to figure out which ones are which (which
was my first concern in reading Simon's original proposal).  To make
it even better, add some generic options that can be passed through to
the underlying handler.

So something like:

ALTER SEQUENCE wump   SET HANDLER (nextval my_magical_nextval, setval my_magical_setval)   OPTIONS
(any_label_you_want_the_handlers_to_get
'some_text_associated_with_the_label', another_label
'some_more_text');

That way you could say, for example, that sequence wump should get its
values from coordinator node 172.24.16.93 and that the global
identifier for this sequence is UUID
e15ea6e6-43d5-4f65-8efd-cf28a14a2d70.  That way you can avoid having
to make any assumptions about how local sequence names on particular
nodes are mapped onto global names.

> Or maybe better, invent a level of indirection like a "sequence access
> method" (comparable to index access methods) that provides a compatible
> set of substitute functions for sequence operations.  If you want to
> override nextval() for a sequence, don't you likely also need to
> override setval(), currval(), etc?  Not to mention overriding ALTER
> SEQUENCE's behavior.

This might be better, but it's also possibly more mechanism than we
truly need here.  But then again, if we're going to end up with more
than a handful of handlers, we probably do want to do this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Global Sequences

From

Tom Lane

Date:

18 October 2012, 15:15:58

Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Oct 16, 2012 at 1:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Or maybe better, invent a level of indirection like a "sequence access
>> method" (comparable to index access methods) that provides a compatible
>> set of substitute functions for sequence operations.  If you want to
>> override nextval() for a sequence, don't you likely also need to
>> override setval(), currval(), etc?  Not to mention overriding ALTER
>> SEQUENCE's behavior.

> This might be better, but it's also possibly more mechanism than we
> truly need here.  But then again, if we're going to end up with more
> than a handful of handlers, we probably do want to do this.

It's definitely a lot of mechanism, and if we can get away with
something simpler that's fine with me.  But I'd want to see a pretty
bulletproof argument why overriding *only* nextval is sufficient
(and always will be) before accepting a hook for just nextval.  If we
build an equivalent amount of functionality piecemeal it's going to
be a lot uglier than if we recognize we need this type of concept
up front.
        regards, tom lane

Re: Global Sequences

From

Simon Riggs

Date:

18 October 2012, 15:41:44

On 18 October 2012 16:08, Robert Haas <robertmhaas@gmail.com> wrote:

> To make
> it even better, add some generic options that can be passed through to
> the underlying handler.

Agreed

>> Or maybe better, invent a level of indirection like a "sequence access
>> method" (comparable to index access methods) that provides a compatible
>> set of substitute functions for sequence operations.  If you want to
>> override nextval() for a sequence, don't you likely also need to
>> override setval(), currval(), etc?  Not to mention overriding ALTER
>> SEQUENCE's behavior.
>
> This might be better, but it's also possibly more mechanism than we
> truly need here.  But then again, if we're going to end up with more
> than a handful of handlers, we probably do want to do this.

Let me have a play and see what comes out simplest. Somewhere in the
middle seems about right.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Global Sequences

From

Simon Riggs

Date:

18 October 2012, 15:43:17

On 18 October 2012 16:15, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> But I'd want to see a pretty
> bulletproof argument why overriding *only* nextval is sufficient
> (and always will be) before accepting a hook for just nextval.  If we
> build an equivalent amount of functionality piecemeal it's going to
> be a lot uglier than if we recognize we need this type of concept
> up front.

We discussed that we need only nextval() and setval() elsewhere, but
adding others is fairly cheap so we can chew on that when we have a
patch to discuss.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services