Home > mailing lists

Re: Built-in Raft replication - Mailing list pgsql-hackers

From	Konstantin Osipov
Subject	Re: Built-in Raft replication
Date	April 16, 2025 12:53:09
Msg-id	Z_9-BR89w-DLeFv3@ark Whole thread Raw
In response to	Re: Built-in Raft replication (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
List	pgsql-hackers

Tree view

* Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> [25/04/16 11:06]:
> > My view is what Konstantin wants is automatic replication topology management. For some reason this technology is
calledHA, DCS, Raft, Paxos and many other scary words. But basically it manages primary_conn_info of some nodes to
providesome fault-tolerance properties. I'd start to design from here, not from Raft paper.
 
> >
> In my experience, the load of managing hundreds of replicas which all
> participate in RAFT protocol becomes more than regular transaction
> load. So making every replica a RAFT participant will affect the
> ability to deploy hundreds of replica.

I think this experience needs to be detailed out. There are
implementations in the field that are less efficient than others.

Early etcd-raft didn't have pre-voting and had "bastardized" 
(their own definition) implementation of configuration changes
which didn't use joint consensus.

Then there is a liveness issue if leader election is implemented
in a straightforward way in large clusters. But this is addressed:
scaling up the randomized election timeout with the cluster size,
converting most of participants to non-voters in large clusters. 

Raft replication, again, if implemented in a naive way, would
require a O(outstanding transaction) * number of replicas amount of
RAM. But that doesn't have to be naive.

To sum up, I am not aware of any principal limitations in this
area.

-- 
Konstantin Osipov, Moscow, Russia

pgsql-hackers by date:

From: Ashutosh Bapat
Date: 16 April 2025, 12:47:54
Subject: Re: Fundamental scheduling bug in parallel restore of partitioned tables

From: Konstantin Osipov
Date: 16 April 2025, 12:58:32
Subject: Re: Built-in Raft replication

Re: Built-in Raft replication - Mailing list pgsql-hackers

Previous

Next