Home > mailing lists

Re: Proposal: Exploring LSM Tree‑Based Storage Engine for PostgreSQL (Inspired by MyRocks) - Mailing list pgsql-hackers

From	Aleksander Alekseev
Subject	Re: Proposal: Exploring LSM Tree‑Based Storage Engine for PostgreSQL (Inspired by MyRocks)
Date	May 12 23:58:49
Msg-id	CAJ7c6TNnz8aER62BO6Zu-QHZiR-cLGV=rir1XbsDPgtpcPoefw@mail.gmail.com Whole thread Raw
List	pgsql-hackers

Tree view

Hi Manish,

> I’ve been exploring the idea of integrating an LSM tree–based storage engine into PostgreSQL — similar in spirit to
MyRocksfor MySQL — by replacing the underlying storage while preserving PostgreSQL’s upper layers (planner, executor,
MVCC,etc.). 
> [...]

Personally I wouldn't expect too much from RocksDB considering that
historically LSM-trees were designed to improve performance on HDDs.
The idea was to replace random writes with sequential ones. These days
we seem to be shifting to SSDs so this approach should be beneficial
in relatively niche cases when the amount of data is so large that
it's economically justified to buy HDDs... and you still want Postgres
for these tens or hundreds of terabytes of data, as a single instance
(no FDWs etc). In theory at least.

This being said, I don't want to discourage you from implementing a
TAM that would use RocksDB. This would be an excellent project for
several reasons:

* We could check if the above statement is true or false
* This could be considered as a reference 3rd party TAM implementation
* This also would be an implementation of index-organized tables
* Perhaps you will discover certain shortcomings of TAM API in the
process so we could improve it
* This promises to be a very-very fun project

> I’d love to hear your thoughts:
>
> Does this direction make sense for experimentation within the Postgres ecosystem?

Yes.

> Are there known architectural blockers or prior discussions/attempts in this space worth revisiting?

From the performance point of view, maybe. Note that RocksDB basically
has its own shared buffers, transactions and write ahead logging.
Makes me wonder if it would be simpler to implement LSM-tree from
scratch. On the flip side you can start by using RocksDB as a
key-value storage basically, see the results, and then replace it with
your own implementation with the same interface.

Supporting backups could be a challenge, including incremental ones.
I'm not sure if/how pg_basebackup and our other tools interact with
TAMs.

> Would such a project be best developed entirely as a fork, or is there openness to evolving TAM to better support
pluggablestorage with LSM‑like semantics? 

Please don't create forks. As a person whose responsibility besides
other things was to help maintain several PG forks and upgrade them to
the next upstream releases I assure you that you don't want doing this
ever.

Just use the TAM interface and create a seperate PostgreSQL extension
on GitHub. This mailing list could be used for general TAM
discussions. When and if your extension will become mature enough it
can be discussed whether it's worth adding to /contrib/ or not.

--
Best regards,
Aleksander Alekseev

pgsql-hackers by date:

From: Bruce Momjian
Date: 12 May, 22:22:22
Subject: Re: Request for Implementation of Custom Error Messages for CHECK Constraints

From: "Robin Haberkorn"
Date: 13 May, 00:00:17
Subject: Re: Adding error messages to a few slash commands

Re: Proposal: Exploring LSM Tree‑Based Storage Engine for PostgreSQL (Inspired by MyRocks) - Mailing list pgsql-hackers

Previous

Next