Re: Proposal: Exploring LSM Tree‑Based Storage Engine for PostgreSQL (Inspired by MyRocks) - Mailing list pgsql-hackers

From Aleksander Alekseev
Subject Re: Proposal: Exploring LSM Tree‑Based Storage Engine for PostgreSQL (Inspired by MyRocks)
Date
Msg-id CAJ7c6TNnz8aER62BO6Zu-QHZiR-cLGV=rir1XbsDPgtpcPoefw@mail.gmail.com
Whole thread Raw
List pgsql-hackers
Hi Manish,

> I’ve been exploring the idea of integrating an LSM tree–based storage engine into PostgreSQL — similar in spirit to
MyRocksfor MySQL — by replacing the underlying storage while preserving PostgreSQL’s upper layers (planner, executor,
MVCC,etc.). 
> [...]

Personally I wouldn't expect too much from RocksDB considering that
historically LSM-trees were designed to improve performance on HDDs.
The idea was to replace random writes with sequential ones. These days
we seem to be shifting to SSDs so this approach should be beneficial
in relatively niche cases when the amount of data is so large that
it's economically justified to buy HDDs... and you still want Postgres
for these tens or hundreds of terabytes of data, as a single instance
(no FDWs etc). In theory at least.

This being said, I don't want to discourage you from implementing a
TAM that would use RocksDB. This would be an excellent project for
several reasons:

* We could check if the above statement is true or false
* This could be considered as a reference 3rd party TAM implementation
* This also would be an implementation of index-organized tables
* Perhaps you will discover certain shortcomings of TAM API in the
process so we could improve it
* This promises to be a very-very fun project

> I’d love to hear your thoughts:
>
> Does this direction make sense for experimentation within the Postgres ecosystem?

Yes.

> Are there known architectural blockers or prior discussions/attempts in this space worth revisiting?

From the performance point of view, maybe. Note that RocksDB basically
has its own shared buffers, transactions and write ahead logging.
Makes me wonder if it would be simpler to implement LSM-tree from
scratch. On the flip side you can start by using RocksDB as a
key-value storage basically, see the results, and then replace it with
your own implementation with the same interface.

Supporting backups could be a challenge, including incremental ones.
I'm not sure if/how pg_basebackup and our other tools interact with
TAMs.

> Would such a project be best developed entirely as a fork, or is there openness to evolving TAM to better support
pluggablestorage with LSM‑like semantics? 

Please don't create forks. As a person whose responsibility besides
other things was to help maintain several PG forks and upgrade them to
the next upstream releases I assure you that you don't want doing this
ever.

Just use the TAM interface and create a seperate PostgreSQL extension
on GitHub. This mailing list could be used for general TAM
discussions. When and if your extension will become mature enough it
can be discussed whether it's worth adding to /contrib/ or not.

--
Best regards,
Aleksander Alekseev



pgsql-hackers by date:

Previous
From: Andreas Karlsson
Date:
Subject: Re: INSERT ... ON CONFLICT DO SELECT [FOR ...] take 2
Next
From: "Robin Haberkorn"
Date:
Subject: Re: Adding error messages to a few slash commands