Re: QSoC proposal: Rewrite pg_dump and pg_restore - Mailing list pgsql-hackers
From | Craig Ringer |
---|---|
Subject | Re: QSoC proposal: Rewrite pg_dump and pg_restore |
Date | |
Msg-id | 532BA367.3060604@2ndquadrant.com Whole thread Raw |
In response to | Re: QSoC proposal: Rewrite pg_dump and pg_restore (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: QSoC proposal: Rewrite pg_dump and pg_restore
|
List | pgsql-hackers |
On 03/21/2014 09:28 AM, Robert Haas wrote: > On Tue, Mar 18, 2014 at 8:41 PM, Alexandr <askellio@gmail.com> wrote: >> Rewrite (add) pg_dump and pg_restore utilities as libraries (.so, .dll & >> .dylib) > > This strikes me as (1) pretty vague and (2) probably too hard for a > summer project. > > I mean, getting the existing binaries to build libraries that you can > call with some trivial interface that mimics the existing command-line > functionality of pg_dump might be doable, but that's not all that > interesting. What people are really going to want is a library with a > sophisticated API that lets you do interesting things > programmatically. But that's going to be hard. AFAIK, nobody's even > tried to figure out what that API should look like. Even if we had > that worked out, a non-trivial task, the pg_dump source code is a > mess, so refactoring it to provide such an API is likely to be a job > and a half. ... and still wouldn't solve one of the most frequently requested things for pg_dump / pg_restore, which is the ability to use them *server-side* over a regular PostgreSQL connection. It'd be useful progress toward that, though. Right now, we can't even get the PostgreSQL server to emit DDL for a table, let alone do anything more sophisticated. Here's how I think it needs to look: - Design a useful API for pg_dump and pg_restore that is practical to use for pg_dump and pg_restore's current tasks (fastdatabase dump/restore) and also useful for extracting specific objects from the database. When designing, consider thatwe'll want to expose this API or functions that use it over SQL later. - Create a new "libpqdump" library. - Implement the designed API in the new library, moving and adjusting code from pg_dump / pg_restore where possible, writingnew code where not. - Refactor (closer to rewrite) pg_dump and pg_restore to use libpqdump, removing as much knowledge of the system catalogsetc as possible from them. - Make sure the result still performs OK THEN, once that's settled in: - Modify libpqdump to support compilation as a backend extension, with use of the SPI for queries and use of syscaches ordirect scans where possible. - Write a "pg_dump" extension that uses libpqdump in SPI mode to expose its API over SQL, or at least uses it to provideSQL functions to describe database objects. So you can dump a DB, or a subset of it, over SQL. After all, a "libpgdump" won't do much good for the large proportion of PostgreSQL users who use Java/JDBC, who can't use a native library (without hideous hacks with JNI). For the very large group who use libpq via language-specific client interfaces like the Pg gem for Ruby, psycopg2 for Python, DBD::Pg for Perl, etc, it'll require a lot of work to wrap the API and maintain it. Wheras a server-side SQL-callable interface would be useful and immediately usable for all of them. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: