Re: pg_dump --split patch - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: pg_dump --split patch |
Date | |
Msg-id | AANLkTinuS5H5QxhS=PQajpDc3W0Sg9tXVw6JkgDsQiCs@mail.gmail.com Whole thread Raw |
In response to | Re: pg_dump --split patch (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: pg_dump --split patch
Re: pg_dump --split patch |
List | pgsql-hackers |
On Mon, Jan 3, 2011 at 1:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On the specific issue of overloaded functions, I have a feeling that >> the only feasible option is going to be to put them all in the same >> file. If you put them in different files, the names will either be >> very long (because they'll have to include the argument types) or >> fairly incomprehensible (if you did something like hash the argument >> types and append 8 hex digits to the function name) or not all that >> static (if you use OIDs; or if you number them sequentially, like >> foo1.sql, foo2.sql, foo3.sql, then foo3.sql might end up as foo2.sql >> on a system where there are only two variants of foo, making diff not >> work very well). > > If you put all the variants in the same file, diff is *still* not going > to work very well. At least not unless you solve the problems that keep > pg_dump from dumping objects in a consistent order ... and once you do > that, you don't need this patch. That's not really true. It's a whole lot easier to look a diff of two 100-line files and then repeat that N times than to look at a single diff of two N*100 line files. I certainly spend enough of my patch-review doing "git diff master <some particular source file>", and then if what's going on isn't clear you can look at just that file in more detail without worrying about every other source file in the system. And I have encountered this problem when comparing database schemas (and sometimes data) also. Yes, I've done that using diff. Yes, it did suck. Yes, I got it done before my boss fired me. >> I think the problem with this patch is that different people are >> likely to want slightly different things, and there may not be any >> single format that pleases everyone, and supporting too many variants >> will become confusing for users and hard for us to maintain. > > Yeah, that's exactly it. I can think of some possible uses for > splitting up pg_dump output, but frankly "to ease diff-ing" is not > one of them. For that problem, it's nothing but a crude kluge that > only sort-of helps. If we're to get anywhere on this, we need a > better-defined problem statement that everyone can agree is worth > solving and is well solved with this particular approach. I have to admit I'm a bit unsold on the approach as well. It seems like you could write a short Perl script which would transform a text format dump into the proposed format pretty easily, and if you did that and published the script, then the next poor shmuck who had the same problem could either use the script as-is or hack it up to meet some slightly different set of requirements. Or maybe you'd be better off basing such a script on the custom or tar format instead, in order to avoid the problem of misidentifying a line beginning with --- as a comment when it's really part of a data item. Or maybe even writing a whole "schema diff" tool that would take two custom-format dumps as inputs. On the other hand, I can certainly think of times when even a pretty dumb implementation of this would have saved me some time. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: