Re: PostgreSQL Developer meeting minutes up - Mailing list pgsql-hackers
From | Markus Wanner |
---|---|
Subject | Re: PostgreSQL Developer meeting minutes up |
Date | |
Msg-id | 20090529084109.14871sskioiu9gud@mail.bluegap.ch Whole thread Raw |
In response to | Re: PostgreSQL Developer meeting minutes up (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: PostgreSQL Developer meeting minutes up
|
List | pgsql-hackers |
Hi, Quoting "Robert Haas" <robertmhaas@gmail.com>: > That's not the best news I've had today... Sorry :-( > To me they sound complex and inconvenient. I guess I'm kind of > mystified by why we can't make this work reliably. Other than the > "broken tags" issue we've discussed, it seems like the only real issue > should be how to group changes to different files into a single > commit. Once you do that, you should be able to construct a > well-defined, total function f : <cvs-file, cvs-revision> -> <git > commit> which is surjective on the space of git commits. In fact it > might be a good idea to explicitly construct this mapping and drop it > into a database table somewhere so that people can sanity check it as > much as they wish. Why is this harder than I think it is? Well, as CVS doesn't guarantee any consistency between files, you end up with silly situations more often than you think. One of the simplest possible example is something like: commit 1: fileA @ 1.1, fileB @ 1.2 commit 2: fileA @ 1.2, fileB @ 1.1 Seen from fileA, it's obvious that commit 1 (@1.1) comes before commit 2 (@1.2), but seen from fileB it's the exact opposite. The most promising approach to solve these problems seems to be based on Graph Theory, where you work with a graph of dependencies from fileA @ 1.1 to fileA @ 1.2. To resolve the above situation, you'd have "split" a blob of single-file commits into two end-result commits (for monotone / git). In the above example, you'd have two options to resolve the conflict: commit 1a: fileA @ 1.1 commit 2: fileA @ 1.2, fileB @ 1.1 commit 1b: fileA @ 1.2 Or: commit 2a: fileB @ 1.1 commit 1: fileA @ 1.1, fileB @ 1.2 commit 2b: fileB @ 1.2 (Note that often enough, these have actually been separate commits in CVS as well, there's just no way to represent that. And no, timestamps are simply not reliable enough). Now add tags, branches and cyclic dependencies involving many files and many 100 commits to the example above and you start to get an idea of the complexity of the problem in general. See my description and diagrams of the steps used for cvs_import in monotone at [1] or follow descriptions of how cvs2svn works internally. A few numbers about a conversion I'm trying for testing my algorithm and heuristics. It's converting a pretty recent snapshot of the Postgres repository: * running at 100% CPU time since: April, 17 * Total number of files involved: 6'847 * total number of blobs (before splitting):28'010 * blobs split due to cyclic dependencies: 12'801 Admittedly, my algorithm isn't optimized at all. However, I'm focusing on good results rather than speed of conversion. Also note, that monotone uses SQLite, so it actually stores the results of this conversion in an SQL database, as you proposed. Recently, a git_export command has been added, so that's definitely worth a try for converting CVS to git. However, I fear cvs2git is more mature. Regards Markus Wanner [1]: a description of the various steps in conversion from CVS to monotone: http://www.monotone.ca/wiki/CvsImport/
pgsql-hackers by date: