TheJach.com

Jach's personal blog

(Largely containing a mind-dump to myselves: past, present, and future)
Current favorite quote: "Supposedly smart people are weirdly ignorant of Bayes' Rule." William B Vogt, 2010

Migrating and remerging perforce histories into git with git grafts

I recently migrated a perforce project to git. The perforce project had some branches, but the git-p4 script sucks with branches. (Well, it's really perforce, but I digress.) Anyway, as a first step I got all the branches from p4 in to the git repo as totally separated branches.


$ git-p4 clone //open/dev/@all project
# 'open/dev/' becomes the new 'master'
$ cd project
$ git-p4 sync --branch=dy //open/dy/dev/@all


I repeated the sync command for the rest of the branches. So now I had everything, but if you looked at a graph of the git commit history you would see four "swim lanes" for the four branches, each completely separated from the other. Naturally we made integrations on the perforce side to merge things together, but git-p4 sucks at detecting that. After some trial I finally got most of those integrations back. (Some integrations were from branches we're not migrating into git so they sort of just pop into existence.)

Git provides a way to graft a new parent-child topology into an existing repo. The general idea is you create a text file with "childsha parentsha [parent2sha parent3sha ...]" where each sha is the full hash of the commit you want to form a relationship with. So we're saved! Sort of. We were fortunate that we had a pretty standard and consistent commit message requirement for integrations, otherwise we would have been hosed. (Perforce's project-specific differences are also probably why git-p4 can't generalize, but I don't know.)

Our integration messages are generally of the form: "DY/DEV: Integrate down to dy/dev from open/dev@14428". git-p4 happily puts some perforce metadata at the end of the commit messages as well, so in addition on the git side we also see "[git-p4: depot-paths = "//open/dy/dev/": change = 14431]" in the body section of every commit message migrated over. That means we can parse out the "@14428", then go find the git-commit where "change = 14428" is in its message, and establish a parent-child link between that commit and the current commit to store in our graft file. I did manually link the parent-child relationships of the branch "creation" points before scripting it.

I implemented the graft generation step in three steps. The first was to generate a list of current commits and their current parents, and stashing that in a file called "orig".


git checkout master
git rev-list --branches --parents --all > orig


Since the very first (in time) commit technically doesn't have parents per se, the last line of the file contained only one hash id, so I just got rid of it.

In my second step, I generated a list of all parent-child commits due to being an integration. I did this with some cool bash tricks, the first being to use a dictionary (aka hash table, associative array, etc.) to store "perforce-change-id" => "git-sha-for-commit", then the second being to use a bash trick to extract groups from a regex pattern match.

I'm sure there's a better way to do this; particularly in getting a master log. (I think git log --branches --all might work.) Since I can't count on newlines as delimiters, I just used ß which is symbol-key + s + s on Linux systems. The '~~' in there was mostly just for my own sanity during testing since there wasn't always a clear distinction between subject and body.


changes=''
declare -A changes
log=''
git checkout master
log=$log`git log --format='%H %s ~~ %b ß' | tr [:upper:] [:lower:]`
git checkout dy
log=$log`git log --format='%H %s ~~ %b ß' | tr [:upper:] [:lower:]`
git checkout dt
log=$log`git log --format='%H %s ~~ %b ß' | tr [:upper:] [:lower:]`
git checkout lu
log=$log`git log --format='%H %s ~~ %b ß' | tr [:upper:] [:lower:]`

while IFS='ß' read -ra A; do
for e in "${A[@]}"; do
[[ $e =~ ([0-9a-f]+).+~~.+\[.*change\ =\ ([0-9]+)\] ]]
if [ $? == 0 ]; then
commit=${BASH_REMATCH[1]}
p4rev=${BASH_REMATCH[2]}
changes[$p4rev]=$commit
fi
done
done <<< $log

while IFS='ß' read -ra A; do
for e in "${A[@]}"; do
[[ $e =~ ([0-9a-f]+).+integrat.+@([0-9]+).+~~.+\[.*change\ =\ ([0-9]+)\] ]]
if [ $? == 0 ]; then
commit=${BASH_REMATCH[1]}
parentp4rev=${BASH_REMATCH[2]}
p4rev=${BASH_REMATCH[3]}
parent=${changes[$parentp4rev]}
if [ "$parent" != "" ]; then
echo $commit $parent
fi
fi
done
done <<< $log > integs


Because we didn't preserve all possible branches, there were cases where changes[parentid] didn't exist because it was an integration from some other branch and so our log wouldn't know about it. Anyway, at the end of that we have a file called integs that stores the new parent relationships we want.

The third step is to combine "orig" with "integs" and basically give hashes two or more parents; if you just keep integs as-is it will overwrite the previous parent relationships instead of appending, and so everything will squash down into the master branch when you graph it.

I combined them with a simple (and stupid) python script.


crap = {}
f = open('integs')
l = f.readlines()
for line in l:
c,p = line.replace('\n', '').split(' ')
if c in crap:
crap[c].update([p])
else:
crap[c] = set([p])

f3 = open('orig')
l = f3.readlines()
for line in l:
c,p = line.replace('\n', '').split(' ')
if c in crap:
crap[c].update([p])
else:
crap[c] = set([p])

f2 = open('grafts', 'w')
for k in crap:
f2.write(k + ' ' + ' '.join(crap[k]) + '\n')

f2.close()


Yes, that's a horribly ugly script (just like the bash example) but it got the job done. I used a set to enforce uniqueness of parents since I had some duplicates otherwise.

So I have a grafts file now! Copy it into .git/info/grafts and then for every branch run:


git filter-branch -f


At the end you should have something interesting.

Hopefully this helps someone else.


Posted on 2011-12-21 by Jach

Tags: bash, git, programming, tips

Permalink: https://www.thejach.com/view/id/226

Trackback URL: https://www.thejach.com/view/2011/12/migrating_and_remerging_perforce_histories_into_git_with_git_grafts

Back to the top

Back to the first comment

Comment using the form below

(Only if you want to be notified of further responses, never displayed.)

Your Comment:

LaTeX allowed in comments, use $$\$\$...\$\$$$ to wrap inline and $$[math]...[/math]$$ to wrap blocks.