Migrate from Perforce to Git
written on Saturday, December 12, 2015
One of our customers is currently using Perforce as their version control system. Some parts of their software is already in Git, but two huge repositories are still maintained in Perforce. Years ago, they decided that they want to move to Git and weed out Perforce altogether. Since then their Perforce server is basically unmaintained and the maintenance contract with Perforce has expired years ago. It's in very bad shape and the limitations of Perforce are daunting.
Now, after several years of basically leaving everything as is, the customer decided to tackle this issue again and finally get rid of Perforce. A simplified version of the transition plan looks as follows:
- Bring the internal git server (Gitlab) up-to-date.
- Create a one-way bridge between Perforce and Git to periodically sync new changes from Perforce to Git.
- Move the build system and infrastructure to Git.
- Educate developers to use Git (for those that don't know it).
- Shutdown the Perforce server permanently.
This blog post is about the second step, the one-way bridge between Perforce and Git.
Here is an informal list of requirements:
- Keep the entire history of interesting branches. These include: the current master branch, a few long running development branches and many release branches.
- Preserve branch points, i.e. the point in time where two branches diverged.
- During the transition period, all new changes committed in Perforce should be periodically synced to Git (incremental updates).
- Make incremental updates fast.
The one-way bridge
Here is how one can migrate a repository from Perforce to Git with support for multiple branches, an unimpaired history and support for incremental updates.
Create the Git repository
Create a new local Git repository and setup a Git remote. For the initial import, the remote is empty and no data can be fetched from it. On subsequent runs, most of the data is already in Git and fetching the data directly from Git is way faster than extracting the commits from Perforce.
Sync Perforce branches into Git
Each Perforce branch of interest needs to be checked for updates. Git-p4 provides the sync subcommand which may be used for this purpose. If a copy of the Perforce branch is not yet in Git, import the entire history of it into a dedicated Git branch. If it is already in Git, just update the local Git branch and import all new changes since the last run. I used the prefix p4/ for Git branches that track a Perforce branch, e.g. p4/main tracks the Perforce main branch.
Find all branches with updates
Find all Git branches that were updated since they need further processing. On the first run, all branches need to be updated. On all subsequent runs, the number of branches to update should be fairly small, if any.
Rewrite history to restore branch points
After the import, the branches have no relationship with each other and the repository might look as follows:
J---K---L---M p4/dev G---H---I p4/release A---B---C---D---E---F p4/main
In reality, the branches do relate to each other and the repository should look like this:
G---H---I p4/release / A---B---C---D---E---F p4/main \ J---K---L---M p4/dev
Unfortunately, the commit-parent relationship got lost during the import. One can use grafts to restore the relationship between commits:
Graft points or grafts enable two otherwise different lines of development to be joined together. It works by letting users record fake ancestry information for commits. This way you can make git pretend the set of parents a commit has is different from what was recorded when the commit was created.
Since the commit volume on the Perforce repository is relatively low, obtaining the graft points is straight forward in my case:
- Extract the current graft points from the Git repository.
- Find the SHA1 of the first commit on each branch and add the SHA1 of the commit that happened right before as a second parent. If no commit is available, do nothing.
- Store the modified grafts file under .git/info/grafts.
- Rewrite history using git filter-branch to permanently apply the grafts.
- Delete the grafts file.
From the Perforce branch to the final Git branch
Since rewriting history permanently alters the repository, it is a very bad idea to do it on public branches. Furthermore, Git-p4 is not amused if one messes with the history of the p4/ branches. After some experiments, I have decided for the following workflow to migrate a Perforce branch to a final Git branch:
Perforce depot main + | | | clone ---------------- | sync +-----+ | | | rewrite | | | history v v + Git repository (local) p4/main +-----------> tmp/main +-----------> main branch branch ^ merge ^ | | | | | push | push ---------------- | pull | pull | | | | v v Git repository (remote) p4/main main
The above diagram illustrates the workflow for a single branch, main:
- Clone or sync the Perforce branch into the local Git branch.
- Create a temporary branch for each branch that got updated. Those branches have the prefix tmp/, e.g. tmp/main is the temporary branch for p4/main.
- Create the grafts file and rewrite history for all temporary branches.
- Create the final branch from the temporary branch, e.g main is the final branch for the Perforce main branch and is branched from tmp/main. On the first run, the final branch does not exist, so simply create it from the temporary branch. If it does exist, perform a fast-forward only merge to get new commits from the temporary branch into the final branch.
Cleanup and publish
Cleanup the local Git repository and remove all temporary branches since they should not be pushed to the remote. After cleanup, publish all branches under p4/ and all final branches to the remote.
I wrote a small one-way bridge in Python that implements the above steps. Unfortunately, it is written in Python 2 due to a fairly ancient server infrastructure where Python 3 is not available. You can download the script and a sample configuration from here.
- Lukas for reviewing this blog post and his valuable input on this topic.
Until next time.