Maintaining a fork of a repository

happy fork, happy life

Sometimes I find myself having to maintain a fork of an upstream open source project. This can be for various reasons. For example, it could be because a bug fix is needed and it will take some time until that fix makes upstream, or I require some completely custom changes that do not fit in the scope of the upstream project.

Over the years, I have seen others having this exact same need. People have various approaches for this, some better, some worse.

§the awful method

Disclaimer: This narrative is based on true events and factual occurrences. However, to protect the privacy and identities of the individuals involved, specific details, names, locations, and timelines have been altered or generalized. Any resemblance to actual persons, living or dead, or actual events is purely coincidental. The changes made are intended to preserve anonymity while still conveying the essence and implications of the events that took place. This account should be read as a representation of the underlying truths, not as a literal historical document.

I have seen some truly awful ones, the real contender is something like this:

1
git log -- subproject/
commit 21f206fa5c4a4e9573a93d5baadebf9b212f6be2 (HEAD)
Author: A Committer <a.committer@example.com>
Date: Wed Jan 24 13:24:56 2024 +0100

    JIRA-1234: some fix

...

commit 29a88911377ca6c2e09bcafd6ccecfc524c2027c
Author: Committer Two <committer.two@example.com>
Date: Tue Jun 6 12:04:47 2023 +0200

    Upgrade subproject to 0.3.2 (#17)

    * bring v0.3.2 from upstream
    * update build commands

commit ae0db845f7a661635d8ca1ac27ba49478e22ec5a
Author: Committer Two <committer.two@example.com>
Date: Fri Mar 31 10:19:22 2023 +0100

    Upgrade subproject to 0.2.3 (#13)

    * bring v0.2.3 from upstream
    * update readme

commit af425f71b92dcfc7819997ddbb92ec25767fd25c
Author: Committer One <committer.one@example.com>
Date: Wed Jan 11 13:11:01 2023 +0100

    update build process

commit 7d9e500ab81cf129fe97cd35bd236bf85e263e04
Author: Committer One <committer.one@example.com>
Date: Fri Jan 6 11:15:29 2023 +0100

    fix to a previous fix

commit c6cade349b2f74be50ce41ee383b0affa96cd892
Author: Fork Creator <fork.creator@example.com>
Date: Mon Dec 5 15:15:30 2023 +0100

    fix subproject patch to include some business requirement

commit 2a49ea1cdd4cecb041f80d0382b48bfb3e0c574c
Author: Fork Creator <fork.creator@example.com>
Date: Fri Dec 3 11:22:33 2022 +0100

    add fork of subproject at version 0.1.1

Please, don’t do this. There are many wrongs with this approach:

  • It’s not a fork, the subproject resides in a subdirectory of a larger repository. There is no easy way to find the diff between your changes and the upstream project.
  • All changes end up on the main branch. Again, this makes bringing future upstream changes impossible to reason about.
  • There are multiple committers repeating these mistakes over and over again.
  • Pull requests bringing new versions contain additional commits, and PRs are squashed. It’s not possible to clearly distinguish in the merge commit between upstream version upgrade and additional changes from the same PR.
  • No commit in this subdirectory matches upstream commits, what do you do if you ever have to patch an earlier version from the one you currently have on the main branch?

§a better approach

As mentioned in the lead, the two most often reasons for having to fork an upstream project are:

  1. A fix needs to be introduced to a released version of the upstream project.

    • For some upstream version v1.2.3, we will have a custom v1.2.3-patched version.
    • When the upstream project releases version v1.2.4, we want to have the ability to create a clean v1.2.4-patched.
    • If our changes are applicable to the upstream project, we will also apply our fixes to the main branch and contribute those to the upstream main branch.
    • Our changes will eventually get merged into the upstream project and released in, say, 1.3.0. At this stage we do not need v1.3.0-patched version because we can continue using the upstream code since our fixes have been incorporated.
    • Normally, a version released by the upstream project will be an idempotent tag. It means: once a tag is released, the tag is set in stone. Any problem in a released version mandates releasing another version containing the fix. A tag. once created, is never deleted and/or recreated.
  2. We require changes that do not fit in the upstream project.

    • For every new version of the upstream project, we want to have the [upstream-version]-patched version available, potentially forever.

In both cases we want that our main branch tracks the upstream main branch, and that for each new version we can extract a clean patch. A patch may evolve.

§clone the fork

How do we do this? Let’s assume a https://github.com/someproject/project.git repository. We create a fork under https://github.com/ours/project.git.

1
git clone git@github.com:ours/project.git

§add the upstream remote

After cloning, we have only one remote, the origin:

1
git remote -v
> origin  git@github.com:ours/project.git (fetch)
> origin  git@github.com:ours/project.git (push)

Add the upstream remote:

1
git remote add upstream https://github.com/someproject/project.git

And verify:

1
git remote -v
> origin    git@github.com:our/project.git (fetch)
> origin    git@github.com:ours/project.git (push)
> upstream  https://github.com/someproject/project.git (fetch)
> upstream  https://github.com/someproject/project.git (push)

§synchronizing ours main with upstream main

To bring changes from the upstream main branch to ours main branch:

1
2
git checkout main
git merge upstream/main

§bringing new upstream tags

To bring any new upstream tags and push them to our fork:

1
2
git fetch upstream
git push origin --tags

§creating and maintaining a patch for a tag

Finally, the core of this article. Let’s assume that v1.2.3 tag is the most recent released version of the upstream project that we need to patch. The course of action is:

  1. Check out the release tag:

    1
    
    git checkout v1.2.3
    
  2. Create a branch from the tag:

    1
    
    git checkout -b v1.2.3-patched
    
  3. Make any changes and leave the branch as is. Do not merge, do not create any PR. If your patch requires additional commits, just add them to this branch. To see the differences between upstream v1.2.3 and v1.2.3-patched, execute:

    1
    
    git diff v1.2.3..v1.2.3-patched
    
  4. A new version is available and requires patching. How do we go about it?

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    
    # get the patch from the previous, last released version:
    git diff v1.2.3..v1.2.3-patched > ~/v1.2.3-patched.diff
    # bring new versions in:
    git fetch upstream
    git push origin --tags
    # check out the clean tag:
    git checkout v1.2.4 
    # create a branch to patch:
    git checkout -b v1.2.4-patched
    # apply the patch from the previous version:
    git apply ~/v1.2.3-patched.diff
    

    There are two possible outcomes:

    1. The patch will apply cleanly meaning we’re done, push the branch to the origin, do not create any PRs, do not merge.
    2. The patch does not apply. In this case fix the patch so that it applies and use it as the most recent patch to use for any future versions.
  5. What’s the result?

    1
    
    git branch
    
      main
      v1.2.3-patched
    * v1.2.4-patched
    
    1
    
    git tag -l
    
    ...
    v1.2.0
    v1.2.1  
    v1.2.4
    v1.2.3
    

§closing words

By following this method you will always have:

  • A clean origin main branch tracking the upstream main branch.
  • A patch per version on a respective *-patched branch, with the most recent version branch being the one where the patch can be extended if needed.
    • Any previous version can have changes backported by cherry-picking changes from more recent versions.