Removing crap from a git repository's history

When I google "git remove from history" (because I frequently forget the exact sequence of commands as I don't have to remove history very often), this is the first result. It almost works. Don't use it, use the second result. (To further be in favor of the second link, the first is from 2009, the second is from github itself and they're pretty good at keeping their material up-to-date with recent gits.)

My current git version is 1.7.3.4; not the bleeding edge, but if you're using 1.7 at the end of 2011 you're generally in good shape. Anyway, the "I don't know, I don't wanna know" version to getting rid of crap you don't want with some commentary in between:

$du -sh .git 946M .git  As you can see, the git repo I'm using is huge. Github soft-limits free users to 300MB; if I want people to fork, it needs to get much smaller. Fortunately, almost all that size comes from a glaring thirdparty/ directory and its history over four branches. (This git repo comes from a perforce one.) So let's kill it! $ git filter-branch --prune-empty --tree-filter 'rm -rf thirdparty/' HEAD


Why tree-filter instead of the faster index-filter? Who cares! I don't wanna know, I want this binary crap gone!

If you have multiple branches where the vile may rest, you have to make sure you switch to them and rerun the command after removing the thing it tells you to remove if you don't remove it before rerunning the command.

$git checkout otherbranch$ rm -rf .git/refs/original
$git filter-branch --prune-empty --tree-filter 'rm -rf thirdparty/' HEAD  Etc. Also note that if you have any tags that contained the evilness, make sure you delete those tags or they'll hoard it even after you complete this process. Anyway, presuming your tags are gone and you've killed the data for all the branches you care about: $ du -sh .git
986M	.git
$git reflog expire --expire=now --all$ git gc --prune=now
Counting objects: 116947, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (33198/33198), done.
Writing objects: 100% (116947/116947), done.
Total 116947 (delta 91775), reused 94657 (delta 76507)
Removing stale temporary file .git/objects/pack/tmp_pack_5HLLcY
$du -sh .git 712M .git$ git gc --aggressive --prune=now
Counting objects: 116947, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (109654/109654), done.
Writing objects: 100% (116947/116947), done.
Total 116947 (delta 92214), reused 17437 (delta 0)
$du -sh .git 551M .git  What's the problem here? At first it was even bigger than before! Now it's at least manageable but it should be smaller... Oh look. I forgot to delete my origin remote tracker and .git/refs/remotes. Let's do that and re-gc. $ git remote rm origin
$rm -rf .git/refs/remotes$ git gc --prune=now
Counting objects: 105938, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (16254/16254), done.
Writing objects: 100% (105938/105938), done.
Total 105938 (delta 82503), reused 105817 (delta 82398)
$du -sh .git 87M .git  Huzzah! Much better. Anyway, there you go. It's such a freaky process when there are lots of places where your data could be hiding still, it's so typical of a migration problem instead of a typical use problem. (Typically, if you do something that calls for history deletion like accidentally commiting a password file, you can amend it before anyone notices.) I hope someone finds this useful for when the first google result fails them and instead of clicking the second one they click the Nth one that this blog shows up as. (Okay so neither the first nor second tells you our little secret of --prune-empty with --tree-filter or if it really matters ;) You don't know and don't wanna know!) Posted on 2011-12-20 by Jach Tags: git, programming, tips Permalink: https://www.thejach.com/view/id/225 Back to the top Anonymous February 03, 2012 09:49:34 AM Now try running "git gc --aggressive" It's a good idea to squeeze your repo as much as possible before publishing it. Sam February 06, 2013 09:09:46 AM Here's my bad-ass version, called git-gc-all-ferocious! #!/bin/sh -ev git remote rm origin || true git branch -D in || true ( cd .git rm -rf refs/remotes/ refs/original/ *_HEAD logs/ ) git for-each-ref --format="%(refname)" refs/original/ | xargs -n1 --no-run-if-empty git update-ref -d git -c gc.reflogExpire=0 -c gc.reflogExpireUnreachable=0 -c gc.rerereresolved=0 -c gc.rerereunresolved=0 -c gc.pruneExpire=now gc --aggressive "$@"

No doubt I will have to add further crud to it as I discover new ways git tries to hold on to unwanted objects!

Comment using the form below

(Only if you want to be notified of further responses, never displayed.)

LaTeX allowed in comments, use $\\...\\$\$ to wrap inline and $$...$$ to wrap blocks.