Git Unite – Fix Case Sensitive File Paths on Windows

Git Unite is a utility that fixes case sensitive file paths present in a git repository index on Windows. Since Windows is not case sensitive, the git index case sensitivity issue does not manifest itself until browsing the code repository on GitHub or cloning the repository to a case sensitive file system on Linux.

Introducing case sensitive file paths into the git index on a case insensitive operating system like Windows is easier than you think. A simple ‘git mv .\Where\Waldo where\is\Waldo‘ is all you need to create two separate paths in the git index, but the Windows working directory will only report one. There might be git config settings that help avoid this problem, but controlling the settings and behavior of 20+ contributors on a project team is nearly impossible.

The problem is exacerbated when hundreds of files are moved during a repository layout reorganization. If the user moving the files is not careful, these case sensitive path names will pollute the git index but appear fine in the working directory. Cleaning up these case sensitive file path issues on Windows is tedious, and this is where Git Unite helps out.

Git Unite will search the git repository index for file paths that do not match the same case that Windows is using. For each git index path case mismatch found, Git Unite will update the git index entry with the case reported by the Windows file system.

Usage

Usage: Git.Unite [OPTIONS]+ repository
Unite the git repository index file paths with current Windows case usage.
If no repository path is specified, the current directory is used.

Options:
      --dry-run              dry run without making changes
  -h, --help                 show this message and exit

History

I work on a project that has one particular git repository tracking over 7,000 files. The repository contains a mixture of ASP.NET MVC3 code, SQL Server SSIS ETL packages, and PowerShell scripts. It all started one day when an ETL developer could not locate the package she developed on the GitHub web site.

I took a look at the git repository on her machine and the ETL package was clearly there under an Etl\Some\Dir\Path folder. The repository reported being up to date with origin/master, but it took several minutes before I noticed an etl and Etl folder on the GitHub web site.

It turns out that the ETL team was in the process reorganizing the ETL packages into a new directory structure layout. I booted up a VM running Ubuntu and cloned the repository down to a case sensitive file system. I found 694 ETL files that were tracked in the git index with a directory path case different than the one reported by the Windows file system.

I fixed the problem by using a combination of find, sort, and awk to build a bash script to run the 694 git mv commands. This was a painful process that I did not want to repeat so I decided to build a tool anyone on the team could use on Windows to fix the problem.

In fact, two months later the same issue appeared again in a different repository. This time I was able to install the Git Unite utility on the user’s machine and fix the issue in a couple minutes. We tracked down the source of the problem to a developer that hand-typed the target directory of a git mv command in all lowercase.

Example Scenario

Here is a representative example using Posh-Git on Windows 7 as to how someone can introduce case sensitive file paths on a case insensitive file system.

Step 1 – Create a new git repository and push it to GitHub

C:\demo> mkdir Where
C:\demo> touch .\Where\Waldo
C:\demo> touch .\Where\IsHere
C:\demo> git init .
Initialized empty Git repository in C:/demo/.git/
C:\demo [master +1 ~0 -0 !]> git add .
C:\demo [master +2 ~0 -0]> git commit -m initial
[master (root-commit) 42ea0fc] initial
 0 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 Where/IsHere
 create mode 100644 Where/Waldo

C:\demo [master]> git remote add origin git@github.com:tawman/waldo.git
C:\demo [master]> git push -u origin master
Counting objects: 4, done.
Delta compression using up to 6 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (4/4), 265 bytes, done.
Total 4 (delta 0), reused 0 (delta 0)
To git@github.com:tawman/waldo.git
 * [new branch]      master -> master
Branch master set up to track remote branch master from origin.

When we look on GitHub the repository appears as expected:Initial repository as seen on GitHub

Step 2 – Start asking some questions

C:\demo [master]> mkdir .\Where\Is
C:\demo [master]> touch .\Where\Is\He
C:\demo [master +1 ~0 -0 !]> git add -A
C:\demo [master +1 ~0 -0]> git commit -m "Good Question"
[master 3d9006e] Good Question
 0 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 Where/Is/He

Keep a close eye on where Waldo is going…

C:\demo [master]> git mv .\Where\Waldo where\is\Waldo
C:\demo [master +0 ~1 -0]> git commit -m "Find Me"
[master 35f843b] Find Me
 1 files changed, 0 insertions(+), 0 deletions(-)
 rename {Where => where/is}/Waldo (100%)

C:\demo [master]> find Where
Where
Where/Is
Where/Is/He
Where/Is/Waldo
Where/IsHere
C:\demo [master]> ls


    Directory: C:\demo


Mode                LastWriteTime     Length Name
----                -------------     ------ ----
d----         1/12/2013  10:54 PM            Where

Seems quite obvious Where Waldo is, but let’s check what GitHub thinks:

C:\demo [master]> git push
Counting objects: 11, done.
Delta compression using up to 6 threads.
Compressing objects: 100% (5/5), done.
Writing objects: 100% (9/9), 683 bytes, done.
Total 9 (delta 1), reused 0 (delta 0)
To git@github.com:tawman/waldo.git
   42ea0fc..35f843b  master -> master

It would appear that git and GitHub have narrowed down the location of Waldo to one of two possible locations:GitHub is not exactly sure where he is at

Step 3 – Let the confusion begin

C:\demo [master]> ls .\Where\Is\Waldo


    Directory: C:\demo\Where\Is


Mode                LastWriteTime     Length Name
----                -------------     ------ ----
-a---         1/12/2013  10:50 PM          0 Waldo

According to Windows, Waldo should be hanging out right here:Is he here?Unfortunately, according to git he is hanging out over there:

Or is he here?

Step 4 – Get everyone back on the same page with Git Unite

C:\demo [master]> Git.Unite.exe C:\demo
C:\demo [master +0 ~1 -0]> git status
# On branch master
# Changes to be committed:
#   (use "git reset HEAD ..." to unstage)
#
#       renamed:    where/is/Waldo -> Where/Is/Waldo
#
C:\demo [master +0 ~1 -0]> git commit -m fixed
[master 4495f40] fixed
 1 files changed, 0 insertions(+), 0 deletions(-)
 rename {where/is => Where/Is}/Waldo (100%)
C:\demo [master]> git push
Counting objects: 7, done.
Delta compression using up to 6 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (4/4), 354 bytes, done.
Total 4 (delta 0), reused 0 (delta 0)
To git@github.com:tawman/waldo.git
   35f843b..4495f40  master -> master

Git Unite clears up the confusion by reconciling the git index file path with the same case Windows is using. When I go back and look at the repository on GitHub, there is only one place Where Waldo could be:Everyone is back Where expected As far as Windows was concerned, Waldo was here the whole time:I knew he was here the whole time

Fork me on GitHub

Fork me on GitHub

Be Sociable, Share!

About Todd Wood

Solution Architect and owner of Wood Consulting Practice, LLC. C#, ASP.NET MVC, Oracle, RoR, and Linux et al rolled up into one on my Mac.
This entry was posted in Git and tagged , , . Bookmark the permalink.
  • mikethescott

    Todd, this turned out to be a*very* timely post for us. I’ve pulled down the source, built it, and can perform a dry run, but running it I get an error (sorry for the wall of text):

    Unhandled Exception: System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. —> System.NullReferenceException: Object reference not set to an instance of an object.

    at LibGit2Sharp.Core.Ensure.Success(Int32 result, Boolean allowPositiveResult)

    at LibGit2Sharp.Index.RemoveFromIndex(String relativePath)

    — End of inner exception stack trace —

    at System.RuntimeMethodHandle._InvokeMethodFast(IRuntimeMethodInfo method, Object target, Object[] arguments, SignatureStruct& sig, MethodAttributes methodAttributes, RuntimeType typeOwner)

    at System.RuntimeMethodHandle.InvokeMethodFast(IRuntimeMethodInfo method, Object target, Object[] arguments, Signature sig, MethodAttributes methodAttributes, RuntimeType typeOwner)

    at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture, Boolean skipVisibilityChecks)

    at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)

    at System.Reflection.MethodBase.Invoke(Object obj, Object[] parameters)

    at LibGitUnite.UniteRepository.Unite(IEnumerable`1 sourcePaths, IEnumerable`1 destinationPaths) in c:Projectsgit-unitesrcLibGitUniteUniteRepository.cs:line 69

    at LibGitUnite.UniteRepository.Unite(String sourcePath, String destinationPath) in c:Projectsgit-unitesrcLibGitUniteUniteRepository.cs:line 43

    at LibGitUnite.GitUnite.Process(String path, Boolean dryrun) in c:Projectsgit-unitesrcLibGitUniteGitUnite.cs:line 64

    at Git.Unite.Program.c__DisplayClass4.b__3(String p) in c:Projectsgit-unitesrcGit.UniteProgram.cs:line 44

    at System.Collections.Generic.List`1.ForEach(Action`1 action)

    at Git.Unite.Program.Main(String[] args) in c:Projectsgit-unitesrcGit.UniteProgram.cs:line 44

    Any insight you could provide would be greatly appreciated!

    • http://www.woodcp.com/ Todd A. Wood

      What command line invocation are you using to specify the working dir? I will clone down the repo and try to reproduce.

      Thanks.

      • mikethescott

        Hi. You’re correct. After a dry run picked out the changes it would make, I ran it from the bin/Debug directory with no parameters other than the directory containing the repository:

        C:Projectsgit-unitesrcGit.UnitebinDebug> git.unite C:Projectsmatlab

        and got the stack trace above. Sorry if that wasn’t clear from my description.

        • http://www.woodcp.com/ Todd A. Wood

          It looks like there might be one or more files that simply changing case causes unintended limitations. I pushed up a change to wrap the LibGit2Sharp remove/add to index calls in a try catch. Pull down the changes and see if it identifies the file(s) causing problems with a simple rename.

          https://github.com/tawman/git-unite/commit/84ce1a4635168023fd4cdc9feb866b9042fbc623

          Thanks.

          • mikethescott

            Sure enough, there’s some files matching that description:

            C:Projectsgit-unitesrcGit.UnitebinDebug>Git.Unite C:Projectsmatlab

            Git.Unite c:Projectsmatlab

            error changing: third_partyapplication_componentsMATLABpmtk3-1nov12demoscatFAdemoAuto.m~ -> Third_PartyApplication_ComponentsMATLABpmtk3-1nov12demoscatFAdemoAuto.m~ [Exception has been thrown by the target of an invocation.]

            … and several (dozen) more…

            they appear to be artifacts of one of my cow-orkers using emacs or the like to edit some of their files;

            The rest of the changes applied, and all of the simple moves were successful. I should be able to manually remove these stragglers on one of our linux machines.

          • http://www.woodcp.com/ Todd A. Wood

            Excellent. Glad I could help you out, and I will try to track down that edge case. By chance, does the catFAdemoAuto.m~ file exist in both directory locations?

          • mikethescott

            It does indeed exist in both locations.

            Thanks for the help!

    • http://www.woodcp.com/ Todd A. Wood

      The error ‘at LibGitUnite.GitUnite.Process(String path, Boolean dryrun) in c:Projectsgit-unitesrcLibGitUniteGitUnite.cs:line 64′ suggets it was not a –dry-run.

      I cloned it down, did a ‘build.bat’ and ran a dry run against the repo itself i.e.

      C:Projectsgit-unite [master]> .srcGit.UnitebinDebugGit.Unite.exe C:Projectsgit-unite –dry-run

      Let me know your command line parms and thanks for looking into the util. Hope it helps.

  • http://twitter.com/andrewwlane Andrew Lane

    Helped us out a ton!

    • http://www.woodcp.com/ Todd A. Wood

      Thanks for the feedback and glad I could help out. All Win.

  • philipoakley

    Hi in http://stackoverflow.com/questions/16863012/how-safely-remove-entry-from-git-tree a user reports that the script can create an extra ‘.’ directory, which is then hidden. Probably a special case, but worth noting.

    • http://www.woodcp.com/ Todd A. Wood

      Thanks for the heads up and I am looking into this.

  • vchandru

    Thanks!