Git Submodules vs Git Subtrees
Subtrees vs Submodules
The simplest way to think of subtrees and submodules is that a subtree is a copy of a repository that is pulled into a parent repository while a submodule is a pointer to a specific commit in another repository.
This difference means that it is trivial to push updates back to a submodule, because we’re just pushing commits back to the original repository that is pointed to, but more complex to push updates back to a subtree, because the parent repository has no knowledge of the origin of the contents of the subtree.
It also means that subtrees are much easier for other people to come and pull, as they are just part of the parent repository.
So an ultra-dumbed-down ELI5 comparison of submodules to subtrees could be:
- Submodules are easier to push but harder to pull – This is because they are pointers to the original repository
- Subtrees are easier to pull but harder to push – This is because they are copies of the original repository
Summary
In my opinion subtrees are not a direct replacement for submodules. The way I believe you should split your shared code between subtrees and submodules is this:
- Is the external repository something you own yourself and are likely to push code back to? Then use a submodule. This gives you the quickest and easiest way for you to push your changes back.
- Is the external repository third party code that you are unlikely to push anything back to? Then use a subtree. This gives the advantage of not having to give people permissions to an extra repo when you are giving them access to the code base, and also reduces the chance that someone will forget to run a
git submodule update
.
If you think I’m a complete idiot who has totally misunderstood and misrepresented submodules or subtrees, please let me know in the comments.
GIT: SUBMODULES VS. SUBTREES
submodule
Remember, a Git submodule is just a link to a specific ref in another repository. When another person clones your repository, it won’t see the Pikaday source there. In order to have that, they will have to run:
git submodule init
git submodule update
An alternative is cloning with the --recursive
option:
git clone --recursive <repo-path>
subtree
Subtrees are much simpler than submodules. As opposed to submodules, subtrees’ sources files are stored in the repo. It’s not just a link, the code is really there. There’s also fewer steps required and fewer changes to the workflow.
As opposed to submodules, someone that clones your repo won’t have to do anything else to have all the code.
Recap
Submodules | Subtrees |
---|---|
Harder (specially for Git beginners) | Easier |
It’s just a link to a commit ref in another repository | Code is merged in the outer repository’s history |
Requires the submodule to be accessible in a server (like GitHub) | Decentralized |
Requires additional steps | Just clone, pull and push in a similar way you are already familiar |
Smaller repository size | Bigger repository size |
Differences between git submodule and subtree
- submodule is a better fit for component-based development, where your main project depends on a fixed version of another component (repo).
You keep only references in your parent repo (gitlinks, special entries in the index)
What if I want the links to always point to the HEAD of the external repo?
You can make a submodule to follow the HEAD of a branch of a submodule remote repo, with:
o git submodule add -b <branch> <repository> [<path>]
. (to specify a branch to follow)
o git submodule update --remote
which will update the content of the submodule to the latest HEAD from <repository>/<branch>
, by default origin/master
. Your main project will still track the hashes of the HEAD of the submodule even if --remote
is used though.
- subtree is more like a system-based development, where your all repo contains everything at once, and you can modify any part.
See an example in this answer.