Manage large files using Git LFS
In this episode, we introduce Git LFS as a way to manage large files such as images, videos, or datasets in your Git repository. Usually, you do not want to add those files directly because they might largely increase your overall Git repository size. Typically, a Git repository should be kept around 1 gigabyte in total size to ensure that you can efficiently work with them. In addition, we look into the Git LFS support for exclusively locking files.
A Brief Introduction to Git LFS¶
Git Large File Storage (LFS) is a Git extension which helps to manage large files in your Git repository. One of its main design goals is to keep the Git repository in manageable size while ensuring that larger files still can be logically kept in the same repository and allowing you to use the familiar Git workflow. In addition, it also helps with files which you cannot (easily) merge by allowing you to lock them.
How does Git LFS work?¶
Git LFS introduces a central server for storing the (large) files. Such Git LFS servers are already integrated in collaboration platforms such as GitLab or GitHub.
In addition, it provides a Git client extension which takes care of the Git LFS protocol specifics and integrates Git LFS with the usual Git workflow. Technically, files managed by Git LFS are stored on a central Git LFS server. The Git repository only contains a minimal configuration and tiny pointer files providing the link to the actual file content stored on the Git LFS server. On this basis, Git LFS avoids to download all the versions of your large files at once which usually happens when you manage the files in the Git repository directly. Git LFS lazily downloads only those files which are required for working with the currently checked out commit.
Is Git LFS for me?¶
Git LFS is a good choice if you look for reducing the size of your Git repository and optionally want to use file locking. Particularly, larger files (> 500 kilobytes) which change rather frequently and are not well compressible are good candidates. However, please be aware that Git LFS does not help with the reduction of the overall storage size because it stores a full copy for every new version of a file on the Git LFS server. I.e., you should carefully consider the concrete case.
Here are a few examples:
- A reference development environment with a size of one gigabyte is managed as a ZIP archive in the Git repository. The file changes on average quarterly. The ZIP archive is rather poorly compressible. This would be a sensible use case for Git LFS.
- In a LabView project, many smaller binary files (some megabytes in size) which are well compressible are stored. The files change on average weekly. In this case, it might be better to manage the files directly in Git because Git can take advantage from the compression.
Some alternatives to Git LFS:
- git-annex keeps the distributed nature of the Git repository also for the large files but requires you to follow an adapted Git workflow.
- Data Version Control is based on Git but introduces an additional command line tool which is used side by side with Git. It focuses on data science use cases.
- Datalad is based on Git and git-annex but provides a new command line tool. It focuses on data science use cases.
Manage Files via Git LFS in GitLab¶
In the following, we start managing files using Git LFS in our Git repository which is already hosted via GitLab.
Initial Setup¶
- Make sure that Git LFS is activated for your GitLab project:
- Visit your GitLab settings page (
Settings => General => Visibility, project features, permissions
) and ensure that theGit Large File Storage (LFS)
option is activated
- Visit your GitLab settings page (
- Make sure that the Git LFS plugin is installed for your Git client:
- You can check your Git command line client via:
git lfs --version
- Some Git distributions (e.g., Git for Windows) already include Git LFS. You can see the Git LFS Web page for further installation details.
- You can check your Git command line client via:
Add an Image of the Mars using Git LFS¶
In the following, we add this Mars image to our repository and reference it in the README.md
.
The Mars image will be managed using Git LFS.
- Switch to your local Git repository
- Add the image file locally:
mkdir images
- Create a directory for all image files- Download the image and copy it to the created directory
- Configure Git LFS to manage all image files:
git lfs track "images/**"
- Ensures that all files in the directoryimages
are managed using Git LFSgit status
- Shows the changes and indicates that there is a new file named.gitattributes
- Commit the initial Git LFS configuration:
git add .gitattributes
- Add the configuration to the staging areagit commit -m "Track all image files via Git LFS"
- Commit the changes
- Now, let us commit the Mars image and reference it in the README file:
- Add the README file as follows:
touch README.md
- Creates an empty README file- Edit the file and add the following content:
git add images/ README.md
- Add the Mars image and the README file to the staging areagit commit -m "Add Mars image and reference it in the README"
- Commits the changes
- Add the README file as follows:
- Now let us push the changes to GitLab because our changes are - as usual - only committed locally.
git push
:- Pushes the changes to the Git repository managed by GitLab
- At that point, the Git LFS server is contacted and the image file is stored there.
- The following output of the
git push
indicates this aspect:
- In the GitLab Web interface, you should find the file
images/mars.jpg
as well. The Web interface also attached a smallLFS
icon to the file to indicate that this file is managed using Git LFS.
Despite of the initial configuration, the overall Git change workflow stays the same. The Git LFS server is transparently accessed by the Git client as soon as a Git commands such as
push/clone/fetch/pull
is used.
Lock Files with Git LFS and GitLab¶
Some files cannot (easily) be merged on the basis of two different versions. For these cases, Git LFS offers the option to lock them to exclusively edit them. In the following, we introduce this feature and consider how it is supported by GitLab.
Initial Setup¶
git lfs track "images/**" --lockable
:- Makes all files in
images
lockable which is done by changing the Git LFS definition in.gitattributes
- As a consequence, the
mars.jpg
file is already made read-only which is a basic hint that you have to acquire a lock before editing the file.
- Makes all files in
ls -la images/
:- The result should contain a line similar to:
-r--r--r-- 1 user 1178261 41118 Apr 14 11:20 mars.jpg
-r--r--r--
indicates that the file is read-only.
- The result should contain a line similar to:
Please be aware that the enforcement of locks depends on your Git LFS server implementation: - You should consider enabling a local push check via
git config lfs.<Git Remote Repository URL>/info/lfs.locksverify true
. It ensures that you do not push lockable, Git LFS managed files for which you have not acquired a lock. - The GitLab Premium edition adds lock features via the GitLab UI and additional checks.
Replace the Mars Image by a real Photo¶
In the following, we replace our initial Mars image by a real photo of the Mars. In this context, we apply the lock/unlock workflow.
git lfs lock images/mars.jpg
- Locks themars.jpg
centrallygit lfs locks
- Shows that all locked files and indicates who created the lock- Now, we can replace the initial Mars image by the real photo.
git add images/mars.jpg
- Add the file to the staging areagit commit -m "Replace the initial image by a real photo"
- Commit the changesgit push
- Publish the stages on the remote repositorygit lfs unlock images/mars.jpg
- Unlock the filegit lfs locks
- Shows no more active locks
When additionally applying the lock/unlock workflow: - Please make sure to lock all relevant files before starting your work. - Please make sure to unlock all relevant files when your results are merged into the main branch.
Key Points¶
- Git LFS is one option to manage large files in Git repositories.
- Git LFS allows you to largely follow the usual Git/GitLab workflows.
- Git LFS offers a light way approach for locking/unlocking files to allow you the exclusive work on files.