Software and Computer Engineering

Notes About Git, The Version Control System

Git is a Version Control System (VCS) because is able to record and recover any state of a directory tree, meaning it can keep of track of all the files and subdirectories contained in a given directory over time. The version-controlled instance is called a repository.

Git is especially designed for version-controlling software source-code, as it includes helpers and optimizations for handling plain-text files, although in principle is capable of dealing with any arbitrary binary file. (In particular, git Large File Storage -- LFS -- is optimized for handling large binary files).

A Distributed VCS

Git is called a distributed VCS as a consequence of any client having the full copy of the history of changes, and thus being able to operate completely autonomosly. Each client can also operate as a peer-to-peer node, sending and receiving changes produced for a shared repository.

This capability of git is a product of its design as the VCS of choice for the Linux Kernel itself, being able to support versioning of a huge amount of files over thousands of contributors in a tractable manner.

Despite that, the most common mode of using git is through a central repository, probably in managed host, such as GitHub or Bitbucket, which enables every contributor to have access and contribute to the shared code at any time. (With a peer-to-peer model, the nodes would have to be online at the same time to share the changes).

Still, it is very nice that the operations central to git are performed completely locally, and thus run almost instantaneously, irrespective of network connectivity.

A System for Snapshots

In git, one works with commits, that can be though of as a full snapshot of the directory three. Once changes are commited or consolidated into a git commit, they are stored and this state can be recovered at any time in the future.

The way that git actually stores the snapshots is quite clever. Each file is indexed not by its path or filename, but through a SHA1 sum of its contents. This means that git has integrity built-in and will efficiently handle moved, renamed and duplicated files.

The States of Git

It is fundamental for using git to understand that tiles within a git directory can be in one of these states:

The Working Directory consists of the files within your file system -- outside of the hidden “.git” directory -- that is a single checkout of the repository. These are the proper files and directory tree for the end-use (for example, to build a piece of software from source) that you are version-controlling.