Victor's Guide to Git: Part 1. Concepts

2023-11-05

1. Conceptual Overview

Hello and welcome to my guide to git!

Git is a fundamental tool that you are going to interact with multiple times a day while working on software collaboratively. It pays dividends over your programming career to take the time learn it well.

Let's explore the three main concepts related to git: repository, commit and branch.

Repository

A repository -- or “repo” for short -- is a collection of commits which retains a complete history of every change that was ever recorded in it.

Typically, a repository contain the source code for a piece of software, enabling multiple people to work on it in parallel, coordinating the modifications submitted to the files being tracked.

You can think of a repository as a regular folder in your computer, possibly containing files and subdirectories with its own files.

The difference lies in the capabilities enabled by using the git command in that directory (or any of its subdirectories).

Commit

A commit is a point-in-time snapshot the particular directory tree being tracked, with enough information to reconstruct the contents and arrangements of all of the files and subdirectories being tracked by git.

You can think of a commit as taking a copy of the whole directory, putting it in a “zip” file, and storing that permanently in the repository history.

In principle, a commit is identified by a 20-byte number, which may be represented as 40 hexadecimal digits (e.g. 3688f6aaa92eae5c5a308f247ce649548af26924), but usually the first few hexadecimal digits are enough to uniquely identify a commit within a repository (e.g. 3688f6a).

This 20-byte number is called the commit hash and is solely determined by the contents of the commit (through digesting it through the SHA-1 hashing algorithm).

While the details of how the hash is computed are unimportant, one important guarantee yielded by this approach is that any two different machines will agree on the “name” of the commit as long as its contents are exactly the same -- and, conversely, arrive at different hashes for commits which differ by even on byte in its contents.

Every commit -- other than the first -- will retain a reference to the commit that preceded it -- called its parent commit -- which allows retracing the history of commits and modifications for the whole repository.

Branches and Branching

Dealing with raw commit hashes, even the shorter ones, is still burdensome. Branches allow us to refer to commits in a more friendly way.

Branches work as pointers to a particular commit and keep track of where in that whole commit history graph we are working on.

Git does not impose a branching strategy -- branches may be named and used as you'd like -- but the convention which is usually agreed upon goes as follows:

There's a main branch -- it is also called master branch, but both refer to the exact same concept -- containing the bleeding edge, the latest version of the software source code a team is working on (step 1).

In a typical work day, a developer will start their journey by create a new branch which points to the same commit as main (that is, the latest commit in the repository's main history). (step 2)

NOTE: That branch may be called a called feature branch, because the goal of such branch is usually to pursue implementing a discrete feature of functionality`.

Modifications will then be appended as new commits to the feature branch -- making it diverge from main (step 3). Note that the feature branch will be updated to keep pointing to the latest commit on its branch, while main will remain unchanged.

As the final step, the finished modifications are incorporated then into main and the feature branch may be deleted. the main branch steps forward and the performed modifications becoming the new latest version of the source code (step 4), and we return to Step 1.