SystemCraft
Loading...

Intoduction to Version Control Systems

Authors:  Frank Mayer

A version control system (VCS) is a tool that is used to keep track of the changes made to a project over time, allowing you to revert to previous versions of the project, compare changes, and collaborate with others.

It is used by developers, but also by engineers, artists, and more.

There are different types of VCSs for different purposes.

Different VCSs use different terminology. For example, Git uses the term “commit” to refer to a change (point in time in the history of a project), while Mercurial uses “changeset”. However, the concepts are the same.

The Origin of Git

Git is currently the default version control for the vast majority of projects, so let’s talk about it.

Until 2005, the Linux kernel project, one of the largest open-source projects in the world, used a proprietary, distributed version control system (VCS) called BitKeeper. However, the license for free use for the kernel team was revoked. This created an acute problem: a new VCS was needed that could meet the project’s extreme requirements:

Since no existing solution met these criteria, Linus Torvalds, the initiator of Linux, took matters into his own hands.

Within a few weeks, Linus Torvalds developed the core of Git. His goal was not to create a user-friendly system, but an extremely fast and robust foundation. The first version was minimalistic, consisting of simple command-line tools that already implemented the core principles of Git.

Linus Torvalds’ main interest remained the Linux kernel. After laying the foundation for Git, he handed over the project in July 2005 to Junio C Hamano, one of the earliest and most important contributors.

Under Hamano’s leadership, Git became what we know today.

Git’s real breakthrough with the general public came with the rise of code-hosting platforms, also known as “forges.”

These platforms extend pure version control with crucial collaboration features:

More than just Git

Both before and after Git, there have been many other VCSs.

Changes

Many believe that Git only stores the modifications from one change to the next. Almost no VCS does this because it is inefficient.

Most VCSs store snapshots. This is a list of all files contained in a change. A file in Git has a name (e.g. foo/bar.txt), an executable flag, and content. Thus, all files are stored in every change, not just those that have changed. Furthermore, it does not matter how much a file has changed; it is saved completely anew.

However, there is an important optimization: it constantly happens that a file’s content appears multiple times. If a file is not modified in a change, its entire content does not need to be saved a second time. Likewise, if two files have the same content, it only needs to be stored once.

In short:

History

Most VCSs store history as a graph. Each node in the graph is a change.

Branches

When multiple people work on the same project, it is common to have different branches.

A branch is a split in the history. It’s like a parallel universe. This is used to work on different features at the same time without them interfering with each other.

When a feature is finished, it can me merged back into the main branch.

3-Way Merge

When two changes are merged, the VCS must perform a three-way merge. In this process, the history is treated as a DAC (Directed Acyclic Graph). In a DAC, it is easy to find the LCA (Lowest Common Ancestor). This is the change that is a parent of both changes to be merged and lies deepest in the DAC, meaning it is furthest from the initial change.

This sounds complicated. However, represented graphically, it looks quite simple:

When changes B and C are to be merged, a common base is needed against which the changes from both changes can be compared. This common base is the most recent change that is a parent of both B and C (the LCA).