Running a Webservice, Part 3: Using Version Control

Intro

Hi!

This is part 3 of the mini series about running a webservice.

We previously learned how to write a basic app in part 1, and how we can use Docker to containerize it in part 2.

Today we will focus on something different: How to use Git to keep track of our project.

Overview

In any project (not only coding/programming!), there will at some point exist multiple versions of files.

We all probably had a folder looking like this at some point in our lives:

Project/
├── Important_Presentation_01.pptx
├── Important_Presentation_02.pptx
├── Important_Presentation_03.pptx
├── Important_Presentation_final.pptx
├── Important_Presentation_FINAL.pptx
└── Important_Presentation_REAL_FINAL.pptx

This is manageable if you work alone on a presentation - but having hundreds or thousands of people collaborating on writing millions of lines of code?

Luckily, there are great tools that can help us with keeping track of all these different versions of files, and they work especially well for text-based files like code: Version Control Systems.

While multiple VCSes exist, Git has become the de facto standard for distributed version control these days - and we will jump right on to the hype train :-)

At first, we will only use git as a sort of time machine that allows us to go back to previous versions of our code - concepts like branching and merging will be introduced later on.

Getting started with Git

To get started, we first need to make sure we have git installed on our system - most linux distros ship it directly or have it available in their package manager. You can check if it is already installed on your system via git --version.

If it is not yet installed, just follow THESE steps.

Now that we have git up and running, let’s change into our project directory and initialize a fresh git repository:

cd /path/to/project
git init

You should see some output like this:

Initialized empty Git repository /path/to/your/project/.git/

Also, there should be a new (hidden) directory .git in our project directory:

$ ls -lA
-rw-rw-r-- 1 manu manu  771 Aug 18 18:47 Dockerfile
drwxrwxr-x 8 manu manu 4096 Okt 30 14:42 .git             <-- This one!
drwxrwxr-x 3 manu manu 4096 Aug 29 18:33 ip_calc
-rw-rw-r-- 1 manu manu  218 Aug 18 18:47 README.md
drwxrwxr-x 2 manu manu 4096 Aug 18 18:47 requirements
drwxrwxr-x 4 manu manu 4096 Aug 29 18:21 .venv

It is important to understand that files that are located in our project directory are not necessarily part to our git repository!

Git does not start to track any files unless we explicitly tell it to. This means that for now, even though there are some files in our project directory, they are not yet part of our git repo.

You can verify this by running git status inside the project directory - you should see some output like this:

Untracked files:
  (use "git add <file>..." to include in what will be committed)
        Dockerfile
        ip_calc/main.py
        README.md
        requirements.txt
        .venv

We could just add everything to the repo using git add --all - But wait! What about the .venv directory? Do we really want to add all that runtime-stuff in our git repo? There are thousands of files in there, and they will change a lot, e.g. when we install a new python module, or update one… Might be a good idea to keep that stuff out of our repo (We would of course also like to keep out logs etc. out of version control).

To exclude files or directories, we can create a file called .gitignore and put the names of everything we want to exclude from git in there:

echo ".venv" >> .gitignore

This tells git to ignore the “file” .venv (obviously, this is a directory - but since in UNIX “everything is a file”, this works perfectly fine).

More details on how to use the .gitignore file can be found HERE.

Now that this is taken care of, we can begin putting our code into the git repo. Let’s check the status first, add all our files, then check the status again:

git status
git add --all
git status

Notice something?

All our files are now in “staging”, so when you do a git status all the files in green are what would change if you committed right now.

After verifying that these are in fact the changes we want (adding a few files to the repo), we go ahead and commit them (with a little message):

git commit -m "initial commit, adding files to repo"

Great! We made our first commit!

Becoming a Time Traveler

One of the great features of git is that it allows us to go back to any previous version of a file that has ever been committed to the repo.

To test this, just make a small change to the README.md and commit it:

# assuming that README.md has been changed
git add README.md
git commit -m "test-commit one" # the '-m' flag lets you append a commit message

You just created a new commit.

Of course it would be useful to have an overview of all commits in a repository, as time goes on -: git log (or git log --oneline, if you want to keep the output more compact) does exactly that, it shows a list of all commits .

Note that each commit has a unique hash value associated with it.

If we want to go to a certain version of our project, we can just git checkout COMMIT-HASH and BOOM! We just travelled back in time!

Of course we can use the same command to jump back to “the present” by using the hash value of the latest commit.

Adding a remote Repository

Even though git is a fully decentralised version control system (meaning that there is no inherent “master repo” or any kind of hierarchy), having a central repository to keep track of all changes in a project oftentimes massively reduces the amount of coordination needed between individual contributors.

Different solutions for these centralised git services exist, the most common ones are GitHub and GitLab, but there are a plethora of other more or less similar tools.

For this example we will use GitLab, as their service is more geared towards DevOps and Continous Integration and Delivery (although GitHub has somewhat caught up in that regard), which will be important later on.

While we could self-host GitLab on our own server, for this example we will use their public service.

Therefore, you need a GitLab Account and store your SSH key there so you can access repositories from the CLI (see Instructions).

Once this is all set up, we can add our local repository to GitLab. For the user “paketb0te” and project name “testproject” (on the default branch master), it looks like this:

# While in the project directory
git push --set-upstream git@gitlab.com:paketb0te/testproject.git master

We should see output ending with something like

To gitlab.com:paketb0te/testproject.git
 * [new branch]      master -> master
Branch 'master' set up to track remote branch 'master' from 'git@gitlab.com:paketb0te/testproject.git'.

The project should now also be visible on our GitLab profile.

Now, whenever we make changes to our code locally, we can commit them and later push the commits to the remote repository:

git add OUR CHANGED FILES  # add our changed files to staging
git status  # check status to make sure we commit the right things
git commit -m "helpful message"  # commit with a helpful message
git push  # push to remote

Likewise, if you want to get the latest version from the remote repo (let’s say your friend added some code), you would use git fetch to download the updated version and git merge to merge the changes into your current working copy.

This lets you review all changes during the merge operation, so you can decide if you want to accept the changes coming from the remote repo or keep your local version.

If you simply want to update your repo to the newest version without reviewing the changes, you can use git pull (which is actually runs git fetch and git merge under the hood, but without the option to review any changes).

This already works very well for our small project, but what if there are more people working on the same project? Or we want to work on different parts of our app, without messing up the current version?

This is where git branches com into play, but we will cover that in a future article.

Wrapping up

In this Article, we learned some basics about the version control system Git: How to add our project files to a git repository, committing changes to our repository, and how we can become time travellers and go back to prvious version of our project.

In the upcoming articles we will learn how to set up basic CI/CD to push our container image to a container registry, how to deploy our container to a live server and more.

See you around!

Intro#

Overview#

Getting started with Git#

Becoming a Time Traveler#

Adding a remote Repository#

Wrapping up#