Skip to content

Demo Project - Continuous Integration

Gems and Jewels to Collect

At the end of this episode you will understand the purpose of demo project that we use during the workshop and the use cases regarding a potential CI pipeline that we will develop in the fourth episode. Additionally, we describe the essential structure of a CI job and how to create an initial CI job that runs the demo project. We will wrap up the episode with criteria you could apply to decide which Docker image meets your quality requirements most.

Introduction

In this episode we introduce a demo project that is used throughout the workshop as a real-world example to exemplify Continuous Integration Pipelines and how Continuous Integration works in GitLab. We will elaborate a first CI job and explain the most important keywords to be used. Later on in the workshop, we will continue implementing a CI pipeline for the demo project and a common ground that you can reuse in your projects that you would like to automate in a CI pipeline.

Demo Project - Astronaut Analysis

The example is a Python project consisting of a Python script that analyses the data given and generates plots that illustrate the findings in the data. The data has been collected from Wikidata using a SPARQL query in the Wikidata Query Service. The data resembles astronaut data like the name, birthdate, date of death, gender of astronauts and when and how long they were in space. The resulting plots show the total hours humans, female and male astronauts spent in space, the statistics of dead and alive astronauts as well as the age distribution of dead and alive astronauts:

Total Time Humans in Space Total Time Females in Space Total Time Males in Space Age Distribution Box Plot Age Distribution Histogram

Please read the README of the example project to get to know more details about it and how the data analysis is conducted.

Exercise 1: Which CI Use Cases could be Promising for a Potential CI Pipeline?

The demo project offers much potential for automation. What do you think? Which tasks could be automated in a CI pipeline?

CI Use Cases

This section introduces some common use cases that you might also face in your GitLab projects. We will cover installing project dependencies, checking license compliance, linting the project, testing with different Python interpreters and running the demo application.

Use Case: Dependency Management

For this Python project the packaging and dependency management is done with Poetry. Libraries used, e.g. Pytest, and other dependencies are pinned to a specific version that is used in this project. For Non-Python projects other tools exists and your pipeline can be adapted to these tools with ease as long as you understand how to use them on the CLI.

Use Case: License Compliance

All open-source projects should have a proper license as part of the project, so that other people know the conditions under which they can reuse the project. A tool called reuse helps you with the compliance check. You need to make sure that all files have a proper license and copyright header and that for all licenses used the license texts are included in the project.

Use Case: Linting

Linting, i.e. code style compliance checking, is done with Black and Isort in this project. You can use those tools not only to check but also to automatically format the code according to standards set in those tools.

Use Case: Unit Tests

In order to test the Python script we prepared Pytest to do the unit testing during pipeline runs. It is always a good practise to test those unit test cases with different Python interpreters to be sure that the application can be run with different interpreters.

Use Case: Run the Application

The last CI use case we would like to implement in the CI pipeline is running the application so that plots will be generated in the results folder of the project’s root folder.

Develop an Initial CI Pipeline for the Project - Step-by-Step

To start with, we are about to explain to you step-by-step how you can elaborate a first CI job running the real-world Astronaut Analysis example. In order to do so we picked the first and last CI use case, i.e. installing dependencies and running the application. We will extend and optimize the CI pipeline and do the polishing later on in the workshop.

Define a First CI Job

Since a GitLab CI pipeline is defined in a YAML file, we need usual YAML syntax to define CI jobs that run in such a pipeline. The most basic structure of a CI job breaks down to just a few essential most common keywords, e.g. image, stage, before_script, script. This is how it could look like:

my_custom_ci_job:
  image: image_name:image_version, e.g. python:3.10
  stage: stage_name, e.g. build
  before_script:
    - echo "Optional Shell commands to be executed before the script section."
  script:
    - echo "Main Shell commands to be executed in this CI job."

We will talk about each of these keywords in this section of the episode.

Choose a Base-Image

GitLab CI is using Docker images to create Docker containers in which the pipeline can run. You can choose an image from Docker registries like Docker Hub or any other registry like the GitLab Container Registry of GitLab projects that contains all dependencies that are needed to run the application. For our example we just need a Python interpreter and pip installed that is why we chose the official Python image in version 3.10 for our first CI job.

In this example, we write the image to be used as part of the CI job:

my_ci_job:
  image: python:3.10

Name All Stages Used in the Pipeline

After having chosen a Docker image, you may continue with declaring which stages you want to use. It is done by listing all of them with the stages keyword as one of the first things in the YAML file, e.g.:

stages:
  - run

In this example we have only one stage which is composed of the CI job to run the application, but you can define as many stages as you like.

Install Dependencies

Poetry does a great job for dependency management. It installs everything with the command poetry install. Beforehand, Poetry needs to be installed. Usually, you also want to update pip.

The Shell commands to install the project dependencies may be executed before the actual CI tasks are run as part of the before_script keyword. In a CI job this before_script section could look like this:

my_ci_job:
  before_script:
    - pip install --upgrade pip
    - pip install poetry
    - poetry install

Run the Project

Finally, we will elaborate our first CI job to run the application in a CI pipeline. The respective Shell command to run the application is python -m astronaut_analysis. In the context of a CI job this Shell command translates into a job that is assigned to stage run such as the following:

my_ci_job:
  stage: run
  script:
    - poetry run python -m astronaut_analysis

The final CI job run in our example pipeline then looks like this:

run:
  image: python:3.10
  stage: run
  before_script:
    - pip install --upgrade pip
    - pip install poetry
    - poetry install
  script:
    - poetry run python -m astronaut_analysis

Now that we have implemented our first CI job in the GitLab CI YAML file, we can push the result to the repository and inspect the newly triggered GitLab CI pipeline. The pipeline should pass successfully.

Exercise 2: CI Use Cases for the C++ Exercise Project

For our exercises in this workshop we created a small C++ exercise project. It’s a very basic hello-world command-line program including a small test suite checking the basic functionality. If you provide an environmental variable GITLAB_USER_NAME with a name, it will return “Hello $name!”, otherwise it will default to “Hello world!”. Build automation and testing are done with CMake and GoogleTest.

After learning about common use-cases for CI pipelines in general, how could suitable CI pipeline use-cases for this example program look like?

Exercise 3: First CI Job for the C++ Exercise Project

Following the example CI pipeline, let’s create our first CI job within our exercise pipeline. Please create the file .gitlab-ci.yml for the exercise project and write a CI job that executes the exercise program on the command-line. The respective Shell commands are the following:

cmake -S . -B build
cmake --build build
./build/bin/helloWorld

Please ensure cmake will be installed during the run of the CI job:

apt-get update && apt-get -qy install python3 python3-pip
pip3 install cmake

Docker Images - Which One to Pick?

However, we may have different requirements regarding the Docker image. In this section we will talk about some criteria to be able to select high-quality Docker images that meet your specific needs.

For the search of suitable Docker images Docker Hub offers options to filter for images with trusted content.

DockerHub Trusted Content

Official vs. Community Images

Most importantly, Docker Hub offers Docker images that are published by Docker. These are called Docker Official Images. Community images, on the other hand, are created by the community and not by Docker itself. Everyone can publish images on Docker Hub and maybe after having searched for a while you might even decide to create your own Docker image that fits your needs better.

DockerHub Trusted Content - Official Images

Verified vs. Unverified Publishers

A second category to look at are Verified Publishers. These are high-quality Docker images from verified publishers. In turn, images from unverified publishers might have poor quality, but this need not be true. There is a huge amount of community images that are not from verified publishers, but still are of high quality.

DockerHub Trusted Content - Verified Publishers

Images from Open Source Program

Additionally, images from Open Source Program might be something you are looking for, because they have a wider community of contributors and hence might be of high-quality as well.

DockerHub Trusted Content - Open Source Program

Criteria to Pick the Right Image for Your Use Case

In order to give you some criteria that you could use to select an image that fits your requirements including your quality requirements we recommend Docker Official Images, images from Verified Publishers and images from Open Source Program in general. They fit the most common use cases already. For all the other community images from unverified publishers and not from the Open Source Program you could try to answer the following questions to get a rough idea on the quality of those images and projects:

  1. Is the image highly rated by the community?
  2. Did other users give good reviews on the image?
  3. How many stars does the image have?
  4. Is the image greatly used and downloaded often?
  5. How frequent are new Docker image tags / versions published?
  6. How large is the Docker image?
  7. Does the image have a well written description?
  8. Does the Dockerfile look alright from your point of view?
  9. How many contributors contribute to the image?
  10. How frequent do contributors maintain the project’s repository?

Take Home Messages

Last, we would like to sum up what you learned in this episode. We introduced an example project and a couple of useful tools to be used in the resulting CI pipeline and elaborated a GitLab CI pipeline containing a first CI job to run the example project. We explored some very common use cases that will be translated into CI jobs later on such as:

  • Checking the license information in the project.
  • Checking the code style compliance of the project.
  • Testing the application with different Python interpreters.

Finally, we defined criteria to evaluate whether a Docker image could be sufficient regarding your quality requirements.

Next Episodes

In the next episode we will reuse the final state of this project and CI pipeline in order to dive deeper into all the other CI use cases of the demo project described in this episode and some more very useful keywords of GitLab CI.