Docker is a virtualization platform based on containers which unlike the hypervisor virtualization, where you have to create completely new machines to isolate them from each other and ensure their independence, Docker will allow you to create containers that will contain only your app. Packaged in form of containers, these applications can be easily deployed on any host running Docker, each container remaining fully independent!
Docker platform consists of two components:
- the Docker daemon running in the background and responsible for managing your containers;
- the Docker client that allows you to interact with the daemon through a command-line tool.
What kind of problems Docker could solve?
Let’s start with the most widespread problem. Your team has developed a product for a customer, tested it and now it is the time to deliver the built solution to the client. How to do it in the shortest time and with minimum resources? Usually you will prepare a lot of different configuration files, scripts and write the installation instructions and then spend a lot of time to solve the user errors or environment compatibility. Let’s suppose you did it once, but what if you need to install your product several times – the replicability. Instead of one customer you have hundreds or thousands of customers and for each of them you have to repeat all the installation and configuration steps. Doing this manually would take too much time, would be expensive and error-prone. And it becomes even more difficult in case you need to update the product to a newer version.
Another problem – the reusability. Suppose you have a department that makes browser games. Assume your department develops several games and they all use the same technology stack. In order to publish a new game, you have to re-configure a new server with almost the same configuration as all the others.
Like any other problems, there are more than one solution we can apply to our issue:
The installation script – The first approach is to write a script that will install everything you need and run it on all the relevant servers. A script can be a simple “sh” file, or something complex, like and SFX module. The disadvantages of this approach is the fragility and instability. In case the script has not been written well, sooner or later, at some point it will fail. And after the failure the customer environment will actually become “corrupt” and it won’t be so easy just to “roll back” the actions that the script managed to perform.
Cloud services – The second approach is to use cloud services. You manually install on a virtual server everything you need and then make an image. After that you clone it as many times as you need. There are some disadvantages too. Firstly, you are dependent on the cloud service and the client may disagree with the server choice you’ve made. Second, the clouds are slow. The virtual servers provided by clouds are greatly inferior in performance compared with the dedicated servers.
Virtual machines – The third approach is about virtual machines. There are still some drawbacks. Because you are limited in disk spare or network traffic, it is not always convenient to download a virtual machine image, which can be quite large. Thus, any change within the VM image requires to download the whole image again. Also, not all the virtual machines support the memory or CPU sharing. Those that support it, require a fine-tuning.
What are the Docker strengths?
I would say Docker is a groundbreaking solution. Here are the most important differences between a Docker container and a simple server:
Stateless – The container configuration cannot and mustn’t be changed after the startup. All the preliminary work has to be done at the stage of creating the image (also named as compilation time) or during the start of the container: different kind of configurations, ports, public folders, environment variables, all of these must be known at the time the container starts. Of course, Docker allows you to do with its memory and file system whatever you want, but to touch something that you can touch during the startup is considered a bad approach.
Pure – The container does not know anything about the host system and cannot interfere with other containers: either to get into someone else’s file system, or send a signal to someone else’s process. Of course, Docker allows the containers to communicate with each other, but only using strictly the declared methods. You can also run the containers endowed with some superpowers, for example, access to a real network or to a physical device.
Lazy – When you start the container it does not copy the image file system from which it is created. The container only creates an empty file system on top of the image – a layer. The image consists of a list of overlapping layers. That is why the container has a high start-up speed – less than a second.
Declarative – All the container properties are stored in a declarative form. The steps of creating an image are also described as strict separated steps. Network settings, the contents of the file system, memory capacity, public ports and so on and so forth are set in the Dockerfile or with the help of the static keys at startup.
Functionality – The container does only one thing, but does it well. It is assumed that the container will live only during one process that performs only one function in the application. Due to the fact that the container does not have its kernel boot partition, an init-process and often, even it has only one pseudo-root user – so the container is not a full-fledged operating system. This specialization makes the function realized by the container predictable and scalable.
Strict – By default, the Docker container prohibits everything except access to the network (which also can be disabled). However, if necessary, it is allowed to violate any of these rules.
At the same time, Docker allows us to design a more agile test architecture, by having each container incorporate a brick of the application (database, languages, components). To test a new version of a brick, you need only to replace the corresponding container.
With Docker, it is possible to build an application from containers, each layer having its components isolated from the others. This is the concept of micro services architecture.
Docker and Jenkins
In our company, we started to use the Docker philosophy a few months ago. One of our colleagues did an investigation on how it can help us and what benefits we will get using it. In the beginning, we were using Docker only for some automated tests, but after we’ve seen how powerful it is we’ve decided to go even further.
Whether we are talking about the automated tests or about the integration tests, we face the same issues: parallel testing and isolated environments. Each test case needs a clean database in order to be executed. The database loading procedure takes about 1 minute and 30 seconds. To perform this procedure at the beginning of every automated test, it will take days in order to properly test the product. We’ve tried to export/import the database dump, but still, this is not fast enough.
With Docker, we load the database during the image compilation, only once. It means that the image already contains a clean database and every container started from that image provides a clean environment ready for testing.
Another point concerns the resources that are common for all the tests. On the one hand we want to isolate the System Under Test (SUT), on the other hand we also want to share some artifacts between SUTs. We want to share all required dependencies. For example, we don’t want to have the copies of all the libraries jars in each image, because in this way we waste quite some GB of memory, not to mention the performance. In order to solve this, we’ve configured a Maven local repository to be shared between the host system, where Docker is installed, and the containers. Inside the Docker image we keep only some instructions needed for the project to be tested. All the common libraries are shared between all the containers.
Besides the automated tests, we also do some integration tests on our product. These tests also need a separate database, in order to load something and test.
We ensure the continuous integration of our projects by using Jenkins. So far we had a virtual machine serving as a Jenkins master and two virtual machines as slaves. Not too long ago, we decided to remove all the executors from the Jenkins master and to let it handle only the Jenkins configurations. Thus, there are only 2 slaves that can actually do something. However, in addition to the automated and integration tests, we have some other jobs in Jenkins, that also require an environment for the execution. To this end, we’ve configured one slave to run the integration tests and the other one to execute the jobs that don’t need a database. So, with this setup (Fig. 1), we could perform only one integration test at one time.
At a certain time, the test executions were taking too much time, because everything was executed sequentially. When all the slaves’ executors were active, I mean each of them compiles some project, at least one job would or hang because of lack of resources. “This is not god. We need a change”, we said. And the change has come with Docker.
We’ve configured the Docker engine on a Linux machine, we’ve built the Dockerfiles for our tests and started triggering them from Jenkins. In such a way we have built an isolated environment to execute the integration tests, however as a consequence we were facing some other issues. After each test execution, we need to do some cleaning and to export the logs. We could do it using some Linux scripts, but it would bring us a lot of headaches.
We’ve decided to change our approach in a way that will provide us more performance and less troubles. We’ve moved Jenkins from Windows to Linux and now we have Jenkins running inside Docker (Fig 2). Docker community is very innovative; on Docker Hub you can find a lot of Jenkins images. With some additional configurations you can build a serious CI environment.
There we found an image with Jenkins installed inside it. When you start a container using this image you can set a role to this container, to be a Jenkins master or a Jenkins slave.
One Docker container is used as a Jenkins master, it stores all the configurations – the jobs, the plugins, the builds’ history etc. and takes care about the workspaces, logs, does all the necessary cleaning. Another four containers serve as slaves. Did you notice it ? I said 4 slaves, compared to 2 as before. The extensibility – this is another reason why we use Docker. Every slave has its own isolated environment, meaning its own database. Shortly, it is like having 4 virtual machines, but much better. Now we can execute at least 3 integration test at the same time.
The Docker image we have found on the repository has only Jenkins installed. But, in order to ensure the CI of our projects we need some other tools to be installed inside the slave containers. For this purpose we have extended the Jenkins image and we’ve installed some Linux specific tools, Java, ActiveMQ, PostgreSQL, we’ve configured the database roles and schemas, Maven, Node.JS, we’ve created some default directories that will be used later by the projects.
All these tools are installed only on the image that is used to start the slaves. For the master we don’t need them, because it doesn’t perform any jobs.
Here are our images:
REPOSITORY TAG IMAGE ID CREATED SIZE inther/jenkins-slave jdk8 027aeb7ecf07 13 days ago 1.708 GB inther/jenkins-slave jdk7 12712cdf36f4 13 days ago 984.6 MB inther/jenkins-master latest e74c32a835a8 5 weeks ago 443.2 MB appcontainers/jenkins ubuntu 49069637832b 5 months ago 402.7 MB
There are 2 ways to start the containers using these images. We can start the containers one by one or to use Docker Compose – a tool for defining and running multi-container Docker applications. It is very easy to add one more slave container, even using the command line, not to mention the Docker Compose. It is amazing. The docker-compose.yml has the following content:
version: '2' volumes: keyvolume: datavolume: repositoryvolume: services: slave_1: image: inther/jenkins-slave:jdk8 container_name: jenkins_slave_1 hostname: jenkins-slave network_mode: jenkins stdin_open: true tty: true restart: always privileged: true dns: environment: - ROLE=slave - SSH_PASS=jenkinspassword volumes: - keyvolume:/var/lib/jenkins_slave_key - /home/jenkins/scripts:/home/jenkins/scripts - /home/jenkins/logs:/home/jenkins/logs - repositoryvolume:/home/repo command: /bin/bash slave_2: …
And now we have everything defined for the first container to run. We have similar configurations for all the other containers, only the name is different. Using the command docker-compose up we can start all the containers and we can stop them in a similar way: docker-compose down. It is good to know that Docker Compose commands have to be executed from the same place where the docker-compose.yml file is located.
Of course we have taken care about eventual troubles we can have in case the master container gets killed or deleted. What happens, will we lose all our CI system? No, we have a backup plan, we store all the configurations inside the Docker volumes (Fig. 3). A volume is a special directory within one or more containers designed to provide features for persistent or shared data. So we persist the Jenkins home to the Docker host. In case we delete all the containers related to Jenkins, we don’t lose anything, we still have all the necessary data to start another Jenkins instance.