Building a light-weight Kubernetes cluster with Raspberry Pi and k3s

Here at Boogie Software, our experts are using Kubernetes and container technology every day in several projects, and thus, we were quite intrigued to find out how Kubernetes works on-premises with more modest hardware. In this post, I will describe how we built a light-weight bare metal cluster on Raspberry Pi boards for running applications and services in our company network. There were really two reasons for doing this project. Firstly, by setting up the cluster we get a platform where we can reliably and flexibly run our company network internal apps and services. Secondly, this was a nice opportunity to learn more about Kubernetes technology, microservices and containerisation. For those who are interested in following these steps or building a similar system, it is recommended to have at least a basic knowledge about Docker containers, Kubernetes concepts (nodes, pods, services, deployments etc.) and IP networking.

Hardware setup

What you would need for this kind of setup is:
  • At least one Raspberry Pi 2B/3B/3B+. You can run some apps even on a single board but getting two or more boards is recommended for spreading the load and for increased redundancy.
  • Power supplies and SD cards for the Pis, an ethernet switch or free ports in your existing one, and some cables.
In our setup, we currently have four Raspberry Pi 3 Model B+ boards, so in the cluster, there is one master/server and three agent nodes. The Raspberry Pi boards of course need some kind of housing and this is where things got a little out of hand. A fellow Boogieman who is very able with CAD and 3D printers designed and printed a neat case for the boards, which would deserve a story on its own. The casing has two fans for cooling in the back and each board sits on a tray that can be hot-swapped in and out for maintenance. The trays also have places at the front for an activity/heartbeat LED and a shutdown/power switch that both connect back to the board’s GPIO header.

Software stack

For the Kubernetes implementation, we chose to use k3s from Rancher Labs. For such a young project, it is actually very stable and usable, likely due to the fact that it just bundles the official Kubernetes components in a smaller, easy-to-install package. What makes k3s different from other smaller Kubernetes distributions, is that it is intended for production use, whereas projects like microk8s or Minikube are more suitable for development purposes, and the fact that it is very lightweight and runs nicely on also on ARM-based hardware. In k3s, the essentials of a Kubernetes system have been combined into a single 40 meg binary that integrates all the required components and processes.

K3s will run pretty much on almost any Linux distribution, and we decided to go with Raspbian Stretch Lite as the base OS because we don’t need any additional services or desktop user interfaces on the boards. K3s does require cgroups to be enabled in the Linux kernel, and this can be done on Raspbian by adding the following parameters to /boot/cmdline.txt:
cgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory

Installing k3s

The authors of k3s have done a nice job with smoothing the installation process. Once you have your server hardware ready, it is super easy to setup in just a couple of minutes: it takes only one command to install the server (master node):
curl -sfL https://get.k3s.io | sh -
and the same goes for agent nodes:
curl -sfL https://get.k3s.io | K3S_TOKEN=<token_from_server> K3S_URL=https://<server_ip>:6443 sh -
where token_from_server is the contents of the file /var/lib/rancher/k3s/server/node-token from the server and server_ip is the IP address of the server node. At this point, our cluster was already up and running, and we could start deploying workloads:
root@k3s-server:~# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k3s-node1 Ready <none> 40s v1.13.4-k3s.1
k3s-server Ready <none> 108s v1.13.4-k3s.1
For administering and monitoring the cluster, we installed Kubernetes Dashboard, which provides a convenient web interface for checking the overall system status, performing admin operations and accessing logs. Installing and running kubectl command locally is also very helpful because it allows administering the cluster from your own computer without needing to ssh into the cluster. To do that, you just install kubectl and copy the cluster information from the server node config /etc/rancher/k3s/k3s.yaml into the local kubeconfig file (usually ${HOME}/.kube/config).

Exposing the services with a load balancer

By default, the applications deployed to a Kubernetes cluster are only reachable from within the cluster (default service type is ClusterIP). To make them reachable from outside the cluster, there are two options. You can either configure the service with the type NodePort, which exposes the service on each node's IP at a static port, or you can use a load balancer (service type LoadBalancer). NodePort services are, however, quite limited: they use their own dedicated port range and we can only differentiate apps by their port number. K3s does also provide a simple built-in service load balancer but since it uses the nodes’ IP addresses, we might quickly run out of IP/port combinations and binding the services to a certain virtual IP is not possible. For these reasons, we decided to deploy MetalLB - a load-balancer implementation that is intended for bare metal clusters.

MetalLB can be installed simply by applying the YAML manifest. The simplest way to run MetalLB in an existing network is to use the so-called layer 2 mode, which means that the cluster nodes announce the virtual IPs of the services in the local network with ARP protocol. For that purpose, we reserved a small pool of IP addresses from our internal network for the cluster services. The config for MetalLB thus looked like this:
apiVersion: v1
kind: ConfigMap
metadata:
namespace: metallb-system
name: config
data:
config: |
address-pools:
- name: company-office
protocol: layer2
addresses:
- 10.10.10.50-10.10.10.99
With this config, the cluster services would be exposed at addresses in the range 10.10.10.50-10.10.10.99. To bind a service to a specific IP, you can use the loadBalancerIP parameter in your service manifest:
apiVersion: v1
kind: Service
metadata:
name: my-web-app
spec:
ports:
- name: http
port: 80
protocol: TCP
targetPort: 8080
loadBalancerIP: 10.10.10.51
selector:
app: my-web-app
type: LoadBalancer
It is with load balancing where we saw most of our challenges. For example, Kubernetes has a limitation that prevents having both TCP and UDP ports in a single load balancer service. To work around that, you can define two service instances, one for TCP ports and another for UDP ports. The downside is that then you will run these two services in different IP addresses, unless you enable IP address sharing. And as MetalLB is a young project, there was a small wrinkle with this as well, but we are confident that all these will be ironed out soon.

Adding storage

K3s doesn't have a built-in storage solution yet, so in order to give the pods access to persistent file storage, we need create one by using one of the plugins supported by Kubernetes. Since one of the goals of Kubernetes is to decouple the applications from the infrastructure and make them portable, the Kubernetes storage lingo defines an abstraction layer for storage with the concepts of PersistentVolume (PV) and PersistentVolumeClaim (PVC). PVs are storage resources that are typically configured and made available for the apps by the administrator. PVCs, on the other hand, describe the application's need for a certain kind and amount of storage. When a PVC is created (typically as part of the application), it is bound to a PV, if there is one available that is not yet in use and satisfies the app's PVC requirements. Configuring and maintaining all this would mean manual work, which is why there is a way to provision volumes dynamically.

In our infrastructure we already had an existing NFS server available, so we decided to use that for the cluster persistent file storage. The easiest way to accomplish this in our case was by using NFS-Client Provisioner that supports dynamic provisioning of PVs. The provisioner simply creates new directories on the existing NFS share for each new PV (that the cluster maps to a PVC) and then the PV directory is mounted in the container where it is used. This way there is no need to configure the NFS shares into volumes in individual pods but it all works dynamically.

Cross-building container images for ARM

Obviously, when running app containers on ARM-based hardware like Raspberry Pi, the containers need to be built for ARM architecture. There are a few gotchas that you might face when building your own apps into ARM architecture containers. First of all, the base image needs to be available for your target architecture. In the case of Raspberry Pi 3, you typically want to use an "arm32v7" base image, as they are called in most Docker registries. So, when cross-building your app, make sure your Dockerfile contains e.g.
FROM arm32v7/alpine:latest
The second thing to note is that your host Docker needs to be able to run ARM binaries. If you are running Docker for Mac, things are easy because it has built-in support for this. On Linux, there are a few steps that you must take, outlined below.

Adding QEMU binary into your base image

To run ARM binaries in Docker on Linux, the image needs have a QEMU binary. You can either choose a base image that already contains the QEMU binary, like the images from Balena, or copy the qemu-arm-static binary into the image during build, e.g. by adding the following line into your Dockerfile:
COPY --from=biarms/qemu-bin /usr/bin/qemu-arm-static /usr/bin/qemu-arm-static

Security notice: Please be aware that downloading and running an unknown container is like downloading and running an unknown EXE. For anything else but hobby projects, you should always use either scanned/vetted images (e.g. Docker Official Images) or container images from trusted organizations or companies.

Then, QEMU needs be registered on your host OS where you create your Docker images. This can be achieved simply with:
docker run --rm --privileged multiarch/qemu-user-static:register --reset
This command can be added into your build script before building the actual image. To wrap things up, your Dockerfile.arm would look e.g. something like this:
FROM arm32v7/alpine:latest
COPY --from=biarms/qemu-bin /usr/bin/qemu-arm-static /usr/bin/qemu-arm-static
# commands to build your app go here…
# e.g. RUN apk add --update <pkgs that you need…>
and your build/CI script is essentially:
docker run --rm --privileged multiarch/qemu-user-static:register --reset
docker build -t my-custom-image-arm . -f Dockerfile.arm
which will give you an ARM architecture container image as a result. For those who are interested in the details, there is more information available on cross-building and if your registry supports v2.2 manifests, the different architectures can be even combined into a multi-arch image.

Automating builds and registry uploads

The final step is to automate the whole process so that the container images are built automatically and uploaded to a registry from where they can easily be deployed to our k3s cluster. Internally, we are using GitLab for our source code management and CI/CD, so we naturally wanted to get these builds running in there. It even includes a built-in container registry, so there was no need to set up a separate one.

GitLab has good documentation on building Docker images, so we won't repeat all the stuff here. After configuring the GitLab Runner for docker builds, all there is left to do is to create a .gitlab-ci.yml file for the project. In our case it looked like this:
image: docker:stable
stages:
- build
- release
variables:
DOCKER_DRIVER: overlay2
CONTAINER_TEST_IMAGE: ${CI_REGISTRY_IMAGE}/${CI_PROJECT_NAME}-arm:${CI_COMMIT_REF_SLUG}
CONTAINER_RELEASE_IMAGE: ${CI_REGISTRY_IMAGE}/${CI_PROJECT_NAME}-arm:latest
before_script:
- docker info
- docker login -u gitlab-ci-token -p $CI_JOB_TOKEN $CI_REGISTRY
build_image:
stage: build
script:
- docker pull $CONTAINER_RELEASE_IMAGE || true
- docker run --rm --privileged multiarch/qemu-user-static:register --reset
- docker build --cache-from $CONTAINER_RELEASE_IMAGE -t $CONTAINER_TEST_IMAGE . -f Dockerfile.arm
- docker push $CONTAINER_TEST_IMAGE
release:
stage: release
script:
- docker pull $CONTAINER_TEST_IMAGE
- docker tag $CONTAINER_TEST_IMAGE $CONTAINER_RELEASE_IMAGE
- docker push $CONTAINER_RELEASE_IMAGE
Now that we have our images in the container registry, we just need to deploy them into our cluster. To grant our cluster access to the registry, we create a deploy token in GitLab and then add the token credentials into the cluster as a docker-registry secret:
kubectl create secret docker-registry deploycred --docker-server=<your-registry-server> --docker-username=<token-username> --docker-password=<token-password> --docker-email=<your-email>
After that the deploy token secret can be used in the YAML file PodSpec:
imagePullSecrets:
- name: deploycred
containers:
- name: myapp
image: gitlab.mycompany.com:4567/my/project/my-app-arm:latest
With all these pieces in place, there we finally have it: an automated CI/CD pipeline from source code to ARM container images in a private registry, ready to be deployed into the cluster.

Conclusions

All-in-all, it turned out that getting your own bare-metal Kubernetes cluster up and running was easier than expected. There are some rough edges and limitations coming from the fact that this technology has its roots in the cloud but nevertheless, k3s proved to be a sound choice for running containerised services at the edge and on lower-spec hardware in general.

One small downside is that k3s doesn’t support high-availability (multi-master configuration) yet. Although a single master setup is already quite resilient because the services continue running on the agent nodes even if the master goes offline, we’d like to get some redundancy also for the master node. Apparently, this feature is in the works, but until it is available, we recommend taking backups from the server node configuration.


Boogie Software Oy is a private Finnish company headquartered in Oulu, Finland. Our unique company profile builds upon top level software competence, entrepreneurial spirit and humane work ethics. The key to our success is close co-operation with the most demanding customers, understanding their business and providing accurate software solutions for complex problems.

Comments