Kubernetes master nodes

In this blogpost I show how to setup high available kubernetes master nodes. Therefore we use the already created etcd cluster from one of my previous posts on how to setup such an etcd cluster

First, we create three servers on DigitalOcean. In my case I’ll use Terraform, that makes it easier to destroy the cluster and bring it back to state in few seconds. Please see also, that I define a tag called kube-master, this makes it easier to secure the cluster in the future by a DigitalOcean firewall. DigitalOcean Firewalls allows to define allowed inbound and outbound traffic by such tags (e.g. allow traffic to etcd port 2379 only from kube-master). Additionally we generate for each of the master servers a domain entry (in my case automatically with cloudflare) to make the ssl certificate handling more easy:

resource "digitalocean_tag" "master_tag" {
  name = "kube-master"
}

resource "digitalocean_droplet" "master_server" {
  image = "ubuntu-16-04-x64"
  name = "k8s-master-${count.index}.${var.cloudflare_domain}"
  region = "fra1"
  size = "512mb"
  private_networking = true

  count = 3
  tags = ["${digitalocean_tag.master_tag.id}"]

  ssh_keys = [
    "${var.ssh_fingerprint}"
  ]
}

resource "cloudflare_record" "master" {
  domain = "${var.cloudflare_domain}"
  name = "${element(digitalocean_droplet.master_server.*.name, count.index)}"
  type = "A"
  count = "${digitalocean_droplet.master_server.count}"
  value = "${element(digitalocean_droplet.master_server.*.ipv4_address, count.index)}"
}

After the servers are available we can connect to them and install the required components:

Flannel
Kube-Apiserver
Kube-Controller-Manager
Kube-Scheduler

Flannel

We begin with the flannel installation. Therefore we download flannel from the github release page, untar it and copy it to /usr/bin:

wget https://github.com/coreos/flannel/releases/download/v0.9.0/flannel-v0.9.0-linux-amd64.tar.gz
tar xvzf flannel-v0.9.0-linux-amd64.tar.gz
cp flanneld /usr/bin

Then we have to create a service definition for flanneld and place it at /etc/systemd/system/flanneld.service:

[Unit]
Description=flanneld
Requires=etcd-lookup.service
After=etcd-lookup.service

[Service]
EnvironmentFile=/etc/kube-apiserver/etcd-members
ExecStart=/usr/bin/flanneld \
    -etcd-endpoints=${ETCD_MEMBERS} \
    -etcd-prefix=/flannel.com/network \
    -etcd-cafile=/etc/kube-apiserver/etcd-ca.pem \
    -etcd-certfile=/etc/kube-apiserver/etcd-client.pem \
    -etcd-keyfile=/etc/kube-apiserver/etcd-client-key.pem \
    -ip-masq
Restart=always
RestartSec=15

[Install]
WantedBy=multi-user.target

In the snippet you can see that we describe that the service starts after the etcd-lookup.service has started. In this service, shown below, a shell script is started, that curls the DigitalOcean Api to get a list of etcd members. This list is written to /etc/kube-apiserver/etcd-members as variable and injected as EnvironmentFile to the Service. Then I define a etcd-prefix, this is the place where flannel stores his network configuration. The three parameters, cafile, certfile, keyfile are client certificates and the public key of the root certificate, please copy your certificates to the described places. This is required to communicate to our etcd members, because they communicate via https with a self signed certificate and they are secured by X.509 client certificate authentication.

After this we will copy the shell script to list the members and the etcd-lookup.service to the DigitalOcean server. Please replace the { digitalocean_token} with your DigitalOcean readonly API key. This is required to lookup the etcd members. In this case we use the tag_name etcd, because all etcd droplets are tagged with this name.

#!/bin/bash

do_response=$(curl -X GET -H "Content-Type: application/json" -H "Authorization: Bearer { digitalocean_token }" "https://api.digitalocean.com/v2/droplets?tag_name=etcd")

## collect list of all etcd members
etcd_member_hosts=$(echo ${do_response} | jq '.droplets[].name' -r)
ETCD_MEMBERS=''
echo $etcd_member_hosts

# calculate list of etcd members
for member_host in $etcd_member_hosts; do
        ETCD_MEMBERS="${ETCD_MEMBERS},https://${member_host}:2379"
done

# remove leading ,
ETCD_MEMBERS=$(echo $ETCD_MEMBERS | cut -c 2-)

## List of all etcd members
cat > "/etc/kube-apiserver/etcd-members" <<EOF
ETCD_MEMBERS="$ETCD_MEMBERS"
EOF

exit 0

[Unit]
Description=Find etcd members by digitalocean api
Wants=network-online.target
After=network-online.target

[Service]
Type=oneshot
ExecStart=/etc/kube-apiserver/etcd_lookup.sh
RemainAfterExit=true
StandardOutput=journal

Before you can start the flannel network we have to configure it, therefore the following curl command needs to be executed:

curl \
  --cert certificate/client/client.pem \
  --key certificate/client/client-key.pem \
  --insecure \
  https://{ etcd-member }:2379/v2/keys/flannel.com/network/config \
  -XPUT
  -d value="{ \"Network\": \"10.200.0.0/16\", \"Backend\": { \"Type\": \"vxlan\", \"VNI\": 1 } }"

Please replace the { etcd-member } part with the domain name of an etcd member. This will define a 10.200.0.0/16 network flannel will use to communicate to each other.

After this you can start the flannel network. Now you should see an additional network interface when you run ifconfig in the terminal.

Kube-Apiserver

The Kube-Apiserver is the central point where clients and nodes communicate to, to get informations and configure the kubernetes cluster and applications running on it. Therefore it is very important to secure this endpoint against unauthorized access.

First download the kube-apiserver and place it at /usr/bin:

wget https://storage.googleapis.com/kubernetes-release/release/v1.8.0/bin/linux/amd64/kube-apiserver
cp kube-apiserver /usr/bin
chmod 740 /usr/bin/kube-apiserver

After this we will copy the following file to the /etc/kube-apiserver directory:

{"apiVersion": "abac.authorization.kubernetes.io/v1beta1", "kind": "Policy", "spec": {"user": "admin", "namespace": "*", "resource": "*", "apiGroup": "*", "nonResourcePath": "*"}}
{"apiVersion": "abac.authorization.kubernetes.io/v1beta1", "kind": "Policy", "spec": {"user":"kubelet", "namespace": "*", "resource": "*", "apiGroup": "*", "nonResourcePath": "*"}}
{"apiVersion": "abac.authorization.kubernetes.io/v1beta1", "kind": "Policy", "spec": {"group":"system:serviceaccounts", "namespace": "*", "resource": "*", "apiGroup": "*", "nonResourcePath": "*"}}

This is the kubernetes configuration of Attribute-based access control (ABAC). This defines for example that the user admin has access to all namespaces, all resources, apiGroups and nonResourcePath. Here we defined 3 of such access policies, one for the normal user that defines the applications running on the cluster (admin). Kubelet is the role the nodes use to connect to the api to get work to do and the group system:serviceaccounts that is used for example by kube-dns as pod inside of the kubernetes cluster to communicate to the internal kubernetes api.

After this we have to generate some ssl certificates. Therefore we use cfssl, this makes it easier to generate some certificates:

#!/usr/bin/env bash

mkdir -p {root,kube-api,admin,kubelet}

cfssl gencert -initca ca-csr.json | cfssljson -bare root/ca -

cfssl gencert -ca=root/ca.pem -ca-key=root/ca-key.pem -config=ca-config.json -profile=kubernetes kube-api.json | cfssljson -bare kube-api/kube-api
cfssl gencert -ca=root/ca.pem -ca-key=root/ca-key.pem -config=ca-config.json -profile=kubernetes admin.json | cfssljson -bare admin/admin
cfssl gencert -ca=root/ca.pem -ca-key=root/ca-key.pem -config=ca-config.json -profile=kubernetes kubelet.json | cfssljson -bare kubelet/kubelet

This bash script first generates a root certificate defined by the ca-csr.json file:

{
  "CN": "Kubernetes",
  "key": {
    "algo": "ecdsa",
    "size": 256
  },
  "names": [
    {
      "C": "DE",
      "L": "NW",
      "ST": "Wesel"
    }
  ]
}

After this the script generate three certificates for the kube-apiserver, a client certificate for the user and a client certificate for the nodes. Therefore it uses the kubernetes profile described in the ca-config.json:

{
  "signing": {
    "default": {
      "expiry": "43800h"
    },
    "profiles": {
      "kubernetes": {
        "usages": ["signing", "key encipherment", "server auth", "client auth"],
        "expiry": "43800h"
      }
    }
  }
}

Together with the definition (kube-api.json) the next line generates the kube-apiserver server certificate:

{
  "CN": "kubernetes api-server",
  "hosts": [
    "*.domainname.com",
    "10.32.0.1"
  ],
  "key": {
    "algo": "ecdsa",
    "size": 256
  },
  "names": [
    {
      "C": "DE",
      "L": "NW",
      "ST": "Wesel"
    }
  ]
}

Here it is important, that we add 10.32.0.1 to the list of hosts because this is the IP of the kubernetes api server available inside of the kubernetes cluster. Additionally we add *.domainname.com to the list, because each of our kubernetes master nodes has a domainname, that makes it easier to generate certificates. And not each master needs his own certificate.

With the certificates, the ABAC jsonl file and the already existing etcd client certificates we are now able to start the kube-apiserver via service file:

[Unit]
Description=kube-apiserver
Requires=etcd-lookup.service
After=etcd-lookup.service

[Service]
EnvironmentFile=/etc/kube-apiserver/etcd-members
ExecStart=/usr/bin/kube-apiserver \
    --admission-control=ServiceAccount \
    --allow-privileged=true \
    --authorization-mode=ABAC \
    --authorization-policy-file=/etc/kube-apiserver/abac-roles.jsonl \
    --client-ca-file=/etc/kube-apiserver/kube-ca.pem \
    --etcd-cafile=/etc/kube-apiserver/etcd-ca.pem \
    --etcd-certfile=/etc/kube-apiserver/etcd-client.pem \
    --etcd-keyfile=/etc/kube-apiserver/etcd-client-key.pem \
    --etcd-servers=${ETCD_MEMBERS} \
    --service-cluster-ip-range=10.32.0.0/24 \
    --service-node-port-range=30000-32767 \
    --secure-port=443 \
    --tls-ca-file=/etc/kube-apiserver/kube-ca.pem \
    --tls-cert-file=/etc/kube-apiserver/kube-api.pem \
    --tls-private-key-file=/etc/kube-apiserver/kube-api-key.pem \
    --service-account-key-file=/etc/kube-apiserver/kube-api-key.pem
Restart=always
RestartSec=15

[Install]
WantedBy=multi-user.target

Now you can see some lines that first describes, that the script uses the etcd-members list generated by the etcd-lookup script to get a list of etcd members to communicate to. This ETCD_MEMBERS is injected into the kube-apiserver process (–etcd-servers=${ETCD_MEMBERS}) and additonally we specify the etcd root certificate (–etcd-cafile) and the client certificate used to authenticate via X.509 (–etcd-certfile and –etcd-keyfile). You can also see an attribute called authorized-policy-file there we specify the path to our abac-roles file. So kubernetes can use it for attribute based access to the cluster. Our newly generated certificates are used in the lines –tls-cert-file and –tls-private-key-file, there we specify the path to our kube-api public- and private key. The –service-account-key attribute is used to generate new certificates for Service Accounts inside of the cluster that requests access to the kube-apiserver. In our case we use the same private key. Additionally we specify the cidr range of kubernetes services and the port range used for services on the nodes. With this information we can reload the daemon and start our kube-apiserver.service after we generated the etcd-members list.

After this it should be possible to curl from the machine to our kube-apiserver. In this case it also generates an unsecured port on localhost:8080, that is only exposed on the machine and is not accessible from the outside. With curl http://localhost:8080/healthz we get the health status of our kube-apiserver and with curl http://localhost:8080/healthz/etcd we get information about the health of the etcd and the communication between kube-apiserver and etcd.

Kube-Controller-Manager

The kube-controller-manager is responsible for the state of the cluster and checks the current state against the desired state.

The installation of the kube-controller-manager is easier then the kube-apiserver installation, first we need to download the kube-controller-manager and copy it to the /usr/bin directory:

wget https://storage.googleapis.com/kubernetes-release/release/v1.8.0/bin/linux/amd64/kube-controller-manager
cp kube-controller-manager /usr/bin
chmod 740 /usr/bin/kube-controller-manager

After this we can install the kube-controller-manager.service to /etc/systemd/system:

[Unit]
Description=kube-controller-manager
Requires=kube-apiserver.service
After=kube-apiserver.service

[Service]
ExecStart=/usr/bin/kube-controller-manager \
    --allocate-node-cidrs=false \
    --cloud-provider= \
    --cluster-cidr=10.200.0.0/16 \
    --cluster-signing-cert-file=/etc/kube-controller-manager/ca.pem \
    --cluster-signing-key-file=/etc/kube-controller-manager/ca-key.pem \
    --leader-elect=true \
    --master=http://localhost:8080 \
    --root-ca-file=/etc/kube-apiserver/kube-ca.pem \
    --service-account-private-key-file=/etc/kube-apiserver/kube-api-key.pem \
    --service-cluster-ip-range=10.32.0.0/24
Restart=always
RestartSec=15

[Install]
WantedBy=multi-user.target

In this case we specify no cloud-provider, because the DigitalOcean cloud provider is only in beta mode and not so interessting I think. In this example I generated a new root certificate that allows us to generate certificates for users from inside the cluster (–cluster-signing-cert-file and –cluster-signing-key-file). This is not important and can be omitted just as easily. Because our kube-apiservers are also deployed on the same droplet as the kube-controller-manager we use the local port to communicate to the apiserver. Here we also specify the cidr range of newly generated pods, this cidr range (–cluster-cidr) must match with the cidr range specified for the flannel network. Additionally we specify again the service-cluster-ip-range as already done in the kube-apiserver.

After we reload the systemctl daemon we can start the kube-controller-manager and try to check the health of it by running the following curl command on the droplet: curl http://localhost:10252/healthz

Kube-Scheduler

The last step is to install the kube-scheduler, this is the simplest step because the configuration is very easy. First we will download it again and place it to our /usr/bin directory:

wget https://storage.googleapis.com/kubernetes-release/release/v1.8.0/bin/linux/amd64/kube-scheduler
cp kube-scheduler /usr/bin
chmod 740 /usr/bin/kube-scheduler

Then we copy the kube-scheduler.service file to /etc/systemd/system:

[Unit]
Description=kube-scheduler
Requires=kube-apiserver.service
After=kube-apiserver.service

[Service]
ExecStart=/usr/bin/kube-scheduler \
    --master=http://localhost:8080 \
    --leader-elect=true
Restart=always
RestartSec=15

[Install]
WantedBy=multi-user.target

We defined kube-apiserver.service as dependency because the kube-scheduler communicates to the apiserver and so it is important that the server is already started. Here we use again the internal port 8080, because we are on the same droplet.

Next we can check again the health after we restarted the systemctl daemon and started the kube-scheduler service with curl. It is the same as described for the kube-controller-manager, only the port changes from 10252 to 10251: curl http://localhost:10251/healthz.

Thats all to install a kubernetes-master node, you can run this steps again on other master nodes for high availability. This is very easy because we used domain names and the certificates are also configured to use wildcard domain names to communicate.

Björn Wenzel

My name is Björn Wenzel. I’m a Platform Engineer working for Schenker with interests in Kubernetes, CI/CD, Spring and NodeJS.