Home Custom Kubernetes Operators in Golang with OperatorSDK and Kubebuilder
Post
Cancel

Custom Kubernetes Operators in Golang with OperatorSDK and Kubebuilder

Kubernetes Operators

Kubernetes Operators are patterns that help us extend the behavior of the cluster. Operators enable us to view an application deployed on Kubernetes as one item. That is your application can composed of a Pod, a Service, a ConfigMap a Deployement etc but you get to manage it as one item and have a much better control of their lifecycle. The lifecycle includes but not limited to installation and configuration and also manage failover and recovery relying on the APIs and Patterns provided by Kubernetes.

If you’re not new to Kubernetes you probably have used a couple of operators, like Grafana, Prometheus, and a couple of ones I blogged about in the past like Strimzi for deployment and management of Kafka Cass Operator for doing the same with Cassandra or DSE.

Now that we know what operators are, how do we build one?.

There are a couple of ways to build an operator. However, we will be focusing on build a simple one with Operator SDK. This is part of the Operator framework that is set of developer tools and Kubernetes components, that aid in Operator development and central management on a multi-tenant cluster. There are three options for Operator development with Operator SDK, that is Golang, Ansible, or Helm. This post is focused on doing this in Go. To get started, we first need to install the OperatorSDK.

Kubebuilder is a framework for building Kubernetes Operators. Operator SDK uses Kubebuilder under the hood to do so for Go projects.

Putting it all together

Putting it all together we can build a Mock Operator. It won’t do much but we will get to use the Operator SDK to build a custom Operator that basically launches a Kubernetes Deployment and helps us manage the lifecycle. This is available on Github as https://github.com/malike/mock-operator.

Before we start building we can define our Mock Operator specification.

i. Operator Specification

1
2
3
4
5
6
7
8
9
10
11
12
13
14
apiVersion: app.malike.kendeh.com/v1alpha1
kind: SampleKind
metadata:
  name: mock-sample
spec:
  image:
    repository: ghcr.io/malike/mock-operator/sample-mock-service
    tag: latest
    pullPolicy: Always
    pullSecretName:
      - name: regcred
  nodes: 2
  containerPort: 80
  servicePort: 80

Things to note:

i. apiVersion of the Operator is app.malike.kendeh.com/v1alpha1
ii. The Operator has one kind which is SampleKind
iii. The Operator basically deploys a custom image which will passed as image.repository
iv. Other parameters that describe the image are in the image: {} block.
v. We use nodes to specify the number of instances we want to deploy.
iv. Configure the port for services and the pod as servicePort and containerPort respectively.

After defining the specification we can proceed to the next step of actually building

ii. Start coding

Once you have Operator SDK installed, we can generate project files for our Mock Operator. By running this command, we create an initial package.

    operator-sdk init --domain malike.kendeh.com --repo github.com/malike/mock-operator

After generating the project we can proceed and generate the api controller. We want our API controller to be called SampleKind with group as app.

    operator-sdk create api --group app --version v1alpha1 --kind SampleKind --resource --controller

iv. Building Operator

Before we start coding let us look at the structure of the source generated by operator-sdk using the two commands.

deployed

This link on Operator SDK helps us understand the project layout structure much better.

Our changes will be much focused on the api and the controllers folders.

Base on our specification we can update the API spec of the operator to meet what defined. Our API has for main parameters, that is image: {}, node: 2, containerPort: 80, servicePort: 80. We can update the specication to include these parametes

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// SampleKindSpec defines the desired state of SampleKind
type SampleKindSpec struct {
	//+kubebuilder:validation:Type:=object
	// Image defines image configuration
	Image ImageSpec `json:"image,omitempty"`
	//+kubebuilder:validation:Type:=number
	//+kubebuilder:default:=2
	// Nodes defines number of instance
	Nodes int32 `json:"nodes,omitempty"`
	//+kubebuilder:validation:Type:=number
	//+kubebuilder:default:=80
	// ContainerPort defines port for container
	ContainerPort int32 `json:"containerPort,omitempty"`
	//+kubebuilder:validation:Type:=number
	//+kubebuilder:default:=80
	// ServicePort defines port for service
	ServicePort int32 `json:"servicePort,omitempty"`
}

Using kubebuilder CRD marker validation, we can enforce rules for these parameters. For example nodes should always be an int32. Once defined we can run the command make generate manifests and then using the helper classes set up by Operator SDK the CRD and codes containing DeepCopy, DeepCopyInto, and DeepCopyObject method implementations that basically transforms the YAML to objects usable in Go.

After defining our API we can move to the controller package. There we should see the controller for the Kind generated by Operator SDK, called samplekind_controller. It has a reconcile function which is responsible for enforcing the desired state of the system based on the CR applied. So if we need certain changes applied based on the CR applied, we can write code for that in this section.

The logic is here is pretty simple for the reconciliation function.

  1. Confirm if resource needs to be created
  2. If resource not found but should have existed, create it
  3. If resource exists, confirm if it is the same as specified in CRD
  4. If resource not found and it is not supposed to be created do nothing.

Pretty simple logic. Remember this function will be called in cycles.

For our MockOperator, we will need a two k8s resources.

  1. A Deployment
  2. Service to expose our deployment

Putting this together we will have something like this in our reconcile function:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
// Fetch the SampleKind instance
	sampleApp := &appv1alpha1.SampleKind{}
	err := r.Get(ctx, req.NamespacedName, sampleApp)
	if err != nil {
		if errors.IsNotFound(err) {
			// Request object not found, could have been deleted after reconcile request.
			// Owned objects are automatically garbage collected. For additional cleanup logic use finalizers.
			// Return and don't requeue
			log.Info("SampleKind resource not found. Ignoring since object must be deleted")
			return ctrl.Result{}, nil
		}
		// Error reading the object
		return ctrl.Result{}, err
	} else {
		log.V(1).Info("Detected existing SampleKind", " sampleApp.Name", sampleApp.Name)
	}

	// Check if the Deployment already exists, if not create a new one
	deployment := &appsv1.Deployment{}
	deploymentName := sampleApp.Name
	err = r.Get(ctx, types.NamespacedName{Name: deploymentName, Namespace: sampleApp.Namespace}, deployment)
	if err != nil && errors.IsNotFound(err) {
		// Define a new configmap
		deployment := r.newSampleAppDeployment(deploymentName, sampleApp)
		log.Info("Creating a new SampleApp", "SampleKind.Namespace", sampleApp.Namespace, "SampleKind.Name", sampleApp.Name)
		err = r.Create(ctx, deployment)
		if err != nil {
			return ctrl.Result{}, err
		}
		return ctrl.Result{Requeue: true}, nil
	} else if err != nil {
		return ctrl.Result{}, err
	}

	service := &corev1.Service{}
	serviceName := getServiceName(deploymentName)
	err = r.Get(ctx, types.NamespacedName{Name: serviceName, Namespace: sampleApp.Namespace}, service)
	if err != nil && errors.IsNotFound(err) {
		// New service
		service = r.newSampleAppService(deploymentName, sampleApp)
		log.Info("Creating a new Service for SampleApp ", "Service.Namespace", service.Namespace, "Service.Name", service.Name)

		err = r.Create(ctx, service)
		if err != nil {
			//log failed to create
			return ctrl.Result{}, err
		}
		return ctrl.Result{Requeue: true}, nil
	} else if err != nil {
		//log failed to create
		return ctrl.Result{}, err
	} else {
		log.V(1).Info("Detected existing Service", " Service.Name", service.Name)
	}

One other important thing, we need to make sure the MockOperator has access to create, update, delete and read these two k8s resources. so we add these three lines and with the help of kubebuidler, the next time we run make generate manifests the right permissions will be give to the operator.

1
2
3
//+kubebuilder:rbac:groups="apps",resources=deployments,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups="",resources=pods,verbs=get;list;watch;create;update;delete;patch
//+kubebuilder:rbac:groups="",resources=services,verbs=get;list;watch;create;update;patch;delete

v. Sample Image for the Operator

Now this part is optional, but needed so we can test the kubernetes deployment the operator manages. It is a simple docker image packaging, called sample-mock-service, for a custom HTML page in nginx. This can be found in the folder.

vi. Testing with Ginko and Gomega

Testing the Operator is specifically a large topic and it is not something I can fully expand on in this subsection. There are better resources like this and this.

Our sample test will then look like this. It just uses BDD to confirm that when we create a SampleKind, a deployment also gets created.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
var _ = Describe("Deployment test", func() {

	const (
		name      = "deployment-test"
		namespace = "default"
	)

	Context("When SampleKind is created, Deployment is created", func() {

		It("allows deployment to be created and deleted", func() {

			By("set up deployment", func() {

				skMockOperator := &samplekind.SampleKind{
					ObjectMeta: metav1.ObjectMeta{
						Name:      name,
						Namespace: namespace,
					},
					Spec: samplekind.SampleKindSpec{
						Nodes: 1,
					},
				}
				Expect(k8sClient.Create(ctx, skMockOperator)).Should(Succeed())

				EventuallyWithOffset(10, func() bool {
					smDeployment := &v1.Deployment{}
					err := k8sClient.Get(ctx, types.NamespacedName{Name: skMockOperator.Name, Namespace: skMockOperator.Namespace}, smDeployment)
					return err == nil
				}).WithTimeout(20 * time.Second).Should(BeTrue())

				//delete samplekind delete deployment
				Expect(k8sClient.Delete(ctx, skMockOperator)).To(Succeed())

			})

		})
	})

})

As you can see it is not extensive but since the MockOperator does little, the test coverage is pretty high as well.

test-coverage

vii. Automated Testings

Now we need to add a simple integration test to confirm our operator works as expected. Using https://github.com/helm/kind-action we can set up a simple Kind cluster in K8s and then test the deployment of MockOperator and then using curl we can confirm if we can access the custom html page deployed in nginx.

A section of the Github Action pipeline looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
name: Build and Test for Operator

on: workflow_call

jobs:
  test:
    name: Test
    runs-on: ubuntu-latest
    steps:
      - name: Check out code
        uses: actions/checkout@v1

      - name: Create k8s Kind Cluster
        uses: helm/kind-action@v1.3.0

      - name: Login to Github Packages
        uses: docker/login-action@v2
        with:
          registry: ghcr.io
          username: $
          password: $

      - name: Operator deployment
        run: |
          kubectl cluster-info
          kubectl get pods -n kube-system
          echo "current-context:" $(kubectl config current-context)
          echo "environment-kubeconfig:" ${KUBECONFIG}
          kubectl create ns mock-operator-system --save-config
          kubectl create secret generic regcred --from-file=.dockerconfigjson=${HOME}/.docker/config.json --type=kubernetes.io/dockerconfigjson -n mock-operator-system
          make deploy | grep created
          kubectl rollout status deployment mock-operator-controller-manager -n mock-operator-system --timeout=30s
          kubectl get crd | grep samplekind

      - name: Create deployment
        run: |
          kubectl create secret generic regcred --from-file=.dockerconfigjson=${HOME}/.docker/config.json --type=kubernetes.io/dockerconfigjson -n default
          kubectl apply -f ci/sample.yaml | grep "lewis-sample"
          sleep 5 ; kubectl get all
          kubectl wait pods --selector app.kubernetes.io/instance=lewis-sample --for condition=Ready --timeout=40s | grep "condition met"
          kubectl get po --show-labels | grep lewis-sample | grep "1/1"
          kubectl port-forward svc/lewis-sample-service 8080:80 &
          sleep 5
          curl localhost:8080 | grep mock


      - name: Delete operator deployment
        run: |
          kubectl delete samplekind lewis-sample | grep deleted

Conclusion

Hopefully you found this useful and are able to kick-start your operator journey with this. The source code for this MockOperator can be found here.

References

https://kubernetes.io/docs/concepts/extend-kubernetes/operator/

https://kubebyexample.com/learning-paths

https://www.infracloud.io/blogs/testing-kubernetes-operator-envtest/

https://betterprogramming.pub/write-tests-for-your-kubernetes-operator-d3d6a9530840

This post is licensed under CC BY 4.0 by the author.