Create your first Ansible-based Kubernetes Operator

This is a hands-on tutorial for a fully-functional Kubernetes Operator as a follow up of the Introduction to Kubernetes Operators. This Proof-of-Concept shows how you can reuse your Ansible code/skills to do automation on Kubernetes which also allows you to wrap any application (even legacy ones) in a declarative, cloud-native style.

This tutorial will cover writing Kubernetes Operators in no time. Also using previous experience with Ansible ecosystem to write Kubernetes operator without writing real code. You will learn and understand the following:

What are Kubernetes operators and how they are work.
How Ansible-based operators work to automate tasks within Kubernetes and how it's different from other operators types.
How to use your previous experience with Ansible to writing Kubernetes operator in no time.
How to build your first operator with real use case.

Introduction
What is the operator pattern in detail?
Operators types and Ansible
Create your first Ansible-based operator
Step 0 — Prerequisites
Step 1 — Initiate the operator files
Step 2 — Create the manifest structure
Step 3 — Handle the logic of WordPress deployment
Step 4 — Handle the logic of WordPress users
Step 5 — Apply WordPress CustomResourceDefinition
Step 6 — Build and deploy WordPress operator
Step 7 — Apply WordPress CustomResource instance
What's next?
Conclusion
Resources

Introduction

Last June, Kubernetes celebrates its 7ths birthday. And according to statistics, Kubernetes' adoption keeps growing steadily. However, the whole automation paradigm changed in many ways after Kubernetes. Most automation tools pre-containerization era have no place in that new paradigm.

You know, the modern problems need modern solutions, but it was a big waste to just dump everything and start over. Hence an idea! Why don't we use what we already have? There should be a tool that fits somehow in the new ecosystem.

Ansible was the best fit for that role! It's easy to use, declarative, agentless, extendible and has a great number of modules for all uses. And the most important thing, it can interact with Kubernetes fluently!

In this post we will have a look at Ansible in the Kubernetes world and how to use what's so-called “Ansible-based Operator” for cloud-native orchestration and automation.

Before we start, let's take a look at some concepts that we will use in this post.

Kubernetes (aka K8s): An open-source system for automating deployment, scaling, and management of containerized applications.
Controllers: A piece of software that watches the state of your cluster, then makes or requests changes where needed. Each controller tries to set the current cluster state closer to the desired state. Kubernetes comes with a bunch of default controllers like the ReplicaSet controller.
Custom Resources: A native way to extend Kubernetes API. Resources like Pod, ConfigMap, and Service are built-in resources and Kubernetes provides a way to have new custom resources. For example, you could have a new kind called “WordPress”.
Operators: A pattern that makes use of both controllers and custom resources to add new functionality. In simple words, an operator is a controller that watches one or more resources (built-in or custom) in Kubernetes API and applies some logic.

We are going to know more about all of that in the next sections.

What is the operator pattern in detail?

As you know, Kubernetes is a general-purpose orchestration platform; thus it works out of the box with stateless applications and services. But when it comes to stateful applications, it needs some assistance to handle the state. For example, to automate database replication, it's needed to understand how to add and remove replica instances and so on.

Hence, Kubernetes operators have been invented! Kubernetes operators were introduced by CoreOS in 2016 but the actual booming happened when Operator Framework was released in 2018 and later on the OperatorHub.io platform in 2019. Since then, the number of operators has been growing rapidly as more and more organizations released operators for their software.

So what is exactly an operator? Kubernetes operator is a piece of software that runs within the Kubernetes cluster and interacts with Kubernetes APIs to do some actions in or out the cluster like managing another application or resource.

Kubernetes operator reconciliation loop

The operators' main goal is to put domain knowledge into software that helps to simplify the management of complex applications on Kubernetes. In other words, operators help in:

Simplify the management of complex applications on Kubernetes where domain knowledge is required.
Unify automation in Kubernetes which gives homogeneous operational experience.
Deal with day-2 operations. Day-1 operations are mainly about install and setup, but day-2 operations are about anything else like monitoring, backup, upgrade, scale, and so on.

Operators types and Ansible

As mentioned, Kubernetes operator is a piece of software, and usually, it's written in programming languages like Golang, Python, etc. But all operators have common parts in between like authentication, interacting with Kubernetes API, and probably more. So the Operator SDK project was born to provide that common base.

However, using programming languages isn't the only way to create an operator, at the end you just need a way to interact with Kubernetes API and to do some other logic depending on the operator's purpose. Therefore other types of operators have been created like Ansible-based, Helm-based, and even Shell-based! (but please do not do that! The world has enough shell scripts already!)

Our interest here is Ansible-based operators and based on the Operator SDK maturity model, Ansible-based operators are capable of doing everything operators that are written in a programming language like Go can do.

Operator Capability Levels - Operator SDK

That's because Ansible is extendable and has a big list of modules where they could be used to do any kind of automation inside or outside the Kubernetes cluster. In Ansible-based operators, you can use the same style in standalone Ansible. For example, you will be able to use Ansible roles and modules.

Create your first Ansible-based operator

So now it's time for the hands-on part, let's create our first Ansible-based operator.

You could create an operator to automate your application workflow. But for demonstration let's assume the following scenario, your company makes a WordPress development workshop, and with every workshop, you need to deploy a new WordPress instance with some configuration for each participant in the workshop.

Hence, we will create an Ansible-based operator to manage WordPress automatically. So what's the idea of our operator? We will create an operator that deploys WordPress and manages WordPress users. It will be also flexible to extend its functionality later. It's just a base for what you can do with Ansible-based operators.

Step 0 — Prerequisites

As mentioned before, the Operator SDK project creates all essential parts to interact with Kubernetes APIs. So all that you need to do is the actual automation logic. But before that, let's take a look at the prerequisites then we are ready to go.

Essential knowledge of Kubernetes and Ansible.
Understanding of Kubernetes Operators, take a look on Introduction to Kubernetes Operators.
Kubernetes v1.16.0+ cluster (Minikube will be used in this post).
Operator SDK v1.0.0+ (A local binary or a Docker image could be used).

Step 1 — Initiate the operator files

To build an operator, the Operator SDK project made everything easy to do so. The latest version of Operator SDK (v1.10.0+) comes with a lot of utils which make developing and building operators easier than ever! So let's get started.

First, we initiate the operator main structure:

operator-sdk init     \
    --plugins=ansible \
    --domain=cloud.example.com

Here we set the plugin type which Ansible as we build an Ansible-based operator, also set the main domain for the API group.

Now, we create an API group:

operator-sdk create api \
    --group web         \
    --version v1alpha1  \
    --kind WordPress    \
    --generate-role

Here we create an API group and a Kind where they will be used for the new CustomResource.

That will generate a boilerplate that includes K8s manifests and CR/CRD, Dockerfile, Ansible playbook and an empty role, and so on. That will help us to put all the effort into the logic of our operator.

Step 2 — Create the manifest structure

Let's think about the first goals of our operator and what we want to do? Following the MVP (Minimum Viable Product) style, we will start with 2 simple tasks. Deploy WordPress and manage WordPress users using WP-CLI.

We will add the data first then based on it we will make the logic next. So the CustomResource file will be the following:

# File: config/samples/web_v1alpha1_wordpress.yaml
apiVersion: cloud.example.com/v1alpha1
kind: WordPress
metadata:
 name: wordpress-devel-workshop
 namespace: default
spec:
 deployment:
   name: wordpress-devel-workshop
   namespace: wp-dev
   replicas: 1
   chart:
     name: wordpress
     version: 12.1.21
     repo: https://charts.bitnami.com/bitnami
     values:
      image:
        debug: true
      wordpressUsername: wp-admin
 users:
 - username: alice
   display_name: Alice
   email: alice@example.com
   role: author
 - username: bob
   display_name: Bob
   email: bob@example.com
   role: author

Here are some highlights about his manifest of our new CustomResource:

A custom Kubernetes apiVersion which is cloud.example.com/v1alpha1.
A custom Kubernetes kind which is WordPress.
The spec has two sections which are deployment and users which are related to WordPress.

Step 3 — Handle the logic of WordPress deployment

Now the structure and data schema of the operator is ready, let's create the logic which is to deploy WordPress and add WordPress users.

For ease of use, we will install WordPress via Bitnami Helm chart. First of all, we need to add Helm binary to our Ansible operator image so we can use its Ansible module:

Operator Dockerfile will be like this, we just copy the binary from “alpine/helm” image.

# File: Dockerfile
FROM quay.io/operator-framework/ansible-operator:v1.12.0

# Add Helm binary to the Ansible operator image
ARG HELM_VERSION=3.7.0
COPY --from=alpine/helm:$HELM_VERSION /usr/bin/helm /usr/bin/helm
[...]

Now we add the Ansible part which will deploy WordPress via the Helm module and wait for the deployment to be ready.

# File: roles/wordpress/tasks/main.yml
- name: Install WordPress Helm chart
 community.kubernetes.helm:
   state: present
   name: "{{ deployment.name }}"
   release_namespace: "{{ deployment.namespace }}"
   chart_ref: "{{ deployment.chart.name }}"
   chart_version: "{{ deployment.chart.version }}"
   chart_repo_url: "{{ deployment.chart.repo }}"
   release_values: "{{ deployment.chart.values }}"

- name: Wait WordPress deployment to be up and running
 wait_for:
   host: "{{ deployment.name }}.{{ deployment.namespace }}"
   port: 80

That's all that we need to install WordPress. And as you see, Operator SDK handles the data under “spec” and passes them to Ansible as variables.

Step 4 — Handle the logic of WordPress users

Let's now add the part related to managing WordPress users. The first thing to check is if WordPress has an Ansible module to use it. Unfortunately, it doesn't. So we will use WP-CLI to do that using the k8s_exec module.

# File: roles/wordpress/tasks/main.yml
[...]
- name: Get WordPress pod
 community.kubernetes.k8s_info:
   kind: Pod
   namespace: "{{ deployment.namespace }}"
   label_selectors:
     - app.kubernetes.io/instance={{ deployment.name }}
 register: wordpress_pods

- name: Set pod name var
 set_fact:
   wordpress_pod_name: "{{ wordpress_pods.resources | map(attribute='metadata.name') | list | first }}"

- name: Add WordPress users
 community.kubernetes.k8s_exec:
   namespace: "{{ deployment.namespace }}"
   pod: "{{ wordpress_pod_name }}"
   command: |
     bash -c '
       wp user get {{ user.username }} ||
       wp user create {{ user.username }} {{ user.email }} \
         --display_name={{ user.display_name }}            \
         --role={{ user.role }}                            \
         --porcelain
     '
 loop: "{{ users }}"
 loop_control:
   loop_var: user

What happens here is the following:

Get all Pods of WordPress deployment in case we have more than one.
Get one of the WordPress Pods to run WP-CLI inside it.
Loop over the “users” section in the spec and run WP-CLI inside the Pod to add each defined WordPress user if it's not there.

Step 5 — Apply WordPress CustomResourceDefinition

So as mentioned before, the operator is a software that watches a Kubernetes custom resource. Thus we need first to deploy that CustomResourceDefinition into the Kubernetes cluster.

The following command uses Kustomize to apply CustomResourceDefinition to the cluster (it will install Kustomize automatically if you don't have it).

$ make install

Now we are ready to build and deploy the actual operator that will handle this CustomResourceDefinition.

Step 6 — Build and deploy WordPress operator

Let's now build and deploy our WordPress operator. As mentioned before, here we assume this is done locally using Minikube. And there are 2 options for local development.

The first option is to run the operator from your local machine (not Minikube) using operator-sdk which will create the connections needed to the Kubernetes cluster. However, there is one downside of this approach where it requires additional prerequisites to install all binaries that are needed by Ansible.

The second option and what we will use here, which is building a Docker image on Minikube, and deploying it with each change. It's not too much, just a couple of command sequences.

First, we will configure the local Docker client (“docker” command) to connect to Docker daemon on Minikube to build the Docker image there directly. So no need to push it to a remote Docker registry like Docker Hub.

All that we need is to run this in the terminal:

$ eval $(minikube docker-env)

Now let's build the image:

$ make docker-build IMG=wordpress-operator:latest

And before the deploy, we need to do a change in the operator deployment manifest, which sets the image and sets imagePullPolicy to Never so Kubernetes will use the local image that we just built.

Here we will add the Kustomize patch to change the imagePullPolicy (which will be done only once):

# File: config/manager/manager_image_pull_policy_patch.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: controller-manager
  namespace: system
spec:
  template:
    spec:
      containers:
      - name: manager
        imagePullPolicy: Never

Now add that patch using Kustomize:

$ cd config/manager && kustomize edit add patch manager_image_pull_policy_patch.yaml

We also need to append some permissions needed by our operator to be able to deploy WordPress chart:

# File: config/rbac/role.yaml
# This should be appended to the role.yaml file.
  - apiGroups:
      - ""
    resources:
      - configmaps
      - persistentvolumeclaims
      - services
    verbs:
      - create
      - delete
      - get
      - list
      - patch
      - update
      - watch

Now we are ready to deploy our operator:

$ make deploy IMG=wordpress-operator:latest

Let's check if the operator is ready:

$ kubectl get po                  \
    -n wordpress-operator-system  \
    -l control-plane=controller-manager

Output:

NAME                                                     READY   STATUS    RESTARTS   AGE
wordpress-operator-controller-manager-578fc6887b-n4kvr   2/2     Running   0          2m22s

Now the operator is up and running, let's deploy the WordPress CustomResource which will be consumed by the operator.

Step 7 — Apply WordPress CustomResource instance

As in the point number in the step no. 2, we already created the data structure for WordPress CustomResource and now we just apply it.

$ kubectl apply -f config/samples/web_v1alpha1_wordpress.yaml

It's a good idea to take a look at the operator logs to see what happens during that:

$ kubectl logs -f                       \
    -n wordpress-operator-system        \
    -l control-plane=controller-manager \
    -c manager

What will happen now that the operator watches WordPress CustomResourceDefinition in Kubernetes APIs, and when it detects a new CustomResource object, the operator will apply the logic it has. Which is to deploy WordPress instance and to add new users. So now let's take a look and see if the logic has been applied.

First, let's check if WordPress has been deployed.

$ kubectl get po -l app.kubernetes.io/instance=wordpress-devel-workshop

Output:

NAME                                        READY   STATUS    RESTARTS   AGE
wordpress-devel-workshop-7c466c7569-tzkgd   1/1     Running   0          4m44s

Not only WordPress, but also MariaDB as database for it:

$ kubectl get po -l app=mariadb,release=wordpress-devel-workshop

Output:

NAME                                 READY   STATUS    RESTARTS   AGE
wordpress-devel-workshop-mariadb-0   1/1     Running   0          4m44s

What about WordPress users? Let's check it too.

$ kubectl exec -it devel-workshop-wordpress-79ddcdf854-6rvbc wp user list

Output:

+----+------------+--------------+-------------------+---------------------+---------------+
| ID | user_login | display_name | user_email        | user_registered     | roles         |
+----+------------+--------------+-------------------+---------------------+---------------+
| 2  | alice      | Alice        | alice@example.com | 2021-10-10 22:22:33 | author        |
| 3  | bob        | Bob          | bob@example.com   | 2021-10-10 22:22:55 | author        |
| 1  | wp-admin   | wp-admin     | user@example.com  | 2021-10-10 22:22:11 | administrator |
+----+------------+--------------+-------------------+---------------------+---------------+

Mission accomplished! WordPress has been deployed successfully and users added after that! That's all!

For operator files and all examples mentioned in his post, you can find them in the repo: k8s-wordpress-operator-poc

What's next?

We have a functional operator that deploys and manages users of WordPress. Now let's extend our operator. So what can you do next?

Specify a structural schema for WordPress CustomResourceDefinition.
Manage WordPress themes using “wp theme”.
Collect operator metrics in Prometheus format which are exposed by default by operator-sdk for any operator. Check the service wordpress-operator-controller-manager-metrics-service on ports 8383 and 8686.

Do you have other ideas or use cases to extend this operator? Please write them in the comments.

Conclusion

As you saw, if you have essential Kubernetes and Ansible knowledge, you can automate almost anything in no time, declarative style, and without a single line of code! You have unlimited possibilities!

Before finishing this post, let's have a quick recap:

Kubernetes operator pattern consists of two parts, a Kubernetes CustomResourceDefinition(s) and a software that watches Kubernetes APIs to take actions within or out cluster based on what's in the CustomResource.
The main goal of Kubernetes' operator is to put domain knowledge into software that helps in automation.
It's not necessary to build your operator, and it's always better to use official operators and what's already provided on OperatorHub.io
Writing code is not the only way to have a fully functional operator! Ansible allows you to build proper operators in no time and without code. And you can make use of the whole Ansible ecosystem.

But always remember! Make sure to use automation the right way! So don't create an operator unless you have a strong and custom use case to do. Don't be tempered to making technical debt even for good causes like automation. And most important, don't use automation to fix symptoms!