31/12/2022

2022 Highlights


Just a random image generated with AI!

Finally, 2022 is over! What a crazy year! In many countries, the Covid-19 pandemic is about to come to an end, but a global economic recession is almost at the door!

On a personal level, it wasn't an easy year for sure, but it was good in many different ways.

Top 5 highlights in 2022

  1. Career: Started the Distribution team at Camunda 🤩️ which is responsible for building and deploying the Camunda Platform 8 Self-Managed (now using an umbrella [Helm chart). Later on, there will be a Kubernetes Operator. That's a great career boost; I just started with many new and exciting challenges. And BTW, my team will begin hiring in 2023!

  2. Coding for Kubernetes: Big refactoring for Bank-Vaults operator which is the biggest open-source contribution to a project I don't own/manage. It polished my Golang skills, and I learned many new things (and had fun where I redesigned the operator logo 😂️).

  3. Security Knowledge-sharing: In 2021, I got my CKS certificate. Then, at the beginning of 2022, I started a security initiative at Camunda to enhance security practices. Then, later on, I conducted a session about Kubernetes Security Best Practices (with some tips for the CKS exam) which was a great case that includes theory, applied practice, and knowledge-sharing!

  4. Advanced CI/CD Knowledge-sharing: I wrote a detailed post about my experience with custom step conditionalRetry, which handles failures on spot/preemptible infrastructure so you could save up to 90% of the costs and have stable builds as well! It's been released as open source, and you can use that in your pipeline!

  5. Activities: Helped more people in their careers, once by moderating the DevOps circle at [JobStack 2022, and also in the voluntary mentorship that I do from time to time.

Besides these highlights, I had some nice stuff during the year. For example:

  • Added more features to kubech (which is a tool to set kubectl context/namespace per shell/terminal )
  • Virtually attended KubeCon Europe 2022, and the content was great!
  • Reached my writing goal this year and wrote 12 blog posts in 2022!

And since we are on this topic, here are the top 5 visited blog posts in 2022!

Top 5 posts in 2022

  1. Delete a manifest from Kustomize base - Kubernetes

  2. 3 ways to customize off-the-shelf Helm charts with Kustomize - Kubernetes

  3. Validate, format, lint, secure, and test Terraform IaC - CI/CD

  4. Now I'm a Certified Kubernetes Application Developer + 10 exam tips

  5. Continuous Delivery and Maturity Model - DevOps

No wonder that Kustomize post is the hights post; that's because there is not much content about it even though it's built-in kubectl now! (since v1.14), I probably need to give it more attention since there is an increase in the demand for it.

For that reason, I just started Awesome Kustomize, which is a curated and collaborative list of awesome Kustomize resources 🎉️


Enjoy 🚀️

Continue Reading »

12/12/2022

22/11/2022

Set OpenAPI patch strategy for Kubernetes Custom Resources - Kustomize

Kustomize supports 2 main client-side patching methods for Kubernetes manifests, JSON Patching and Strategic Merge Patch. In the JSON Patching method, you have a "meta" syntax that specifies operation/target/value. In the Strategic Merge Patch method, you can override values by providing a patch file with the same structure but with new values, and it will override the original values (it simply merges the 2 files with the same structure).

Each method has pros and cons but generally speaking, I would arguably say that Strategic Merge Patch is better for big changes/patches, and JSON Patching is better for smaller fine-grained patches. And for my use case, I will use Strategic Merge Patch, but I just faced a problem with patching Kubernetes Custom Resources!

ToC

TL;DR

Kustomize's default patch strategy for the lists (arrays) is replace. That means the patch list will override the original list, which is not always the desired behavior. That behavior could be changed only if an OpenAPI schema for a Kubernetes resource is available to define the patch strategy.

The OpenAPI schema for Kustomize core resources (like Namespace, Deployment, Pod, etc.) is already part of Kustomize, so changing the patch strategy works out of the box for these resources. However, if you have a Kubernetes Custom Resource, you need to provide to Kustomize the OpenAPI schema of that custom resource. And that's only useful if the custom resource includes the OpenAPI extensions related to merging strategy.

This post shows how to add those extensions to have control over the patch strategy. You can jump directly to the solution section if you already know all these details.

1. Task

I want to use Kustomize to patch Kubernetes Custom Resources like Prometheus AlertmanagerConfig), and I want to use merge as a patch strategy for lists. That means the original lists in the same path should be merged, not overridden by the patch list. That works out of the box for Kustomize core resources but not for custom resources. First, let's see that in action, then dive into the explanation afterward.

2. Issue reproduction

Let's have a look at this example using the core resource Pod, given this Kustomization file:

# kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
- pod.yaml

patches:
- pod-patch01.yaml
- pod-patch02.yaml

And these resources and patches files:

# pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    env:
    - name: MY_ENV_VAR_01
      value: source

# pod-patch01.yaml
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    env:
    - name: MY_ENV_VAR_01
      value: patch 01

# pod-patch02.yaml
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    env:
    - name: MY_ENV_VAR_02
      value: patch 02

The kustomize build . will return:

apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    env:
    - name: MY_ENV_VAR_02
      value: patch 02
    - name: MY_ENV_VAR_01
      value: patch 01

As you see, the env key MY_ENV_VAR_01 overrode by the value from pod-patch01.yaml, and the env key MY_ENV_VAR_02 has just been added from pod-patch02.yaml. That's great; the lists are merged based on the name key.

...

However, if you tried to do that with a CustomResource like AlertmanagerConfig, it would not work! Let's give it a try! Given this Kustomization file:

# kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
- alertmanagerconfig.yaml

patches:
- alertmanagerconfig-patch01.yaml
- alertmanagerconfig-patch02.yaml

And these resources and patches files:

# alertmanagerconfig.yaml
apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
  name: example
spec:
  receivers:
  - name: 'webhook01'
    webhookConfigs:
    - url: 'http://example.com/'

# alertmanagerconfig-patch01.yaml
apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
  name: example
spec:
  receivers:
  - name: 'webhook01'
    webhookConfigs:
    - url: 'http://example01.com/'

# alertmanagerconfig-patch02.yaml
apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
  name: example
spec:
  receivers:
  - name: 'webhook02'
    webhookConfigs:
    - url: 'http://example02.com/'

The kustomize build . will return:

apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
  name: example
spec:
  receivers:
  - name: webhook02
    webhookConfigs:
    - url: http://example02.com/

As you see, the last patch from the file alertmanagerconfig-patch02.yaml replaced everything in the spec.receivers list, and that's the default behavior in Kustomize. The patch list will replace everything in the original list. Why? Because that's the safest choice since Kustomize doesn't know anything about AlertmanagerConfig schema! Before diving into the fix, let's learn more about the why.

3. Background

The Strategic Merge Patch is a client-side merge method that merges 2 or more Kubernetes manifests together based on the manifest apiVersion, kind, and metadata.name. To merge 2 YAML files, you need to decide the "merge strategy" for different data types, i.e., what should happen for the "string", "int", "list", "map", and so on? Should they merge together? Or do the patch values override the original values?

Also, each data type could be patched differently; for example, how to patch a list? Kustomize provides different patch formats like merge, replace, and delete. In fact, in a previous post (Delete a manifest from Kustomize base), I mentioned the delete patch strategy, which works out of the box with core Kubernetes primitive (namespace, deployment, pod, etc.), but not the CustomResources.

Why does it work with core resources only? Because of 2 things.

  1. The Kubernetes project includes specific keys (extensions) in the core resources OpenAPI schema to deal with that. Namely the OpenAPI extensions x-kubernetes-patch-strategy and x-kubernetes-patch-merge-key (see them in Kubernetes swagger.json).
  2. The OpenAPI schema for Kubernetes' core resources is embedded in Kustomize.

If those keys are not included in the OpenAPI schema, and Kustomize doesn't have access to the OpenAPI schema, the default behavior will be applied, which in Kustomize, the patch list will fully replace the original list.

4. Solution

Now there are 2 cases, First, if the custom resource definition already has the x-kubernetes-patch-*`, and second, if the custom resource definition doesn't have them at all. The Kustomize supports "openapi" field, which specifies where Kustomize gets its OpenAPI schema.

For the first case, it's easy; you just need to point Kustomize to the OpenAPI schema and that's it!

# kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

openapi:
  # It could be also a URL.
  path: monitoring.coreos.com_v1alpha1_alertmanagerconfig.json
[...]

However, for the second case, in a Platonic world, you should contact the upstream to add the OpenAPI extension keys x-kubernetes-patch-strategy and x-kubernetes-patch-merge-key. But as you know, in reality, that will take ages, and in the best-case scenario, it will not happen overnight! So the pragmatic solution is to tell Kustomize how to deal with that custom resource via OpenAPI schema.

We will simply get the Custom Resource's OpenAPI schema and add the x-kubernetes-patch-* keys to it with the merge strategy, which can also be customized using different patch formats like merge, replace, and delete.

The following are the step to get the OpenAPI schema of a custom resource, clean it, add the merge strategy keys, and finally use it in kustomization.yaml file.

4.1 Get the custom resource OpenAPI schema

You can get the OpenAPI schema for the resource from the upstream project, or if you have already installed its CustomResourceDefinition, then you can get it directly by calling Kubernetes API. And since K8s API will return every definition it has (probably thousands of lines), we will use jq to get the exact custom resource OpenAPI schema.

Here is a snippet that will help to get the OpenAPI definition for a particular resource:

get_openapi_definition () {
    jq \
        --arg group "${1}" \
        --arg version "${2}" \
        --arg kind "${3}" \
        '.definitions | with_entries(select(.value."x-kubernetes-group-version-kind"[0] |
            .group==$group and
            .version==$version and
            .kind==$kind
        ))'
}

And we can get the schema for the exact resource from Kubernetes API by running the following (remember, you should have installed the CRD for that resource into your Kubernetes cluster to be able to do that):

kustomize openapi fetch | get_openapi_definition "monitoring.coreos.com" "v1alpha1" "AlertmanagerConfig" > alertmanagerconfig_openapi_schema_map.json

4.2 Find the desired key path

Here is most of the manual work, but the good news is that you need to do it once. We need 2 things, the data of the path spec.receivers and the key x-kubernetes-group-version-kind. Open the schema file and remove everything not under the hierarchy of the path we want to customize.

Here is what it looks like after removing everything unrelated:

# alertmanagerconfig_openapi_schema_map.json
{
  "com.coreos.monitoring.v1alpha1.AlertmanagerConfig": {
    "properties": {
      "spec": {
        "properties": {
          "receivers": {
            "type": "array"
          }
        },
        "type": "object"
      }
    },
    "type": "object",
    "x-kubernetes-group-version-kind": [
      {
        "group": "monitoring.coreos.com",
        "kind": "AlertmanagerConfig",
        "version": "v1alpha1"
      }
    ]
  }
}

4.3 Create the custom OpenAPI schema file

Now we just need to put everything together adding x-kubernetes-patch-* keys and the definitions parent. The final result will look like the following:

# monitoring.coreos.com_v1alpha1_alertmanagerconfig.json
{
  "definitions": {
    "com.coreos.monitoring.v1alpha1.AlertmanagerConfig": {
      "properties": {
        "spec": {
          "properties": {
            "receivers": {
              "x-kubernetes-patch-merge-key": "name",
              "x-kubernetes-patch-strategy": "merge",
              "type": "array"
            }
          },
          "type": "object"
        }
      },
      "type": "object",
      "x-kubernetes-group-version-kind": [
        {
          "group": "monitoring.coreos.com",
          "kind": "AlertmanagerConfig",
          "version": "v1alpha1"
        }
      ]
    }
  }
}

4.4 Update kustomization.yaml with the OpenAPI schema file

The final step, we need to tell Kustomize about our custom OpenAPI schema file as follows (it also accepts YAML files in case you like to convert it):

# kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

openapi:
  path: monitoring.coreos.com_v1alpha1_alertmanagerconfig.json

resources:
- alertmanagerconfig.yaml

patches:
- alertmanagerconfig-patch01.yaml
- alertmanagerconfig-patch02.yaml

Now the kustomize build . will return:

apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
  name: example
spec:
  receivers:
  - name: webhook02
    webhookConfigs:
    - url: http://example02.com/
  - name: webhook01
    webhookConfigs:
    - url: http://example01.com/

Great, it works as expected! 🎉️ And the custom resource list is merged based on the name key (you can choose any merge key based on your use case).

Conclusion

Kustomize is super powerful and has many capabilities to manage your entire Kubernetes infrastructure as code! And the most fantastic thing? It's now part of kubectl, so it's almost the standard way to deal with advanced Kubernetes manifest structure.

Enjoy :-)

Continue Reading »

11/11/2022

09/09/2022

How to create Makefile targets with dynamic parameters and autocompletion - Make

Make is one of the oldest build automation tools ever (the original Make was created in 1976!). And since then, it got many implementations as BSD make, GNU make, and Microsoft nmake. It uses a declarative syntax, and sometimes that's the best and worst thing about it!

You probably used Make at least once. It's widely adopted in the tech industry, or as someone said, "Make is the second best tool to automate anything!". 😄️

After

Continue Reading »

08/08/2022

2 ways to route Ingress traffic across namespaces - Kubernetes

The tech industry is full of workarounds, and you probably rely on one or more. There is no problem with that per se, but it's important to review your workarounds from time to time because there could be a new standard/intuitive way to make it.

The Problem

A couple of years ago I had a use case where a single domain has 2 sub-paths each of them using its own service in different namespaces. Let's see this example:

example.com/app => service "backend" in namespace "app"
example.com/blog => service "wordpress" in namespace "blog"

The problem was that the Ingress object is namespaced which means that it interacts with services within the same namespace. Also, only one ingress object per host/domain is allowed.

So at that time I found a generic solution which looks like a workaround. Actually, by thinking about it now, it was not a bad workaround. It depends on how you manage your infrastructure, and you can think about it as a centralized vs decentralized approach.

The Solution

So here are the 2 ways to route Ingress traffic across namespaces in Kubernetes. The 1st is a generic way that will work with any Ingress controller. The 2nd relies on the Ingress controller capabilities NGINX Ingress Controller by NGINX, Inc. (NOT Ingress-NGINX Controller by Kubernetes).

Option One: Generic method - ExternalName Service

This method relies on native Kubernetes ExternalName Service which is simply a DNS CNAME! This method is centralized where it uses the normal Ingress object in addition to ExternalName Service within the same namespace as a bridge to the services in any other namespace.

The following is an example of that setup with a single Ingress resource and 2 ExternalName services (3 endpoints which are /, /coffee, and /tea).

Config for shop.example.com including the 2 sub-paths /coffee and /tea in addition to the root /.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: shop-ingress
  namespace: shop
  annotations:
    kubernetes.io/ingress.class: "nginx"
spec:
  tls:
  - hosts:
    - shop.example.com
    secretName: shop-secret
  rules:
  - host: shop.example.com
    http:
      paths:
      - path: /coffee
        pathType: Prefix
        backend:
          service:
            name: coffee-svc-bridge
            port:
              number: 80
      - path: /tea
        pathType: Prefix
        backend:
          service:
            name: tea-svc-bridge
            port:
              number: 80

The coffee-svc-bridge service in the shop namespace is a CNAME for the coffee-svc service in coffee namespace:

apiVersion: v1
kind: Service
metadata:
  name: coffee-svc-bridge
  namespace: shop
spec:
  type: ExternalName
  externalName: coffee-svc.coffee

The tea-svc-bridge service in the shop namespace is a CNAME for the tea-svc service in tea namespace:

apiVersion: v1
kind: Service
metadata:
  name: tea-svc-bridge
  namespace: shop
spec:
  type: ExternalName
  externalName: tea-svc.tea

As you see, the Ingress config comes in 1 part and is normal. And use the ExternalName services as a bridge to access the services in the other namespaces.

Option Two: Controller-specific method - Mergeable Ingress Resources

The other option is using controller-specific capabilities to achieve that goal. There are dozens of Ingress controllers for Kubernetes like the Ingress-NGINX (by Kubernetes project), NGINX Ingress Controller (by NGINX, Inc.), Traefik, HAProxy, Istio, and many more.

Here I will cover only NGINX Ingress Controller by NGINX, Inc., but the idea is the same, using the controller-specific features.

If you took a look at the official Nginx docs you will find the Cross-namespace Configuration page suggests using Mergeable Ingress Resources.

That approach relies on a simple idea, there is a single Ingress resource that has all configurations related to the host/domain and that resource is called "master", and any number of the Ingress resources handles the paths under that host/domain and each of these resources is called "minion".

Each one of the master or minion can or can not contain some Ingress annotations based on their role. Here I will use here the examples from the official documentation.

Config for shop.example.com like TLS and host-level annotations.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: shop-ingress-master
  namespace: shop
  annotations:
    kubernetes.io/ingress.class: "nginx"
    nginx.org/mergeable-ingress-type: "master"
spec:
  tls:
  - hosts:
    - shop.example.com
    secretName: shop-secret
  rules:
  - host: shop.example.com

Config for shop.example.com/coffee which is in the coffee namespace and routes the traffic of the coffee-svc service.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: shop-ingress-coffee-minion
  namespace: coffee
  annotations:
    kubernetes.io/ingress.class: "nginx"
    nginx.org/mergeable-ingress-type: "minion"
spec:
  rules:
  - host: shop.example.com
    http:
      paths:
      - path: /coffee
        pathType: Prefix
        backend:
          service:
            name: coffee-svc
            port:
              number: 80

Config for shop.example.com/tea which is in the tea namespace and routes the traffic of the tea-svc service.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: shop-ingress-tea-minion
  namespace: tea
  annotations:
    kubernetes.io/ingress.class: "nginx"
    nginx.org/mergeable-ingress-type: "minion"
spec:
  rules:
  - host: shop.example.com
    http:
      paths:
      - path: /tea
        pathType: Prefix
        backend:
          service:
            name: tea-svc
            port:
              number: 80

As you see, the Ingress config is split into 2 parts, the host/domain config, and the paths config. Each one of them could be in a different namespace and handles the services in that namespace.

Conclusion

Maybe the first approach looks like a workaround, but for many workloads could be better and easier to follow and digest. But in general, it's good to have different ways to use what's fit better.

Enjoy :-)

Continue Reading »

27/07/2022

Notes about KRM Functions - Kustomize

Recently I dived into the new plugin system in Kustomize, KRM Functions, so I wanted to know more about it. Kubernetes Resource Model or KRM for short is simply a unified way to work with resources in Kubernetes ecosystem. For example, all plugins will have the same input and output format.

Here is a summary I found useful to share:

  • Kustomize decided to adapt KRM (Kubernetes Resource Model) functions from kpt ... and that's actually not new, it's been there for some time (around 2020).
  • The goal is to deprecate the old plugins style model. Kustomize already deprecated both Go Plugins and Exec plugins in favour of KRM style.
  • KRM Functions style has 2 ways for the plugins: Containerized KRM Functions and Exec KRM Functions.
  • The containerized KRM function is really useful one because you don't need to manage and download the Kustomize plugins (it was super annoying to manage plugins especially across multiple OS).

However, KRM functions are still alpha but look super promising, however, they are still buggy or incomplete for some use cases!

  • KRM exec has a bug which makes it almost unusable. In the PR no. #4654 I've a proposal to fix that issue.
  • KRM container has also some issues! It only works with KRM resources but not any external files (for example, if a plugin reads files from the disk, like creating ConfigMap from a text file, that will not work at the moment).

In May 2022, I decided to go a bit further and try to implement the KRM style to one of the existing plugins. So I've selected SopsSecretGenerator Kustomize plugin and introduced KRM support in the PR no. #32 So if you want to have an idea how KRM style looks like in action, then take a look at the change I made in that PR (It has been merged already).

In conclusion, KRM Functions look super promising but they are not that mature yet in Kustomize and they don't fit all the use cases.

Continue Reading »

02/07/2022

Kubernetes Security Best Practices with tips for the CKS exam - Presentation

At the end of last year (2021), and after a couple of years of Kubernetes production hands-on, I found that it was time to dive more into Kubernetes security, and after a lot of reading and practising, I got my CKS certificate. Also, for the last 3 quarters, security was one of the focus areas in my team, and I was taking care of it.

For that reason, I decided to consolidate that into a session which is a combination of Kubernetes Security Best Practices and tips for the Certified Kubernetes Security Specialist (CKS) exam to share the knowledge in my team as well across teams.

The session is just 15 Min in total. The first 6 Min are for everyone and the rest for Kubernetes specialists or anyone who wants to dive more into Kubernetes security topics. If you are just interested in the tools, then jump to section #5 Kubernetes Security Starter Kit. If you are just interested in the CKS exam tips, then jump to section #6 CKS Exam Overview and Tips.

Agenda:

  1. Introduction
  2. Shift-left and DevSecOps
  3. General Security Concepts
  4. The 4C's of Cloud Native Security
  5. Kubernetes Security Starter Kit
  6. CKS Exam Overview and Tips

Note: If you want more details about CKS, checkout my previous post for more info Now I'm a Certified Kubernetes Security Specialist + exam tips.



The recording of Kubernetes Security Best Practices session

Overview:

A dive into Kubernetes Security Best Practices and tips for the Certified Kubernetes Security Specialist (CKS) exam.

The 1-3 sections are for everyone and will cover the container era's security. So it doesn't matter your title or background; they are a good start for anyone.

The 4-6 sections will dive more into Kubernetes security, so DevOps engineers and SREs will probably find that more interesting. But in general, anyone interested in Kubernetes security is more than welcome.


That's it, enjoy :-)

Continue Reading »

22/06/2022

Moderating DevOps circle at JobStack 2022

Last Satruday (18.06.2022), I had a great chance to moderate and participate the DevOps circle in JobStack 2022 by Talents Arena. JobStack is the biggest virtual tech job fair in the region (MENA) and this was the 4rd edition.

With Hussein El-Sayed (Software engineer III at AWS) and Mohamed Radwan (Sr. Cloud Architect at T-Systems) ... we answered many different questions about DevOps. The circle or the AMA session was heavily based on a previous collaborative session between us in 2022 (DevOps! What, Why, and How? - Arabic)

The whole event was great and had a lot of fruitful sessions and discussions.

Continue Reading »

04/04/2022

Apply Kustomize builtin transformers on a single resource - Kubernetes

Kustomize is a template-free declarative management tool for Kubernetes resources. Kustomize has 2 main concepts: Generators and Transformers. In short, the first able to create K8s manifests, and the second is able to manipulate K8s manifests. In this post, I'm interested in the Kustomize Transformers.

Continue Reading »

03/03/2022

Refactoring Bank-Vaults operator for full Vault management support

In Q1 2022, my friend Islam Wazery and I were working on an interesting enhancement for the open-source Vault Kubernetes Operator, Bank-Vaults.

It's one of my biggest open-source contributions recently. In this meta post, I like to share some details about the problem we were trying to solve, goal, available solutions, implementation details, and challenges during working on the new feature.


ToC


1. Intro

Anyone who used Kubernetes knows that the Secret resources are encoded, not encrypted, so you probably need another solution to manage your secrets and sensitive data. HashiCorp Vault is one of the best tools for that purpose.

In case you didn't use Vault before, here is a short intro from its docs:

Vault is an identity-based secrets and encryption management system. A secret is anything that you want to tightly control access to, such as API encryption keys, passwords, or certificates. Vault provides encryption services that are gated by authentication and authorization methods. Using Vault's UI, CLI, or HTTP API, access to secrets and other sensitive data can be securely stored and managed, tightly controlled (restricted), and auditable.

HashiCorp already provides resources to install Vault on Kubernetes as well as a Helm chart for Vault. However, there is no official solution from HashiCorp to manage Vault itself on Kubernetes. And here comes Bank-Vaults, the Vault Swiss Army Knife!

Bank-Vaults by Banzai Cloud is an open-source umbrella project which provides various tools (Operator, Configurer, Vault Env injector, and more) for ease of use and operation of Hashicorp Vault.

The most exciting part here is the Vault Operator by Bank-Vaults. Which allows you to manage Vault on Kubernetes. And based on my research, it's the only operator in the market for Vault, so I started a PoC to use it in production.

2. Problem

After the initial setup, It seemed that the Bank-Vaults operator was mature and production-ready. It had many features like bootstrap, sealing, unsealing, cloud backend, and all the features we need, but it's missing an important feature, it didn't support full Vault management! It handled the creation of Vault's config (like policies, secrets engines, auth methods, etc.), but it didn't handle the removal of the config! And that was confirmed by the issue no. #605 which had been unresolved for more than 2 years! (Aug 2019)

Next, I've checked the operator code, and it turned out that the operator works only with create/update, but it doesn't have any mechanism to work the config removal. No one fixed that because it's a full feature that needs much work (well, that's why it needed more than 2 years to fix).

That means the operator doesn't fully manage Vault! Unfortunately, this is a deal-breaker to use the operator in production. And to fix that, there are several ways to remove the unmanaged config. In the following sections, I dive more into the available mechanisms to handle the config removal, but first, let's set the goal.

3. Goal

I already experienced managing Vault using Terraform but doing that on Kubernetes would be a snowflake where it needs an extra stateful tool! I like to use Terraform for the infrastructure like Kubernetes clusters but not for the apps. Hence, I don't want to go that way.

So the ultimate goal is that the operator should be able to manage Vault completely. It should add and remove Vault config like policies, secrets engines, auth methods, etc. And that should be done using the Kubernetes ecosystem in a cloud-native approach.

4. Available solutions

Based on my previous experience with code, infrastructure as code, and configuration management tools, there are several ways to achieve config removal, and each one of them has pros and cons.

4.1 Purge anything not in the config

The first mechanism is simply the purge approach, where the operator removes anything not in the configuration. This mechanism compares Bank-Vaults config and Vault config and removes the differences.

So this approach is somewhat radical. It doesn't allow any manual changes, and any change outside the configuration will be removed. But the good side is that, well, it doesn't allow any manual changes! So the configuration is the source of truth. However, there is mitigation to allow some manual changes by excluding some of the configs. I will discuss it in the implementation section.

4.2 Compare differences between the old and new config

The second mechanism is the last-diff approach, where the operator compares the old and the new config and removes anything not in the new config. This way is considered "semi-stateful" where you need to have the old config and the new config to compare them. This approach allows manual changes outside the operator, but the operator is only aware of the last changes.

4.3 Manage changes statefully

The third mechanism is the diff approach, where the operator maintains a state of all its operations, and with any new change, it compares the changes with the state (this is the Terraform way). This way is fully-stateful, which allows for tracking the changes done by the operator and allows manual changes outside the operator.

4.4 Handel config individually

Finally, the fourth mechanism is the flag approach, where the operator manages the config according to a config flag. For example, each policy in the operator config could have a field called "state", and its value could be "present" or "absent" (this is the style of config management tools like Ansible). In this solution, it's possible to have managed, and unmanaged config but the biggest downside is that you need to deal with the config on the individual level.

5. Implementation

In the cloud-native era, the first style looks the most suitable approach where full management is assumed. So anything that is not in the config would be removed. And to mitigate that behavior, it's 'possible to exclude some sections like policies, auth methods, etc., so they could have manual changes if needed.

Vault has main 7 configuration sections:

  • Audit
  • Auth
  • Groups
  • GroupAliases
  • Plugins
  • Policies
  • Secrets

Each section already has the "add" mechanism, and it's able to create the config in Vault, and the goal is to add the "remove" mechanism to have full CRUD (Create, read, update and delete). However, the "adding" code wasn't follow Golang style and it needed to be refactored. So for each section, the code is refactored first, then the "removing" code is added.

Let's take policies as an example (which will be the same way for all 7 configs mentioned above); the "removing" part works as the following:

  1. Bank-Vaults operator reads its config file with managed policies.
  2. Then, it calls Vault to get all already configured policies.
  3. Then, it compares what's in the config (the desired state) with Vault (the actual state).
  4. Finally, if there are differences, then the Bank-Vaults operator calls Vault to delete the unmanaged policies.

The final step is creating E2E tests to run in the CI (Github Actions). The tests simply check different cases like removing a config while the purge option is disabled/enabled fully/partially. Now let's take a look at the challenges I had while working on this feature.

6. Challenges

In the following sections, I'd like to share the top challenges while introducing full Vault management in the Bank-Vaults operator.

6.1 Project complexity

Bank-Vaults is not just the operator; it's an umbrella project to work with Vault. It's a mono repo with many shared parts. For example, the operator relies on a CLI tool with the same name.

Hence, the first challenge was to understand the project structure and where exactly to change, and how the changes could affect the rest of the project.

6.2 Refactoring the write path

After a thoughtful dive into the project, it was clear what and where I should change to fully manage Vault by Bank-Vaults (so it can add and remove config in Vault). However, the write path code in the operator (that's responsible for creating and updating managed config) doesn't follow the Golang style. It was more like Python written in Golang. It reminded me of when I wrote Golang for the first time, coming from a Python background.

Leaving the write path code as it is would make the code oddly bad and redundant. So the first step was refactoring the "write path" code, then adding the "remove path" code (which is responsible for removing any unmanaged config). And this was the second challenge to solve before the actual implementation.

6.3 Only generic acceptance tests

Another challenge was that the part I wanted to change didn't have any unit tests, but only generic acceptance tests were available. Which makes things harder to change. I needed to pay extra attention to ensure I didn't break anything while refactoring the existing code and introducing the new feature. That also means I should write some E2E tests to avoid this situation in the future.

6.4 Coordination

As I mentioned before, this feature is a bit big, and it would be implemented by 2 people (my friend Wazery and me). At the same time, it's a new project we didn't work on before, and we didn't work together before. So we needed to make sure that everything was clear and both of us were aligned to deliver this feature in high quality.

7. Result

With the PR no. #1538, and Bank-Vaults v1.15.1 was able to fully or partially purging unmanaged configuration in Vault.

The user has the option to fully or partially purge unmanaged config as shown here:

purgeUnmanagedConfig:

  # This will purge any unmanaged config in Vault.
  enabled: true

  # This will prevent purging unmanaged config for secret engines in Vault.
  exclude:
    secrets: true

To avoid behavior change, and since this feature is destructive, it was safe to make it disabled by default. The user needs to enable it explicitly in Bank-Vaults config. And as usual, it's recommended to test it in a non-production environment first.

Wazery PRs:

My PRs:


That's it! It was a pretty exciting journey for me 🤩️

Continue Reading »

02/02/2022

Extending Jenkins to run resilient pipelines with long-running jobs - CI/CD

In 2021, I wrote a high-level blog post about how a small task force revamped and modernized a gigantic CI pipeline Which was an overview of the challenges we had while building advanced declarative pipelines.

In today's post, I dive more into the technical details and implementation and how we overcame long-running job limitations in Jenkins Declarative Pipelines.

Continue Reading »

11/01/2022

Powered by Blogger.

Hello, my name is Ahmed AbouZaid, I'm a passionate Tech Lead DevOps Engineer. 👋

I specialize in Cloud-Native and Kubernetes. I'm also a Free/Open source geek and book author. My favorite topics are DevOps transformation, DevSecOps, automation, data, and metrics.

More about me ➡️

Contact Me

Name

Email *

Message *

Start Your DevOps Engineer Journey!

Start Your DevOps Engineer Journey!
Start your DevOps career for free the Agile way in 2024 with the Dynamic DevOps Roadmap ⭐

Latest Post

2024 Highlights

Finally, 2024 is over! Another crazy year but I like it! 🤩 This year was a bit chill compared to previous ones, but also I ...

Popular Posts

Blog Archive