04/04/2022

Apply Kustomize builtin transformers on a single resource - Kubernetes

Kustomize is a template-free declarative management tool for Kubernetes resources. Kustomize has 2 main concepts: Generators and Transformers. In short, the first able to create K8s manifests, and the second is able to manipulate K8s manifests. In this post, I'm interested in the Kustomize Transformers.

Continue Reading »

03/03/2022

Refactoring Bank-Vaults operator for full Vault management support

In Q1 2022, my friend Islam Wazery and I were working on an interesting enhancement for the open-source Vault Kubernetes Operator, Bank-Vaults.

It's one of my biggest open-source contributions recently. In this meta post, I like to share some details about the problem we were trying to solve, goal, available solutions, implementation details, and challenges during working on the new feature.


ToC


1. Intro

Anyone who used Kubernetes knows that the Secret resources are encoded, not encrypted, so you probably need another solution to manage your secrets and sensitive data. HashiCorp Vault is one of the best tools for that purpose.

In case you didn't use Vault before, here is a short intro from its docs:

Vault is an identity-based secrets and encryption management system. A secret is anything that you want to tightly control access to, such as API encryption keys, passwords, or certificates. Vault provides encryption services that are gated by authentication and authorization methods. Using Vault's UI, CLI, or HTTP API, access to secrets and other sensitive data can be securely stored and managed, tightly controlled (restricted), and auditable.

HashiCorp already provides resources to install Vault on Kubernetes as well as a Helm chart for Vault. However, there is no official solution from HashiCorp to manage Vault itself on Kubernetes. And here comes Bank-Vaults, the Vault Swiss Army Knife!

Bank-Vaults by Banzai Cloud is an open-source umbrella project which provides various tools (Operator, Configurer, Vault Env injector, and more) for ease of use and operation of Hashicorp Vault.

The most exciting part here is the Vault Operator by Bank-Vaults. Which allows you to manage Vault on Kubernetes. And based on my research, it's the only operator in the market for Vault, so I started a PoC to use it in production.

2. Problem

After the initial setup, It seemed that the Bank-Vaults operator was mature and production-ready. It had many features like bootstrap, sealing, unsealing, cloud backend, and all the features we need, but it's missing an important feature, it didn't support full Vault management! It handled the creation of Vault's config (like policies, secrets engines, auth methods, etc.), but it didn't handle the removal of the config! And that was confirmed by the issue no. #605 which had been unresolved for more than 2 years! (Aug 2019)

Next, I've checked the operator code, and it turned out that the operator works only with create/update, but it doesn't have any mechanism to work the config removal. No one fixed that because it's a full feature that needs much work (well, that's why it needed more than 2 years to fix).

That means the operator doesn't fully manage Vault! Unfortunately, this is a deal-breaker to use the operator in production. And to fix that, there are several ways to remove the unmanaged config. In the following sections, I dive more into the available mechanisms to handle the config removal, but first, let's set the goal.

3. Goal

I already experienced managing Vault using Terraform but doing that on Kubernetes would be a snowflake where it needs an extra stateful tool! I like to use Terraform for the infrastructure like Kubernetes clusters but not for the apps. Hence, I don't want to go that way.

So the ultimate goal is that the operator should be able to manage Vault completely. It should add and remove Vault config like policies, secrets engines, auth methods, etc. And that should be done using the Kubernetes ecosystem in a cloud-native approach.

4. Available solutions

Based on my previous experience with code, infrastructure as code, and configuration management tools, there are several ways to achieve config removal, and each one of them has pros and cons.

4.1 Purge anything not in the config

The first mechanism is simply the purge approach, where the operator removes anything not in the configuration. This mechanism compares Bank-Vaults config and Vault config and removes the differences.

So this approach is somewhat radical. It doesn't allow any manual changes, and any change outside the configuration will be removed. But the good side is that, well, it doesn't allow any manual changes! So the configuration is the source of truth. However, there is mitigation to allow some manual changes by excluding some of the configs. I will discuss it in the implementation section.

4.2 Compare differences between the old and new config

The second mechanism is the last-diff approach, where the operator compares the old and the new config and removes anything not in the new config. This way is considered "semi-stateful" where you need to have the old config and the new config to compare them. This approach allows manual changes outside the operator, but the operator is only aware of the last changes.

4.3 Manage changes statefully

The third mechanism is the diff approach, where the operator maintains a state of all its operations, and with any new change, it compares the changes with the state (this is the Terraform way). This way is fully-stateful, which allows for tracking the changes done by the operator and allows manual changes outside the operator.

4.4 Handel config individually

Finally, the fourth mechanism is the flag approach, where the operator manages the config according to a config flag. For example, each policy in the operator config could have a field called "state", and its value could be "present" or "absent" (this is the style of config management tools like Ansible). In this solution, it's possible to have managed, and unmanaged config but the biggest downside is that you need to deal with the config on the individual level.

5. Implementation

In the cloud-native era, the first style looks the most suitable approach where full management is assumed. So anything that is not in the config would be removed. And to mitigate that behavior, it's 'possible to exclude some sections like policies, auth methods, etc., so they could have manual changes if needed.

Vault has main 7 configuration sections:

  • Audit
  • Auth
  • Groups
  • GroupAliases
  • Plugins
  • Policies
  • Secrets

Each section already has the "add" mechanism, and it's able to create the config in Vault, and the goal is to add the "remove" mechanism to have full CRUD (Create, read, update and delete). However, the "adding" code wasn't follow Golang style and it needed to be refactored. So for each section, the code is refactored first, then the "removing" code is added.

Let's take policies as an example (which will be the same way for all 7 configs mentioned above); the "removing" part works as the following:

  1. Bank-Vaults operator reads its config file with managed policies.
  2. Then, it calls Vault to get all already configured policies.
  3. Then, it compares what's in the config (the desired state) with Vault (the actual state).
  4. Finally, if there are differences, then the Bank-Vaults operator calls Vault to delete the unmanaged policies.

The final step is creating E2E tests to run in the CI (Github Actions). The tests simply check different cases like removing a config while the purge option is disabled/enabled fully/partially. Now let's take a look at the challenges I had while working on this feature.

6. Challenges

In the following sections, I'd like to share the top challenges while introducing full Vault management in the Bank-Vaults operator.

6.1 Project complexity

Bank-Vaults is not just the operator; it's an umbrella project to work with Vault. It's a mono repo with many shared parts. For example, the operator relies on a CLI tool with the same name.

Hence, the first challenge was to understand the project structure and where exactly to change, and how the changes could affect the rest of the project.

6.2 Refactoring the write path

After a thoughtful dive into the project, it was clear what and where I should change to fully manage Vault by Bank-Vaults (so it can add and remove config in Vault). However, the write path code in the operator (that's responsible for creating and updating managed config) doesn't follow the Golang style. It was more like Python written in Golang. It reminded me of when I wrote Golang for the first time, coming from a Python background.

Leaving the write path code as it is would make the code oddly bad and redundant. So the first step was refactoring the "write path" code, then adding the "remove path" code (which is responsible for removing any unmanaged config). And this was the second challenge to solve before the actual implementation.

6.3 Only generic acceptance tests

Another challenge was that the part I wanted to change didn't have any unit tests, but only generic acceptance tests were available. Which makes things harder to change. I needed to pay extra attention to ensure I didn't break anything while refactoring the existing code and introducing the new feature. That also means I should write some E2E tests to avoid this situation in the future.

6.4 Coordination

As I mentioned before, this feature is a bit big, and it would be implemented by 2 people (my friend Wazery and me). At the same time, it's a new project we didn't work on before, and we didn't work together before. So we needed to make sure that everything was clear and both of us were aligned to deliver this feature in high quality.

7. Result

With the PR no. #1538, and Bank-Vaults v1.15.1 was able to fully or partially purging unmanaged configuration in Vault.

The user has the option to fully or partially purge unmanaged config as shown here:

purgeUnmanagedConfig:

  # This will purge any unmanaged config in Vault.
  enabled: true

  # This will prevent purging unmanaged config for secret engines in Vault.
  exclude:
    secrets: true

To avoid behavior change, and since this feature is destructive, it was safe to make it disabled by default. The user needs to enable it explicitly in Bank-Vaults config. And as usual, it's recommended to test it in a non-production environment first.

Wazery PRs:

My PRs:


That's it! It was a pretty exciting journey for me 🤩️

Continue Reading »

02/02/2022

Extending Jenkins to run resilient pipelines with long-running jobs - CI/CD

In 2021, I wrote a high-level blog post about how a small task force revamped and modernized a gigantic CI pipeline Which was an overview of the challenges we had while building advanced declarative pipelines.

In today's post, I dive more into the technical details and implementation and how we overcame long-running job limitations in Jenkins Declarative Pipelines.

Continue Reading »

11/01/2022

01/11/2021

Now I'm a Certified Kubernetes Security Specialist + exam tips

On Saturday 30.10.2021 and in less than 24 hours of the exam, I got an email that I passed the CKS exam on the first try and I'm now a Certified Kubernetes Security Specialist. So now I have the 3 Kubernetes certificates (CKA, CKAD, and CKS). 🎉🎉

So is it now DevSecOps? 😄️ Well ... let's take a look on some details about the "why" of getting a certificate.

Continue Reading »

10/10/2021

Create your first Ansible-based Kubernetes Operator - Tutorial

This is a hands-on tutorial for a fully-functional Kubernetes Operator as a follow up of the Introduction to Kubernetes Operators. This Proof-of-Concept shows how you can reuse your Ansible code/skills to do automation on Kubernetes which also allows you to wrap any application (even legacy ones) in a declarative, cloud-native style.

This tutorial will cover writing Kubernetes Operators in no time. Also using previous experience with Ansible ecosystem to write Kubernetes operator without writing real code. You will learn and understand the following:

  • What are Kubernetes operators and how they are work.
  • How Ansible-based operators work to automate tasks within Kubernetes and how it's different from other operators types.
  • How to use your previous experience with Ansible to writing Kubernetes operator in no time.
  • How to build your first operator with real use case.
Continue Reading »

22/09/2021

Docker Best Practices Workshop - Presentation

Well, this is the 3rd post in the same month, I didn't do that for a long time! But Q3 2021 has been super productive and many things have been done.

Yesterday, as part of the knowledge share and developer enablement at Camunda, I delivered a Docker Best Practices Workshop which was available for the whole engineering division at Camunda.

What I really liked about this workshop, that everyone told me that they learned something new, even though they have been working with Docker for a pretty long time!

If I have a single piece of advice

Continue Reading »

11/09/2021

How a small task force revamped and modernized a gigantic CI pipeline - DevOps Transformation

One of the typical cases for DevOps transformation is when a team has the ownership of a component they shouldn't own in the first place. For example, when you have super complex CI/CD, and the developers don't manage the CI/CD jobs directly but with help from the operations team.

Let's imagine this scenario

Continue Reading »

09/09/2021

Powered by Blogger.

Hello, my name is Ahmed AbouZaid and this is my "lite" technical blog!

I'm a passionate DevOps engineer, Cloud/Kubernetes specialist, Free/Open source geek, and an author.

I believe in self CI/CD (Continuous Improvements/Development), also that "the whole is greater than the sum of its parts".

DevOps transformation, automation, data, and metrics are my preferred areas. And I like to help both businesses and people to grow.

Popular Posts