Kubernetes IAM in Depth

Since this blog post is quite long, here is a quick view of the table of content.

Table of content

Intro

This blog post will talk about Identity and access management in depth in Kubernetes. Identity in Kubernetes is a topic that is quite often an afterthought. It could be because cloud providers can give a false sense of security in the literal sense. Or, it could be because of how complex this topic gets. This is the blog post I wish I had when I was learning about Kubernetes security and how to manage them.

Because this blog post is quite long, here is the table of content of what we will be discussing. I recommend you read it all since some concepts build on each other.

Access Control

Any time you have to decide whether to allow or deny access of a client (user or application) to a resource, you have entered the access control problem domain.

Access controls define the allowable interactions between subjects and objects. It is based on the granting of rights, or privileges, to a subject with respect to an object. - Mike Chapple

Mike Chapple puts the definition of Access control beautifully! The description points out three essential keywords:

Subjects: Any entity requesting access, like a user or an application.
Objects: A resource a subject desires to access, like Pods or Deployments.
Privileges (Policies): Essentially defines what a subject can do on an object, like listing or deleting a Pod.

With these three keywords alone, you can describe any access control system.

Authentication and Authorization

It is important to distinguish between Authentication (Identity) and Authorization (Access Level).

Authentication: Evaluation if a subject (user or application) is who they claim they are.

Authorization: Defines if a subject can act on an object.

💡 Usually, we refer to authentication and authorization combined as IAM.

Principal components of Access Control in Kubernetes

Any access control system must consist of three components:

Subjects
Objects
Policies

Kubernetes IAM is no different! but it defines it using:

Subjects
Resources (Objects)
Roles (Policy)
RoleBinding (Policy enforcement on a subject)

1- Kubernetes Subjects

Kubernetes has 3 types of subjects:

User
Group
Service account

Kubernetes has no resource or object called User or Group. These are just predefined names for the purposes of identification within RBAC RoleBindings. The method of determining the user and group is entirely dependent on the authentication method in use, and Kubernetes has no way to define or manage them internally. - Production Kubernetes

As described in the above quote, users and groups is not a native concept in Kubernetes. Both users and groups are just modelled as strings and solely used for RBAC RoleBindings.

Kubernetes has no resource or object called User or Group. These are just predefined names for identification within RBAC RoleBindings.

2- Kubernetes Objects

Kubernetes has many types of objects; these are essentially just Kubernetes components.

Remember that there are two major types of objects in K8s:

Core components (Pods, Deployments, Service, etc.)
Custom components, aka Custom Resource (CR) (Cert manager, Argocd, etc.)

These objects are also known as Kubernetes resources.

3- Kubernetes Policies (Roles)

Kubernetes follows RBAC for policies. RBAC is role-based access control.

To simplify, RBAC defines providing roles access to a resource rather than managing individual users or applications. The idea helps in having a logical grouping of who can access what based on their role. People (and apps) come and go, but their functions stay the same. Hence, we associate a user or a group of users to a role. Then, the role has access to the resource.

Kubernetes implements RBAC by providing a policy to the role itself. That's why K8s doesn't use the term Policy to refer to the rules. Instead, these rules are defined in the Role.

Here is an example of Kubernetes Role

Kubernetes have two different types of Roles. The difference lies in their scope.

Role: Is limited to the namespace it’s defined on
ClusterRole: Is scoped to the entire cluster

Here are some common verbs supported in Kubernetes:

get, list, watch
create, update, patch
delete

💡 K8s can have custom resources, which can have their verbs as well. For example, Argo Workflows adds a few verbs: `Submit`, which is the act to create a workflow.

If you would like to learn more about the breakdown of roles, visit here

Sometimes you will see roles that also include the property resource name and I would like to clarify how is that used with the conjunction of a resource:

You can also refer to resources by name for certain requests through the resourceNames list. When specified, requests can be restricted to individual instances of a resource. Here is an example that restricts its subject to only get or update a Conifmap named my-configmap

💡 You cannot restrict create or deletecollection requests by their resource name. For create, this limitation is because the name of the new object may not be known at authorization time. If you restrict list or watch by resourceName, clients must include a metadata.name field selector in their list or watch request that matches the specified resource name in order to be authorized. For example, kubectl get configmaps --field-selector=metadata.name=my-configmap

A resource name allows you to reference a particular resource by name say that you choose a deployment type, you can limit the role to a particular deployment name. This is handy as we will see later when creating sudo like behaviour in Kubernetes.

4- RoleBinding

It’s important to understand that a Role or a ClusterRole defines access to the Objects but it doesn’t reference anything about the subject. Hence, we need a mechanism to tie a role to a subject. This is where RoleBinding and ClusterRoleBinding come in.

RoleBinding assigns a subject the permissions it can perform. In other words, a Subject must have a role to access an Object. This association is done through RoleBinding.

Here I would like to point you to some different ways to tweak how you create a role binding.

Section summary

Just to make sure you are still following, so far we discussed:

What is Access Control
What is RBAC
The principal components of Access Control in Kubernetes (Subject, Object, Roles, RoleBinding)

Here is a visual to remember the core Access Control components in K8s and their relationship.

Subjects Authentication in Kubernetes

Now that we covered the basic three types of subjects in Kubernetes, it is time to discuss them more practically.

We will start with Users and Groups and how we create them, model them, and authenticate and authorize. Then will talk about applications and their access using Service account Tokens and Projected Service Account Tokens.

Users Authentication

In Kubernetes, users have three ways to authenticate:

Shared secrets
Public Key Infrastructure (PKI)
OpenID Connect (OIDC)

💡 Normal users cannot be added to a cluster through an API call.

We will discuss these in detail below.

1- Shared secret

A shared secret is a unique piece (or set) of information held by the client and validated by the server. An example of a shared secret:

Username and password (No longer available)
Access Token

An important property to note about a shared secret is that both server and client must know the same information (hence the name shared).

Username and Password:

At some point in Kubernetes, you could have passed a CSV file with username and password to the API server using -basic-auth-file flag while starting up your API-Server. Users then supply the credentials as a base64-encoded in the HTTP Basic Authorization header. I believe this option is no longer available. For a list of options, you can pass to the API server check here

Access Token:

Access tokens are static secret, and you can provide these access tokens to user mapping using a CSV file and passing it while starting the API server using the

-token-auth-file flag.

The CSV is in the following format token,user,uid,"group1,group2,group3"

Here is an example of an auth file:

💡 Currently, tokens last indefinitely, and the token list cannot be changed without restarting the API server.

To use a static access token for authentication you attach the access token to the Authorization header with the value Bearer <TOKEN>

2- PKI

The PKI model uses certificates and keys to uniquely identify and authenticate users to Kubernetes.

You should start your API server with -client-ca-file=SOMEFILE flag to enable this feature. The referenced file must contain one or more certificate authorities (CA) to validate client certificates presented to the API server. The common name of the subject is used to extract the username. Client certificates can also indicate a user's group memberships using the certificate's organization fields.

Here is an example of generating a certificate using OpenSSL.

💡 This is just to demo, so make sure you check K8s guide to creating certs

This would create a CSR for the username "obanby", belonging to two groups, "group1" and "group2".

Now you can provide the generated certificate to the API server using

-client-ca-file=obanby.pem. To see how the cert headers you can use

openssl x509 -in obanby.pem -text. this will show you the cert details

To automate this process, you can generate the certs and programmatically supply them to K8s. You can automate this process end to end assuming you can:

Validate the identity of the user requesting a new cert (SSO, for example)
Your automation tool has sufficient permissions to generate the request manifest and apply it to k8s

Reference Certificate Signing requests in K8s documentation here for practical details.

💡 Certificates provisioned through the Kubernetes Certificate Signing Request cannot be revoked before expiry. There is currently no support for certificate revocation lists of Online Certificate Status Protocol (OSCP) stapling in Kubernetes

3- OIDC

This is a pretty important topic, and essentially OIDC allows you to connect to K8s by delegating authentication to a third party like AWS IAM, Github, etc.

I might write another blog post about OIDC in depth. For now, here is a good one hour video that explains Oauth2 and OIDC in-depth

Kubernetes documentation did an excellent job explaining OIDC; here is the request flow diagram and highlights from their blog.

Login to your identity provider
Your identity provider will provide you with an access_token, id_token and a refresh_token
When using kubectl, use your id_token with the -token flag or add it directly to your kubeconfig
kubectl sends your id_token in a header called Authorization to the API server
The API server will make sure the JWT signature is valid by checking against the certificate named in the configuration
Check to make sure the id_token hasn't expired
Make sure the user is authorized
Once authorized the API server returns a response to kubectl
kubectl provides feedback to the user

Check out k8s documentation here on how to use OpenID Connect (OIDC)

💡 Each provider uses OIDC a little differently, so check your cloud provider documentation. For example, if you use a managed EKS, you can check the docs here. Typically most cloud utils like awscli or gcloud configure the returned token from OIDC in the .kube/config file.

Groups Authentication

Everything mentioned in the user section applies to groups since users and groups are not objects in Kubernetes and are merely a string representation of a named entity as described in the Kubernetes Subjects section of the Principal components.

Impersonation in Kubernetes

Kubernetes have impersonation capabilities. Impersonation means that you can act as another user/group. You can only impersonate another user if you have impersonation privileges.

A user can act as another user through impersonation headers attached to the request. Client tools like kubectl support this behaviour by passing the --as or

--as-group flag.

Impersonation allows for an interesting pattern, the sudo pattern! In Linux, we often have to run operations like chown or chmod, which usually require admin privileges. To do that, we typically run sudo chown or sudo chmod, which allows us to impersonate the admin user.

Here is how to create a similar behaviour in Kubernetes.

To reiterate, I would like to have my DevOps users view the cluster resources but can only do a potentially risky operation (like deleting an ingress) if they impersonate DevOps-admin.

Steps:

First, we need to assume that your authentication will end up identifying the user as part of “devops” group.
Create and ClusterRoleBinding to tie the group with the view access 💡 In Kubernetes there are default Cluster Roles, among them are the cluster-admin, and view.

Now that we have the binding we need to give the “devops” group impersonation privileges. 💡 when you think adding privileges automatically think creating a new role or editing an existing one. The idea is privileges == roles.

By now, we have the devops group's primary role is equal to view, and we

created the ability for the devops members to run kubectl -as=devops-admin. However, we don't have the user "devops-admin" 💡 notice how we specified devops-admin as a user instead of a group. The explanation is in the next bullet point

Now we need to give devops-admin role the privileges of being an admin

Notice that "devops-user" is just a string. It could be named anything. It is not a predefined user or group. The idea is that we don't assign the admin rights directly to the "devops" group. However, we set the admin permissions to the "devops-admin" user. Rember we have allowed the "devops" group to impersonate the "devops-admin." This setup will create the sudo like behaviour we desire.

Testing your Kubernetes Permissions

So now that we have seen impersonation and such, we should wonder if there is a way we can test this without causing havoc!

Kubernetes has an elegant feature implemented in Kubernetes go-client and in kubectl. This feature is called can-i. As the name implies, you can ask it if you have permission to perform the action you want, and it answers with either a yes or no.

for example, if we were to have the impersonation setup and we would like to test it, we would run

which also reads pretty nicely! 😉

Applications authentication

Since we discussed authentication patterns for human clients in Kubernetes. Let's now discuss how applications can authenticate.

Since we discussed authentication patterns for human clients in Kubernetes, let's now discuss how applications can authenticate.

Here are the three use cases where an application needs to authenticate

A workload is authenticating to obtain appropriate access to the Kubernetes API itself. This is an everyday use case for custom controllers that need to watch and act on Kubernetes resources.
A workload is authenticating to another workload within the cluster; Potentially establishing mutual authentication between them for additional security.
A workload is authenticating to external services. An external service could be anything outside the cluster but could be a vendor service running on AWS, GCP, etc.

I will only discuss a few approaches, shared secret, Service Account Token and Projected Service Account Tokens. After that, I will refer to more resources if you are interested in learning more about the other techniques.

For reference, there are five ways for an application to authenticate

Shared Secret
Service Account Token
Projected Service Account Token
Network Primitives
Platform Mediated Node Identity

Shared Secret

A shared secret is a unique piece (or set) of the information held by the calling application and the server. This method requires that both client and server access that combination in some form. For example, think of how a backend authenticates with a database.

It would be best if you also had a way to easily rotate (change) secret credentials in the case of a compromise. Again you need to ensure that secrets are distributed to all calling applications and kept in sync.

Here are the pain points of this approach:

It suffers from the secure introduction problem.
Secret lifecycle management is challenging.

Here is a brief description of the secure introduction problem:

it highlights that to securely get the first secret from an origin to a destination. You need to already have some authentication in place. This statement sounds a lot like the chicken or the egg problem. and it raises the following questions:

How does a secret a client (user or application)
How to prove that it is the legitimate recipient of a secret to acquiring a token?
How can you avoid persisting raw token values during your secure introduction?

read more about the secure introduction problem here.

You should use a secret management platform; I used HashiCorp Vault in many organizations, but I have seen clean secret management solutions using cloud providers’ toolings.

Usage of Shared Secret is quite simple though, you create a Kubernetes secret, and you mount it your pod. Otherwise, as mentioned, you can use a secret management tool.

Service Account authentication

In each namespace, there is a default service account. The default service account is attached and mounted when you deploy a pod without specifying the service account.

I hope you noticed that I said: “mounted”!

Kubernetes Service accounts are just JSON Web Tokens (JWT) stored as a secret in the cluster. Like any secret in K8s, this secret gets mounted to a location in the pod filesystem that can be read and utilized. Then this token is evaluated for the permissions it has by the API and therefore either authorized or denied.

If you are curious about the location where this secret gets mounted, it is /var/run/secrets/kubernetes.io/serviceaccount/

here is an example of a service account

This service account has no permission whatsoever, and indeed we need to create a role with the privileges we want and a role binding to allow this service account to act as the defined role.

Projected Service Account Token Volume (aka PSAT)

You must start the Kubernetes API with the following flags to use this feature.

issues with regular service accounts:

The JSON Web Tokens (JWTs) used by service accounts are not audience bounded (explained a bit later). A service account user can masquerade as another user and launch masquerade attacks.
The service account token is stored in a Secret and delivered to the corresponding node as a file. A service account may be granted advanced permissions when a powerful system component is running. This results in a broad attack surface for the Kubernetes control plane. Attackers can obtain the service account used by this system component to launch privilege escalation attacks.
JWTs are not time-bound. A compromised JWT in the attacks mentioned above stays valid for as long as the service account exists. You can mitigate the issue only through service account signing key rotation, which is not supported by client-go and not automated by the control plane. Therefore, a complex operations process is required.
A Kubernetes secret must be created for each service account. This may put strains on elasticity and capacity in large-scale workload deployments.

Projected Service Account Token (PSAT) solves this issue by using a volume plugin in the Kublet. The secret mounts as a volume, which automatically allows the Kublet to manage the secret inside the pod. PSAT also allows for specifying an audience (more on that shortly).

Here is a list of benefits of PSAT:

Tight integration with the platform (Kubernetes).
Provides identity in a well-understood and consumable format (JWT).
Invalidated once the service account / Secret is deleted.
Scoped to individual Pods.
Configurable TTL.
Configurable audience.
Not persisted in the Kubernetes Secrets API.
Tokens are rotated before expiry automatically by the Kubelet (if using projection).

So what about that audience thing?

The audience field will refer to the service that will consume the generated token. This means that the pod will use this particular token to talk to only a selected number of services in the cluster. Therefore, if this token were to be compromised and run against another service, it would not be able to since it can only run against a target audience.

For example, suppose we configured the audience to be vault. In that case, if this token was compromised and the user/application tried to issue a request to Kubernetes API, it will fail because the JWT audience and signature don't match.

💡Note I tried to find more on this topic, but it's still relatively new and not widely adopted.

Other workload identity options

Network Primitives
Platform Mediated Node Identity

both of these are out of the scope of this post since this is getting way too long now! I will point you to this wonderful blog post where it goes more in-depth about workload identity

Last words

I hope this blog post clarifies any confusion around IAM in Kubernetes.

IAM is a big topic, so give this another read and make sure to read the resources to get the whole picture.

If you like this post, please give it a share, and let me know if you want more or if you see a mistake 😊

follow me on Twitter @omarelbanby!

References and resources

Production Kubernetes [Book]
Managing Kubernetes [Book]
Access Control and Identity Management, 3rd Edition [Book]
Workload Identity https://tanzu.vmware.com/developer/guides/platform-security-workload-identity/ [vmware blog post]
Enable service account token volume projection https://www.alibabacloud.com/help/doc-detail/160384.htm [AliBaba’s blog post]
K8s authentication https://kubernetes.io/docs/reference/access-authn-authz/authentication/#service-account-tokens [k8s blog post]
K8s authorization https://kubernetes.io/docs/reference/access-authn-authz/authorization/ [k8s blog post]
K8s RBAC authorization https://kubernetes.io/docs/reference/access-authn-authz/rbac/ [k8s blog post]
K8s Certificate Signing Requests https://kubernetes.io/docs/reference/access-authn-authz/certificate-signing-requests/ [k8s blog post]
k8s Configure Service Accounts for Pods https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/ [k8s blog post]
k8s Kube-api server https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver/ [k8s blog post]
K8s Object Names and IDs https://kubernetes.io/docs/concepts/overview/working-with-objects/names/ [k8s blog post]
How to display SSL cert content https://support.qacafe.com/knowledge-base/how-do-i-display-the-contents-of-a-ssl-certificate/ [blog post]
Ouath2 and OpenIdConnect video https://www.youtube.com/watch?v=996OiexHze0&ab_channel=OktaDev [video]
Understanding service account token volume projection in Kubernetes https://mohammad-ayub.medium.com/understanding-service-account-token-volume-projection-in-kubernetes-15d5623e7cc7 [blog post]
Secure Introduction of Vault Clients https://learn.hashicorp.com/tutorials/vault/secure-introduction [vault tutorial]