3.0 - Configuration¶

3.1 - Prerequisites: Commands and Arguments in Docker¶

Note: This is not a requirement for the CKAN curriculum.
Consider a simple scenario:
Run a docker container via an ubuntu image: docker run ubuntu
Runs an instance of the Ubuntu image and exits immediately, noted upon execution of docker ps -a
This occurs because containers aren't designed to host an OS, but instead to run a specific task or process.
Example: host a web server or database
So long as that process stays active, so does the container.
If the service stops or crashes, the container exits.
A Dockerfile with CMD ["bash"] defined doesn't work as this is not a command, but a CLI instead.
When the container runs, it runs Ubuntu and launches bash
In general, Docker doesn't attach a terminal to a container when it's ran.
Bash cannot find a terminal and the container exits as the process finishes/fails.
To solve, one can append commands to the docker run command e.g.
docker run ubuntu sleep 5
Similarly, in a Dockerfile
CMD <command> <param1>
or in JSON:
CMD ["command", "parameter"]
To build new image and run: docker build -t <image name> .
To run: docker run <image name>
To use the command but with a parameter value subject to change, change CMD to ENTRYPOINT i.e.:
ENTRYPOINT ["command"]
Any parameters specified on the CLI will automatically be appended to the entrypoint command.
If using entrpoint and a command parameter isn't specified, an error is likely to occur, a default value should therefore be provided.
Therefore, ENTRYPOINT and CMD should be used together.
Example:

ENTRYPOINT ["command"]

CMD ["parameter"]

From this configuration, if no additional parameter(s) is provided, the CMD parameter will be provided.
Any parameter on the CLI will override the CMD parameter
To override the entrypoint: docker run --entrypoint <new command> <image name>
Note: ENTRYPOINT and CMD values should be expressed in a JSON format.

3.2 - Commands and Arguments in Kubernetes¶

The Ubuntu sleeper image can be defined in a YAML file for Kubernetes similar to the following:

apiVersion: v1
kind: Pod
metadata:
  name: <pod name>
spec:
  containers:
  - name: <container name>
    image: <image name>:<image tag>
    command: ["<command>"]
    args: ["<arg1>"]

To add anything to be appended to the docker run command, one adds the aargs attribute to the container spec.
The pod can then be created through standard means such as kubectl create -f <filenmame>.yaml
The Dockerfile's entrypoint is overwritten by the command atribute.
Note: You cannot edit specifications of a pre-existing pod aside from:
containers
initContainers
activeDeadlineSections
tolerations
Environmental variables, in general, cannot be edited, as well as service accounts and resource limits.
If editing is required of these parameters, 2 methods are advised:
kubectl edit pod <pod name>
- Edit the properties desired
- As outlined above, certain properties cannot be edited whilst a pod is "live" - if this happens, the requested changes to the YAML will be saved as a temporary definition.
- Delete the original pod kubectl delete pod <pod name>
- Recreate the pod from the temp definition file: kubectl create -f <temp filename>.yaml
Extract the YAML of the pod via kubectl get pod <pod name> -o yaml > <file name>.yaml
- Open the extracted YAML and edit as appropriate e.g. vim <filename>.yaml
- Delete the original pod and recreate similar to the latter 2 steps of method 1.
Note: If part of a deployment, change any instance of pod in the commands above to deployment.
Any changes to deployments are automatically applied to the pods within.

3.5 - Environment Variables¶

For a given definition file, one can set environment variables via the env: field in containers spec.
Each environment variable is an array entry, with a name and value associated:

env:
- name: <ENV NAME>
  value: <VALUE>

Environment variables may be set via 1 of 3 ways (primarily):
Key-value pairs (above)
ConfigMaps
Secrets
The latter two are implemented in a similar manner to the following respective examples:

env:
- name: <ENV NAME>
  valueFrom:
    configMapKeyRef:

env:
- name: <ENV NAME>
  valueFrom:
    secretKeyRef:

3.6 - ConfigMaps¶

When there are multiple pod definition files, it becomes difficult to manage environment data.
This information can be removed from the definition files and managed centrally via ConfigMaps
Used to pass configuration data as key-value pairs in Kubernetes
When a pod is created, one can inject the ConfigMap into the pod.
The key-value pairs are available to the pod as environment variables for the application within the pod.
Configuring ConfigMaps involves 2 phases:
Create the ConfigMap
Inject it to the pod.
Creation is achieved through standard means: kubectl create configmap <configmap name>
Or if a YAML file exists: kubectl create -f <filename>.yaml
Via the first method, one can automatically pass in key-value pairs:
kubectl create configmap <configmap name> --from-literal=<key>=<value> --from-literal=<key2>=<value2>
Multiple uses of the --from-literal=<key>=<value> allows multiple variables to be added.
Note: This becomes difficult when too many config items are present.
One can also create ConfigMaps from a file e.g.
kubectl create configmap <configmap name> --from-file=/path/to/file

Creating a ConfigMap via Declaration¶

Create a definition file:

apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
data:
  APP_COLOR: blue
  APP_MODE: prod

To create from the above: kubectl create -f <filename>.yaml
Can create as many ConfigMaps and required, just ensure they are named appropriately.
View ConfigMaps via kubectl get configmaps
Get detailed information of a ConfigMap via kubectl describe configmap <configmap name>
Configuring a pod with a ConfigMap:
In a pod definition file, under the containers in spec, add envFrom: list property.
Each item in the resultant list corresponds to a ConfigMap item.
Example usage:

envFrom:
- configMapRef:
    name: app-config

Can apply the config file and pod definitions with the kubectl create -f <pod definition>.yaml

Summary¶

ConfigMaps can be used to inject environmental variables into pods
Could also inject the data as a file or via a volume

env¶

envFrom:
- configMapRef:
    name: <configmap key name>

single env¶

env:
- name: <env name>
  valueFrom:
    configMapKeyRef:
      name: <configmap name>
      key: <configmap key name>

Volumes¶

volumes:
- name: app-config-volume
  configMap:
    name: app-config

3.8 - Secrets¶

Considering a simple python server:
The hostname username and passwords are hardcoded in bad practice => high security risk.
It would be better to store this data as a ConfigMap based on previous discussion - the problem though is that ConfigMap data is stored in a plaintext format.
Not applicable for sensitive info like passwords
Variables like username and passwords are better stored as secrets in Kubernetes.
These are similar to ConfigMaps, but the values are stored in encrypted format.
Analogous to ConfigMaps, there are 2 steps:
Secret Creation
Inject secrets to a pod.
Secret creation is achieved either imperatively or declaratively:
Declarative: Use a YAML definition file to "declare" the desired configuration
Imperative: Use the kubectl create secret command to "imply" Kubernetes to create a secret, and let Kubernetes figure out the configuration desired.

Imperative Secret Creation¶

kubectl create secret generic <secret name> --from-literal=<key>=<value>
As with ConfigMaps, data can be specified from the CLI in key-value-pairs via the --from-literal flag multiple times.
Example: kubectl create secret generic app-secret --from-literal=DB_HOST=mysql --from-literal=DB_USER=root --from-literal=DB_PASSWORD=password
For larger amounts of secrets, the data can be imported from a file, achieved by using the --from-file flag.
Example: kubectl create secret generic app-secret --from-file=app-secret.properties

Declarative Secret Creation¶

Using a YAML definition file:

apiVersion: v1
kind: Secret
metadata:
  name: app-secret
data:
  DB_HOST: mysql
  DB_USER: root
  DB_PASSWORD: password

As discussed, the secrets should not be stored in plaintext string format. Typically, Kubernetes secrets are stored in Base64 encrypted format.
To convert: echo -n '<secret plaintext value>' | base64
Create the secret via kubectl create -f .... as normal
Secrets can be viewed via: kubectl get secrets
Detailed information viewed via: kubectl describe secrets <secret name>
To view secret in more detail: kubectl get secret <secret name> -o yaml
To decode secret: echo -n '<secret base64 value>' | base64 --decode

Secrets in Pods¶

With both a pod and secret YAML file, the secret data can be injected as environment variables:

spec:
  containers:
  - envFrom:
    - secretRef:
        name: <secret name>

When kubectl create -f ... is run, the secret data is available as environment variables in the pod.
As before, one can inject secrets to pods via environment variables (above) OR a single environment variable (below):

spec:
  containers:
  - env:
    - name: DB_PASSWORD
      valueFrom:
        secretKeyRef:
          name: app-secret
          key: DB_PASSWORD

Secrets in Volumes¶

Secrets can also be added as volumes attached to pods:

volumes:
- name: app-secret-volume
  secret:
    secretName: app-secret

If mounting the secret as a volume, each attribute in the secret is created as a file, with the value being the content.

Additional Notes¶

Secrets are encoded in base64 format -> this can easily be decoded as they're not encrypted!
It's thought that secrets are a safer option, but this is only in the sense that "it's better than plaintext".
It's the practices regarding secret storage that makes them safer, including:
Not checking-in secret object definition files to source code repositories
Enabling encryption-at-rest for secrets
Kubernetes takes some actions to ensure safe handling of secrets:
A secret is only sent to a node if a pod on said node requires it
Kubelet stores the secret into a temporary file storage so it's not persisted to a disk.
Once a pod is deleted, any local copies of secrets used by that pod are deleted.
For further improved safety regarding secrets, one could also use tools such as Helm Secrets and HashiCorp Vault.

Encrypting Secrets at Rest¶

Additional guidance can be found in the Kubernetes documentation.
Encryption at rest is determined by configuring the kube-apiserver and the etcd server.

Secret Storage in ETCD¶

Creating a sample secret in Kubernetes: kubectl create secret generic secret1 -n default --from-literal=mykey=mydata
The secret can be read from etcd via the etcdctl utility:

ETCDCTL_API=3 etcdctl \
   --cacert=/etc/kubernetes/pki/etcd/ca.crt   \
   --cert=/etc/kubernetes/pki/etcd/server.crt \
   --key=/etc/kubernetes/pki/etcd/server.key  \
   get /registry/secrets/default/secret1 | hexdump -C

Without encryption at rest, the above command would return the secret value in plaintext format, this is a huge security risk.

Enabling Encryption at Rest¶

The kube-apiserver process accepts the flag --encryption-provider-config to determine API data encryption in ETCD.
To check if it's enabled: ps -aux | grep kube-api | grep "encryption-provider-config" OR examine the kube-apiserver.yaml manifest for the same flag.
If not enabled, one can define an EncryptionConfiguration YAML manifest to attach to this flag.
An example (not suitable for production) from the documentation follows:

---
##
## CAUTION: this is an example configuration.
##          Do not use this for your own cluster!
##
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
  - resources:
      - secrets
      - configmaps
      - pandas.awesome.bears.example # a custom resource API
    providers:
      # This configuration does not provide data confidentiality. The first
      # configured provider is specifying the "identity" mechanism, which
      # stores resources as plain text.
      #
      - identity: {} # plain text, in other words NO encryption
      - aesgcm:
          keys:
            - name: key1
              secret: c2VjcmV0IGlzIHNlY3VyZQ==
            - name: key2
              secret: dGhpcyBpcyBwYXNzd29yZA==
      - aescbc:
          keys:
            - name: key1
              secret: c2VjcmV0IGlzIHNlY3VyZQ==
            - name: key2
              secret: dGhpcyBpcyBwYXNzd29yZA==
      - secretbox:
          keys:
            - name: key1
              secret: YWJjZGVmZ2hpamtsbW5vcHFyc3R1dnd4eXoxMjM0NTY=
  - resources:
      - events
    providers:
      - identity: {} # do not encrypt Events even though *.* is specified below
  - resources:
      - '*.apps' # wildcard match requires Kubernetes 1.27 or later
    providers:
      - aescbc:
          keys:
          - name: key2
            secret: c2VjcmV0IGlzIHNlY3VyZSwgb3IgaXMgaXQ/Cg==
  - resources:
      - '*.*' # wildcard match requires Kubernetes 1.27 or later
    providers:
      - aescbc:
          keys:
          - name: key3
            secret: c2VjcmV0IGlzIHNlY3VyZSwgSSB0aGluaw==

As per usual, the first 2 lines determine the apiVersion and kind. From here, the general format is to list in arrays the resources and the provider(s) to be used in association with encryption and decryption.
Under providers, the first provider is for encryption only, any providers listed after that for the given resources are suitable for decryption only.
There are multiple providers available, such as identity (no encryption), kms, secretbox, etc.
Creating an example encryption configuration file, which leverages the aesbc encryption provider:

---
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
  - resources:
      - secrets
      - configmaps
      - pandas.awesome.bears.example
    providers:
      - aescbc:
          keys:
            - name: key1
              # See the following text for more details about the secret value
              secret: <BASE 64 ENCODED SECRET>
      - identity: {} # this fallback allows reading unencrypted secrets;
                     # for example, during initial migration

The secret for the aescbc provider can be a random 32-byte key encoded in base64 - head -c 32 /dev/urandom | base64.
Saving the encryption config file to a given path, it will then need to be mounted to the kube-apiserver static pod, achieved by editing the kube-apiserver manifest, adding the --encryption-provider-config=/path/to/config flag, and mounting the config's location as a volume.

---
##
## This is a fragment of a manifest for a static Pod.
## Check whether this is correct for your cluster and for your API server.
##
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubeadm.kubernetes.io/kube-apiserver.advertise-address.endpoint: 10.20.30.40:443
  creationTimestamp: null
  labels:
    app.kubernetes.io/component: kube-apiserver
    tier: control-plane
  name: kube-apiserver
  namespace: kube-system
spec:
  containers:
  - command:
    - kube-apiserver
    ...
    - --encryption-provider-config=/etc/kubernetes/enc/enc.yaml  # add this line
    volumeMounts:
    ...
    - name: enc                           # add this line
      mountPath: /etc/kubernetes/enc      # add this line
      readOnly: true                      # add this line
    ...
  volumes:
  ...
  - name: enc                             # add this line
    hostPath:                             # add this line
      path: /etc/kubernetes/enc           # add this line
      type: DirectoryOrCreate             # add this line
  ...

As a static pod is being edited, the API Server should restart automatically, if it fails to do so, troubleshoot accordingly.
These steps should be repeated per control plane node, and for verification, the original steps with etcdctl can be repeated.
Note: To ensure all relevant data is encrypted, including that which is already stored, run the following command to update them with the relevant config as an administrator: kubectl get secrets --all-namespaces -o json | kubectl replace -f -

3.11 - Docker Security¶

Consider a host with Docker running on it.
The host will be running processes such as the Docker Daemon
Containers aren't completely isolated from their host -> they share the same kernel
In general, containers are separated by namespaces (Linux)
All processes run by a container are run by the host, just in their own namespace
The container can only see its own process via ps aux
In the host, all processes are visible, container processes have differing IDs depending on their namespace

Security: Users¶

Docker hosts can have root users as well as non-root users
By default, Docker runs processes in containers as the root user.
True within and outside the container
To edit the default user for the container, use docker run in a similar manner to: docker run --user=<username> <container> <command>
To enforce security, one can add USER value to the Dockerfile e.g. USER 1000
This automatically defines the user when the container is built and run.
When running a container that defaults to the root user, Docker takes measures to prevent the root user from taking unnecessary actions via Linux Capabilities.

Linux Capabilities¶

Listed in /usr/include/linux/capability.h
In containers, Docker applies limited capabilities by default
To override, add --cap-add <CAPABILITY NAME> to the docker run command
To remove: --cap-drop <CAPABILITY NAME> in a similar manner
For all capabilities, add --privileged -> not recommended!

3.12 - Security Contexts¶

Container security may be configured by adding or specifying users and their associated capabilities in the docker run command.
Similar settings may also be handled via Kubernetes.
As containers are hosted within pods on Kubernetes, one can either configure security at the pod or container level.
If configured at pod level, any changes will automatically be applied to the containers within.
If configured at container level, these settings will override anything defined at pod level.

Security Context¶

To configure security in the definition file, add the securityContext: attribute
User is set by runAsUser: <user ID>
To configure at container-level, add the same fields to the containers list
To add capabilities, add capabilities:, then in a dictionary, add add: ["<CAPABILITY ID>", .... ].

apiVersion: v1
kind: Pod
metadata:
  name: pod
spec:
  #securityContext:
  #    runAsUser: 1000
  containers:
  - name: ubuntu
    image: ubuntu
    command: ["sleep", "3600"]
    securityContext:
      runAsUser: 1000
      capabilities:
        add: ["MAC_ADMIN"]

Note: Capabilities are only supported at container-level, not pod-level.

3.14 - Service Account¶

A service account links to securIty concepts such as authorization and RBAC, etc.
One of 2 account types in Kubernetes, the other being a user account:
User accounts are used by humans e.g. development accounts.
Service accounts are those used by applications to interact with Kubernetes.
For an app to use/interact with the Kubernetes API, it needs to authenticate via service accounts.
Creation via: kubectl create serviceaccount <serviceaccount name>
View service accounts via kubectl get serviceaccount
When a `serviceaccount`` is created, it automatically creates a service token to be used for authentication.
Token can be viewed (along with other details) via kubectl describe serviceaccount <serviceaccount name>
Token is stored as a Kubernetes secret by default, it can be viewed via kubectl describe secret <secret ID>
Suppose the app using the service account is already part of a Kubernetes cluster.
One can mount the service token secret as a volume inside the pod
This allows it to be easily accessible by the application
The default service account and its corresponding token is automatically mounted as a volume to the pod.
In the path of the mount, 3 files are stored, detailing:
Namespace
Token
ca.crt
The above can be viewed for a given service by kubectl exec -it <service id> ls /var/run/secrets/kubernetes.io/<serviceaccount>
Note: The default service account is restricted to basic operations.
Automatically created and mounted.
If you wish to switch it, edit the pod definition file to add serviceAccount: under the spec:, then delete and recreate the pod.
Changes like this are automatically applied if editing a deployment.
Note: One can avoid automatic service account association via the addition of automountServiceAccountToken: false
Example:

apiVersion: v1
kind: Pod
metadata:
  name: my-pod
spec:
  serviceAccountName: build-robot
  automountServiceAccountToken: false

Service Accounts Updates - 1.22/1.24¶

All namespaces have a default service account with its own secret, when a pod is created, this service account is automatically associated with the pod, and the secret is mounted to a given location.
As a result of this, a process within the pod can query the Kubernetes API using the mounted token.
Checking the location where the secret is mounted, three files are found:
ca.crt
namespace
token - the ServiceAccount token.
The token can be decoded via jq (or some other means e.g. jwt.io) via the following command: jq -R 'split(".") | select(length > 0) | .[0],.[1] | @base64d | fromjson' <<< < TOKEN>
The output shows that the token has no expiry date defined in the payload section - this poses problems.

v1.22 Notes on Bound Service Account Tokens¶

Kubernetes already provisions JWTs to workloads, a functionality that is enabled by default, and therefore widely deployed - leading to the following problems:
These JWTs are not audience-bound -> Anyone with the JWT can pretend to be an authenticated user
The current model of storing the service account token in a secret delivered to nodes provides a large attack surface to the control plane nodes.
The JWTs are not time-bound, any JWT Compromised via the previous 2 points are valid so long as the service account exists.
Each JWT requires a Kubernetes secret -> Not Scalable
To overcome this, the TokenRequestAPI was introduced, this generates tokens which are Audience, Time and Object bound.
Since its introduction, when a new pod is created, it no longer requires on the token from the service account, instead, a token with a particular lifetime is created by the TokenRequestAPI via the ServiceAccount Admission controller.
This token is then mounted as a projected volume into the pod.

apiVersion: v1
kind: Pod
metadata:
  name: nginx
  namespace: default
spec:
  containers:
  - image: nginx
    name: nginx
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-<random string>
  volumes:
  - name: kube-api-access-<random string>
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downnwardAPI:
          items:
          - fieldRed:
              apiVersion: v1

v1.24 Enhancements¶

This version included changes to reduce the amount of secret-based service account tokens.
Previously, upon serviceaccount creation, a secret token that had no expiry and was unbound, was automatically created and mounted into pods upon their creation.
In v1.24, the serviceaccount token creation is no longer automatic, instead it must be created separately after serviceaccount creation i.e.:
kubectl create serviceaccount <serviceaccount name>
kubectl create token <serviceaccount name>
If this token is decoded via means defined previously, an expiry date can be seen in the payload.
The secrets can still be created with no expiry if desired, however this is REALLY not advised:

apiVersion: v1
kind: Secret
type: kubernetes.io/service-account-token
metadata:
  name: <secret name>
  annotations:
    kubernetes.io/service-account.name: <service account name>

3.17 - Resource Requirements¶

Consider a 3-node setup, each has a set amount of resources available i.e.:
CPU
Memory
Disk Space
The Kubernetes scheduler is responsible for allocating pods to nodes
To do so, it takes into account the node's current resource allocation and the resources requested by the pod.
If no resources are available, the scheduler will hold the pod back for release
Kubernetes automatically assumes a pod or container within a pod will require at least:
0.5 CPU Units
256Mi Memory
If the pod or container requires more resources than allocated above, one can configure the pod definition file's spec, in particular, add the following under the containers list:

resources:
  requests:
    memory: "1Gi"
    cpu: 1

Resource - CPU¶

Can be set from 1m (1 micro) to as high as required / supported by the host system.
1 CPU = 1 AWS vCPU = 1 GCP Core = 1 Azure Core = 1 Hyperthread

Resources - Memory¶

Allocate within any of the following suffix for the givne purpose and the system's capabilities:

Memory Metric	Shorthand Notation	Equivalency
Gigagbyte	G	1000M
Megabyte	M	1000K
Kilobyte	K	1000 Bytes
Gigibyte	Gi	1024Mi
Mebibyte	Mi	1024Ki
Kilibyte	Ki	1024 Bytes

Docker containers have no limit to the resources they can consume
When only running on a node, it can only use a maximum of 1vCPU unit - if the limits need changing, update the pod definition file:

resources:
  requests:
    memory: <value and unit>
    cpu: <value and limit>
    ....
  limits:
    memory: < value and unit>
    cpu: <number>

The limits and requests can be set for each pod and container
If CPU overload occurs, CPU usage is "throttled" on the node so it does not go beyond the limit.
If repeated memory use is exceeded for an extended period of time, the pod is terminated with an OOM (Out of Memory) error.

Default Behavior¶

By default, Kubernetes doesn't set CPU or Memory limits, pods would be able to use as much of these resources as they like until it stops other pods from functioning.
Rather than just putting blanket limits or requests in place, it's often advised to tailor the requests and limits per workload.
Alternatively, you could put in requests without limits, ensuring that each pod is guaranteed the minimum resources required, but this doesn't prevent resource quota throttling.

Limit Ranges¶

To ensure all pods are created with particular limits by default, one can create a LimitRange object, both CPU and Memory constraints can be defined by a single LimitRange:

apiVersion: v1
kind: LimitRange
metadata:
  name: cpu-resource-constraint
spec:
  limits:
  - default:
      cpu: 500m
    defaultRequest:
      cpu: 500m
    max:
      cpu: "1"
    min:
      cpu: 100m
    type: Container

Quotas can be set for a namespace itself by creating a ResourceQuota object - these are namespace-scoped:

apiVersion: v1
kind:  ResourceQuota
metadata:
  name: my-resource-quota
spec:
  hard:
    requests.cpu: 4
    requests.memory: 4Gi
    limits.cpu: 10
    limits.memory: 10Gi

3.19 - Taints and Tolerations¶

Used to set restrictions regarding what pods can be scheduled on a node.
Consider a cluster of 3 nodes with 4 pods preparing for launch:
The scheduler will place the pods across all nodes equally if no restriction applies
Suppose now only 1 node has resources available to run a particular application:
A taint can be applied to the node in question; preventing any unwanted pods from being scheduled on it.
Tolerations then need to be applied to the pod(s) to specifically run on node 1
Pods can only run on a node if their tolerations match the taint applied to the node.
Taints and tolerations allow the scheduler to allocate pods to required nodes, such that all resources are used and allocated accordingly.
Note: By default, no tolerations are applied to pods.

Taints - Node¶

To apply a taint: kubectl taint nodes <nodename> key=value:<taint-effect>
The key-value pair defined could match labels defined for resources e.g app=frontend
The taint effect determines what happens to pods that are intolerant to the taint, 1 of 3 possibilities can be specified:
NoSchedule - Pods won't be scheduled.
PreferNoSchedule - Try to avoid scheduling if possible.
NoExecute - New pods won't be scheduled, and any pre-existing pods intolerant to the taint are stopped and evicted.

Tolerations - Pod¶

To apply a toleration to a pod, one can look at the definition file
In the spec section, add similar to the following:

...
spec:
  containers:
  ...
  tolerations:
  - key: app
    operator: "Equal"
    value: "blue"
    effect: "NoSchedule"

Be sure to apply the same values used when applying the taint to the node.
All values added need to be enclosed in " ".

Taint - NoExecute¶

Suppose Node1 is to be used for a particular application:
Apply a taint to node 1 with the app name and add a toleration to the pod running the app.
Setting the taint effect to NoExecute causes existing pods on the node that are intolerant to be stopped and evicted.
Taints and tolerations are only used to restrict pod access to nodes.
As there are no restrictions / taints applied to the other pods, there's a chance the app could still be placed on a different node(s).
If wanting the pod to go to a particular node, one can utilize node affinity.
Note: A taint is automatically applied to the master node, such that no pods can be scheduled to it.
View it via kubectl describe node kubemaster | grep Taint

3.20 - Node Selectors¶

Consider a 3-node cluster, with 1 node having a larger resource configuration:
In this scenario, one would like the task/process requiring more resources to go to the larger node.
To solve, can place limitations on pods
This can be done via the nodeSelector property in the definition file:

nodeSelector:
  size: node-label

NodeSelectors require the node to be labelled: kubectl label nodes <node name> <label key>=<key value>
When pod is created, it should be assigned to the labelled node so long as the resources allow it.

Limitations of NodeSelectors¶

NodeSelectors are beneficial for simple allocation tasks, but if more complex allocation is needed, Node Affinity is recommended, e.g. "go to either 1 of 2 nodes".

3.22 - Node Affinity¶

Node affinity looks to ensure that pods are hosted on the desired nodes
Can ensure high-resource consumption jobs are allocated to high-resource nodes
Node affinity allows more complex capabilities regarding pod-node limitation.
To specify, in the spec section of a pod definition file add in a new field:

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      matchExpressions:
      - key: size
        operator: In
        values:
        - Large

Note: For the example above, the NotIn operator could also be used to avoid particular nodes.
Note: If just needing a pod to go to any node with a particular label, regardless of value, use the Exists operator -> no values are required in this case.
Additional operators are available, with further details provided in the documentation.
In the event that a node cannot be allocated due to a label fault, the resulting action is dependent upon the NodeAffinityType set.

Node Affinity Types¶

Defines the scheduler's behavior regarding Node Affinity and pod lifecycle stages
2 main types available:
RequireDuringSchedulingIgnoredDuringExecution
PreferredDuringSchedulingIgnoredDuringExecution
Other types are to be released such as requiredDuringSchedulingRequiredDuringExecution
Considering the 2 available types, can break it down into the 2 stages of a pod lifecycle:
DuringScheduling -> The pod has been created for the first time and not deployed
DuringExecution
If the node isn't available according to the NodeAffinity, the resultant action is dependent upon the NodeAffinity type:
Required:
Pod must be placed on a node that satisfies the node affinity criteria
If no node satisfies the criteria, the pod won't be scheduled
Generally used when the node placement is crucial
Preferred:
Used if the pod placement is less important than the need for running the task
If a matching node not found, the scheduler ignores the NodeAffinity
Pod placed on any available node
Suppose a pod has been running and a change is made to the Node Affinity:
The response is determined by the prefix of DuringExecution:
- Ignored:
- Pods continue to run
- Any changes in Node Affinity will have no effect once scheduled.
- Required:
- When applied, if any current pods that don't meet the NodeAffinity requirements are evicted.

Taints and Tolerations vs Node Affinity¶

Consider a 5-cluster setup:
Blue Node: Runs the blue pod
Red Node: Runs the red pod
Green Node: runs the green pod
Node 1: To run the grey pod
Node 2: " "
Applying a taint to each of the colored nodes to accept their respective pod
Tolerances are then are applied to the pods
Need to apply a taint to node 1 and node 2 as the colored pods can still be allocated to nodes where they're not wanted.
To overcome, use Node Affinity:
Label nodes with respective colors
Pods end up in the correct nodes via use of Node Selector.
There's a chance that the unwanted pods could still be allocated e.g. the grey pods could still be scheduled on the colored nodes.
A combination of taints and tolerations, and node affinity must be used.
Apply taints and tolerations to present unwanted pod placement on nodes
Use node affinity to prevent the correct pods from being placed on incorrect nodes.