How to Ensure Cluster Reliability with ChaosMesh on Kubernetes

Updated on July 25, 2024
How to Ensure Cluster Reliability with ChaosMesh on Kubernetes header image

Introduction

Chaos engineering is a fault detection practice that involves intentionally injecting faults to test system reliability and resilience to identify potential failures before they result in outages. Chaos Engineering offers engineers a better understanding of how the system reacts to stress and can more efficiently reduce or eliminate failure.

Chaos Engineering benefits system reliability in many ways which include:

  • Weakness Identification: Chaos Engineering helps to identify weaknesses in a distributed system environment by intentionally inducing failures. This helps engineers to understand how the system behaves under stress and how to effectively recover from failures.
  • Downtime Reduction: Chaos Engineering identifies and addresses potential failure points that lead to the reduction of downtime occurrences and enhance system resilience to withstand unexpected conditions.
  • Improved Resilience: By conducting chaos experiments, engineers gain insights into system strengths and weaknesses to make informed decisions that improve resilience.

Chaos Mesh is an open-source chaos engineering platform that performs fault simulation and injection tasks in Kubernetes environments. Chaos Mesh supports multiple scenarios such as network latency, packet loss, pod deletion, CPU, and memory resource exhaustion to simulate real-world failure scenarios and evaluate a cluster's resilience in different circumstances.

This article explains how to install Chaos Mesh on Kubernetes using the Vultr Kubernetes Engine (VKE). You will run NetworkChaos experiments to inject transient failures into HTTP Kubernetes services using Kubectl and the Chaos Mesh dashboard.

Prerequisites

Before you begin:

Deploy Sample HTTP Services

To test the Chaos Mesh detection functionalities, follow the steps below to deploy two sample HTTP services http-server-first and http-server-second to inject with network failures within your cluster.

  1. Create a new Deployment YAML file http-server-first to specify the first HTTP service pod template.

    console
    $ nano http-server-first.yaml
    
  2. Add the following configurations to the file.

    yaml
    apiVersion: apps/v1
    kind: Deployment
    metadata:
     name: http-server-first
    spec:
     replicas: 3
     selector:
       matchLabels:
         app: http-server-first
     template:
       metadata:
         labels:
           app: http-server-first
       spec:
         containers:
         - name: http-server-first
           image: httpd:latest
           ports:
           - containerPort: 80
           env:
           - name: MESSAGE
             value: "Welcome to the first http server!"
    

    Save and close the file.

    The above deployment file creates a new HTTP service using the Apache httpd:latest image with the default output Welcome to the first http server!.

  3. Apply the deployment to your Kubernetes cluster.

    console
    $ kubectl apply -f http-server-first.yaml
    
  4. Create a new Deployment file to define the second HTTP service http-server-second pod template.

    console
    $ nano http-server-second.yaml
    
  5. Add the following configurations to the file.

    yaml
    apiVersion: apps/v1
    kind: Deployment
    metadata:
     name: http-server-second
    spec:
     replicas: 3
     selector:
       matchLabels:
         app: http-server-second
     template:
       metadata:
         labels:
           app: http-server-second
       spec:
         containers:
         - name: http-server-second
           image: httpd:latest
           ports:
           - containerPort: 80
           env:
           - name: MESSAGE
             value: "Welcome to our second http server!"
    

    Save and close the file.

  6. Apply the deployment to your cluster.

    console
    $ kubectl apply -f http-server-second.yaml
    
  7. View the cluster deployments and verify that the new resources are ready and available.

    console
    $ kubectl get deployment
    

    Output:

    NAME                                     READY   UP-TO-DATE   AVAILABLE   AGE
    http-server-first                        3/3     3            3           9m21s
    http-server-second                       3/3     3            3           9m16s
  8. Create a new Service YAML file to define how the sample deployments communicate within the cluster.

    console
    $ nano service.yaml
    
  9. Add the following configurations to the file.

    yaml
    apiVersion: v1
    kind: Service
    metadata:
      name: http-server-first
    spec:
      selector:
        app: http-server-first
      ports:
        - protocol: TCP
          port: 80
          targetPort: 80
    
    ---
    
    apiVersion: v1
    kind: Service
    metadata:
      name: http-server-second
    spec:
      selector:
        app: http-server-second
      ports:
        - protocol: TCP
          port: 80
          targetPort: 80
    

    Save and close the file.

    The above configuration creates a new Service resource for each of the HTTP applications running on the Internal port 80.

  10. Apply the resource to your cluster.

    console
    $ kubectl apply -f service.yaml
    

Expose the HTTP Services for External Access

To access the sample HTTP services outside the cluster, follow the steps below to install the Nginx Ingress Controller and enable an external Vultr Load Balancer IP address to use with the domains associated with the cluster services.

  1. Add the Nginx Ingress repository to your Helm sources.

    console
    $ helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
    
  2. Update the Helm repositories.

    console
    $ helm repo update
    
  3. Install the Nginx Ingress Controller to your cluster using Helm.

    console
    $ helm install nginx-ingress ingress-nginx/ingress-nginx --set controller.publishService.enabled=true
    
  4. Wait for at least 5 minutes for the Nginx Ingress Controller deployment process to complete. Then, view the cluster services to verify the external Load Balancer IP address assigned to the Nginx Ingress Controller service.

    console
    $ kubectl get service --namespace default nginx-ingress-ingress-nginx-controller
    

    Your output should look like the one below:

    NAME                                     TYPE           CLUSTER-IP     EXTERNAL-IP     PORT(S)                      AGE
    nginx-ingress-ingress-nginx-controller   LoadBalancer   10.98.94.144   192.0.2.102   80:32671/TCP,443:32325/TCP   2m58s

    Keep note of the external IP address value, for example, 192.0.2.102 to use with your domain records. For additional monitoring, verify that the assigned IP address matches your Vultr Load Balancer IP attached to the VKE cluster.

  5. Access your domain DNS configuration page and point your subdomain A records to the load balancer IP address. Replace 192.0.2.102 with your actual public IP address.

    • first.example.com:192.0.2.102
    • second.example.com:192.0.2.102
    • dashboard.example.com:192.0.2.102
  6. Create a new Ingress resource file ingress.yaml to expose the HTTP services.

    console
    $ nano ingress.yaml
    
  7. Add the following configurations to the file. Replace first.example.com and second.example.com with your actual domains.

    yaml
    apiVersion: networking.k8s.io/v1
    kind: Ingress
    metadata:
     name: kubernetes-ingress
     annotations:
       kubernetes.io/ingress.class: nginx
    spec:
     rules:
    
    - host: "first.example.com"
      http:
        paths:
    
      - pathType: Prefix
        path: "/"
        backend:
          service:
    
            name: http-server-first
            port:
              number: 80
    
    - host: "second.example.com"
      http:
        paths:
    
      - pathType: Prefix
        path: "/"
        backend:
          service:
    
            name: http-server-second
            port:
              number: 80
    

    The above Ingress configuration forwards all root requests from first.example.com and second.example.com to your HTTP services on port 80.

  8. Apply the Ingress resource to your cluster.

    console
    $ kubectl apply -f ingress.yaml
    
  9. View the cluster Ingress objects and verify that the new resource is available.

    console
    $ kubectl get ingress
    

    Output:

    NAME                 CLASS    HOSTS                                    ADDRESS         PORTS   AGE
    kubernetes-ingress   <none>   first.example.com,second.example.com                      80      3m
  10. To test the HTTP services, visit your first domain using a web browser such as Firefox or Chrome

    http://first.example.com

    First HTTP service

  11. Visit the second service domain in a new browser window to verify access to the HTTP service.

    http://second.example.com

Install Chaos Mesh to the Kubernetes Cluster

  1. Add the Chaos Mesh Helm repository to your sources.

    console
    $ helm repo add chaos-mesh https://charts.chaos-mesh.org
    
  2. Update the Helm repository index.

    console
    $ helm repo update
    
  3. Create a new Chaos Mesh namespace chaos-mesh in your cluster.

    console
    $ kubectl create ns chaos-mesh
    
  4. Create a new values.yaml file to customize the Chaos Mesh configuration by overriding the default values.

    console
    $ nano values.yaml
    
  5. Add the following configurations to the file.

    yaml
    chaosDaemon:
    
     runtime: containerd
    
     socketPath: /run/containerd/containerd.sock
    

    The above configuration overrides the default Chaos Mesh cluster details with the following values:

    • chaosDaemon: Defines the Chaos Mesh service name.
    • runtime: containerd: Specifies the chaosDaemon runtime environment.
    • socketPath: /run/containerd/containerd.sock: Specifies the Unix socket path used by the chaosDaemon component for communications with the container runtime.
  6. Deploy Chaos Mesh to the cluster using the custom values.yaml configurations.

    console
    $ helm install chaos-mesh chaos-mesh/chaos-mesh -n=chaos-mesh -f values.yaml
    
  7. View all Pods in the Chaos Mesh namespace and verify that all resources are running.

    console
    $ kubectl get pods --namespace chaos-mesh -l app.kubernetes.io/instance=chaos-mesh
    

    Output:

    NAME                                       READY   STATUS    RESTARTS   AGE
    chaos-controller-manager-9d9895f4f-fd92d   1/1     Running   0          77s
    chaos-controller-manager-9d9895f4f-hpz8p   1/1     Running   0          77s
    chaos-controller-manager-9d9895f4f-wvq7m   1/1     Running   0          77s
    chaos-daemon-7ng72                         1/1     Running   0          78s
    chaos-daemon-nvklg                         1/1     Running   0          78s
    chaos-daemon-vjj6z                         1/1     Running   0          78s
    chaos-dashboard-54c7d9d-8zh4j              1/1     Running   0          77s
    chaos-dns-server-66d757d748-h9mxp          1/1     Running   0          77s

Run a Chaos Experiment

To test the resilience of your Kubernetes applications, perform different types of chaos experiments using Chaos Mesh such as NetworkChaos, DNSChaos, KernelChaos, and PodChaos. Follow the steps below to create an experiment using NetworkChaos that injects network failures into existing HTTP services within the cluster.

  1. Create a new experiment YAML file network-corruption.yaml to define the NetworkChaos configuration.

    console
    $ nano network-corruption.yaml
    
  2. Add the following configurations to the file.

    yaml
    kind: NetworkChaos
    apiVersion: chaos-mesh.org/v1alpha1
    metadata:
     namespace: default
     name: network-corruption
    spec:
     selector:
       namespaces:
         - default
       labelSelectors:
         app: http-server-first
     mode: all
     action: corrupt
     corrupt:
       corrupt: '100'
     direction: to
    

    Save and close the file.

    The above configuration sets up a NetworkChaos experiment in the default cluster namespace that targets the http-server-first application pods and injects 100% corruption into the pod's network activity.

  3. Apply the experiment to your cluster.

    console
    $ kubectl apply -f network-corruption.yaml
    
  4. Wait for at least 3 minutes and access the target HTTP application domain in your web browser.

    http://first.example.com

    Verify that the application returns a 504 Gateway- Time-out error because of the active NetworkChaos experiment with a high corruption percentage (100%).

    first HTTP service error page

  5. Run the following command to edit the NetworkChaos values and pause the experiment.

    console
    $ kubectl annotate networkchaos network-corruption experiment.chaos-mesh.org/pause=true --overwrite
    
  6. Navigate to your browser again, refresh the page and verify that you can correctly access the first HTTP service.

    first HTTP service page

Schedule Chaos Experiments

Chaos Mesh supports scheduled experiences to continuously validate the cluster reliability. Follow the steps below to create and schedule a NetworkChaos experiment to automate network corruption to the first HTTP service in your cluster.

  1. Create a new resource file network-corruption-schedule.yaml to define an experiment schedule.

    console
    $ nano network-corruption-schedule.yaml
    
  2. Add the following configurations to the file.

    yaml
    apiVersion: chaos-mesh.org/v1alpha1
    kind: Schedule
    metadata:
     name: network-corruption-schedule
    spec:
     schedule: '* * * * *'
     historyLimit: 2
     type: 'NetworkChaos'
     networkChaos:
       action: corrupt
       mode: all
       selector:
         namespaces:
           - default
         labelSelectors:
           app: http-server-first
       corrupt:
         corrupt: '100'
         correlation: '0'
       direction: to
       duration: '10s'
    

    Save and close the file.

    The above experiments add corruption to the first HTTP service network traffic for 10 seconds depending on the execution schedule. Within the configuration:

    • schedule: '* * * * *': Specifies the schedule to run every minute using the Cron syntax.
    • historyLimit: Sets the maximum number of completed schedules to keep within the cluster. The value 2 keeps two completed schedules.
    • action: Specifies the action to perform on the target application's network traffic.
    • labelSelectors: Defines the target application to apply the chaos experiment.
    • corrupt: Specifies the traffic percentage to corrupt with the experiment. The value 100 corrupts all network packets.
    • duration: Specifies how long the chaos experiment runs. The value, 10s sets the corruption duration to 10 seconds.
  3. Apply the configuration to your cluster.

    console
    $ kubectl apply -f network-corruption-schedule.yaml
    
  4. View all available NetworkChaos objects in the cluster.

    console
    $ kubectl get networkchaos -w
    
  5. Verify that a new chaos experiment runs every 10 seconds similar to the output below.

    NAME                 ACTION    DURATION
    network-corruption   corrupt   
    network-corruption-schedule-k576x   corrupt   10s
    network-corruption-schedule-k576x   corrupt   10s
    network-corruption-schedule-k576x   corrupt   10s
    network-corruption-schedule-k576x   corrupt   10s
    network-corruption-schedule-k576x   corrupt   10s
    network-corruption-schedule-k576x   corrupt   10s
    network-corruption-schedule-k576x   corrupt   10s
  6. Access the HTTP service in your web browser to verify the experiment processes.

    http://first.example.com
  7. Verify that a 504 Gateway Time-out error displays in your browser session every first 10 seconds of a minute.

    first HTTP service error page

Access the Chaos Mesh Dashboard

Chaos Mesh runs with a graphical web dashboard you can use to create and manage chaos experiments, monitor their progress, and analyze the results using a graphical interface. Follow the steps below to enable the Chaos Mesh dashboard in your cluster, set up an Ingress resource to expose the application and access it using your domain.

  1. Open the values.yaml file.

    console
    $ nano values.yaml
    
  2. Add the following configurations at the end of the file. Replace dashboard.example.com with your actual domain.

    yaml
    dashboard:
    ingress:
    enabled: true
    ingressClassName: nginx
    hosts:
    
    - name: dashboard.example.com
    

    The above configuration enables the Chaos Mesh dashboard using the Nginx Ingress controller with the domain dashboard.example.com for external access.

  3. Apply the modified values.yaml to your cluster.

    console
    $ helm upgrade chaos-mesh chaos-mesh/chaos-mesh -n=chaos-mesh -f values.yaml
    
  4. View all pods in the chaos-mesh namespace and verify that the new dashboard resources are ready.

    console
    $ kubectl get pods --namespace chaos-mesh -l app.kubernetes.io/instance=chaos-mesh
    

    Output:

    NAME                                        READY   STATUS    RESTARTS   AGE
    chaos-controller-manager-85498c8b4c-kbdcw   1/1     Running   0          88s
    chaos-controller-manager-85498c8b4c-rl8s2   1/1     Running   0          84s
    chaos-controller-manager-85498c8b4c-sh9q9   1/1     Running   0          93s
    chaos-daemon-cnvb4                          1/1     Running   0          94s
    chaos-daemon-jnhkg                          1/1     Running   0          84s
    chaos-daemon-pdgv7                          1/1     Running   0          90s
    chaos-dashboard-54c7d9d-4zml4               1/1     Running   0          94s
    chaos-dns-server-66d757d748-h9mxp           1/1     Running   0          55m
  5. The Chaos Mesh dashboard requires an access token for authentication purposes. Create a new RBAC configuration file dashboard-rbac.yaml to define cluster management resources.

    console
    $ nano dashboard-rbac.yaml
    
  6. Add the following configurations to the file.

    yaml
    kind: ServiceAccount
    apiVersion: v1
    metadata:
     namespace: default
     name: account-cluster-manager-dashboard
    
    ---
    
    kind: ClusterRole
    apiVersion: rbac.authorization.k8s.io/v1
    metadata:
     name: role-cluster-manager-dashboard
    rules:
    
    - apiGroups: [""]
      resources: ["pods", "namespaces"]
      verbs: ["get", "watch", "list"]
    - apiGroups: ["chaos-mesh.org"]
      resources: [ "*" ]
      verbs: ["get", "list", "watch", "create", "delete", "patch", "update"]
    
    ---
    
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
     name: bind-cluster-manager-dashboard
    subjects:
    
    - kind: ServiceAccount
      name: account-cluster-manager-dashboard
      namespace: default
      roleRef:
      kind: ClusterRole
      name: role-cluster-manager-dashboard
      apiGroup: rbac.authorization.k8s.io
    

    Save and close the file.

    Within the configuration:

    • ServiceAccount: Authenticates and enables access to resources within the Kubernetes cluster.
    • ClusterRole: Defines a new ClusterRole role-cluster-manager-dashboard that specifies the Kubernetes and Chaos Mesh resource access permissions.
    • ClusterRoleBinding: Binds the ServiceAccount to the ClusterRole and grants all defined permissions.
  7. Apply the configuration to your cluster.

    console
    $ kubectl apply -f dashboard-rbac.yaml
    
  8. Create a new access using the account-cluster-manager-dashboard resource.

    console
    $ kubectl create token account-cluster-manager-dashboard
    

    Verify and copy the generated access token to your clipboard similar to the output below:

    eyJhbGciOiJSUzI1NiIsImtpZCI6IkNIaUZfU3RUc3lxanpvTVF2b0x3aWZZU2pibzg3dHVrSGtodEtROGRDa0kifQ.eyJhdWQiOlsiaHR0cHM6Ly9rdWJlcm51bHQuc3ZjIiwic3lzdGVtOmtvbm5lY3Rpdml0eS1zZXJ2ZXIiXSwiZXhwIjoxNzEwMzk0NjcwLCJpYXQiOjE3MTAzOTEwNzAsImlzcyI6Imh0dHBzOi8va3ViZXJuZXRlcy5kZWZhdWx0LnN2YyIsImt1YmVybmV0ZXMuaW8iOnsibmFtZXNwYWNlIjoiZGVmYXVsdCIsInNlcnZpY2VhY2NvdW50Ijp7Im5hbWUiOiJhY2NvdW50LWNsdXN0ZXItbWFuYWdlci1kYXNoYm9hcmQiLCJ1aWQiOiI1NzM4ZjMyZC0wZGY1LTQ3MTEtYjBmNy05NDk4NTMzMTI0NWYifX0sIm5iZiI6MTcxMDM5MTA3MCwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50OmRlZmF1bHQ6YWNjb3VudC1jbHVzdGVyLW1hbmFnZXItZGFzaGJvYXJkIn0.afhENuTfKenZp_OfzlN4MtnPiBBEKYgU1wvozZcAnRnA1Dv5x_VqVWHYprDHnvTQEWGD_XDJwA9YE_VIV5IVUEMIYX2JD_ts3R-6IDm8hv4NsIO8XBzsQrtKO6SO6uDAkNvmZtIDTjpxWMijPCbzA6fYWk8-ZAiIg5AdJzhXKQZOoTp9S5w5lxmXY3-eBtklciup05zkGnG5jdjSJOPfT0sfBCQKFvZp8j7133aqF6cpSIzaunUrwSHSNl3VowWRfbgXg6_keGzMbkJyxgZHc2Y9xGMrmxfmMFaxTuraw7Z7OJIBrg5CN9TUJktlHP8teHZ_ZQ1ka6IfXAAZpBSiSg
  9. Access the Chaos Mesh dashboard domain in a new web browser window to use the web interface.

    http://dashboard.example.com
  10. Paste your generated access token in the Token field, and enter your desired token name in the Name field.

    Enter Chaos Mesh token

  11. Click Submit to log in and access the Chaos Mesh dashboard.

    Chaos Mesh dashboard

Run a Chaos Experiment using the Chaos Mesh Dashboard

Follow the steps below to create, run and monitor a chaos experiment using the graphical interface in a process similar to the cluster CLI procedure.

  1. Click Schedules on the left navigation menu to view all Scheduled experiments within the cluster.

    List schedule experiments

  2. Click the Pause button and select Confirm to pause the scheduled experiment.

  3. Click Experiments on the left navigation bar to view all deployed experiments.

    List all experiments

    Verify that all experiments appear in a paused state based on your earlier schedule.

  4. Click New experiment to create a new experiment and keep Kubernetes as the inject platform. Then, select Network Attack to set the experiment type.

    Select experiment type

  5. Click Corrupt as the experiment action, enter 100 as the corruption percentage and click Submit to enable the experiment information properties.

    Select experiment action

  6. Click the Namespace Selectors field and select default to use the default cluster namespace. Enter your desired experiment name in the Metadata field, select the http-server-second HTTP service in the Label Selectors field and toggle to activate the Run continuously option. Then, click Submit to save the experiment information.

    Experiment information

  7. Click Submit in the New experiment summary to apply the Chaos experiment.

    Confirm all information

  8. Click the new running experiment on your Experiments page to view more information about the associated cluster events.

    View new Chaos experiments

  9. Click the running experiment to see more information about your experiment.

    View experiments information

  10. Access your target HTTP service domain in a browser window and verify that a 504 Gateway Time-out error displays in your session.

    http://second.example.com

    Second HTTP service error page

Run Chaos Workflows Using Kubectl

Chaos workflows allow you to deploy a set of chaos experiments at once to intentionally disrupt a system in a controlled way. In addition, you can track the workflow's progress, monitor individual experiment outcomes, and analyze the chaos impact on your system. Follow the steps below to run Chaos workflows using a NetworkChaos experiment with a 9000ms delay and StatusCheck to continuously monitor the second HTTP service availability within the cluster.

  1. Create a new chaos workflow configuration file.

    console
    $ nano networkchaos-statuscheck-workflow.yaml
    
  2. Add the following configurations to the file.

    yaml
    apiVersion: chaos-mesh.org/v1alpha1
    kind: Workflow
    metadata:
     name: networkchaos-statuscheck-workflow
    spec:
     entry: the-entry
     templates:
    
    - name: the-entry
      templateType: Parallel
      deadline: 180s
      children:
    
      - workflow-status-check
      - workflow-network-chaos
    
    - name: workflow-status-check
      templateType: StatusCheck
      deadline: 180s
      abortWithStatusCheck: true
      statusCheck:
        mode: Continuous
        type: HTTP
        intervalSeconds: 3
        failureThreshold: 2
        http:
    
          url: http://second.example.com
          method: GET
          criteria:
            statusCode: "200"
    
    - name: workflow-network-chaos
      templateType: NetworkChaos
      deadline: 180s
      networkChaos:
        direction: to
        action: delay
        mode: all
        selector:
    
          labelSelectors:
            "app": "http-server-second"
    
        delay:
    
          latency: "9000ms"
    

    Save and close the file.

    The above Chaos Workflow performs a continuous status check on the HTTP endpoint http://second.example.com. When the status check fails, the workflow aborts and introduces network chaos by delaying incoming traffic to pods with the app=http-server-second label. Within the configuration:

    • name: networkchaos-statuscheck-workflow: Specifies the Chaos Workflow name.
    • entry: the-entry: Specifies the workflow starting point.
    • templateType: Parallel: Executes the template tasks in parallel.
    • deadline: 180s: Sets a deadline of 180 seconds to complete the task.
    • children: Specifies the child templates to execute in parallel.
    • workflow-status-check: Defines the first child template that performs the status check.
    • workflow-network-chaos: Defines the second child template that introduces network chaos.
    • abortWithStatusCheck: true: Stops the workflow when the status check fails.
    • mode: Continuous: Enables the status monitoring process to continue up to the deadline.
    • type: HTTP: Sets an HTTP request as the status check type.
    • intervalSeconds: 3: Sets the status check interval between attempts to 3 seconds.
    • failureThreshold: 2: Marks the status check as failed if two consecutive failures occur.
    • http: Sets the HTTP status check parameters.
    • url: http://second.example.com: Sets the target HTTP endpoint.
    • method: GET: Enabled the GET HTTP request method.
    • criteria: Specifies the status check result determination criteria.
    • statusCode: "200": Marks the status check as successful when the HTTP response status code is 200.
    • direction:: Specifies the direction of network chaos. The value to applies the experiment on incoming traffic.
    • action:: Sets the network chaos execution type, delay enables delays in network traffic.
    • mode: all: Specifies that network chaos will affect all pods matching the selector.
    • selector: Specifies the target pods affected by network chaos.
    • labelSelectors: Specifies the target labels to use when selecting the cluster pods.
    • "app":: Selects the target pods based on the label value, http-server-second enables the experiment on the second HTTP service pods.
    • delay: Sets the network delay parameters.
    • latency: "9000ms": Sets the network delay's latency to 9000 milliseconds.
  3. Apply the workflow configuration to your cluster.

    console
    $ kubectl apply -f networkchaos-statuscheck-workflow.yaml
    
  4. View all workflows and verify that the new resource is available in your cluster.

    console
    $ kubectl get workflow
    

    Output:

    NAME                                AGE
    networkchaos-statuscheck-workflow   7s
  5. View all workflow nodes associated with the Chaos Workflow networkchaos-statuscheck-workflow.

    console
    $ kubectl get workflownode --selector="chaos-mesh.org/workflow=networkchaos-statuscheck-workflow"
    

    Output:

    NAME                           AGE
    the-entry-wjvjm                45s
    workflow-network-chaos-ndch6   45s
    workflow-status-check-jmbz2    45s
  6. Describe the status, metadata and events associated with any of the workflow nodes. For example, the-entry-wjvjm.

    console
    $ kubectl describe workflownode the-entry-wjvjm
    

    Your output should be similar to the one below:

    Type    Reason           Age   From                                Message
    
    ----    ------           ----  ----                                -------
    
    Normal  NodesCreated     69s   workflow-parallel-node-reconciler   child nodes created, workflow-status-check-jmbz2,workflow-network-chaos-ndch6
    Normal  WorkflowAborted  58s   workflow-abort-workflow-reconciler  abort the node because workflow networkchaos-statuscheck-workflow aborted

    Based on the above output, the status check fails early and aborts the Workflow because of a high latency value of 9000 ms (9 seconds).

  7. Delete the above Workflow from the Kubernetes cluster to test a lower latency value.

    console
    $ kubectl delete -f networkchaos-statuscheck-workflow.yaml
    
  8. Open the Workflow configuration file and set the latency value of the NetworkChaos to a much smaller value.

    console
    $ nano networkchaos-statuscheck-workflow.yaml
    
  9. Find the delay section and change the latency value from 9000ms to 60ms.

    yaml
    delay:
     latency: "60ms"
    
  10. Apply the modified Workflow to your cluster.

    console
    $ kubectl apply -f networkchaos-statuscheck-workflow.yaml
    
  11. Wait for at least 180 seconds (3 minutes) which is the specified Workflow deadline value and view the workflow events.

    console
    $ kubectl describe workflow networkchaos-statuscheck-workflow
    

    Output:

    Events:
     Type    Reason                Age   From                       Message
    
     ----    ------                ----  ----                       -------
    
     Normal  EntryCreated          3m    workflow-entry-reconciler  entry node created, entry node the-entry-rgb9j
     Normal  WorkflowAccomplished  1s    workflow-entry-reconciler  workflow accomplished

    Based on the above output, the workflow runs successfully after the specified deadline.

Conclusion

You have deployed Chaos Mesh on a Vultr Kubernetes Engine (VKE) cluster and utilized the application features to orchestrate chaos experiments. Chaos Mesh provides valuable insights into the Kubernetes cluster behavior to enhance the general infrastructure reliability and availability of all resources. For more information, visit the Chaos Mesh documentation.