Extracting useful information from your Kubernetes cluster with custom-columns and jq

Extracting useful information from your Kubernetes cluster with custom-columns and jq

How to build custom queries for your Kubernetes cluster objects and how to create your own query collection.

> Image by: max_duz at: unsplash.com/photos/qAjJk-un3BI

It's common when working with Kubernetes that we perform several queries to our cluster objects such as nodes, deployments, builds, pods and we don't always get the set of information we need, exposed by default via kubeclt get, having to resort in these cases to search the entire object bringing information beyond what is desired.

Using kubectl's custom-columns output option and the jq tool we can create queries that deliver specifically what we want. In this article, we'll explore the two and learn how to create your own query collection.

Problem

Let's consider two common scenarios where we need to fetch information from a kubernetes cluster:

  • Recover cluster nodes health information such as memory, cpu, disk pressures.
  • Retrieve information from environment variables (env) and resource limits (resource and limits) from deployments in the cluster.

To retrieve information from cluster objects, in general, we can use the command:

kubectl get <OBJECT>

To consult the nodes we can run the command:

~ kubectl get nodes -o wide

NAME                                         STATUS   ROLES    AGE   VERSION           INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                 CONTAINER-RUNTIME
ip-10-0-135-204.xyz.compute.internal   Ready    master   11d   v1.21.1+6438632   10.0.135.204   <none>        RHEL CoreOS 48.84.202110270303-0    4.18.0-305.19.1.el8_4.x86_64   cri-o://1.21.3-8.rhaos4.8.git7415a53.el8
ip-10-0-142-176.xyz.compute.internal   Ready    worker   11d   v1.21.1+6438632   10.0.142.176   <none>        RHEL CoreOS 48.84.202110270303-0    4.18.0-305.19.1.el8_4.x86_64   cri-o://1.21.3-8.rhaos4.8.git7415a53.el8
ip-10-0-160-187.xyz.compute.internal   Ready    master   11d   v1.21.1+6438632   10.0.160.187   <none>        RHEL CoreOS 48.84.202110270303-0    4.18.0-305.19.1.el8_4.x86_64   cri-o://1.21.3-8.rhaos4.8.git7415a53.el8
ip-10-0-176-188.xyz.compute.internal   Ready    worker   11d   v1.21.1+6438632   10.0.176.188   <none>        RHEL CoreOS 48.84.202110270303-0    4.18.0-305.19.1.el8_4.x86_64   cri-o://1.21.3-8.rhaos4.8.git7415a53.el8
ip-10-0-214-226.xyz.compute.internal   Ready    master   11d   v1.21.1+6438632   10.0.214.226   <none>        RHEL CoreOS 48.84.202110270303-0    4.18.0-305.19.1.el8_4.x86_64   cri-o://1.21.3-8.rhaos4.8.git7415a53.el8
ip-10-0-219-74.xyz.compute.internal    Ready    worker   11d   v1.21.1+6438632   10.0.219.74    <none>        RHEL CoreOS 48.84.202110270303-0    4.18.0-305.19.1.el8_4.x86_64   cri-o://1.21.3-8.rhaos4.8.git7415a53.el8

To query deployments we can use the command, like for instance:

# You can also use -o wide to retrieve more information
~ kubectl get deployments --all-namespaces

NAMESPACE                                          NAME                                     READY   UP-TO-DATE   AVAILABLE   AGE
openshift-apiserver-operator                       openshift-apiserver-operator             1/1     1            1           11d
openshift-apiserver                                apiserver                                3/3     3            3           11d
openshift-cluster-storage-operator                 cluster-storage-operator                 1/1     1            1           11d
openshift-cluster-storage-operator                 csi-snapshot-controller                  2/2     2            2           11d
openshift-cluster-version                          cluster-version-operator                 1/1     1            1           11d
openshift-console-operator                         console-operator                         1/1     1            1           11d
openshift-console                                  console                                  2/2     2            2           11d

Both commands, despite bringing a lot of information, do not contain the information we are looking for. To retrieve the information we need, we can retrieve the complete objects, in yaml or json format, using the command: kubectl get deployments --all-namespaces -o json

The command output is as follows:

{
    "apiVersion": "v1",
    "items": [
        {
            "apiVersion": "apps/v1",
            "kind": "Deployment",
            "metadata": {
                "name": "openshift-apiserver-operator",
                "namespace": "openshift-apiserver-operator"
            },
            "spec": {
                "template": {
                    "spec": {
                        "containers": [
                            {
                                "env": [
                                    {
                                        "name": "IMAGE",
                                        "value": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f532d4e20932e1e6664b1b7003691d44a511bb626bc339fd883a624f020ff399"
                                    },
                                    {
                                        "name": "OPERATOR_IMAGE",
                                        "value": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a24bdc7bae31584af5a7e0cb0629dda9bb2b1d613a40e92e227e0d13cb326ef4"
                                    },
                                    {
                                        "name": "OPERATOR_IMAGE_VERSION",
                                        "value": "4.8.19"
                                    },
                                    {
                                        "name": "OPERAND_IMAGE_VERSION",
                                        "value": "4.8.19"
                                    },
                                    {
                                        "name": "KUBE_APISERVER_OPERATOR_IMAGE",
                                        "value": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0e56e34f980552a7ce3d55429a9a265307dc89da11c29f6366b34369cc2a9ba0"
                                    }
                                ],
                                "resources": {
                                    "requests": {
                                        "cpu": "10m",
                                        "memory": "50Mi"
                                    }
                                }
                            }
                        ]
                    }
                }
            }
        },
        // other informations...
    ],
    "kind": "List",
    "metadata": {
        "resourceVersion": "",
        "selfLink": ""
    }
}

Using custom-columns to query nodes

Let's explore the custom-columns output option of the kubectl get command to retrieve just the information we need. The custom-columns option allows us to define which data will be extracted by mapping the column heading and the desired field.

Using the node json as a base to build our query

{
  "apiVersion": "v1",
  "kind": "Node",
  "metadata": {
    "name": "ip-10-0-219-74.xyz.compute.internal"
  },
  "status": {
    "addresses": [
      {
        "address": "10.0.219.74",
        "type": "InternalIP"
      },
      {
        "address": "ip-10-0-219-74.xyz.compute.internal",
        "type": "Hostname"
      },
      {
        "address": "ip-10-0-219-74.xyz.compute.internal",
        "type": "InternalDNS"
      }
    ],
    "conditions": [
      {
        "message": "kubelet has sufficient memory available",
        "reason": "KubeletHasSufficientMemory",
        "status": "False",
        "type": "MemoryPressure"
      },
      {
        "message": "kubelet has no disk pressure",
        "reason": "KubeletHasNoDiskPressure",
        "status": "False",
        "type": "DiskPressure"
      },
      {
        "message": "kubelet has sufficient PID available",
        "reason": "KubeletHasSufficientPID",
        "status": "False",
        "type": "PIDPressure"
      },
      {
        "message": "kubelet is posting ready status",
        "reason": "KubeletReady",
        "status": "True",
        "type": "Ready"
      }
    ],
    "nodeInfo": {
      "architecture": "amd64",
      "bootID": "327671fc-3d6f-4bc4-ab5f-fa012687e839",
      "containerRuntimeVersion": "cri-o://1.21.3-8.rhaos4.8.git7415a53.el8",
      "kernelVersion": "4.18.0-305.19.1.el8_4.x86_64",
      "kubeProxyVersion": "v1.21.1+6438632",
      "kubeletVersion": "v1.21.1+6438632",
      "machineID": "ec2e23b2f3d554c78f67dc2e30ba230a",
      "operatingSystem": "linux",
      "osImage": "Red Hat Enterprise Linux CoreOS 48.84.202110270303-0 (Ootpa)",
      "systemUUID": "ec2e23b2-f3d5-54c7-8f67-dc2e30ba230a"
    }
  }
}

A simple query using custom-columns to return the names of the cluster nodes:

~ kubectl get nodes -o custom-columns="Name:.metadata.name"

Name
ip-10-0-135-204.xyz.compute.internal
ip-10-0-142-176.xyz.compute.internal
ip-10-0-160-187.xyz.compute.internal
ip-10-0-176-188.xyz.compute.internal
ip-10-0-214-226.xyz.compute.internal
ip-10-0-219-74.xyz.compute.internal

To query values from a group, such as nodes addresses (InternalIP, Hostname, InternalDNS) we can use the notation .status.addresses[*].address

~ kubectl get nodes -o custom-columns="Name:.metadata.name,Addresses:.status.addresses[*].address"

Name                                         Addresses
ip-10-0-135-204.xyz.compute.internal   10.0.135.204,ip-10-0-135-204.xyz.compute.internal,ip-10-0-135-204.xyz.compute.internal
ip-10-0-142-176.xyz.compute.internal   10.0.142.176,ip-10-0-142-176.xyz.compute.internal,ip-10-0-142-176.xyz.compute.internal
ip-10-0-160-187.xyz.compute.internal   10.0.160.187,ip-10-0-160-187.xyz.compute.internal,ip-10-0-160-187.xyz.compute.internal
ip-10-0-176-188.xyz.compute.internal   10.0.176.188,ip-10-0-176-188.xyz.compute.internal,ip-10-0-176-188.xyz.compute.internal
ip-10-0-214-226.xyz.compute.internal   10.0.214.226,ip-10-0-214-226.xyz.compute.internal,ip-10-0-214-226.xyz.compute.internal
ip-10-0-219-74.xyz.compute.internal    10.0.219.74,ip-10-0-219-74.xyz.compute.internal,ip-10-0-219-74.xyz.compute.internal

If we want specific values from a group, we can use the desired index for that, then, to set up our nodes health query:

~ kubectl get nodes -o custom-columns="Name:.metadata.name,InternalIP:.status.addresses[0].address,Kernel:.status.nodeInfo.kernelVersion,MemoryPressure:.status.conditions[0].status,DiskPressure:.status.conditions[1].status,PIDPressure:.status.conditions[2].status,Ready:.status.conditions[3].status"
Name                                         Kernel                         InternalIP     MemoryPressure   DiskPressure   PIDPressure   Ready
ip-10-0-135-204.xyz.compute.internal   4.18.0-305.19.1.el8_4.x86_64   10.0.135.204   False            False          False         True
ip-10-0-142-176.xyz.compute.internal   4.18.0-305.19.1.el8_4.x86_64   10.0.142.176   False            False          False         True
ip-10-0-160-187.xyz.compute.internal   4.18.0-305.19.1.el8_4.x86_64   10.0.160.187   False            False          False         True
ip-10-0-176-188.xyz.compute.internal   4.18.0-305.19.1.el8_4.x86_64   10.0.176.188   False            False          False         True
ip-10-0-214-226.xyz.compute.internal   4.18.0-305.19.1.el8_4.x86_64   10.0.214.226   False            False          False         True
ip-10-0-219-74.xyz.compute.internal    4.18.0-305.19.1.el8_4.x86_64   10.0.219.74    False            False          False         True

Creating a query collection

With our custom query ready, we can save the field mapping for easy reuse. The file follows a specific format of headers and values:

HEADER1 HEADER2 HEADER3
.field.value1 .field.value2 .field.value3

For our query, the file, which we'll call cluster-nodes-health.txt, would be:

Name Kernel InternalIP MemoryPressure DiskPressure PIDPressure Ready
.metadata.name .status.nodeInfo.kernelVersion .status.addresses[0].address .status.conditions[0].status .status.conditions[1].status .status.conditions[2].status .status.conditions [3].status

And we can perform the query using the custom-columns-file option:

~ kubectl get nodes -o custom-columns-file=cluster-nodes-health.txt

Name                                         Kernel                         InternalIP     MemoryPressure   DiskPressure   PIDPressure   Ready
ip-10-0-135-204.xyz.compute.internal   4.18.0-305.19.1.el8_4.x86_64   10.0.135.204   False            False          False         True
ip-10-0-142-176.xyz.compute.internal   4.18.0-305.19.1.el8_4.x86_64   10.0.142.176   False            False          False         True
ip-10-0-160-187.xyz.compute.internal   4.18.0-305.19.1.el8_4.x86_64   10.0.160.187   False            False          False         True
ip-10-0-176-188.xyz.compute.internal   4.18.0-305.19.1.el8_4.x86_64   10.0.176.188   False            False          False         True
ip-10-0-214-226.xyz.compute.internal   4.18.0-305.19.1.el8_4.x86_64   10.0.214.226   False            False          False         True
ip-10-0-219-74.xyz.compute.internal    4.18.0-305.19.1.el8_4.x86_64   10.0.219.74    False            False          False         True

Using jq to query env from deployments

To query the envs we will explore the jq utility, with it we will fetch objects like json and filter them to show only the information we want.

About jq

jq is a lightweight and flexible JSON command line processor.

As the jq page itself describes:

> "jq is like sed for JSON data - you can use it to split, filter, map, and transform structured data as easily as sed, awk, grep."

It can be found at: stedolan.github.io/jq

Structuring the jq command

Let's show a basic query with jq. That interacts with .items[] deployments and extracts just their name from .metadata.name.

~ kubectl get deployments --all-namespaces -o json | jq -r '.items[] | .metadata.name '

openshift-apiserver-operator
apiserver
authentication operator
# other projects...

Let's evolve our query to build a json with the information for name, namespace and env:

~ kubectl get deployments --all-namespaces -o json | jq -r '.items[] | { namespace: .metadata.namespace, name: .metadata.name, env: .spec.template.spec.containers[].env}'
{
  "namespace": "openshift-apiserver-operator",
  "name": "openshift-apiserver-operator",
  "env": [
    {
      "name": "IMAGE",
      "value": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:f532d4e20932e1e6664b1b7003691d44a511bb626bc339fd883a624f020ff399"
    },
    {
      "name": "OPERATOR_IMAGE",
      "value": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:a24bdc7bae31584af5a7e0cb0629dda9bb2b1d613a40e92e227e0d13cb326ef4"
    },
    {
      "name": "OPERATOR_IMAGE_VERSION",
      "value": "4.8.19"
    },
    {
      "name": "OPERAND_IMAGE_VERSION",
      "value": "4.8.19"
    },
    {
      "name": "KUBE_APISERVER_OPERATOR_IMAGE",
      "value": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0e56e34f980552a7ce3d55429a9a265307dc89da11c29f6366b34369cc2a9ba0"
    }
  ]
}
{
  "namespace": "openshift-apiserver",
  "name": "apiserver",
  "env": [
    {
      "name": "POD_NAME",
      "valueFrom": {
        "fieldRef": {
          "apiVersion": "v1",
          "fieldPath": "metadata.name"
        }
      }
    },
    {
      "name": "POD_NAMESPACE",
      "valueFrom": {
        "fieldRef": {
          "apiVersion": "v1",
          "fieldPath": "metadata.namespace"
        }
      }
    }
  ]
}
{
  "namespace": "openshift-apiserver",
  "name": "apiserver",
  "env": [
    {
      "name": "POD_NAME",
      "valueFrom": {
        "fieldRef": {
          "apiVersion": "v1",
          "fieldPath": "metadata.name"
        }
      }
    },
    {
      "name": "POD_NAMESPACE",
      "valueFrom": {
        "fieldRef": {
          "apiVersion": "v1",
          "fieldPath": "metadata.namespace"
        }
      }
    }
  ]
}

// fields hidden for reading....

To get our json in a valid format, let's wrap the results in an array [] and use jq's map function.

~ kubectl get deployments --all-namespaces -o json | jq -r '.items | [ map(.) | .[] | { namespace: .metadata.namespace, name: .metadata.name, env: .spec.template.spec.containers[].env }]'
// small output for reading purpose... 

[
  {
    "namespace": "openshift-operator-lifecycle-manager",
    "name": "catalog-operator",
    "env": [
      {
        "name": "RELEASE_VERSION",
        "value": "4.8.19"
      }
    ]
  },
  {
    "namespace": "openshift-operator-lifecycle-manager",
    "name": "olm-operator",
    "env": [
      {
        "name": "RELEASE_VERSION",
        "value": "4.8.19"
      },
      {
        "name": "OPERATOR_NAMESPACE",
        "valueFrom": {
          "fieldRef": {
            "apiVersion": "v1",
            "fieldPath": "metadata.namespace"
          }
        }
      },
      {
        "name": "OPERATOR_NAME",
        "value": "olm-operator"
      }
    ]
  },
  {
    "namespace": "openshift-operator-lifecycle-manager",
    "name": "packageserver",
    "env": [
      {
        "name": "OPERATOR_CONDITION_NAME",
        "value": "packageserver"
      }
    ]
  }
]

Query with jq file

Just like with custom-columns, with jq we have the option of passing a file containing our filter instead of inline data. So, let's create a file called jq-deployments-envs.txt with the contents:

.items | [ map(.) | .[] | { namespace: .metadata.namespace, name: .metadata.name, env: .spec.template.spec.containers[].env }]

And our query can be executed with the command:

~ kubectl deployments --all-namespaces -o json | jq -f jq-deployments-envs.txt

Conclusion

With kubectl's native option, custom-columns, and the jq utility it is possible to extract custom information from a Kubernetes cluster. Furthermore, with the option to use files to assemble queries we can create several useful views for the cluster and store them in source control for sharing with other team members or for the community.

References

kubernetes.io/docs/reference/kubectl/overvi..

kubernetes.io/pt-br/docs/reference/kubectl/..

stedolan.github.io/jq/tutorial

kubernetes.io/docs/tasks/access-application..

kubernetes.io/docs/reference/kubectl/jsonpath

gist.github.com/so0k/42313dbb3b547a0f51a547..

starkandwayne.com/blog/silly-kubectl-trick-..

michalwojcik.com.pl/2021/07/04/yaml-jsonpat..

laury.dev/snippets/combine-kubectl-jsonpath..

access.redhat.com/articles/2988581

sferich888.blogspot.com/2017/01/learning-us..