Troubleshooting
Although Kyverno’s goal is to make policy simple, sometimes trouble still strikes. The following points can be used to help troubleshoot Kyverno when things go wrong.
Symptom: My policies are created but nothing seems to happen when I create a resource that should trigger them.
Solution: There are a few moving parts that need to be checked to ensure Kyverno is receiving information from Kubernetes and is in good health.
-
Check and ensure the Kyverno Pod(s) are running. Assuming Kyverno was installed into the default Namespace of
kyverno
, use the commandkubectl -n kyverno get po
to check their status. The status should beRunning
at all times. -
Kyverno registers as two types of webhooks with Kubernetes. Check the status of registered webhooks to ensure Kyverno is among them.
1$ kubectl get validatingwebhookconfigurations,mutatingwebhookconfigurations 2 NAME WEBHOOKS AGE 3 validatingwebhookconfiguration.admissionregistration.k8s.io/kyverno-policy-validating-webhook-cfg 1 46m 4 validatingwebhookconfiguration.admissionregistration.k8s.io/kyverno-resource-validating-webhook-cfg 1 46m 5 6 NAME WEBHOOKS AGE 7 mutatingwebhookconfiguration.admissionregistration.k8s.io/kyverno-policy-mutating-webhook-cfg 1 46m 8 mutatingwebhookconfiguration.admissionregistration.k8s.io/kyverno-resource-mutating-webhook-cfg 1 46m 9 mutatingwebhookconfiguration.admissionregistration.k8s.io/kyverno-verify-mutating-webhook-cfg 1 46m
The age should be consistent with the age of the currently running Kyverno Pod(s). If the age of these webhooks shows, for example, a few seconds old, Kyverno may be having trouble registering with Kubernetes.
-
Test that name resolution and connectivity to the Kyverno service works inside your cluster by starting a simple
busybox
Pod and trying to connect to Kyverno. Enter thewget
command as shown below. If the response is not “remote file exists” then there is a network connectivity or DNS issue within your cluster. If your cluster was provisioned with kubespray, see if this comment helps you.1$ kubectl run busybox --rm -ti --image=busybox -- /bin/sh 2If you don't see a command prompt, try pressing enter. 3/ # wget --no-check-certificate --spider --timeout=1 https://kyverno-svc.kyverno.svc:443/health/liveness 4Connecting to kyverno-svc.kyverno.svc:443 (100.67.141.176:443) 5remote file exists 6/ # exit 7Session ended, resume using 'kubectl attach busybox -c busybox -i -t' command when the pod is running 8pod "busybox" deleted
Symptom: Kyverno is working for some policies but not others. How can I see what’s going on?
Solution: The first thing is to check the logs from the Kyverno Pod to see if it describes why a policy or rule isn’t working.
-
Check the Pod logs from Kyverno. Assuming Kyverno was installed into the default Namespace called
kyverno
use the commandkubectl -n kyverno logs <kyverno_pod_name>
to show the logs. To watch the logs live, add the-f
switch for the “follow” option. -
If no helpful information is being displayed at the default logging level, increase the level of verbosity by editing the Kyverno Deployment. To edit the Deployment, assuming Kyverno was installed into the default Namespace, use the command
kubectl -n kyverno edit deploy kyverno
. Find theargs
section for the container namedkyverno
and change the-v=2
switch to-v=6
. This will increase the logging level to its highest. Take care to revert this back to-v=2
once troubleshooting steps are concluded.
Symptom: I have a large cluster with many objects and many Kyverno policies. Kyverno is seen to sometimes crash.
Solution: In cases of very large scale, it may be required to increase the memory limit of the Kyverno Pod so it can keep track of these objects.
- Edit the Kyverno Deployment and increase the memory limit on the
kyverno
container by using the commandkubectl -n kyverno edit deploy kyverno
. Change theresources.limits.memory
field to a larger value. Continue to monitor the memory usage by using something like the Kubernetes metrics-server.
Symptom: I’m using GKE and after installing Kyverno, my cluster is either broken or I’m seeing timeouts and other issues.
Solution: Private GKE clusters do not allow certain communications from the control planes to the workers, which Kyverno requires to receive webhooks from the API server. In order to resolve this issue, create a firewall rule which allows the control plane to speak to workers on the Kyverno TCP port which by default at this time is 9443.
Symptom: I’m an EKS user and I’m finding that resources that should be blocked by a Kyverno policy are not.
Solution: When using EKS with a custom CNI, the Kyverno webhook cannot be reached by the API server because the control plane nodes, which cannot use a custom CNI, differ from the configuration of the worker nodes, which can. In order to resolve this, when installing Kyverno via Helm, set the hostNetwork
option to true
. See also this note.