Mesh Health Check

This policy uses new policy matching algorithm. Do not combine with HealthCheck.

This policy enables Kuma to keep track of the health of every data plane proxy, with the goal of minimizing the number of failed requests in case a data plane proxy is temporarily unhealthy.

By creating a MeshHealthCheck resource you instruct a data plane proxy to keep track of the health status for any other data plane proxy. When health-checks are properly configured, a data plane proxy will never send a request to another data plane proxy that is considered unhealthy. When an unhealthy proxy returns to a healthy state, Kuma will resume sending requests to it again.

This policy provides active checks. If you want to configure passive checks, please utilize the MeshCircuitBreaker policy. Data plane proxies with active checks will explicitly send requests to other data plane proxies to determine if target proxies are healthy or not. This mode generates extra traffic to other proxies and services as described in the policy configuration.

TargetRef support matrix

targetRef Allowed kinds
targetRef.kind Mesh, MeshSubset
to[].targetRef.kind Mesh, MeshService

To learn more about the information in this table, see the matching docs.

Configuration

The MeshHealthCheck policy supports both L4/TCP and L7/HTTP/gRPC checks.

Protocol selection

The health check protocol is selected by picking the most specific protocol and falls back to more general protocol when specified protocol has disabled=true in policy definition. See protocol fallback example.

Examples

Health check from web to backend service

apiVersion: kuma.io/v1alpha1
kind: MeshHealthCheck
metadata:
  name: web-to-backend-check
  namespace: kuma-demo
  labels:
    kuma.io/mesh: default
spec:
  targetRef:
    kind: MeshSubset
    tags:
      kuma.io/service: web
  to:
  - targetRef:
      kind: MeshService
      name: backend_kuma-demo_svc_3001
    default:
      interval: 10s
      timeout: 2s
      unhealthyThreshold: 3
      healthyThreshold: 1
      http:
        path: "/health"
        expectedStatuses:
        - 200
        - 201
apiVersion: kuma.io/v1alpha1
kind: MeshHealthCheck
metadata:
  name: web-to-backend-check
  namespace: kuma-demo
  labels:
    kuma.io/mesh: default
spec:
  targetRef:
    kind: MeshSubset
    tags:
      kuma.io/service: web
  to:
  - targetRef:
      kind: MeshService
      name: backend
      namespace: kuma-demo
      sectionName: http
    default:
      interval: 10s
      timeout: 2s
      unhealthyThreshold: 3
      healthyThreshold: 1
      http:
        path: "/health"
        expectedStatuses:
        - 200
        - 201

Protocol fallback

apiVersion: kuma.io/v1alpha1
kind: MeshHealthCheck
metadata:
  name: web-to-backend-check
  namespace: kuma-demo
  labels:
    kuma.io/mesh: default
spec:
  targetRef:
    kind: MeshSubset
    tags:
      kuma.io/service: web
  to:
  - targetRef:
      kind: MeshService
      name: backend_kuma-demo_svc_3001
    default:
      interval: 10s
      timeout: 2s
      unhealthyThreshold: 3
      healthyThreshold: 1
      tcp: {}
      http:
        disabled: true
apiVersion: kuma.io/v1alpha1
kind: MeshHealthCheck
metadata:
  name: web-to-backend-check
  namespace: kuma-demo
  labels:
    kuma.io/mesh: default
spec:
  targetRef:
    kind: MeshSubset
    tags:
      kuma.io/service: web
  to:
  - targetRef:
      kind: MeshService
      name: backend
      namespace: kuma-demo
      sectionName: http
    default:
      interval: 10s
      timeout: 2s
      unhealthyThreshold: 3
      healthyThreshold: 1
      tcp: {}
      http:
        disabled: true

gRPC health check from cart to payment service

apiVersion: kuma.io/v1alpha1
kind: MeshHealthCheck
metadata:
  name: web-to-backend-check
  namespace: kuma-demo
  labels:
    kuma.io/mesh: default
spec:
  targetRef:
    kind: MeshSubset
    tags:
      kuma.io/service: web
  to:
  - targetRef:
      kind: MeshService
      name: backend_kuma-demo_svc_3001
    default:
      interval: 15s
      timeout: 5s
      unhealthyThreshold: 3
      healthyThreshold: 2
      grpc:
        serviceName: grpc.health.v1.CustomHealth
apiVersion: kuma.io/v1alpha1
kind: MeshHealthCheck
metadata:
  name: web-to-backend-check
  namespace: kuma-demo
  labels:
    kuma.io/mesh: default
spec:
  targetRef:
    kind: MeshSubset
    tags:
      kuma.io/service: web
  to:
  - targetRef:
      kind: MeshService
      name: backend
      namespace: kuma-demo
      sectionName: http
    default:
      interval: 15s
      timeout: 5s
      unhealthyThreshold: 3
      healthyThreshold: 2
      grpc:
        serviceName: grpc.health.v1.CustomHealth

Common configuration

  • interval - (optional) interval between consecutive health checks, if not specified then equal to 1m
  • timeout - (optional) maximum time to wait for a health check response, if not specified then equal to 15s
  • unhealthyThreshold - (optional) number of consecutive unhealthy checks before considering a host unhealthy, if not specified then equal to 5
  • healthyThreshold - (optional) number of consecutive healthy checks before considering a host healthy, if not specified then equal to 1
  • initialJitter - (optional) if specified, Envoy will start health checking after a random time in milliseconds between 0 and initialJitter. This only applies to the first health check
  • intervalJitter - (optional) if specified, during every interval Envoy will add intervalJitter to the wait time
  • intervalJitterPercent - (optional) if specified, during every interval Envoy will add intervalJitter * intervalJitterPercent / 100 to the wait time. If intervalJitter and intervalJitterPercent are both set, both of them will be used to increase the wait time.
  • healthyPanicThreshold - allows to configure panic threshold for Envoy cluster. If not specified, the default is 50%. To disable panic mode, set to 0%.
  • failTrafficOnPanic - (optional) if set to true, Envoy will not consider any hosts when the cluster is in ‘panic mode’. Instead, the cluster will fail all requests as if all hosts are unhealthy. This can help avoid potentially overwhelming a failing service.
  • noTrafficInterval - (optional) a special health check interval that is used when a cluster has never had traffic routed to it. This lower interval allows cluster information to be kept up to date, without sending a potentially large amount of active health checking traffic for no reason. Once a cluster has been used for traffic routing, Envoy will shift back to using the standard health check interval that is defined. Note that this interval takes precedence over any other. The default value for “no traffic interval” is 60 seconds.
  • eventLogPath - (optional) specifies the path to the file where Envoy can log health check events. If empty, no event log will be written.
  • alwaysLogHealthCheckFailures - (optional) if set to true, health check failure events will always be logged. If set to false, only the initial health check failure event will be logged. The default value is false.
  • reuseConnection - (optional) reuse health check connection between health checks. Default is true.

Protocol specific configuration

HTTP

HTTP health checks are executed using HTTP2

  • disabled - (optional) - if true HTTP health check is disabled
  • path - (optional) HTTP path to be used during the health checks, if not specified then equal to “/”
  • expectedStatuses (optional) - list of HTTP response statuses which are considered healthy
    • only statuses in the range [100, 600) are allowed
    • by default, when this property is not provided only responses with status code 200 are being considered healthy
  • requestHeadersToAdd (optional) - HeaderModifier list of HTTP headers which should be added to each health check request

HeaderModifier

  • set - (optional) - list of headers to set. Overrides value if the header exists.
    • name - header’s name
    • value - header’s value
  • add - (optional) - list of headers to add. Appends value if the header exists.
    • name - header’s name
    • value - header’s value

TCP

  • disabled - (optional) - if true TCP health check is disabled
  • send - (optional) - Base64 encoded content of the message which should be sent during the health checks
  • receive - (optional) - list of Base64 encoded blocks of strings which should be found in the returning message which should be considered as healthy
    • when checking the response, “fuzzy” matching is performed such that each block must be found, and in the order specified, but not necessarily contiguous;
    • if receive section won’t be provided or will be empty, checks will be performed as “connect only” and will be marked as successful when TCP connection will be successfully established.

gRPC

  • disabled - (optional) - if true gRPC health check is disabled
  • serviceName - (optional) - service name parameter which will be sent to gRPC service
  • authority - (optional) - value of the :authority header in the gRPC health check request, by default name of the cluster this health check is associated with

All policy options