Careful!

You are browsing documentation for the next version of Kuma. Use this version at your own risk.

Mesh Health Check

This policy uses new policy matching algorithm. Do not combine with HealthCheck.

This policy enables Kuma to keep track of the health of every data plane proxy, with the goal of minimizing the number of failed requests in case a data plane proxy is temporarily unhealthy.

By creating a MeshHealthCheck resource you instruct a data plane proxy to keep track of the health status for any other data plane proxy. When health-checks are properly configured, a data plane proxy will never send a request to another data plane proxy that is considered unhealthy. When an unhealthy proxy returns to a healthy state, Kuma will resume sending requests to it again.

This policy provides active checks. If you want to configure passive checks, please utilize the MeshCircuitBreaker policy. Data plane proxies with active checks will explicitly send requests to other data plane proxies to determine if target proxies are healthy or not. This mode generates extra traffic to other proxies and services as described in the policy configuration.

TargetRef support matrix

See the Referencing Dataplanes, Services, and Routes inside policies section of the introduction to learn about all the available targetRef kinds.

Configuration

The MeshHealthCheck policy supports both L4/TCP and L7/HTTP/gRPC checks.

Protocol selection

The health check protocol is selected by picking the most specific protocol and falls back to more general protocol when specified protocol has disabled=true in policy definition. See protocol fallback example.

Examples

I am using MeshService

apiVersion: kuma.io/v1alpha1
kind: MeshHealthCheck
metadata:
  name: web-to-backend-check
  namespace: kuma-demo
  labels:
    kuma.io/mesh: default
spec:
  targetRef:
    kind: Dataplane
    labels:
      app: web
  to:
  - targetRef:
      kind: MeshService
      name: backend_kuma-demo_svc_3001
    default:
      interval: 10s
      timeout: 2s
      unhealthyThreshold: 3
      healthyThreshold: 1
      http:
        path: "/health"
        expectedStatuses:
        - 200
        - 201

apiVersion: kuma.io/v1alpha1
kind: MeshHealthCheck
metadata:
  name: web-to-backend-check
  namespace: kuma-demo
  labels:
    kuma.io/mesh: default
spec:
  targetRef:
    kind: Dataplane
    labels:
      app: web
  to:
  - targetRef:
      kind: MeshService
      name: backend
      namespace: kuma-demo
      sectionName: http
    default:
      interval: 10s
      timeout: 2s
      unhealthyThreshold: 3
      healthyThreshold: 1
      http:
        path: "/health"
        expectedStatuses:
        - 200
        - 201

I am using MeshService

type: MeshHealthCheck
name: web-to-backend-check
mesh: default
spec:
  targetRef:
    kind: Dataplane
    labels:
      app: web
  to:
  - targetRef:
      kind: MeshService
      name: backend
    default:
      interval: 10s
      timeout: 2s
      unhealthyThreshold: 3
      healthyThreshold: 1
      http:
        path: "/health"
        expectedStatuses:
        - 200
        - 201

type: MeshHealthCheck
name: web-to-backend-check
mesh: default
spec:
  targetRef:
    kind: Dataplane
    labels:
      app: web
  to:
  - targetRef:
      kind: MeshService
      name: backend
      sectionName: http
    default:
      interval: 10s
      timeout: 2s
      unhealthyThreshold: 3
      healthyThreshold: 1
      http:
        path: "/health"
        expectedStatuses:
        - 200
        - 201

Protocol fallback

Kubernetes
Universal

I am using MeshService

apiVersion: kuma.io/v1alpha1
kind: MeshHealthCheck
metadata:
  name: web-to-backend-check
  namespace: kuma-demo
  labels:
    kuma.io/mesh: default
spec:
  targetRef:
    kind: Dataplane
    labels:
      app: web
  to:
  - targetRef:
      kind: MeshService
      name: backend_kuma-demo_svc_3001
    default:
      interval: 10s
      timeout: 2s
      unhealthyThreshold: 3
      healthyThreshold: 1
      tcp: {}
      http:
        disabled: true

apiVersion: kuma.io/v1alpha1
kind: MeshHealthCheck
metadata:
  name: web-to-backend-check
  namespace: kuma-demo
  labels:
    kuma.io/mesh: default
spec:
  targetRef:
    kind: Dataplane
    labels:
      app: web
  to:
  - targetRef:
      kind: MeshService
      name: backend
      namespace: kuma-demo
      sectionName: http
    default:
      interval: 10s
      timeout: 2s
      unhealthyThreshold: 3
      healthyThreshold: 1
      tcp: {}
      http:
        disabled: true

I am using MeshService

type: MeshHealthCheck
name: web-to-backend-check
mesh: default
spec:
  targetRef:
    kind: Dataplane
    labels:
      app: web
  to:
  - targetRef:
      kind: MeshService
      name: backend
    default:
      interval: 10s
      timeout: 2s
      unhealthyThreshold: 3
      healthyThreshold: 1
      tcp: {}
      http:
        disabled: true

type: MeshHealthCheck
name: web-to-backend-check
mesh: default
spec:
  targetRef:
    kind: Dataplane
    labels:
      app: web
  to:
  - targetRef:
      kind: MeshService
      name: backend
      sectionName: http
    default:
      interval: 10s
      timeout: 2s
      unhealthyThreshold: 3
      healthyThreshold: 1
      tcp: {}
      http:
        disabled: true

gRPC health check from cart to payment service

Kubernetes
Universal

I am using MeshService

apiVersion: kuma.io/v1alpha1
kind: MeshHealthCheck
metadata:
  name: web-to-backend-check
  namespace: kuma-demo
  labels:
    kuma.io/mesh: default
spec:
  targetRef:
    kind: Dataplane
    labels:
      app: web
  to:
  - targetRef:
      kind: MeshService
      name: backend_kuma-demo_svc_3001
    default:
      interval: 15s
      timeout: 5s
      unhealthyThreshold: 3
      healthyThreshold: 2
      grpc:
        serviceName: grpc.health.v1.CustomHealth

apiVersion: kuma.io/v1alpha1
kind: MeshHealthCheck
metadata:
  name: web-to-backend-check
  namespace: kuma-demo
  labels:
    kuma.io/mesh: default
spec:
  targetRef:
    kind: Dataplane
    labels:
      app: web
  to:
  - targetRef:
      kind: MeshService
      name: backend
      namespace: kuma-demo
      sectionName: http
    default:
      interval: 15s
      timeout: 5s
      unhealthyThreshold: 3
      healthyThreshold: 2
      grpc:
        serviceName: grpc.health.v1.CustomHealth

I am using MeshService

type: MeshHealthCheck
name: web-to-backend-check
mesh: default
spec:
  targetRef:
    kind: Dataplane
    labels:
      app: web
  to:
  - targetRef:
      kind: MeshService
      name: backend
    default:
      interval: 15s
      timeout: 5s
      unhealthyThreshold: 3
      healthyThreshold: 2
      grpc:
        serviceName: grpc.health.v1.CustomHealth

type: MeshHealthCheck
name: web-to-backend-check
mesh: default
spec:
  targetRef:
    kind: Dataplane
    labels:
      app: web
  to:
  - targetRef:
      kind: MeshService
      name: backend
      sectionName: http
    default:
      interval: 15s
      timeout: 5s
      unhealthyThreshold: 3
      healthyThreshold: 2
      grpc:
        serviceName: grpc.health.v1.CustomHealth

Common configuration

interval - (optional) interval between consecutive health checks, if not specified then equal to 1m
timeout - (optional) maximum time to wait for a health check response, if not specified then equal to 15s
unhealthyThreshold - (optional) number of consecutive unhealthy checks before considering a host unhealthy, if not specified then equal to 5
healthyThreshold - (optional) number of consecutive healthy checks before considering a host healthy, if not specified then equal to 1
initialJitter - (optional) if specified, Envoy will start health checking after a random time in milliseconds between 0 and initialJitter. This only applies to the first health check
intervalJitter - (optional) if specified, during every interval Envoy will add intervalJitter to the wait time
intervalJitterPercent - (optional) if specified, during every interval Envoy will add intervalJitter * intervalJitterPercent / 100 to the wait time. If intervalJitter and intervalJitterPercent are both set, both of them will be used to increase the wait time.
healthyPanicThreshold - (optional) allows to configure panic threshold for Envoy clusters. If not specified, the default is 50%. To disable panic mode, set to 0%. ⚠️This is deprecated from version 2.10.x and has been moved to MeshCircuitBreaker.⚠️
failTrafficOnPanic - (optional) if set to true, Envoy will not consider any hosts when the cluster is in ‘panic mode’. Instead, the cluster will fail all requests as if all hosts are unhealthy. This can help avoid potentially overwhelming a failing service.
noTrafficInterval - (optional) a special health check interval that is used when a cluster has never had traffic routed to it. This lower interval allows cluster information to be kept up to date, without sending a potentially large amount of active health checking traffic for no reason. Once a cluster has been used for traffic routing, Envoy will shift back to using the standard health check interval that is defined. Note that this interval takes precedence over any other. The default value for “no traffic interval” is 60 seconds.
eventLogPath - (optional) specifies the path to the file where Envoy can log health check events. If empty, no event log will be written.
alwaysLogHealthCheckFailures - (optional) if set to true, health check failure events will always be logged. If set to false, only the initial health check failure event will be logged. The default value is false.
reuseConnection - (optional) reuse health check connection between health checks. Default is true.

Protocol specific configuration

HTTP

HTTP health checks are executed using HTTP2

disabled - (optional) - if true HTTP health check is disabled
path - (optional) HTTP path to be used during the health checks, if not specified then equal to “/”
expectedStatuses (optional) - list of HTTP response statuses which are considered healthy
- only statuses in the range [100, 600) are allowed
- by default, when this property is not provided only responses with status code 200 are being considered healthy
requestHeadersToAdd (optional) - HeaderModifier list of HTTP headers which should be added to each health check request

HeaderModifier

set - (optional) - list of headers to set. Overrides value if the header exists.
- name - header’s name
- value - header’s value
add - (optional) - list of headers to add. Appends value if the header exists.
- name - header’s name
- value - header’s value

TCP

disabled - (optional) - if true TCP health check is disabled
send - (optional) - Base64 encoded content of the message which should be sent during the health checks
receive - (optional) - list of Base64 encoded blocks of strings which should be found in the returning message which should be considered as healthy
- when checking the response, “fuzzy” matching is performed such that each block must be found, and in the order specified, but not necessarily contiguous;
- if receive section won’t be provided or will be empty, checks will be performed as “connect only” and will be marked as successful when TCP connection will be successfully established.

gRPC

disabled - (optional) - if true gRPC health check is disabled
serviceName - (optional) - service name parameter which will be sent to gRPC service
authority - (optional) - value of the :authority header in the gRPC health check request, by default name of the cluster this health check is associated with

All policy options

TargetRef is a reference to the resource the policy takes an effect on. The resource could be either...

Kind of the referenced resource

Values: Mesh | Dataplane

Name of the referenced resource. Can only be used with kinds: `MeshService`, `MeshServiceSubset` and...

Namespace specifies the namespace of target resource. If empty only resources in policy namespace wi...

Labels are used to select group of MeshServices that match labels. Either Labels or Name and Namespa...

SectionName is used to target specific section of resource. For example, you can target port from Me...

To list makes a match between the consumed services and corresponding configurations

TargetRef is a reference to the resource that represents a group of destinations.

Kind of the referenced resource

Values: Mesh | MeshService | MeshMultiZoneService

Name of the referenced resource. Can only be used with kinds: `MeshService`, `MeshServiceSubset` and...

Namespace specifies the namespace of target resource. If empty only resources in policy namespace wi...

Labels are used to select group of MeshServices that match labels. Either Labels or Name and Namespa...

SectionName is used to target specific section of resource. For example, you can target port from Me...

Default is a configuration specific to the group of destinations referenced in 'targetRef'

If set to true, health check failure events will always be logged. If set to false, only the initial...

Specifies the path to the file where Envoy can log health check events. If empty, no event log will ...

If set to true, Envoy will not consider any hosts when the cluster is in 'panic mode'. Instead, the ...

GrpcHealthCheck defines gRPC configuration which will instruct the service the health check will be ...

The value of the :authority header in the gRPC health check request, by default name of the cluster ...

If true the GrpcHealthCheck is disabled

Service name parameter which will be sent to gRPC service

Allows to configure panic threshold for Envoy cluster. If not specified, the default is 50%. To disa...

Number of consecutive healthy checks before considering a host healthy. If not specified then the de...

HttpHealthCheck defines HTTP configuration which will instruct the service the health check will be ...

If true the HttpHealthCheck is disabled

List of HTTP response statuses which are considered healthy

The HTTP path which will be requested during the health check (ie. /health) If not specified then th...

The list of HTTP headers which should be added to each health check request

If specified, Envoy will start health checking after a random time in ms between 0 and initialJitter...

Interval between consecutive health checks. If not specified then the default value is 1m

If specified, during every interval Envoy will add IntervalJitter to the wait time.

If specified, during every interval Envoy will add IntervalJitter * IntervalJitterPercent / 100 to t...

The "no traffic interval" is a special health check interval that is used when a cluster has never h...

Reuse health check connection between health checks. Default is true.

TcpHealthCheck defines configuration for specifying bytes to send and expected response during the h...

If true the TcpHealthCheck is disabled

List of Base64 encoded blocks of strings expected as a response. When checking the response, "fuzzy"...

Base64 encoded content of the message which will be sent during the health check to the target

Maximum time to wait for a health check response. If not specified then the default value is 15s

Number of consecutive unhealthy checks before considering a host unhealthy. If not specified then th...

Mesh Health Check

TargetRef support matrix

Configuration

Protocol selection

Examples

Health check from web to backend service

Protocol fallback

gRPC health check from cart to payment service

Common configuration

Protocol specific configuration

HTTP

HeaderModifier

TCP

gRPC

See also

All policy options