Migration to the new policies

Kuma provides two set of policies to configure proxies. The original source/destination policies, while provided a lot of features, haven’t met users expectations in terms of flexibility and transparency. The new targetRef policies were designed to preserve what already worked well, and enhance the matching functionality and overall UX.

In this guide, we’re going to setup a demo with old policies and then perform a migration to the new policies.

Prerequisites

  • Helm - a package manager for Kubernetes
  • Kind - a tool for running local Kubernetes clusters
  • jq - a command-line JSON processor
  • jd - a command-line util to visualise JSONPatch

Start Kubernetes cluster

Start a new Kubernetes cluster on your local machine by executing:

kind create cluster --name=mesh-zone

You can skip this step if you already have a Kubernetes cluster running. It can be a cluster running locally or in a public cloud like AWS EKS, GCP GKE, etc.

Install Kuma

Install Kuma control plane with skipMeshCreation set to true by executing:

helm repo add kuma https://kumahq.github.io/charts
helm repo update
helm install --create-namespace --namespace kuma-system kuma kuma/kuma --set "controlPlane.defaults.skipMeshCreation=true"

Make sure the list of meshes is empty:

kubectl get meshes

Expected output:

No resources found

Setup demo with old policies

In the first half of this guide we’re going to deploy a demo app in the default mesh and configure it using old policies.

Create default mesh

echo 'apiVersion: kuma.io/v1alpha1
kind: Mesh
metadata:
  name: default
spec:
  # for the purpose of this guide we want to setup mesh with old policies first,
  # that is why we are skipping the default policies creation
  skipCreatingInitialPolicies: ["*"] ' | kubectl apply -f-

Deploy demo application

  1. Deploy the application
    kubectl apply -f https://raw.githubusercontent.com/kumahq/kuma-counter-demo/master/demo.yaml
    kubectl wait -n kuma-demo --for=condition=ready pod --selector=app=demo-app --timeout=90s
    
  2. Port-forward the service to the namespace on port 5000:

    kubectl port-forward svc/demo-app -n kuma-demo 5000:5000
    
  3. In a browser, go to 127.0.0.1:5000 and increment the counter.

Enable Mutual TLS and Traffic Permissions

echo 'apiVersion: kuma.io/v1alpha1
kind: Mesh
metadata:
  name: default
spec:
  skipCreatingInitialPolicies: ["*"]
  mtls:
    enabledBackend: ca-1
    backends:
      - name: ca-1
        type: builtin' | kubectl apply -f-
echo 'apiVersion: kuma.io/v1alpha1
kind: TrafficPermission
mesh: default
metadata:
  name: app-to-redis
spec:
  sources:
    - match:
        kuma.io/service: demo-app_kuma-demo_svc_5000
  destinations:
    - match:
        kuma.io/service: redis_kuma-demo_svc_6379' | kubectl apply -f -

Deploy TrafficRoute

echo 'apiVersion: kuma.io/v1alpha1
kind: TrafficRoute
mesh: default
metadata:
  name: route-all-default
spec:
  sources:
    - match:
        kuma.io/service: "*"
  destinations:
    - match:
        kuma.io/service: "*"
  conf:
    destination:
      kuma.io/service: "*"' | kubectl apply -f-

Deploy Timeouts

echo 'apiVersion: kuma.io/v1alpha1
kind: Timeout
mesh: default
metadata:
  name: timeout-global
spec:
  sources:
    - match:
        kuma.io/service: "*"
  destinations:
    - match:
        kuma.io/service: "*"
  conf:
    connectTimeout: 21s
    tcp: 
      idleTimeout: 22s
    http:
      idleTimeout: 22s
      requestTimeout: 23s
      streamIdleTimeout: 25s
      maxStreamDuration: 26s' | kubectl apply -f-

Deploy CircuitBreaker

echo 'apiVersion: kuma.io/v1alpha1
kind: CircuitBreaker
mesh: default
metadata:
  name: cb-global
spec:
  sources:
  - match:
      kuma.io/service: "*"
  destinations:
  - match:
      kuma.io/service: "*"
  conf:
    interval: 21s
    baseEjectionTime: 22s
    maxEjectionPercent: 23
    splitExternalAndLocalErrors: false
    thresholds:
      maxConnections: 24
      maxPendingRequests: 25
      maxRequests: 26
      maxRetries: 27
    detectors:
      totalErrors: 
        consecutive: 28
      gatewayErrors: 
        consecutive: 29
      localErrors: 
        consecutive: 30
      standardDeviation:
        requestVolume: 31
        minimumHosts: 32
        factor: 1.33
      failure:
        requestVolume: 34
        minimumHosts: 35
        threshold: 36' | kubectl apply -f-

Migration to the new policies

It’s time to migrate the demo app to the new policies.

Each type of policy can be migrated separately; for example, once we have completely finished with the Timeouts, we will proceed to the next policy type, CircuitBreakers. It’s possible to migrate all policies at once, but small portions are preferable as they’re easily reversible.

The generalized migration process roughly consists of 4 steps:

  1. Create a new targetRef policy as a replacement for exising source/destination policy. The corresponding new policy type can be found in the table. Deploy the policy in shadow mode to avoid any traffic disruptions.
  2. Using Inspect API review the list of changes that are going to be created by the new policy.
  3. Remove kuma.io/effect: shadow label so that policy is applied in a normal mode.
  4. Observe metrics, traces and logs. If something goes wrong change policy’s mode back to shadow and return to the step 2. If everything is fine then remove the old policies.

The order of migrating policies generally doesn’t matter, except for the TrafficRoute policy, which should be the last one deleted when removing old policies. This is because many old policies, like Timeout and CircuitBreaker, depend on TrafficRoutes to function correctly.

TrafficPermission -> MeshTrafficPermission

  1. Create a replacement policy for app-to-redis TrafficPermission and apply it with kuma.io/effect: shadow label:

    echo 'apiVersion: kuma.io/v1alpha1
    kind: MeshTrafficPermission
    metadata:
      namespace: kuma-system
      name: app-to-redis
      labels:
        kuma.io/mesh: default
        kuma.io/effect: shadow
    spec:
      targetRef:
        kind: MeshService
        name: redis_kuma-demo_svc_6379
      from:
        - targetRef:
            kind: MeshSubset
            tags: 
              kuma.io/service: demo-app_kuma-demo_svc_5000
          default:
            action: Allow' | kubectl apply -f -
    
  2. Check the list of changes for redis_kuma-demo_svc_6379 pod in Envoy configuration using kumactl, jq and jd:

    kumactl inspect dataplane redis-8fcbfc795-twlst.kuma-demo --type=config --shadow --include=diff | jq '.diff' | jd -t patch2jd
    

    Expected output:

    @ ["type.googleapis.com/envoy.config.listener.v3.Listener","inbound:10.42.0.13:6379","filterChains","0","filters","0","typedConfig","rules","policies","allow-all-default"]
    - {"permissions":[{"any":true}],"principals":[{"authenticated":{"principalName":{"exact":"spiffe://default/demo-app_kuma-demo_svc_5000"}}}]}
    @ ["type.googleapis.com/envoy.config.listener.v3.Listener","inbound:10.42.0.13:6379","filterChains","0","filters","0","typedConfig","rules","policies","MeshTrafficPermission"]
    + {"permissions":[{"any":true}],"principals":[{"authenticated":{"principalName":{"exact":"spiffe://default/demo-app_kuma-demo_svc_5000"}}}]}
    

    As we can see, the only difference is the policy name “MeshTrafficPermission” instead of “allow-all-default”. The value of the policy is the same.

  3. Remove the kuma.io/effect: shadow label:

    echo 'apiVersion: kuma.io/v1alpha1
    kind: MeshTrafficPermission
    metadata:
      namespace: kuma-system
      name: app-to-redis
      labels:
        kuma.io/mesh: default
    spec:
      targetRef:
        kind: MeshService
        name: redis_kuma-demo_svc_6379
      from:
        - targetRef:
            kind: MeshSubset
            tags: 
              kuma.io/service: demo-app_kuma-demo_svc_5000
          default:
            action: Allow' | kubectl apply -f -
    

    Even though the old TrafficPermission and the new MeshTrafficPermission are both in use, the new policy takes precedence, making the old one ineffective.

  4. Observe the demo app behaves as expected. If everything goes well, we can safely remove TrafficPermission and conclude the migration.

Timeout -> MeshTimeout

  1. Create a replacement policy for timeout-global Timeout and apply it with kuma.io/effect: shadow label:

    echo 'apiVersion: kuma.io/v1alpha1
    kind: MeshTimeout
    metadata:
      namespace: kuma-system
      name: timeout-global
      labels:
        kuma.io/mesh: default
        kuma.io/effect: shadow
    spec:
      targetRef:
        kind: Mesh
      to:
      - targetRef:
          kind: Mesh
        default:
          connectionTimeout: 21s
          idleTimeout: 22s
          http:
            requestTimeout: 23s
            streamIdleTimeout: 25s
            maxStreamDuration: 26s
      from:
      - targetRef:
          kind: Mesh
        default:
          connectionTimeout: 10s
          idleTimeout: 2h
          http:
            requestTimeout: 0s
            streamIdleTimeout: 2h' | kubectl apply -f-
    
  2. Check the list of changes for redis_kuma-demo_svc_6379 pod in Envoy configuration using kumactl, jq and jd:

    kumactl inspect dataplane redis-8fcbfc795-twlst.kuma-demo --type=config --shadow --include=diff | jq '.diff' | jd -t patch2jd
    

    Expected output:

    @ ["type.googleapis.com/envoy.config.cluster.v3.Cluster","demo-app_kuma-demo_svc_5000","typedExtensionProtocolOptions","envoy.extensions.upstreams.http.v3.HttpProtocolOptions","commonHttpProtocolOptions","maxConnectionDuration"]
    + "0s"
    @ ["type.googleapis.com/envoy.config.listener.v3.Listener","outbound:10.43.146.6:5000","filterChains","0","filters","0","typedConfig","commonHttpProtocolOptions","idleTimeout"]
    - "22s"
    @ ["type.googleapis.com/envoy.config.listener.v3.Listener","outbound:10.43.146.6:5000","filterChains","0","filters","0","typedConfig","commonHttpProtocolOptions","idleTimeout"]
    + "0s"
    @ ["type.googleapis.com/envoy.config.listener.v3.Listener","outbound:10.43.146.6:5000","filterChains","0","filters","0","typedConfig","routeConfig","virtualHosts","0","routes","0","route","idleTimeout"]
    + "25s"
    @ ["type.googleapis.com/envoy.config.listener.v3.Listener","outbound:10.43.146.6:5000","filterChains","0","filters","0","typedConfig","requestHeadersTimeout"]
    + "0s"
    

    Review the list and ensure the new MeshTimeout policy won’t change the important settings. The key differences between old and new timeout policies:

    • Previously, there was no way to specify requestHeadersTimeout, maxConnectionDuration and maxStreamDuration (on inbound). These timeouts were unset. With the new MeshTimeout policy we explicitly set them to 0s by default.
    • idleTimeout was configured both on the cluster and listener. MeshTimeout configures it only on the cluster.
    • route/idleTimeout is duplicated value of streamIdleTimeout but per-route. Previously we’ve set it only per-listener.

    These 3 facts perfectly explain the list of changes we’re observing.

  3. Remove the kuma.io/effect: shadow label. Even though the old Timeout and the new MeshTimeout are both in use, the new policy takes precedence, making the old one ineffective.

  4. Observe the demo app behaves as expected. If everything goes well, we can safely remove Timeout and conclude the migration.

CircuitBreaker -> MeshCircuitBreaker

  1. Create a replacement policy for cb-global CircutBreaker and apply it with kuma.io/effect: shadow label:

    echo 'apiVersion: kuma.io/v1alpha1
    kind: MeshCircuitBreaker
    metadata:
      namespace: kuma-system
      name: cb-global
      labels:
        kuma.io/mesh: default
        kuma.io/effect: shadow
    spec:
      targetRef:
        kind: Mesh
      to:
      - targetRef:
          kind: Mesh
        default:
          connectionLimits:
            maxConnections: 24
            maxPendingRequests: 25
            maxRequests: 26
            maxRetries: 27
          outlierDetection:
            interval: 21s
            baseEjectionTime: 22s
            maxEjectionPercent: 23
            splitExternalAndLocalErrors: false
            detectors:
              totalFailures:
                consecutive: 28
              gatewayFailures:
                consecutive: 29
              localOriginFailures:
                consecutive: 30
              successRate:
                requestVolume: 31
                minimumHosts: 32
                standardDeviationFactor: "1.33"
              failurePercentage:
                requestVolume: 34
                minimumHosts: 35
                threshold: 36' | kubectl apply -f-
    
  2. Check the list of changes for redis_kuma-demo_svc_6379 pod in Envoy configuration using kumactl, jq and jd:

    kumactl inspect dataplane demo-app-b4f98898-zxrqj.kuma-demo --type=config --shadow --include=diff | jq '.diff' | jd -t patch2jd
    

    The expected output is empty. CircuitBreaker and MeshCircuitBreaker configures Envoy in the exact similar way.

  3. Remove the kuma.io/effect: shadow label. Even though the old CircuitBreaker and the new MeshCircuitBreaker are both in use, the new policy takes precedence, making the old one ineffective.

  4. Observe the demo app behaves as expected. If everything goes well, we can safely remove CircuitBreaker and conclude the migration.

TrafficRoute -> MeshTCPRoute

It’s safe to simply remove route-all-default TrafficRoute. Traffic will flow through the system even if there are neither TrafficRoutes nor MeshTCPRoutes/MeshHTTPRoutes.

Next steps