Secure Nomad jobs with Consul service mesh

12min
|
Nomad
Consul

Nomad's first-class integration with Consul allows operators to design jobs that natively leverage Consul service mesh. However in Consul Clusters that are ACL-enabled there are a few extra steps required to verify that your Nomad servers and clients have Consul ACL tokens with sufficient privileges to create additional services for the required sidecar proxies. This tutorial explores those steps, has you run a sample Connect workload, learn about the allow_unauthenticated value; so that you will be able to configure your own Nomad cluster to run Connect jobs against your own ACL-Enabled consul cluster.

Prerequisites

Nomad v0.10.4 or greater
a Nomad environment with Nomad and Consul installed. You can use this Terraform environment to provision a sandbox environment. This guide will assume a cluster with one node running both Consul and Nomad in server mode. And one or more nodes running Nomad and Consul in client mode.
You will need
- to have the Consul cluster you are connecting to ACL enabled and bootstrapped
- a management key.
You can use the "Secure Consul with ACLs" tutorial to configure a Consul cluster for this guide.

If your Consul cluster is TLS-enabled for agent communication and you are using Nomad version 0.10, you will need to provide some Consul configuration as environment variables for your Nomad process. This can be done by modifying your init scripts or systemd system units. This will be discussed later in the guide.

Note

This tutorial is for demo purposes and is only using a single Nomad server with a Consul server configured alongside it. In a production cluster, 3 or 5 Nomad server nodes are recommended along with a separate Consul cluster. Consult the Consul Reference Architecture to learn how to securely deploy a Vault cluster.

Generate Consul ACL tokens for Nomad

Create a Nomad server policy

Define the Nomad server policy by making a file named nomad-server-policy.hcl with this content.

agent_prefix "" {
  policy = "read"
}

node_prefix "" {
  policy = "read"
}

service_prefix "" {
  policy = "write"
}

acl = "write"

Create the Nomad server policy by uploading this file.

$ consul acl policy create \
  -name "nomad-server" \
  -description "Nomad Server Policy" \
  -rules @nomad-server-policy.hcl

The command outputs information about the newly created policy and its rules.

$ consul acl policy create \
>   -name "nomad-server" \
>   -description "Nomad Server Policy" \
>   -rules @nomad-server-policy.hcl
ID:           4ca519e1-d480-5fd2-160e-8a84cc22eefa
Name:         nomad-server
Description:  Nomad Server Policy
Datacenters:
Rules:
agent_prefix "" {
  policy = "read"
}

node_prefix "" {
  policy = "read"
}

service_prefix "" {
  policy = "write"
}

acl = "write"

Create a Nomad client policy

Define the Nomad client policy by making a file named nomad-client-policy.hcl with this content.

agent_prefix "" {
  policy = "read"
}

node_prefix "" {
  policy = "read"
}

service_prefix "" {
  policy = "write"
}

# uncomment if using Consul KV with Consul Template
# key_prefix "" {
#   policy = "read"
# }

Create the Nomad client policy by uploading this file.

$ consul acl policy create \
  -name "nomad-client" \
  -description "Nomad Client Policy" \
  -rules @nomad-client-policy.hcl

The command outputs information about the newly created policy and its rules.

$ touch nomad-client-policy.hcl
$ consul acl policy create \
>   -name "nomad-client" \
>   -description "Nomad Client Policy" \
>   -rules @nomad-client-policy.hcl
ID:           b093d1c5-a800-7973-4d73-6c7ac2c8ec01
Name:         nomad-client
Description:  Nomad Client Policy
Datacenters:
Rules:
agent_prefix "" {
  policy = "read"
}

node_prefix "" {
  policy = "read"
}

service_prefix "" {
  policy = "write"
}

# uncomment if using Consul KV with Consul Template
# key_prefix "" {
#   policy = read
# }

Create a token for Nomad

Generate a token associated with these policies and save it to a file named nomad-agent.token. Because this tutorial is written for a node that is both a client and a server, apply both policies to the token. Typically, you would generate tokens with the nomad-server role for your Nomad server nodes and tokens with the nomad-client role for your Nomad client nodes.

Consider applying roles instead of rotating tokens

If your Nomad node already has a token, it is better to add the required capabilities to the existing token or roles rather than changing to a new token.

$ consul acl token create \
  -description "Nomad Demo Agent Token" \
  -policy-name "nomad-server" \
  -policy-name "nomad-client" | tee nomad-agent.token

The command will return a new Consul token for use in your Nomad configuration.

$ consul acl token create \
>   -description "Nomad Demo Agent Token" \
>   -policy-name "nomad-server" \
>   -policy-name "nomad-client" | tee nomad-agent.token
AccessorID:       98eb9a6a-5823-6138-93b4-bf9958e6d16c
SecretID:         4ca3820e-1bc4-2980-94ef-e6a421eddd7d
Description:      Nomad Demo Agent Token
Local:            false
Create Time:      2020-03-31 19:04:03.734810397 +0000 UTC
Policies:
   4ca519e1-d480-5fd2-160e-8a84cc22eefa - nomad-server
   b093d1c5-a800-7973-4d73-6c7ac2c8ec01 - nomad-client

Update Nomad's Consul configuration

Open the your Nomad configuration file on all of your nodes and add a consul stanza with your token.

consul {
  token = "«your nomad agent token»"
}

Provide environment variables for TLS-enabled Consul

If you are using Nomad version 0.10 and your Consul cluster is TLS-enabled, you will need to provide additional Consul configurations as environment variables to the Nomad process. This is to work around a known issue in Nomad—hashicorp/nomad#6594. Refer to the TLS-enabled Consul environment section in the "Advanced considerations" of this tutorial for details. You will be able to return to here after you read that material.

Alternative architectures (non-x86/amd64)

If you are running on ARM or another non-x86/amd64 architecture, jump to the Alternative Architectures section in the "Advanced Considerations" appendix of this tutorial for details. You will be able to return to here after you read that material.

Restart Nomad to load new configuration

Run systemctl restart nomad to restart Nomad to load these changes.

Run a Connect-enabled job

Create the job specification

Create the "countdash" job by copying this job specification into a file named countdash.nomad.hcl.

job "countdash" {
   datacenters = ["dc1"]
   group "api" {
     network {
       mode = "bridge"
     }

     service {
       name = "count-api"
       port = "9001"

       connect {
         sidecar_service {}
       }
     }

     task "web" {
       driver = "docker"
       config {
         image = "hashicorpnomad/counter-api:v1"
       }
     }
   }

   group "dashboard" {
     network {
       mode ="bridge"
       port "http" {
         static = 9002
         to     = 9002
       }
     }

     service {
       name = "count-dashboard"
       port = "9002"

       connect {
         sidecar_service {
           proxy {
             upstreams {
               destination_name = "count-api"
               local_bind_port = 8080
             }
           }
         }
       }
     }

     task "dashboard" {
       driver = "docker"
       env {
         COUNTING_SERVICE_URL = "http://${NOMAD_UPSTREAM_ADDR_count_api}"
       }
       config {
         image = "hashicorpnomad/counter-dashboard:v1"
       }
     }
   }
 }

Create an intention

In Consul, the default intention behavior is defined by the default ACL policy. If the default ACL policy is "allow all", then all service mesh connections are allowed by default. If the default ACL policy is "deny all", then all service mesh connections are denied by default.

To avoid unexpected behavior around this, it is better to create an explicit intention. Create an intention to allow traffic from the count-dashboard service to the count-api service.

First, create a file for a config entry definition named intention-config.hcl.

intention-config.hcl

Kind = "service-intentions"
Name = "count-api"
Sources = [
  {
    Name   = "count-dashboard"
    Action = "allow"
  }
]

From the same directory where you saved this file, run the following command on your terminal to initialize these intention rules.

$ consul config write intention-config.hcl
Config entry written: service-intentions/count-api

Note To learn more about intentions, feel free to check out the Service Intentions Docs.

Run the job

Run the job by calling nomad run countdash.nomad.hcl.

The command will output the result of running the job and show the allocation IDs of the two new allocations that are created.

$ nomad run countdash.nomad.hcl
==> Monitoring evaluation "3e7ebb57"
    Evaluation triggered by job "countdash"
    Evaluation within deployment: "9eaf6878"
    Allocation "012eb94f" created: node "c0e8c600", group "api"
    Allocation "02c3a696" created: node "c0e8c600", group "dashboard"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "3e7ebb57" finished with status "complete"

Once you are done, run nomad stop countdash to prepare for the next step.

The command will output evaluation information about the stop request and stop the allocations in the background.

$ nomad stop countdash
==> Monitoring evaluation "d4796df1"
    Evaluation triggered by job "countdash"
    Evaluation within deployment: "18b25bb6"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "d4796df1" finished with status "complete"

Use Consul authentication on jobs

By default, Nomad does not require that an operator validate themselves and will create ACL permissions at any level the Nomad server token can. In some scenarios, this can allow an operator to escalate their privileges to that of Nomad server.

To prevent this, you can set the allow_unauthenticated option to false.

Update Nomad configuration

Open the your Nomad configuration file on all of your nodes and add the allow_unauthenticated value inside of the consul configuration block.

consul {
  # ...
  allow_unauthenticated = false
}

Run the systemctl restart nomad command to restart Nomad to load these changes.

Submit the job with a Consul token

Start by unsetting the Consul token in your shell session.

$ unset CONSUL_HTTP_TOKEN

Now, try running countdash.nomad.hcl again. You will receive an error explaining that you need to supply a Consul token.

$ nomad run countdash.nomad.hcl

You will receive an error that indicates that you need to supply an Consul ACL token in order to run the job.

$ nomad run countdash.nomad.hcl
Error submitting job: Unexpected response code: 500 (operator token denied: missing consul token)

Nomad will not allow you to submit a job to the cluster without providing a Consul token that has write access to the Consul service that the job defines.

You can supply the token in a few ways:

CONSUL_HTTP_TOKEN environment variable
-consul-token flag on the command line
-X-Consul-Token header on API calls

Reload your management token into the CONSUL_HTTP_TOKEN environment variable.

$ export CONSUL_HTTP_TOKEN=$(awk '/SecretID/ {print $2}' consul.bootstrap)

Now, try running countdash.nomad.hcl again. This time it will succeed.

$ nomad run countdash.nomad.hcl

Advanced considerations

Alternative architectures

Nomad provides a default link to a pause image. This image, however, is architecture specific and is only provided for the amd64 architecture. In order to use Consul service mesh on non-x86/amd64 hardware, you will need to configure Nomad to use a different pause container. If Nomad is trying to use a version of Envoy earlier than 1.16, you will need to specify a different version it as well. Read through the section on airgapped networks below. It explains the same configuration elements that you will need to set to use alternative containers for service mesh.

Special thanks to @GusPS, who reported this working configuration.

Envoy 1.16 now has ARM64 support. Configure it as your sidecar image by setting this the connect.sidecar_image meta variable on each of your ARM64 clients.

meta {
    "connect.sidecar_image" = "envoyproxy/envoy:v1.16.0"
  }

The rancher/pause container has versions for several different architectures as well. Override the default pause container and use it instead. In your client configuration, add an infra_image to your docker plugin configuration overriding the default with the rancher version.

plugin "docker" {
  config {
    infra_image = "rancher/pause:3.2"
  }
}

If you came here from "Alternative Architectures" note above, return there now.

Airgapped networks or proxied environments

If you are in an airgapped network or need to access Docker Hub via a proxy, you will have to perform some additional configuration on your Nomad clients to enable Nomad's Consul service mesh integration.

Set the "infra_image" path

Set the infra_image configuration option for the Docker driver plugin on your Nomad clients to a path that is accessible in your environment. For example,

plugin "docker" {
  config {
    infra_image = "dockerhub.myproxy.com:8080/google_containers/pause-amd64:3.0"
  }
}

Changing this value will require a restart of Nomad.

Set the "sidecar_image" path

You will also need the Envoy proxy image used for Consul service mesh networking. Configure the "meta.connect.sidecar_image" on your Nomad clients to override the default container path by adding a "connect.sidecar_image" value to the client.meta stanza of your Nomad client configuration. If you do not have a meta stanza inside of your top-level client stanza, add one as follows.

client {
# ...
  meta {
    # Set this value to a proxy or internal registry that can provide an
    # appropriate envoy image.
    "connect.sidecar_image" = "dockerhub.myproxy.com:8080/envoyproxy/envoy:v1.11.2@sha256:a7769160c9c1a55bb8d07a3b71ce5d64f72b1f665f10d81aa1581bc3cf850d09"
  }
# ...
}

Changing this value will require a restart of Nomad.

Next steps

Now that you have completed this guide, you have:

configured your cluster with a Consul token
run a non-validated job using the Nomad server's Consul token
run a validated job using the permissions of a user-provided Consul token

Now that you have run both a non-validated and as a user-token validated job, which is right for your environment? All of these steps can be done using the Nomad API directly, which path might you use for your use case?

Reference material

Learn more about Consul service mesh in these Learn guides:

Secure Service-to-Service Communication
[Consul Service Mesh in Production]

Learn more about Consul ACLs:

Manage ACL Policies

Study Nomad's Consul configuration:

consul stanza

Consul ACL with Nomad WI

Consul in production