Getting started with Nomad – Part 1

February 28, 2016

Nomad is a distributed, highly available, datacenter-aware scheduler.

If you’re not familiar with application schedulers, I encourage you to watch the very accessible talk by @armon at Ricon 2015: Nomad: A Distributed, Optimistically Concurrent Scheduler.

Slides

The official getting started guide does a good job of helping you set up Nomad locally. I thought it could benefit from a concrete example, hence this post.

Note: remember that Nomad is still a very young project and you should probably wait a little bit before running it in a production environment.

Our action plan:

Write a simple http service and install system wide.
Write a nomad job specification to run our http service.
Update our http service to read environment variables configured by Nomad.

1. Writing a simple HTTP service

Feel free to choose any technology you are familiar with to write this simple http service. I’ve chosen Go because it makes it easy to build self contained binaries. Building self contained binaries isn’t strictly necessary but it does help keeping our job specification simple.

Note: if you plan to write this in Java, I’d suggest building a uberjar to not have to worry about the classpath. You will also have to update your job specification to use the Java driver.

Here is an example of a simple http service in Go:

// main.go
package main

import (
    "fmt"
    "log"
    "net/http"
)

func endpoint(w http.ResponseWriter, r *http.Request) {
    fmt.Fprintf(w, "%s\n", "hello world")
}

func main() {
    addr := ":8888"
    http.HandleFunc("/", endpoint)
    log.Println("Listening on:", addr)
    err := http.ListenAndServe(addr, nil)
    if err != nil {
        log.Fatal(err)
    }
}

Running go build -o nomad-example from the directory will generate an executable. You could also run go install if you want this to be available system wide.

Let’s make note of where the executable is located:

$ pwd
/Users/tim/codebin/go/src/github.com/pims/nomad-example/

It wouldn’t hurt to test that your HTTP service is running when executing your binary:

$ /Users/tim/codebin/go/src/github.com/pims/nomad-example/nomad-example
2016/02/28 14:05:23 Listening on: :8888

2. Writing a Nomad job specification

Before we start write our nomad job specification, let’s keep in mind that it is only intended to run locally to help familiarize ourselves with Nomad. You should not run this in a production environment.

The official getting started guide provides a reference job specification which might look a little overwhelming at first. We’ll strip it down to its bare minimum to start.

Note: I assume that you are running Nomad v0.3.0, since a few things have changed from the v0.2.3 release. If you’re on OS X, brew install nomad will install v0.3.0.

Let’s create a example.nomad file with the following job spec.

# Define a job called nomad-example
job "nomad-example" {

    region = "us"
    datacenters = ["dc1"]
    type = "service"

    group "webservice" {
        count = 1

        task "app" {
            driver = "raw_exec"
            config {
                command = "/Users/tim/codebin/go/src/github.com/pims/nomad-example/nomad-example"
            }

            resources {
                network {
                    mbits = 1
                }
            }
        }
    }
}

Important: You should be extremely careful when using the raw_exec driver. Since the regular exec driver is only available on Linux and I’m running nomad on OS X, we have to use the raw_exec driver.

At this point, it is safe to ignore everything but the task "app" {} part. The driver key tells Nomad which driver to use to execute a task and provide resource isolation. The command key represents the path of the executable that Nomad should run. Our simple HTTP service doesn’t expect any arguments yet, so we’ll temporarily ignore how to specify arguments to run our executable.

As for the resource {} part, the network{mbits=1} is required since v0.3.0, so it needs to be present in our job spec.

By mostly ignoring the resource {} definition, we’re not making use of one of the core features of a scheduler: resource utilization. We’ll revisit this part later.

The raw_exec driver documentation states that:

The raw_exec driver is used to execute a command for a task without any isolation. Further, the task is started as the same user as the Nomad process. As such, it should be used with extreme care and is disabled by default.

Let’s enable it by creating a client.hcl configuration file with the following content:

client {
  options = {
    "driver.raw_exec.enable" = "1"
  }
}

Assuming you have installed nomad previously – brew update && brew install nomad – we can now start our nomad agent in development mode, which runs both nomad server and nomad agent. We’ll specify the config we created above to enable the raw_exec driver.

$ sudo nomad agent -dev -config client.hcl

    Loaded configuration from client.hcl
==> Starting Nomad agent...

With the server and agent running, we are now ready to submit our first job. Following the naming convention we’ve defined earlier, we can submit our first job by running:

$ nomad run example.nomad

==> Monitoring evaluation "39cc6db8"
    Evaluation triggered by job "nomad-example"
    Allocation "93c5cadb" created: node "7a8732b4", group "webservice"
    Allocation "93c5cadb" status changed: "pending" -> "running" ()
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "39cc6db8" finished with status "complete"

To make sure that everything is running properly, we can use the status command to see the status of our job.

$ nomad status nomad-example

ID          = nomad-example
Name        = nomad-example
Type        = service
Priority    = 50
Datacenters = dc1
Status      = running
Periodic    = false

==> Evaluations
ID        Priority  Triggered By  Status
39cc6db8  50        job-register  complete

==> Allocations
ID        Eval ID   Node ID   Task Group  Desired  Status
93c5cadb  39cc6db8  7a8732b4  webservice  run      running

The Status = running indicates that everything seems to be running properly. Let’s query our HTTP service to verify that it s indeed running.

$ curl http://localhost:8888/ -i

HTTP/1.1 200 OK
Date: Sun, 28 Feb 2016 23:57:45 GMT
Content-Length: 12
Content-Type: text/plain; charset=utf-8

hello world

Great, everything is running and we’re able to query our HTTP service. To stop our service, we need to tell nomad to stop running our service:

$ nomad stop nomad-example
==> Monitoring evaluation "4055f880"
    Evaluation triggered by job "nomad-example"
    Evaluation status changed: "pending" -> "complete"

We can double check that our service isn’t running anymore by attempting to connect to our HTTP service:

$ curl http://localhost:8888/ -i

curl: (7) Failed to connect to localhost port 8888: Connection refused

There are several legitimate reasons why we would like to run multiple instances of our http service on a single machine, and luckily for us, Nomad makes this pretty easy. In theory, we should only have to update our job spec and replace count = 1 by count = 3 and Nomad will attempt to launch two additional processes.

$ nomad run example.nomad

==> Monitoring evaluation "886b8b99"
    Evaluation triggered by job "nomad-example"
    Allocation "008c5c09" created: node "7a8732b4", group "webservice"
    Allocation "be224140" created: node "7a8732b4", group "webservice"
    Allocation "cb95db91" created: node "7a8732b4", group "webservice"
    Allocation "be224140" status changed: "pending" -> "running" ()
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "886b8b99" finished with status "complete"

Let’s check the status of our job:

$ nomad status nomad-example
ID          = nomad-example
Name        = nomad-example
Type        = service
Priority    = 50
Datacenters = dc1
Status      = running
Periodic    = false

==> Evaluations
ID        Priority  Triggered By    Status
d3c7389b  50        job-register    complete

==> Allocations
ID        Eval ID   Node ID   Task Group  Desired  Status
534d5d0f  d3c7389b  7a8732b4  webservice  run      pending
e2ae5fa8  d3c7389b  7a8732b4  webservice  run      running
ef741f99  d3c7389b  7a8732b4  webservice  run      pending

The Allocations output indicates that three processes are now allocated, but if you look carefully, you’ll notice that two of them are in a pending state. They will likely cycle between running and pending for a while. But why?

This is where things can get a little bit complicated. It is not obvious how to debug a task that isn’t running as we’d expect. To keep this intro as short as possible, we won’t cover how to debug the above issue, which is caused by attempting to bind multiple processes to the same hardcoded port.

If you look at our current http service implementation, we’ve specified that the port it should be listening on is :8888 and as such only one task can successfully run. Luckily, Nomad lets you assign dynamic ports to your application.

3. Update our HTTP service to read environment variables.

We need to update our job spec to tell Nomad to dynamically assign a host + port combo to our http service. Let’s replace our resources declaration with the following:

resources {
    network {
        mbits = 1
        port "http" {}
    }
}

Notice the new port "http" {} in the resources declaration. http is the name we give to the host + port variable that Nomad will generate for us. Even though you can change the name http, we’ll stick with it in this example as it describes what the http host + port will be used for.

The next step is to have our http service use this variable to configure itself. There are two ways of passing this info to our application:

program arguments
environment variables

To keep our job specification as minimal as possible, we are going to use the environment variables approach.

Nomad exposes the host + port as an environment variable named NOMAD_ADDR_http where http represents the name declared in our job specification.

With minimal changes, we now have a HTTP service which gets its address set by Nomad.

import (
 "os"
)

addr := os.Getenv("NOMAD_ADDR_http")

To test that our program is behaving as expected we can manually set the NOMAD_ADDR_http variable prior to starting our HTTP service:

$ NOMAD_ADDR_http=127.0.0.1:9999 ./nomad-example

Listening on: 127.0.0.1:9999

Let’s re-run our job specification, Nomad will intelligently decide to start or update our job.

$ nomad status nomad-example

ID          = nomad-example
Name        = nomad-example
Type        = service
Priority    = 50
Datacenters = dc1
Status      = running
Periodic    = false

==> Evaluations
ID        Priority  Triggered By    Status
682d4258  50        job-register    complete

==> Allocations
ID        Eval ID   Node ID   Task Group  Desired  Status
4ba66088  682d4258  7a8732b4  webservice  run      running
c7ece42a  682d4258  7a8732b4  webservice  run      running
68e5a98a  682d4258  7a8732b4  webservice  run      running

Success, we now have three instances of our HTTP service running through Nomad. But how do we connect to our HTTP service? One way to find out, is to list open ports and search for our nomad-example program:

$ sudo lsof -iTCP -sTCP:LISTEN -P -n

// details omitted for brevity
nomad-exa 77258 root    ...    TCP 127.0.0.1:54156 (LISTEN)
nomad-exa 77259 root    ...    TCP 127.0.0.1:45295 (LISTEN)
nomad-exa 77260 root    ...    TCP 127.0.0.1:47585 (LISTEN)

Let’s try connecting to one of them:

$ curl http://127.0.0.1:47585 -i

HTTP/1.1 200 OK
Date: Mon, 29 Feb 2016 01:41:45 GMT
Content-Length: 12
Content-Type: text/plain; charset=utf-8

hello world

This is quite an inconvenient way to find out how to connect to our HTTP service. In Part 2, we’ll explore options for service discovery.

Source for the Nomad job specification and http service is available on github.