Rundeck Nomad Plugin
This is early work. Use with extreme caution!
Purpose
This is a Workflow Step plugin for submitting jobs to a Nomad cluster via Rundeck UI. The plugin interacts with a Nomad server via HTTP API.
Rundeck is a popular and well established automation tool. It features, besides other things, a rich customizable UI, role-based access control, scheduling, logging, alerts, cli and API support and an already extensive plugin ecosystem. It fits well into the CI/CD pipelines (for instance, there is a Jenkins Rundeck integration plugin). Rundeck does not lend itself easily to running in HA mode or scaling worker nodes. Therefore, it seems a good idea put a distributed scheduler such as Nomad behind it to offload resource-intensive jobs.
Installation
- Download and start Rundeck. It will automatically create the necessary directories.
- Clone this repository. Test and build using
gradle
wrapper:./gradlew test ./gradlew build
- Drop
rundeck-nomad-plugin-<version>.jar
tolibext/
under Rundeck installation directory. - Restart Rundeck.
Usage
Download Nomad and start an agent in server mode. For evaluation you may use development mode for zero-configuration start.
In Rundeck UI create a new project and a new job in that project. Under "Add a Step" section swich to "Workflow Steps" tab. If the plugin was recognized successfully, you should see "Run Docker container on Nomad" in the list of the available workflow step plugins. Click on the plugin entry to bring up the input form, fill in Nomad agent URL, docker image name and any other available fields. Save and run the job.
What is in scope
Currently the scope is limited to batch and service jobs of simple structure (1 job, 1 task group, 1 task). The reason is such jobs fit well into the Rundeck operating model and map onto the available UI configuration in a straightforward way. It is possible to set the task count within the task group thereby increasing parallelism where that matters.
Nomad supports a range of Drivers to execute tasks. At the moment only Docker driver task configuration is supported by the plugin. However, best effort has been made to isolate driver-specific code and make the extension process simple.
Job lifecycle
Monitoring of the running jobs is performed in several stages the outcome of which is reported in the log output. Please consult Nomad documentation for the relevant terminology. First it is checked if the job has been successfully submitted to the scheduler. Then it is verified if the job passed the evaluation (evaluation ID is reported). Depending on the desired task count the corresponding number of allocations will be placed by Nomad. Some or all of the allocations may fail for various reasons (resource limitations, driver error, etc), however, the job as a whole can only have pending, running or dead status which may not be representative of the success/failure of the outcome. Hence, in order to allow for some flexibility, we poll for the status of the individual allocations and raise an error if more than a configurable percentage of them end up in a failed status.
Note that logs from individual tasks are not streamed here. Given the arbitrary number of task instances that can be deployed it could be challenging to read all of their streams into Rundeck output. Some support for that may be added in future.
Nomad supports scheduling of periodic jobs and defining restart policies, and also Nomad SDK implements time-outs and back-off strategy for all API calls. However, all of the above settings also belong to core functionality of Rundeck. Therefore, in order to avoid confusion, it was decided to delegate them to Rundeck job-level configuration. That is why API calls are configured to wait indefinitely and periodic stanza from Nomad job specification is not supported. It may be implemented in future, if this plugin is enhanced to be able to deploy long running services.
Minimal version requirements
- Java 1.8
- Rundeck 2.9.x
- Nomad 0.6.0
Similar projects
- cvandal/nomad-ui - UI for Nomad by HashiCorp.
- jippi/hashi-ui - UI for Nomad and Consul.
- FRosner/cluster-broccoli - templated UI for configuring jobs.
- Verizon/nelson - container deployment manager, supporting Nomad as a backend.
Thanks
- rundeck-mesos-plugin provided a lot of useful examples and ideas for the initial structure
TODO
- Better test coverage
- Driver support
- More detailed logging
- TLS support
- Contraints configuration