19 Aug 2016 • on devops vagrant docker ansible monitoring

OpsDev

Production First
Project Overview

Production First

There have been a number of articles recently discussing the concept of OpsDev.

Instead of leaving important architectural decisions to the last minute they should be part of the design from the beginning. Topics such as High Availability, Fault Tolerance, Service Discovery, Data Persistence, Centralised Logging and Monitoring are essential in modern applications and can be much more complicated if left to the last minute to implement.

From that I wanted to build a demo system that emulates a production environment. It should facilitate all the topics mentioned above in the simplest way possible to demonstrate each concept. The demo should provide a framework for experimenting with new tools and technologies.

Project Overview

https://github.com/jamesdmorgan/vagrant-ansible-docker-swarm

The project is available on github and will be extended as new components are added. The name may change as there is more than just an automated swarm

Everything will be automated so it should be trivial to get working. At the moment it uses around 12Gb of Ram. This can be reduced by using a smaller number of Virtual Machines. I wanted to use at least 3 managers to demonstrate clustering of both Docker Swarm and Consul.

Current Components

Vagrant - Management of Virtualbox VMs
Ansible - Provisioning of boxes
Docker 1.12 - Swarm creation on manager and worker boxes
Consul - External DNS, K/V store and dashboard
Registrator - Intended to register services but doesn’t currently work with 1.12
Consul-notifier - Temporary replacement to Registrator for registering Docker Swarm Services

Roadmap

RSyslog - Log shipping for container logs -> LogStash / filebeat
LogStash - Log file filtering
Elasticsearch - Log file indexing
Kibana - Log file UI
CollectD - System metrics
cAdvisor - Docker metrics
Riemann - Metric filtering and processing
InfluxDB - Metric storage
Grafana - Metric dashboard
Flocker - Volume management

Data persistence

Docker swarm brings amazing potential and new problems. Having containers re-scheduled causes problems when there are volumes / data persistence. Cattle containers work well in this environment. More complicated containers running Databases or k/v stores need to treated carefully. In my example I have a couple of containers that fall into this category. Mongo currently can move about the swarm. This would not work in a production environment as data loss is totally unacceptable. The options seem to be. Either don’t run in the Swarm, limit to specific hosts using constraints or use volume manager like Flocker. I will investigate this after I have centralised logging and monitoring.

I will write an individual post for each stage of the project. Starting with building the Virtual Machines with Vagrant & Ansible, through provisioning the Docker Swarm, service discovery, applications, logging and system monitoring.

« OSX Development Environment Vagrant Docker Swarm »

James Morgan