Using GlusterFS with Docker swarm cluster

In this blog I will create a 3 node Docker swarm cluster and use GlusterFS to share volume storage across Docker swarm nodes.

Introduction

Using Swarm node in Docker will create a cluster of Docker hosts to run container on, the problem in had is if container “A” run in “node1” with named volume “voldata”, all data changes applied to “voldata” will be locally saved to “node1”. If container A is shut down and happen to start again in different node, let’s say this time on “node3” and also mounting named volume “voldata” will be empty and will not contain changes done to the volume when it was mounted in “node1”.

In this example I will not use named volume, rather I will use shared mount storage among cluster nodes, of course the same can apply to share storage for named volume folder.

I’m using for this exercise 3 EC2 on AWS with 1 attached EBS volumes for each one of them.

How to get around this?

One of the way to solve this is to use GlusterFS to replicate volumes across swarm nodes and make data available to all nodes at any time. Named volumes will still be local to each Docker host since GlusterFS takes care of the replication.

Preparation on each server

I will use Ubuntu 16.04 for this exercise.

First we put friendly name in /etc/hosts:

XX.XX.XX.XX    node1
XX.XX.XX.XX    node2
XX.XX.XX.XX    node3

Then we update the system

$ sudo apt update
$ sudo apt upgrade

Finally we reboot the server. Then start with installing necessary packages on all nodes:

$ sudo apt install -y docker.io
$ sudo apt install -y glusterfs-server

Then start the services:

$ sudo systemctl start glusterfs-server
$ sudo systemctl start docker

Create GlusterFS storage for bricks:

$ sudo mkdir -p /gluster/data /swarm/volumes

GlusterFS setup

First we prepare filesystem for the Gluster storage on all nodes:

$ sudo mkfs.xfs /dev/xvdb 
$ sudo mount /dev/xvdb /gluster/data/

From node1:

$ sudo gluster peer probe node2
peer probe: success. 
$ sudo gluster peer probe node3
peer probe: success.

Create the volume as a mirror:

$ sudo gluster volume create swarm-vols replica 3 node1:/gluster/data node2:/gluster/data node3:/gluster/data force
volume create: swarm-vols: success: please start the volume to access data

Allow mount connection only from localhost:

$ sudo gluster volume set swarm-vols auth.allow 127.0.0.1
volume set: success

Then start the volume:

$ sudo gluster volume start swarm-vols
volume start: swarm-vols: success

Then on each Gluster node we mount the shared mirrored GlusterFS locally:

$ sudo mount.glusterfs localhost:/swarm-vols /swarm/volumes

Docker swarm setup

Here I will create 1 manager node and 2 worker nodes.

$ sudo docker swarm init
Swarm initialized: current node (82f5ud4z97q7q74bz9ycwclnd) is now a manager.
 
To add a worker to this swarm, run the following command:
 
    docker swarm join \
    --token SWMTKN-1-697xeeiei6wsnsr29ult7num899o5febad143ellqx7mt8avwn-1m7wlh59vunohq45x3g075r2h \
    172.31.24.234:2377
 
To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.

Get the token for worker nodes:

$ sudo docker swarm join-token worker
To add a worker to this swarm, run the following command:
 
    docker swarm join \
    --token SWMTKN-1-697xeeiei6wsnsr29ult7num899o5febad143ellqx7mt8avwn-1m7wlh59vunohq45x3g075r2h \
    172.31.24.234:2377

Then on both worker nodes:

$ sudo docker swarm join --token SWMTKN-1-697xeeiei6wsnsr29ult7num899o5febad143ellqx7mt8avwn-1m7wlh59vunohq45x3g075r2h 172.31.24.234:2377
This node joined a swarm as a worker.

Verify the swarm cluster:

$ sudo docker node ls
ID                           HOSTNAME          STATUS  AVAILABILITY  MANAGER STATUS
6he3dgbanee20h7lul705q196    ip-172-31-27-191  Ready   Active        
82f5ud4z97q7q74bz9ycwclnd *  ip-172-31-24-234  Ready   Active        Leader
c7daeowfoyfua2hy0ueiznbjo    ip-172-31-26-52   Ready   Active

Testing

To test, I will create label on node1 and node3, then create a container on node1 then shut it down then create it again on node3, with the same volume mounts, then we will notice that files created by both containers are shared.

Label swarm nodes:

$ sudo docker node update --label-add nodename=node1 ip-172-31-24-234
ip-172-31-24-234
$ sudo docker node update --label-add nodename=node3 ip-172-31-26-52
ip-172-31-26-52

Check the labels:

$ sudo docker node inspect --pretty ip-172-31-26-52
ID:			c7daeowfoyfua2hy0ueiznbjo
Labels:
 - nodename = node3
Hostname:		ip-172-31-26-52
Joined at:		2017-01-06 22:44:17.323236832 +0000 utc
Status:
 State:			Ready
 Availability:		Active
Platform:
 Operating System:	linux
 Architecture:		x86_64
Resources:
 CPUs:			1
 Memory:		1.952 GiB
Plugins:
  Network:		bridge, host, null, overlay
  Volume:		local
Engine Version:		1.12.1

Create Docker service on node1 that will create a file in the shared volume:

$ sudo docker service create --name testcon --constraint 'node.labels.nodename == node1' --mount type=bind,source=/swarm/volumes/testvol,target=/mnt/testvol /bin/touch /mnt/testvol/testfile1.txt
duvqo3btdrrlwf61g3bu5uaom

Verify service creation:

$ sudo docker service ls
ID            NAME     REPLICAS  IMAGE    COMMAND
duvqo3btdrrl  testcon  0/1       busybox  /bin/bash

Check that it’s running in node1:

$ sudo docker service ps testcon
ID                         NAME           IMAGE          NODE              DESIRED STATE  CURRENT STATE           ERROR
6nw6sm8sak512x24bty7fwxwz  testcon.1      ubuntu:latest  ip-172-31-24-234  Ready          Ready 1 seconds ago     
6ctzew4b3rmpkf4barkp1idhx   \_ testcon.1  ubuntu:latest  ip-172-31-24-234  Shutdown       Complete 1 seconds ago

Also check the volume mounts:

$ sudo docker inspect testcon
[
    {
        "ID": "8lnpmwcv56xwmwavu3gc2aay8",
        "Version": {
            "Index": 26
        },
        "CreatedAt": "2017-01-06T23:03:01.93363267Z",
        "UpdatedAt": "2017-01-06T23:03:01.935557744Z",
        "Spec": {
            "ContainerSpec": {
                "Image": "busybox",
                "Args": [
                    "/bin/bash"
                ],
                "Mounts": [
                    {
                        "Type": "bind",
                        "Source": "/swarm/volumes/testvol",
                        "Target": "/mnt/testvol"
                    }
                ]
            },
            "Resources": {
                "Limits": {},
                "Reservations": {}
            },
            "RestartPolicy": {
                "Condition": "any",
                "MaxAttempts": 0
            },
            "Placement": {
                "Constraints": [
                    "nodename == node1"
                ]
            }
        },
        "ServiceID": "duvqo3btdrrlwf61g3bu5uaom",
        "Slot": 1,
        "Status": {
            "Timestamp": "2017-01-06T23:03:01.935553276Z",
            "State": "allocated",
            "Message": "allocated",
            "ContainerStatus": {}
        },
        "DesiredState": "running"
    }
]

Shutdown the service and then create in node3:

$ sudo docker service create --name testcon --constraint 'node.labels.nodename == node3' --mount type=bind,source=/swarm/volumes/testvol,target=/mnt/testvol ubuntu:latest /bin/touch /mnt/testvol/testfile3.txt
5y99c0bfmc2fywor3lcsvmm9q

Verify it has ran on node3:

$ sudo docker service ps testcon
ID                         NAME           IMAGE          NODE             DESIRED STATE  CURRENT STATE           ERROR
5p57xyottput3w34r7fclamd9  testcon.1      ubuntu:latest  ip-172-31-26-52  Ready          Ready 1 seconds ago     
aniesakdmrdyuq8m2ddn3ga9b   \_ testcon.1  ubuntu:latest  ip-172-31-26-52  Shutdown       Complete 2 seconds ago

Now check the files created from both containers exist in the same volume:

$ ls -l /swarm/volumes/testvol/
total 0
-rw-r--r-- 1 root root 0 Jan  6 23:59 testfile3.txt
-rw-r--r-- 1 root root 0 Jan  6 23:58 testfile1.txt

High Availability WordPress with GlusterFS

We decided to run a WordPress website in high availability mode on Amazon Web Services (AWS). I created 3 AWS instances with a Multi-AZ RDS running MySQL, move the existing database, the only missing thing is to share WordPress file on all machines (for uploads and WP upgrades). NFS was no option for me as I had bad experiences with stale connections in the past, so I decided to go with GlusterFS.

What is GlusterFS?

As per Wikipedia: GlusterFS is a scale-out network-attached storage file system. It has found applications including cloud computing, streaming media services, and content delivery networks. GlusterFS was developed originally by Gluster, Inc., then by Red Hat, Inc., after their purchase of Gluster in 2011.

Volumes shared with GlusterFS can work in multiple modes as Distributed, Mirrored (multi-way), Striped, or combinations of those.

Gluster Volumes

Gluster works in Server/Client mode, servers take care of shared volumes, clients mount the volumes and use them. In my scenario the servers and the clients are the same machines.

Preparation of Gluster nodes

1- Nodes will communicate on internal AWS network, so the following must go on each node’s /etc/hosts file:

XXX.XXX.XXX.XXX node-gfs1 # us-east-1b
XXX.XXX.XXX.XXX node-gfs2 # us-east-1d
XXX.XXX.XXX.XXX node-gfs3 # us-east-1d

2- Create AWS EBS volumes to be attached on each instance. Node that it’s good to create each volume in the availability zone of the instance.

3- Open firewall ports on local network:
Note: To mount them locally (client on the same server machine), must open proper below or else the FS might be mounted read only, according to the following guidelines:

– 24007 TCP for the Gluster Daemon
– 24008 TCP for Infiniband management (optional unless you are using IB)
– One TCP port for each brick in a volume. So, for example, if you have 4 bricks in a volume, port 24009 – 24012 would be used in GlusterFS 3.3 & below, 49152 – 49155 from GlusterFS 3.4 & later.
– 38465, 38466 and 38467 TCP for the inline Gluster NFS server.
– Additionally, port 111 TCP and UDP (since always) and port 2049 TCP-only (from GlusterFS 3.4 & later) are used for port mapper and should be open.

Installation steps

On each machine: install GlusterFS (server and client)

# yum install centos-release-gluster37
# yum install glusterfs-server

Then start the Gluster server process and enable it on boot:

# systemctl start glusterd
# systemctl enable glusterd
Created symlink from /etc/systemd/system/multi-user.target.wants/glusterd.service to /usr/lib/systemd/system/glusterd.service.
#

From first node: establish Gluster cluster nodes trust relationship:

# gluster peer probe node-gfs2
peer probe: success. 
# gluster peer probe node-gfs3
peer probe: success.

Now check the status of peer commands:

# gluster peer status
Number of Peers: 2
 
Hostname: node-gfs2
Uuid: 2a7ea8f6-0832-42ba-a98e-6fe7d67fcfe9
State: Peer in Cluster (Connected)
 
Hostname: node-gfs3
Uuid: 55b0ce72-0c34-441f-ab3c-88414885e32d
State: Peer in Cluster (Connected)
#

On each server: prepare the volumes:

# mkdir -p /glusterfs/bricks/brick1
# mkfs.xfs /dev/xvdf

Add to /etc/fstab:

UUID=8f808cef-c7c6-4c2a-bf15-0e32ef71e97c /glusterfs/bricks/brick1 xfs    defaults        0 0

Then mount it

Note: If you use the mount point directly I get the error:

# gluster volume create wp replica 3 node-gfs1:/glusterfs/bricks/brick1 node-gfs2:/glusterfs/bricks/brick1 node-gfs3:/glusterfs/bricks/brick1
volume create: wp: failed: The brick node-gfs1:/glusterfs/bricks/brick1 is a mount point. Please create a sub-directory under the mount point and use that as the brick directory. Or use 'force' at the end of the command if you want to override this behavior.

So create under each /glusterfs/bricks/brick1 mount point a directory used for GlusterFS volume, in my case I created /glusterfs/bricks/brick1/gv.

From server 1: Create a 3-way mirror volume:

# gluster volume create wp replica 3 node-gfs1:/glusterfs/bricks/brick1/gv node-gfs2:/glusterfs/bricks/brick1/gv node-gfs3:/glusterfs/bricks/brick1/gv
volume create: wp: success: please start the volume to access data
#

Check the status:

# gluster volume info
 
Volume Name: wp
Type: Replicate
Volume ID: 34dbacba-344e-4c89-875f-4c91812f01be
Status: Created
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: node-gfs1:/glusterfs/bricks/brick1/gv
Brick2: node-gfs2:/glusterfs/bricks/brick1/gv
Brick3: node-gfs3:/glusterfs/bricks/brick1/gv
Options Reconfigured:
performance.readdir-ahead: on
#

Now start the volume:

# gluster volume start wp
volume start: wp: success
#

Check the status of the volume:

# gluster volume status
Status of volume: wp
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick node-gfs1:/glusterfs/bricks/brick1/
gv                                          49152     0          Y       10229
Brick node-gfs2:/glusterfs/bricks/brick1/
gv                                          49152     0          Y       9323 
Brick node-gfs3:/glusterfs/bricks/brick1/
gv                                          49152     0          Y       9171 
NFS Server on localhost                     2049      0          Y       10249
Self-heal Daemon on localhost               N/A       N/A        Y       10257
NFS Server on node-gfs2                     2049      0          Y       9343 
Self-heal Daemon on node-gfs2               N/A       N/A        Y       9351 
NFS Server on node-gfs3                     2049      0          Y       9191 
Self-heal Daemon on node-gfs3               N/A       N/A        Y       9199 
 
Task Status of Volume wp
------------------------------------------------------------------------------
There are no active volume tasks
 
#

This is a healthy volume, if one of the servers goes offline it will disappear from the table above and reappears when it’s back online. Also peer status will be disconnected (from gluster peer status command).

Using the volume (Gluster clients)

Mount on each machine: In server 1 I will mount from server 2, from server 2 i will mount from server 3, from server 3 I will mount from server 1.

Using the syntax in /etc/fatab:

node-gfs2:/wp        /var/www/html      glusterfs     defaults,_netdev  0  0

Repeat it on each server as per my above note.

Now /var/www/html is shared on each machine in read/write mode.

References

  • http://severalnines.com/blog/scaling-wordpress-and-mysql-multiple-servers-performance
  • http://www.slashroot.in/gfs-gluster-file-system-complete-tutorial-guide-for-an-administrator
  • https://wiki.centos.org/HowTos/GlusterFSonCentOS