Ceph, AWS S3, and Multipart uploads using Python

Summary

In this article the following will be demonstrated:

  • Ceph Nano – As the back end storage and S3 interface
  • Python script to use the S3 API to multipart upload a file to the Ceph Nano using Python multi-threading

Introduction

Caph Nano is a Docker container providing basic Ceph services (mainly Ceph Monitor, Ceph MGR, Ceph OSD for managing the Container Storage and a RADOS Gateway to provide the S3 API interface). It also provides Web UI interface to view and manage buckets.

Multipart uploads is a feature in HTTP/1.1 protocol that allow download/upload of range of bytes in a file. For example, a 200 MB file can be downloaded in 2 rounds, first round can 50% of the file (byte 0 to 104857600) and then download the remaining 50% starting from byte 104857601 in the second round.

The Details

First Docker must be installed in local system, then download the Ceph Nano CLI using:

$ curl -L https://github.com/ceph/cn/releases/download/v2.3.1/cn-v2.3.1-linux-amd64 -o cn && chmod +x cn

This will install the binary cn version 2.3.1 in local folder and turn it executable.

To start the Ceph Nano cluster (container), run the following command:

$ ./cn cluster start ceph
2019/12/03 11:59:12 Starting cluster ceph...

Endpoint: http://166.87.163.10:8000
Dashboard: http://166.87.163.10:5000
Access key: 90WFLFQNZQ452XXI6851
Secret key: ISmL6Ru3I3MDiFwZITPCu8b1tL3BWyPDAmLoF0ZP
Working directory: /usr/share/ceph-nano

This will download the Ceph Nano image and run it as a Docker container. Web UI can be accessed on http://166.87.163.10:5000, API end point is at http://166.87.163.10:8000.

We can verify that using:

$ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                                            NAMES
0ba17ec716d3        ceph/daemon         "/opt/ceph-contain..."   4 weeks ago         Up 26 seconds       0.0.0.0:5000->5000/tcp, 0.0.0.0:8000->8000/tcp   ceph-nano-ceph

Of course this is for demonstration purpose, the container here is created 4 weeks ago.

It can be accessed with the name ceph-nano-ceph using the command

$ docker exec -it ceph-nano-ceph /bin/bash

Which will drop me in a BASH shell inside the Ceph Nano container.

To examine the running processes inside the container:

[root@ceph-nano-ceph-faa32aebf00b /]# ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 08:59 ?        00:00:00 /bin/bash /opt/ceph-container/bin/entrypoint.sh
ceph       113     1  0 08:59 ?        00:00:43 /usr/bin/ceph-mon --cluster ceph --default-log-to-file=false --default-mon-cluster-log-to-file=false --setuser ceph --setgroup ceph -i ceph-nano-ceph-faa32aebf00b --mon-data /var/lib/ceph/mo
ceph       194     1  1 08:59 ?        00:02:08 ceph-mgr --cluster ceph --default-log-to-file=false --default-mon-cluster-log-to-file=false --setuser ceph --setgroup ceph -i ceph-nano-ceph-faa32aebf00b
ceph       240     1  0 08:59 ?        00:00:29 ceph-osd --cluster ceph --default-log-to-file=false --default-mon-cluster-log-to-file=false --setuser ceph --setgroup ceph -i 0
ceph       451     1  0 08:59 ?        00:00:17 radosgw --cluster ceph --default-log-to-file=false --default-mon-cluster-log-to-file=false --setuser ceph --setgroup ceph -n client.rgw.ceph-nano-ceph-faa32aebf00b -k /var/lib/ceph/radosgw/c
root       457     1  0 08:59 ?        00:00:02 python app.py
root       461     1  0 08:59 ?        00:00:00 /usr/bin/python2.7 /usr/bin/ceph --cluster ceph -w
root      1093     0  0 11:02 ?        00:00:00 /bin/bash
root      1111  1093  0 11:03 ?        00:00:00 ps -ef

The first thing I need to do is to create a bucket, so when inside the Ceph Nano container I use the following command:

# s3cmd mb s3://nano

Which will create a Bucket called nano.

Now to create a user on the Ceph Nano cluster to access the S3 buckets. So here I created a user called test, with access and secret keys set to test.

$ radosgw-admin user create --uid=test --access-key=test --secret=test --display-name test

To upload a test file for testing:

# dd if=/dev/zero of=./zeros bs=15M count=1
# s3cmd put ./zeros s3://nano

And list the file in the bucket:

# s3cmd ls s3://nano
2019-10-29 11:58  15728640   s3://nano/zeors

The Python code

#!/usr/bin/env python
#
# Copyright (c) 2019 Tamer Embaby <tamer@redhat.com>
# All rights reserved.
#
# Main reference is: https://stackoverflow.com/questions/34303775/complete-a-multipart-upload-with-boto3
# Good code, but it will take too much time to complete especially for thread synchronization. (DONE)
#
# TODO: 
#       - Check return code of function calls everywhere.
#       - Use logging instead of print's everywhere.
#       - Address the XXX and FIXME's in the code
#

import boto3
import sys, os
import threading
import logging

b3_client = None
b3_s3 = None
mpu = None              # Multipart upload handle

#
# Thread (safe) function responsible of uploading a part of the file
#
def upload_part_r(partid, part_start, part_end, thr_args):
        filename = thr_args['FileName']
        bucket = thr_args['BucketName']
        upload_id = thr_args['UploadId']

        logging.info("%d: >> Uploading part %d", partid, partid)
        logging.info("%d: --> Upload starts at byte %d", partid, part_start)
        logging.info("%d: --> Upload ends at byte %d", partid, part_end)

        f = open(filename, "rb")
        logging.info("%d: DEBUG: Seeking offset: %d", partid, part_start)
        logging.info("%d: DEBUG: Reading size: %d", partid, part_end - part_start)
        f.seek(part_start, 0)
        # XXX: Would the next read fail if the portion is too large?
        data = f.read(part_end - part_start + 1)

        # DO WORK HERE
        # TODO:
        # - Variables like mpu, Bucket, Key should be passed from caller -- DONE
        # - We should collect part['ETag'] from this part into array/list, so we must synchronize access
        #   to that list, this list is then used to construct part_info array to call .complete_multipart_upload(...)
        # TODO.
        #
        # NOTES:
        # - Since part id is zero based (from handle_mp_file function), we add 1 to it here as HTTP parts should start
        #   from 1
        part = b3_client.upload_part(Bucket=bucket, Key=filename, PartNumber=partid+1, UploadId=upload_id, Body=data)

        # Thread critical variable which should hold all information about ETag for all parts, access to this variable
        # should be synchronized.
        lock = thr_args['Lock']
        if lock.acquire():
                thr_args['PartInfo']['Parts'].append({'PartNumber': partid+1, 'ETag': part['ETag']})
                lock.release()

        f.close()
        logging.info("%d: -><- Part ID %d is ending", partid, partid)
        return

#
# Part size calculations.
# Thread dispatcher
#
def handle_mp_file(bucket, filename, nrparts):

        print ">> Uploading file: " + filename + ", nr_parts = " + str(nrparts)

        fsize = os.path.getsize(filename)
        print "+ %s file size = %d " % (filename, fsize)

        # do the part size calculations
        part_size = fsize / nrparts
        print "+ standard part size = " + str(part_size) + " bytes"

        # Initiate multipart uploads for the file under the bucket
        mpu = b3_client.create_multipart_upload(Bucket=bucket, Key=filename)

        threads = list()
        thr_lock = threading.Lock()
        thr_args = { 'PartInfo': { 'Parts': [] } , 'UploadId': mpu['UploadId'], 'BucketName': bucket, 'FileName': filename,
                'Lock': thr_lock }

        for i in range(nrparts):
                print "++ Part ID: " + str(i)

                part_start = i * part_size
                part_end = (part_start + part_size) - 1

                if (i+1) == nrparts:
                        print "DEBUG: last chunk, part-end was/will %d/%d" % (part_end, fsize)
                        part_end = fsize

                print "DEBUG: part_start=%d/part_end=%d" % (part_start, part_end)

                thr = threading.Thread(target=upload_part_r, args=(i, part_start, part_end, thr_args, ) )
                threads.append(thr)
                thr.start()

        # Wait for all threads to complete
        for index, thr in enumerate(threads):
                thr.join()
                print "%d thread finished" % (index)

        part_info = thr_args['PartInfo']
        for p in part_info['Parts']:
                print "DEBUG: PartNumber=%d" % (p['PartNumber'])
                print "DEBUG: ETag=%s" % (p['ETag'])

        print "+ Finishing up multi-part uploads"
        b3_client.complete_multipart_upload(Bucket=bucket, Key=filename, UploadId=mpu['UploadId'], MultipartUpload=thr_args['PartInfo'])
        return True

### MAIN ###

if __name__ == "__main__":
        bucket = 'test'                 # XXX FIXME: Pass in arguments

        if len(sys.argv) != 3:
                print "usage: %s <filename to upload> <number of threads/parts>" % (sys.argv[0])
                sys.exit(1)

        # Filename: File to upload
        # NR Parts: Number of parts to divide the file to, which is the number of threads to use
        filename = sys.argv[1]
        nrparts = int(sys.argv[2])

        format = "%(asctime)s: %(message)s"
        logging.basicConfig(format=format, level=logging.INFO, datefmt="%H:%M:%S")

        # Initialize the connection with Ceph RADOS GW
        b3_client = boto3.client(service_name = 's3', endpoint_url = 'http://127.0.0.1:8000', aws_access_key_id = 'test', aws_secret_access_key = 'test')
        b3_s3 = boto3.resource(service_name = 's3', endpoint_url = 'http://127.0.0.1:8000', aws_access_key_id = 'test', aws_secret_access_key = 'test')

        handle_mp_file(bucket, filename, nrparts)

### END ###

This code will using Python multithreading to upload multiple part of the file simultaneously as any modern download manager will do using the feature of HTTP/1.1.

To use this Python script, name the above code to a file called boto3-upload-mp.py and run is as:

$ ./boto3-upload-mp.py mp_file_original.bin 6

Here 6 means the script will divide the file into 6 parts and create 6 threads to upload these part simultaneously.

The uploaded file can be then redownloaded and checksummed against the original file to veridy it was uploaded successfully.

Deploying OpenBSD 6.3 on AWS

Here are the steps on deploying OpenBSD 6.3 on Amazon Web Service, I use it as SMTP/IMAP server, also it can be used as secure Jump Server.

Roadmap

  • Create a VM on VirtualBox (VBox) running OpenBSD 6.3
  • Prepare the OpenBSD VBox VM to be deployed on AWS
  • Upload the OpenBSD VBox VM to AWS as volume
  • Snapshot and create AMI from the uploaded volume

Steps

Create a VM on VirtualBox (VBox)

I use /vbox directory as backend storage for VBox disk images, so first I create disk image for OpenBSD:

$ vboxmanage createhd --format VHD --filename /vbox/openbsd/obsd-disk0 --size 8196
0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%
Medium created. UUID: 75a6caa8-c6ea-4e36-9768-002944d846e7

Create the VBox VM:

$ vboxmanage createvm --name "openbsd-6.3" --ostype OpenBSD_64 --register
Virtual machine 'openbsd-6.3' is created and registered.
UUID: 2760b9d6-1c35-4783-9090-c0cb5f3b35f4
Settings file: '/home/te/VirtualBox VMs/openbsd-6.3/openbsd-6.3.vbox'

Create SATA controller and attach OpenBSD VM virtual disk to it:

$ vboxmanage storagectl openbsd-6.3 --name "SATA Controller" --add sata --controller IntelAHCI
$ vboxmanage storageattach openbsd-6.3 --storagectl "SATA Controller" --port 0 --device 0 --type hdd --medium /vbox/openbsd/obsd-disk0.vdi

Create IDE controller and attach OpenBSD installation ISO to it:

$ vboxmanage storagectl openbsd-6.3 --name "IDE Controller" --add ide
$ vboxmanage storageattach openbsd-6.3 --storagectl "IDE Controller" --port 0 --device 0 --type dvddrive --medium /vbox/ISO/openbsd/6.3/amd64/install63.iso

Now set some configuration for the VM to work:

$ vboxmanage modifyvm openbsd-6.3 --ioapic on
$ vboxmanage modifyvm openbsd-6.3 --boot1 dvd --boot2 disk --boot3 none
$ vboxmanage modifyvm openbsd-6.3 --memory 768
$ vboxmanage modifyvm openbsd-6.3 --vram 128
$ vboxmanage modifyvm openbsd6.3 --cpus 2
$ vboxmanage modifyvm openbsd6.3 --uart1 0x3F8 4

Notes:

  • It’s important to set CPU count to 2 for the OpenBSD installer to install SMP kernel
  • It’s important to set COM1 (UART1) to be able to to view the console messages

Review:

$ vboxmanage showvminfo openbsd-6.3
Name: OpenBSD 6.3
Groups: /
Guest OS: OpenBSD (64-bit)
UUID: 2760b9d6-1c35-4783-9090-c0cb5f3b35f4
Config file: /home/te/VirtualBox VMs/openbsd-6.3/openbsd-6.3.vbox
Snapshot folder: /home/te/VirtualBox VMs/openbsd-6.3/Snapshots
Log folder: /home/te/VirtualBox VMs/openbsd-6.3/Logs
Hardware UUID: 2760b9d6-1c35-4783-9090-c0cb5f3b35f4
Memory size: 768MB
Page Fusion: off
VRAM size: 8MB
CPU exec cap: 100%
HPET: off
Chipset: piix3
Firmware: BIOS
Number of CPUs: 1
...
IOAPIC: on
BIOS APIC mode: APIC
Time offset: 0ms
RTC: local time
Hardw. virt.ext: on
Nested Paging: on
Large Pages: off
VT-x VPID: on
...
Storage Controller Name (0): SATA Controller
Storage Controller Type (0): IntelAhci
Storage Controller Instance Number (0): 0
Storage Controller Max Port Count (0): 30
Storage Controller Port Count (0): 30
Storage Controller Bootable (0): on
Storage Controller Name (1): IDE Controller
Storage Controller Type (1): PIIX4
Storage Controller Instance Number (1): 0
Storage Controller Max Port Count (1): 2
Storage Controller Port Count (1): 2
Storage Controller Bootable (1): on
SATA Controller (0, 0): /vbox/openbsd/obsd-disk0.vdi (UUID: 75a6caa8-c6ea-4e36-9768-002944d846e7)
IDE Controller (0, 0): /vbox/ISO/openbsd/6.3/amd64/install63.iso (UUID: bef3fcaf-31c1-47e4-96bc-6596ce0dc07c)
NIC 1: MAC: 0800274874D9, Attachment: NAT, Cable connected: on, Trace: off (file: none), Type: 82540EM, Reported speed: 0 Mbps, Boot priority: 0, Promisc Policy: deny, Bandwidth group: none
NIC 1 Settings: MTU: 0, Socket (send: 64, receive: 64), TCP Window (send:64, receive: 64)
NIC 2: disabled
...
Pointing Device: PS/2 Mouse
Keyboard Device: PS/2 Keyboard
UART 1: I/O base: 0x03f8, IRQ: 4, disconnected
UART 2: disabled
UART 3: disabled
UART 4: disabled
LPT 1: disabled
LPT 2: disabled
...
...

Now start the VM and then follow OpenBSD installation

$ vboxmanage startvm openbsd6.3

Inside the OpenBSD VBox VM

create ec2-user and add to /etc/doas.conf to be able to use doas tool:

permit nopass keepenv ec2-user as root

Download the file ec2-init.sh from the below URL: https://raw.githubusercontent.com/ajacoutot/aws-openbsd/master/ec2-init.sh

Install the ec2-init.sh it to the path /usr/local/libexec/ec2-init and set necessary ownership and permissions:

# chmod 0555 /usr/local/libexec/ec2-init
# chown root.bin /usr/local/libexec/ec2-init

In the file /etc/ttys replace line that reads:

#tty0 ...

With:

tty00 /usr/libexec/getty std.9600\" vt220 on secure

Add the following file to /etc/boot.conf:

stty com0 9600
set tty com0

Create the network configuration file /etc/hostname.xnf0 with mode 0640 that reads:

dhcp
!/usr/local/libexec/ec2-init

The /usr/local/libexec/ec2-init is a cloud-init help for OpenBSD responsible for passing instance information to AWS OpenBSD instance and  setting hostname, instance-id, SSH public key etc.

Disallow root and password login in SSH /etc/ssh/sshd_config:

PermitRootLogin no
PasswordAuthentication no

And finally do any necessary package installation and configuration in the OpenBSD VBox VM, this will be our default image for OpenBSD instances create in AWS.

Uploading OpenBSD image to AWS

I use Ubuntu 18.04 for my personal laptop, to upload the OpenBSD VBox disk image to AWS the following software is needed:

$ sudo apt install ec2-api-tools ec2-ami-tools

Then execute the following command to upload the image to AWS:

$ export AWS_KEY="YOUR_AWS_KEY"
$ export AWS_SEC="YOUR_AWS_KEY_SECRET"
$ ec2-import-volume --format vhd --volume-size 12 --region \
   us-east-1 --availability-zone us-east-1c \
   --bucket openbsd-tmp-folder --owner-akid $AWS_KEY \
   --owner-sak $AWS_SEC --aws-access-key $AWS_KEY \
   --aws-secret-key $AWS_SEC /vbox/openbsd/obsd-disk0.vhd

The “us-east-1” and “us-east-1c” is region and availability zone desired.

The above command upload the OpenBSD disk image in chucks to S3 bucket “openbsd-tmp-folder” and then convert them to AWS volume of size 12 GB. Conversion process can be monitored with the command:

$ ec2-describe-conversion-tasks --aws-access-key $AWS_KEY \
   --aws-secret-key $AWS_KEY

Then depending on preference, we can login to AWS console and create a snapshot from the OpenBSD volume and then chose to make an AMI from that snapshot or using the following command to create them:

$ ec2-create-snapshot \
   --aws-access-key $AWS_KEY" \
   --aws-secret-key $AWS_SEC \
   --region us-east-1 \
   <VOLUME-NAME>

$ ec2-register \
   --name "OpenBSD 6.3 AMI" \
   --aws-access-key $AWS_KEY \
   --aws-secret-key $AWS_SEC \
   --region us-east-1 \
   --architecture x86_64 \
   --root-device-name /dev/sda1 \
   --virtualization-type hvm \
   --snapshot <SNAPSHOT-NAME>

Then launch instance in AWS from that AMI and login with ec2-user keys, here is my OpenBSD dmesg:

ip-172-30-2-198$ dmesg
OpenBSD 6.3 (GENERIC.MP) #107: Sat Mar 24 14:21:59 MDT 2018
deraadt@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 1056964608 (1008MB)
avail mem = 1017905152 (970MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xeb01f (11 entries)
bios0: vendor Xen version "4.2.amazon" date 08/24/2006
bios0: Xen HVM domU
acpi0 at bios0: rev 2
acpi0: sleep states S3 S4 S5
acpi0: tables DSDT FACP APIC HPET WAET SSDT SSDT
acpi0: wakeup devices
acpitimer0 at acpi0: 3579545 Hz, 32 bits
acpimadt0 at acpi0 addr 0xfee00000: PC-AT compat
ioapic0 at mainbus0: apid 1 pa 0xfec00000, version 11, 48 pins
, remapped to apid 1
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz, 2399.73 MHz
cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,SSSE3,FMA3,CX16,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,RDTSCP,LONG,LAHF,ABM,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MELTDOWN
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 100MHz
acpihpet0 at acpi0: 62500000 Hz
acpiprt0 at acpi0: bus 0 (PCI0)
acpicpu0 at acpi0: C1(@1 halt!)
"ACPI0007" at acpi0 not configured
pvbus0 at mainbus0: Xen 4.2
xen0 at pvbus0: features 0x705, 32 grant table frames, event channel 3
xbf0 at xen0 backend 0 channel 5: disk
scsibus1 at xbf0: 2 targets
sd0 at scsibus1 targ 0 lun 0: <Xen, phy hda 768, 0000> SCSI3 0/direct fixed
sd0: 12288MB, 512 bytes/sector, 25165824 sectors
xnf0 at xen0 backend 0 channel 6: address 0e:ac:b7:ee:8a:2a
"console" at xen0: device/console/0 not configured
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel 82441FX" rev 0x02
pcib0 at pci0 dev 1 function 0 "Intel 82371SB ISA" rev 0x00
pciide0 at pci0 dev 1 function 1 "Intel 82371SB IDE" rev 0x00: DMA, channel 0 wired to compatibility, channel 1 wired to compatibility
pciide0: channel 0 disabled (no drives)
pciide0: channel 1 disabled (no drives)
piixpm0 at pci0 dev 1 function 3 "Intel 82371AB Power" rev 0x01: SMBus disabled
vga1 at pci0 dev 2 function 0 "Cirrus Logic CL-GD5446" rev 0x00
wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
xspd0 at pci0 dev 3 function 0 "XenSource Platform Device" rev 0x01
isa0 at pcib0
isadma0 at isa0
fdc0 at isa0 port 0x3f0/6 irq 6 drq 2
com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
com0: console
pckbc0 at isa0 port 0x60/5 irq 1 irq 12
pckbd0 at pckbc0 (kbd slot)
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pms0 at pckbc0 (aux slot)
wsmouse0 at pms0 mux 0
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
vscsi0 at root
scsibus2 at vscsi0: 256 targets
softraid0 at root
scsibus3 at softraid0: 256 targets
root on sd0a (02307a84259f2d52.a) swap on sd0b dump on sd0b
fd0 at fdc0 drive 0: density unknown
fd1 at fdc0 drive 1: density unknown

References

High Availability WordPress with GlusterFS

We decided to run a WordPress website in high availability mode on Amazon Web Services (AWS). I created 3 AWS instances with a Multi-AZ RDS running MySQL, move the existing database, the only missing thing is to share WordPress file on all machines (for uploads and WP upgrades). NFS was no option for me as I had bad experiences with stale connections in the past, so I decided to go with GlusterFS.

What is GlusterFS?

As per Wikipedia: GlusterFS is a scale-out network-attached storage file system. It has found applications including cloud computing, streaming media services, and content delivery networks. GlusterFS was developed originally by Gluster, Inc., then by Red Hat, Inc., after their purchase of Gluster in 2011.

Volumes shared with GlusterFS can work in multiple modes as Distributed, Mirrored (multi-way), Striped, or combinations of those.

Gluster Volumes

Gluster works in Server/Client mode, servers take care of shared volumes, clients mount the volumes and use them. In my scenario the servers and the clients are the same machines.

Preparation of Gluster nodes

1- Nodes will communicate on internal AWS network, so the following must go on each node’s /etc/hosts file:

XXX.XXX.XXX.XXX node-gfs1 # us-east-1b
XXX.XXX.XXX.XXX node-gfs2 # us-east-1d
XXX.XXX.XXX.XXX node-gfs3 # us-east-1d

2- Create AWS EBS volumes to be attached on each instance. Node that it’s good to create each volume in the availability zone of the instance.

3- Open firewall ports on local network:
Note: To mount them locally (client on the same server machine), must open proper below or else the FS might be mounted read only, according to the following guidelines:

– 24007 TCP for the Gluster Daemon
– 24008 TCP for Infiniband management (optional unless you are using IB)
– One TCP port for each brick in a volume. So, for example, if you have 4 bricks in a volume, port 24009 – 24012 would be used in GlusterFS 3.3 & below, 49152 – 49155 from GlusterFS 3.4 & later.
– 38465, 38466 and 38467 TCP for the inline Gluster NFS server.
– Additionally, port 111 TCP and UDP (since always) and port 2049 TCP-only (from GlusterFS 3.4 & later) are used for port mapper and should be open.

Installation steps

On each machine: install GlusterFS (server and client)

# yum install centos-release-gluster37
# yum install glusterfs-server

Then start the Gluster server process and enable it on boot:

# systemctl start glusterd
# systemctl enable glusterd
Created symlink from /etc/systemd/system/multi-user.target.wants/glusterd.service to /usr/lib/systemd/system/glusterd.service.
#

From first node: establish Gluster cluster nodes trust relationship:

# gluster peer probe node-gfs2
peer probe: success. 
# gluster peer probe node-gfs3
peer probe: success.

Now check the status of peer commands:

# gluster peer status
Number of Peers: 2
 
Hostname: node-gfs2
Uuid: 2a7ea8f6-0832-42ba-a98e-6fe7d67fcfe9
State: Peer in Cluster (Connected)
 
Hostname: node-gfs3
Uuid: 55b0ce72-0c34-441f-ab3c-88414885e32d
State: Peer in Cluster (Connected)
#

On each server: prepare the volumes:

# mkdir -p /glusterfs/bricks/brick1
# mkfs.xfs /dev/xvdf

Add to /etc/fstab:

UUID=8f808cef-c7c6-4c2a-bf15-0e32ef71e97c /glusterfs/bricks/brick1 xfs    defaults        0 0

Then mount it

Note: If you use the mount point directly I get the error:

# gluster volume create wp replica 3 node-gfs1:/glusterfs/bricks/brick1 node-gfs2:/glusterfs/bricks/brick1 node-gfs3:/glusterfs/bricks/brick1
volume create: wp: failed: The brick node-gfs1:/glusterfs/bricks/brick1 is a mount point. Please create a sub-directory under the mount point and use that as the brick directory. Or use 'force' at the end of the command if you want to override this behavior.

So create under each /glusterfs/bricks/brick1 mount point a directory used for GlusterFS volume, in my case I created /glusterfs/bricks/brick1/gv.

From server 1: Create a 3-way mirror volume:

# gluster volume create wp replica 3 node-gfs1:/glusterfs/bricks/brick1/gv node-gfs2:/glusterfs/bricks/brick1/gv node-gfs3:/glusterfs/bricks/brick1/gv
volume create: wp: success: please start the volume to access data
#

Check the status:

# gluster volume info
 
Volume Name: wp
Type: Replicate
Volume ID: 34dbacba-344e-4c89-875f-4c91812f01be
Status: Created
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: node-gfs1:/glusterfs/bricks/brick1/gv
Brick2: node-gfs2:/glusterfs/bricks/brick1/gv
Brick3: node-gfs3:/glusterfs/bricks/brick1/gv
Options Reconfigured:
performance.readdir-ahead: on
#

Now start the volume:

# gluster volume start wp
volume start: wp: success
#

Check the status of the volume:

# gluster volume status
Status of volume: wp
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick node-gfs1:/glusterfs/bricks/brick1/
gv                                          49152     0          Y       10229
Brick node-gfs2:/glusterfs/bricks/brick1/
gv                                          49152     0          Y       9323 
Brick node-gfs3:/glusterfs/bricks/brick1/
gv                                          49152     0          Y       9171 
NFS Server on localhost                     2049      0          Y       10249
Self-heal Daemon on localhost               N/A       N/A        Y       10257
NFS Server on node-gfs2                     2049      0          Y       9343 
Self-heal Daemon on node-gfs2               N/A       N/A        Y       9351 
NFS Server on node-gfs3                     2049      0          Y       9191 
Self-heal Daemon on node-gfs3               N/A       N/A        Y       9199 
 
Task Status of Volume wp
------------------------------------------------------------------------------
There are no active volume tasks
 
#

This is a healthy volume, if one of the servers goes offline it will disappear from the table above and reappears when it’s back online. Also peer status will be disconnected (from gluster peer status command).

Using the volume (Gluster clients)

Mount on each machine: In server 1 I will mount from server 2, from server 2 i will mount from server 3, from server 3 I will mount from server 1.

Using the syntax in /etc/fatab:

node-gfs2:/wp        /var/www/html      glusterfs     defaults,_netdev  0  0

Repeat it on each server as per my above note.

Now /var/www/html is shared on each machine in read/write mode.

References

  • http://severalnines.com/blog/scaling-wordpress-and-mysql-multiple-servers-performance
  • http://www.slashroot.in/gfs-gluster-file-system-complete-tutorial-guide-for-an-administrator
  • https://wiki.centos.org/HowTos/GlusterFSonCentOS