Setting up Docker Swarm in VMware NSX

Setting up Docker Swarm is pretty simple. BUT VMWare NSX is a little annoying, in that it blocks the VXLAN transport port (TCP Port 4789) at the hypervisor level. I’m sure this seemed GREAT for security, but it majorly messes up any application USING VXLAN inside the transport zone. Suck as Docker Swarm inside a cloud provider who uses VMWare NSX. As long as you know about this, you can work around it, however, as you can specify an alternate VXLAN port when you initialize your swarm! So let’s do that!

We will be bringing up a swarm on a cluster today with one manager and four nodes – each host has two network interfaces – we’ll be using ens160 in for our transport network. we use the –data-path-port parameter to set the VXLAN port.

Note: Our manager, and all nodes, already need Docker installed, incase this isn’t obvious ūüėÄ

root@prod-swarm-manager-1:~# docker swarm init --data-path-port 4788 --advertise-addr
 Swarm initialized: current node (p9ojg9edmipi7saldcbrcnhyt) is now a manager.
To add a worker to this swarm, run the following command:

    docker swarm join --token SWMTKN-1-42rg6zgs3onagtyamztitzgqb21z9hmwnwfdqoabmew4ppk2i5-2r0upkukt2asdfsdf3234512ad

To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions. 

And we now have a swarm (with one node) up. Time to add more nodes!

root@prod-swarm-node-1:~# docker swarm join --token SWMTKN-1-42rg6zgs3onagtyamztitzgqb21z9hmwnwfdqoabmew4ppk2i5-2r0upkukt2asdfsdf3234512ad
This node joined a swarm as a worker. 

root@prod-swarm-node-2:~# docker swarm join --token SWMTKN-1-42rg6zgs3onagtyamztitzgqb21z9hmwnwfdqoabmew4ppk2i5-2r0upkukt2asdfsdf3234512ad
This node joined a swarm as a worker. 

root@prod-swarm-node-3:~# docker swarm join --token SWMTKN-1-42rg6zgs3onagtyamztitzgqb21z9hmwnwfdqoabmew4ppk2i5-2r0upkukt2asdfsdf3234512ad
This node joined a swarm as a worker. 

root@prod-swarm-node-3:~# docker swarm join --token SWMTKN-1-42rg6zgs3onagtyamztitzgqb21z9hmwnwfdqoabmew4ppk2i5-2r0upkukt2asdfsdf3234512ad
This node joined a swarm as a worker. 

We should now have our swarm up and running – run docker node list, to see!

root@prod-swarm-manager-1:~# docker node list
 ID                            HOSTNAME                 STATUS              AVAILABILITY        MANAGER STATUS      ENGINE VERSION
 p9ojg9edmipi7saldcbrcnhyt *   prod-swarm-manager-1     Ready               Active              Leader              19.03.1
 lnp2b2ijurmtamp0if4aner7y     prod-swarm-node-1        Ready               Active                                  19.03.1
 caxka5zdq0nb9lilcvss1fv82     prod-swarm-node-2        Ready               Active                                  19.03.1
 k0ar3rgjzoz1jjncfpr5xd9t1     prod-swarm-node-3        Ready               Active                                  19.03.1
 oa0ym3ytsgf5svbs2rz205jwr     prod-swarm-node-4        Ready               Active                                  19.03.1

We do, perfect! We now want to manage the swarm with a nice web interface, so lets bring up swarmpit.

root@prod-swarm-manager-1:~# docker run -it --rm \
>   --name swarmpit-installer \
>   --volume /var/run/docker.sock:/var/run/docker.sock \
>   swarmpit/install:1.7
Unable to find image 'swarmpit/install:1.7' locally
1.7: Pulling from swarmpit/install
e7c96db7181b: Pull complete 
5297bd381816: Pull complete 
3a664477889c: Pull complete 
a9b893dcc701: Pull complete 
48bf7c1cb0dd: Pull complete 
555b6ea27ad2: Pull complete 
7e8a5ec7012a: Pull complete 
6adc20046ac5: Pull complete 
42a1f54aa48c: Pull complete 
717a4f34e541: Pull complete 
f95ad45cac17: Pull complete 
f963bb249c55: Pull complete 
Digest: sha256:04e47b8533e5b4f9198d4cbdfea009acac56417227ce17a9f1df549ab66a8520
Status: Downloaded newer image for swarmpit/install:1.7
                                        _ _   
 _____      ____ _ _ __ _ __ ___  _ __ (_) |_ 
/ __\ \ /\ / / _` | '__| '_ ` _ \| '_ \| | __|
\__ \\ V  V / (_| | |  | | | | | | |_) | | |_ 
|___/ \_/\_/ \__,_|_|  |_| |_| |_| .__/|_|\__|
Welcome to Swarmpit
Version: 1.7
Branch: 1.7

Preparing dependencies
latest: Pulling from byrnedo/alpine-curl
8e3ba11ec2a2: Pull complete 
6522ab4c8603: Pull complete 
Digest: sha256:e8cf497b3005c2f66c8411f814f3818ecd683dfea45267ebfb4918088a26a18c
Status: Downloaded newer image for byrnedo/alpine-curl:latest

Preparing installation
Cloning into 'swarmpit'...
remote: Enumerating objects: 6, done.
remote: Counting objects: 100% (6/6), done.
remote: Compressing objects: 100% (6/6), done.
remote: Total 17028 (delta 1), reused 1 (delta 0), pack-reused 17022
Receiving objects: 100% (17028/17028), 4.39 MiB | 3.05 MiB/s, done.
Resolving deltas: 100% (10146/10146), done.

Application setup
Enter stack name [swarmpit]: prod-swarmpit
Enter application port [888]: 
Enter database volume driver [local]: 
Enter admin username [admin]: 
Enter admin password (min 8 characters long): SYJpt6FQ@*j2ztPZ53^yF@!q5VRkZRyr*h$ydWGYE67$RWaHWat5Q$g6#zQtA3q^8QgQeSAMBEPT2^z8t2y#GKb5^X%e

Application deployment
Creating network prod-swarmpit_net
Creating service prod-swarmpit_db
Creating service prod-swarmpit_agent
Creating service prod-swarmpit_app

Starting swarmpit............DONE.
Initializing swarmpit...DONE.

Username: admin
Password: SYJpt6FQ@*j2ztPZ53^yF@!q5VRkZRyr*h$ydWGYE67$RWaHWat5Q$g6#zQtA3q^8QgQeSAMBEPT2^z8t2y#GKb5^X%e
Swarmpit is running on port :888

Enjoy ūüôā

And bingo! If I hit up the manager host on port 888, I can login and view the swarm state!

Setting up docker on a new Ubuntu 18.04 server

This is actually fairly simple ūüôā

In fact, REALLY simple ūüėÄ Just do the following:

curl -fsSL | sudo apt-key add -
add-apt-repository "deb [arch=amd64] $(lsb_release -cs) stable"
apt update
apt-get install -y docker-ce
systemctl start docker
systemctl enable docker 

If you’re running CSF, you’ll want a couple of extra CSF modules installed, namely and

But other than that? Yep, all done ūüôā

Changing hostname on Ubuntu 18.04

So we have finally started rolling out 18.04 VM’s for corporate use at work (R1soft finally started rolling ‘non-beta’ modules, which was our main blocker), occasionally I’ll go to run up a bunch of VM’s and munge the hostname of one machine.. With cloud-init, it’s not quite a simple as editing /etc/hostname and rebooting anymore.. But it’s not too bad ūüôā

First, edit /etc/cloud/cloud.cfg, and look for the preseve_hostname: field – you want this set to true.

#This will cause the set+update hostname module to not operate (if true)
preserve_hostname: true

Once done, run ‘hostnamectl’

root@prod-docker-manager-1:~# hostnamectl
   Static hostname: prod-docker-manager-1
         Icon name: computer-vm
           Chassis: vm
        Machine ID: 71431bc67882462ab8752997212223e8
           Boot ID: 2523f4f9cc3142b5bf56ad73f93da02e
    Virtualization: vmware
  Operating System: Ubuntu 18.04.2 LTS
            Kernel: Linux 4.15.0-45-generic
      Architecture: x86-64

This will show your current hostname. I guess you don’t need to really show this, it’s handy to know that it IS set as static. If it’s not, you’ll want to go google something ūüėČ

You can now just run ‘hostnamectl set-hostname <newhostname>’

root@prod-docker-manager-1:~# hostnamectl set-hostname prod-swarm-manager-1
root@prod-docker-manager-1:~# hostnamectl
   Static hostname: prod-swarm-manager-1
         Icon name: computer-vm
           Chassis: vm
        Machine ID: 71431bc67882462ab8752997212223e8
           Boot ID: 2523f4f9cc3142b5bf56ad73f93da02e
    Virtualization: vmware
  Operating System: Ubuntu 18.04.2 LTS
            Kernel: Linux 4.15.0-45-generic
      Architecture: x86-64

And you’re good to go. Reboot if you have processes running which depend on the hostname. Or not if this is a brand new host (which is my usual case, where I’ve munged something during the VM install, and am now SSH’d in as the temporary local user to do the actual provisioning).

SSD Caching – Actually doing it

So, caching is built right into LVM these days. ¬†It’s quite neat. ¬†I’m testing this on my OTHER caching box – a shallow 1RU box with space for two SSD’s and that’s about it.

First step is to mount an ISCSI target. ¬†I’m just mounting a target I created on my fileserver, to save some latency (I can mount the main SAN from the DC, but there’s 15ms latency due to the EOIP tunnel over the ADSL here). There’s a much more detailed writeup of this Here

root@isci-cache01:~# iscsiadm -m discovery -t st -p,1,1

root@isci-cache01:~# iscsiadm -m node --targetname "" --portal "" --login
Logging in to [iface: default, target:, portal:,3260] (multiple)
Login to [iface: default, target:, portal:,3260] successful.

[ 193.182145] scsi6 : iSCSI Initiator over TCP/IP
[ 193.446401] scsi 6:0:0:0: Direct-Access IET VIRTUAL-DISK 0 PQ: 0 ANSI: 4
[ 193.456619] sd 6:0:0:0: Attached scsi generic sg1 type 0
[ 193.466849] sd 6:0:0:0: [sdb] 1048576000 512-byte logical blocks: (536 GB/500 GiB)
[ 193.469692] sd 6:0:0:0: [sdb] Write Protect is off
[ 193.469697] sd 6:0:0:0: [sdb] Mode Sense: 77 00 00 08
[ 193.476918] sd 6:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[ 193.514882] sdb: unknown partition table
[ 193.538467] sd 6:0:0:0: [sdb] Attached SCSI disk

root@isci-cache01:~# pvcreate /dev/sdb
root@isci-cache01:~# vgcreate vg_iscsi /dev/sdb

root@isci-cache01:~# pvdisplay
--- Physical volume ---
PV Name               /dev/sdb
VG Name               vg_iscsi
PV Size               500.00 GiB / not usable 4.00 MiB
Allocatable           yes
PE Size               4.00 MiB
Total PE              127999
Free PE               127999
Allocated PE          0
PV UUID               0v8SWY-2SSA-E2oL-iAdE-yeb4-owyG-gHXPQK
--- Physical volume ---
PV Name               /dev/sda5
VG Name               isci-cache01-vg
PV Size               238.24 GiB / not usable 0
Allocatable           yes
PE Size               4.00 MiB
Total PE              60988
Free PE               50784
Allocated PE          10204

PV UUID               Y3O48a-tep7-nYjx-gEck-bcwk-tJzP-2Sc2pP

root@isci-cache01:~# lvcreate -L 499G -n testiscsilv vg_iscsi
Logical volume "testiscsilv" created
root@isci-cache01:~# mkfs -t ext4 /dev/mapper/vg_iscsi-testiscsilv
mke2fs 1.42.12 (29-Aug-2014)
Creating filesystem with 130809856 4k blocks and 32702464 inodes
Filesystem UUID: 9aa5f499-902a-4935-bc67-61dd8930e014
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
Allocating group tables: done
Writing inode tables: done

Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

Now things get a little tricky, as I’d already installed my system with the ssd in one volume group.. ¬†I’ll be using a RAID array for the production box for PiCloud.
For now, we’ll just create an LV from the SSD VG, and then add it to the iscsi VG.

root@isci-cache01:~# lvcreate -L 150G -n iscsicaching isci-cache01-vg
Logical volume "iscsicaching" created
root@isci-cache01:~# vgextend vg_iscsi  /dev/mapper/isci--cache01--vg-iscsicaching
  Physical volume "/dev/isci-cache01-vg/iscsicaching" successfully created
  Volume group "vg_iscsi" successfully extended
root@isci-cache01:~# lvcreate -L 1G -n cache_meta_lv vg_iscsi /dev/isci-cache01-vg/iscsicaching
Logical volume "cache_meta_lv" created
root@isci-cache01:~# lvcreate -L 148G -n cache_lv vg_iscsi /dev/isci-cache01-vg/iscsicaching
  Logical volume "cache_lv" created
root@isci-cache01:~# lvs
LV            VG              Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
iscsicaching  isci-cache01-vg -wi-ao---- 150.00g
root          isci-cache01-vg -wi-ao----  30.18g
swap_1        isci-cache01-vg -wi-ao----   9.68g
cache_lv      vg_iscsi        -wi-a----- 148.00g
cache_meta_lv vg_iscsi        -wi-a-----   1.00g
  testiscsilv   vg_iscsi        -wi-a----- 499.00g
root@isci-cache01:~# pvs
PV                                VG              Fmt  Attr PSize   PFree
/dev/isci-cache01-vg/iscsicaching vg_iscsi        lvm2 a--  150.00g 1020.00m
/dev/sda5                         isci-cache01-vg lvm2 a--  238.23g   48.38g
  /dev/sdb                          vg_iscsi        lvm2 a--  500.00g 1020.00m
Now we want to convert these two new LV's into a 'cache pool'
root@isci-cache01:~# lvconvert --type cache-pool --poolmetadata vg_iscsi/cache_meta_lv vg_iscsi/cache_lv
WARNING: Converting logical volume vg_iscsi/cache_lv and vg_iscsi/cache_meta_lv to pool's data and metadata volumes.
Do you really want to convert vg_iscsi/cache_lv and vg_iscsi/cache_meta_lv? [y/n]: y
Logical volume "lvol0" created
  Converted vg_iscsi/cache_lv to cache pool.

And now we want to attach this cache pool to our iscsi LV.

root@isci-cache01:~# lvconvert --type cache --cachepool vg_iscsi/cache_lv vg_iscsi/testiscsilv
  Logical volume vg_iscsi/testiscsilv is now cached.
root@isci-cache01:~# dd if=/dev/zero of=/export/test1 bs=1024k count=60
60+0 records in
60+0 records out
62914560 bytes (63 MB) copied, 0.0401375 s, 1.6 GB/s
root@isci-cache01:~# dd if=/dev/zero of=/export/test1 bs=1024k count=5000
^C2512+0 records in
2512+0 records out
2634022912 bytes (2.6 GB) copied, 7.321 s, 360 MB/sroot@isci-cache01:~# ls -l
total 0
root@isci-cache01:~# dd if=/export/test1 of=/dev/null
5144576+0 records in
5144576+0 records out
2634022912 bytes (2.6 GB) copied, 1.82355 s, 1.4 GB/s

Oh yeah!  Over a 15mbps network too!

Now we want to setup XFS quotas so we can have a quota per directory.

root@isci-cache01:/# echo "100001:/export/mounts/pi-01" >> /etc/projects
root@isci-cache01:/# echo "pi-01:10001" >> /etc/projid
root@isci-cache01:/# xfs_quota -x -c 'project -s pi-01' /export
root@isci-cache01:/# xfs_quota -x -c 'limit -p bhard=5g pi-01' /export

root@isci-cache01:/# xfs_quota -x -c report /export
Project quota on /export (/dev/mapper/vg_iscsi-testiscsilv)
Project ID       Used       Soft       Hard    Warn/Grace
---------- --------------------------------------------------
pi-01         2473752          0    5242880     00 [--------]

Note: Need the thin-provisioning-tools package, and to ensure that your initramfs gets built with the proper modules included.

Sweet, so we CAN do this ūüôā

SSD Caching Server

The next issue I have been facing is that while I CAN netboot 16 pi’s off one NFS server (That’s not a problem at all), it is when my ‘large’ available storage is my iscsi SAN. ¬†I tried having a VM which mounted an iSCSI target and then NFS-shared it, but that’s just not working fast enough. ¬†Time to investigate SSD caching.. the Kernel DMCache driver looks pretty good..

I have this old Lacie ‘Gigabit Disk’ NAS – it’s has an 800MHz VIA CPU with 4 x 250GB IDE drives in it. ¬†It became too slow years ago, but I’ve recently picked up some H61M-ITX mini-ITX boards which work were selling off. ¬†Think it’ll make a nice upgrade for the NAS ūüôā


It’s a nice little unit. ¬†Think it’ll fit what I want to put in it pretty well.IMG_3580IMG_3589

Sweet, that fits! ¬†H61M-ITX, 16GB ram, and an i7-2600. ¬†And 8 x 240GB M500 SSD’s for the caching. ¬†They’ll run on the 3ware 9650se-8lmpl raid card.