Provisioning a Galera Cluster on Ubuntu 18.04

So, we want to bring up a Galera cluster, and do some basic testing of how to bring it back online should things go pear shaped

First, install MariaDB on all three nodes

# apt-get install software-properties-common
# apt-key adv --recv-keys --keyserver hkp://keyserver.ubuntu.com:80 0xF1656F24C74CD1D8
# add-apt-repository "deb [arch=amd64,arm64,ppc64el] http://mariadb.mirror.liquidtelecom.com/repo/10.4/ubuntu $(lsb_release -cs) main"
# apt update
# apt -y install mariadb-server mariadb-client

At this point mariadb is installed, but has no root password configured.

# mysql_secure_installation
 NOTE: RUNNING ALL PARTS OF THIS SCRIPT IS RECOMMENDED FOR ALL MariaDB
       SERVERS IN PRODUCTION USE!  PLEASE READ EACH STEP CAREFULLY!
 In order to log into MariaDB to secure it, we'll need the current
 password for the root user. If you've just installed MariaDB, and
 haven't set the root password yet, you should just press enter here.
 Enter current password for root (enter for none): 
 OK, successfully used password, moving on…
 Setting the root password or using the unix_socket ensures that nobody
 can log into the MariaDB root user without the proper authorisation.
 You already have your root account protected, so you can safely answer 'n'.
 Switch to unix_socket authentication [Y/n] 
 Enabled successfully!
 Reloading privilege tables..
  … Success!
 You already have your root account protected, so you can safely answer 'n'.
 Change the root password? [Y/n] y
 New password: 
 Re-enter new password: 
 Password updated successfully!
 Reloading privilege tables..
  … Success!
 By default, a MariaDB installation has an anonymous user, allowing anyone
 to log into MariaDB without having to have a user account created for
 them.  This is intended only for testing, and to make the installation
 go a bit smoother.  You should remove them before moving into a
 production environment.
 Remove anonymous users? [Y/n] y
  … Success!
 Normally, root should only be allowed to connect from 'localhost'.  This
 ensures that someone cannot guess at the root password from the network.
 Disallow root login remotely? [Y/n] y
  … Success!
 By default, MariaDB comes with a database named 'test' that anyone can
 access.  This is also intended only for testing, and should be removed
 before moving into a production environment.
 Remove test database and access to it? [Y/n] y
 Dropping test database…
 … Success!
 Removing privileges on test database…
 … Success! 
 Reloading the privilege tables will ensure that all changes made so far
 will take effect immediately.
 Reload privilege tables now? [Y/n] y
  … Success!
 Cleaning up…
 All done!  If you've completed all of the above steps, your MariaDB
 installation should now be secure.
 Thanks for using MariaDB!

Do this on both servers, and you’re now ready to configure the Galera Cluster portion! On each node, you want to create a /etc/mysql/mariadb.conf.d/galera.cnf file:

# cat /etc/mysql/mariadb.conf.d/galera.cnf
[mysqld]
character-set-server = utf8
character_set_server = utf8
bind-address=0.0.0.0
port=3306
default_storage_engine=InnoDB
binlog_format=row
innodb_autoinc_lock_mode=2
# Galera cluster configuration
wsrep_on=ON
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_cluster_address="gcomm://192.168.1.201,192.168.1.202,192.168.1.203"
wsrep_cluster_name="galera-cluster-1"
wsrep_sst_method=rsync
# Cluster node configuration
wsrep_node_address="192.168.1.201"
wsrep_node_name="galera-host-01"

#

The only difference between this file on each node is the last two lines – wsrep_node_address, and wsrep_node_name. note: You’ll probably want to have hosts file entries on your nodes to map IP’s to names and vice versa, unless you have reliable DNS configured internally, as it affects your cluster status displays.
Your wsrep_cluster_address line will have the IP’s of your cluster nodes.

Stop MariaDB on all nodes, and boot galera on the first node:

node2# systemctl stop mariadb
node3# systemctl stop mariadb
node1# systemctl stop mariadb
node1# galera_new_cluster

Your cluster should now have started. Lets check the cluster state.

root@galera-host-01:~# mysql -e "show status like 'wsrep_%'"
 
+-------------------------------+------------------------------------------------
| Variable_name                 | Value                               +-------------------------------+------------------------------------------------
 | wsrep_local_state_uuid        | fd6dbdcc-c95e-11e9-ac52-570534ceb766           
 | wsrep_protocol_version        | 10  
 | wsrep_last_committed          | 1 
 | wsrep_replicated              | 0 
 | wsrep_replicated_bytes        | 0 
 | wsrep_repl_keys               | 0 
 | wsrep_repl_keys_bytes         | 0 
 | wsrep_repl_data_bytes         | 0 
 | wsrep_repl_other_bytes        | 0 
 | wsrep_received                | 2 
 | wsrep_received_bytes          | 144 
 | wsrep_local_commits           | 0  
 | wsrep_local_cert_failures     | 0  
 | wsrep_local_replays           | 0 
 | wsrep_local_send_queue        | 0 
 | wsrep_local_send_queue_max    | 1 
 | wsrep_local_send_queue_min    | 0 
 | wsrep_local_send_queue_avg    | 0 
 | wsrep_local_recv_queue        | 0 
 | wsrep_local_recv_queue_max    | 1 
 | wsrep_local_recv_queue_min    | 0 
 | wsrep_local_recv_queue_avg    | 0  
 | wsrep_local_cached_downto     | 1 
 | wsrep_flow_control_paused_ns  | 0 
 | wsrep_flow_control_paused     | 0 
 | wsrep_flow_control_sent       | 0 
 | wsrep_flow_control_recv       | 0 
 | wsrep_cert_deps_distance      | 0 
 | wsrep_apply_oooe              | 0 
 | wsrep_apply_oool              | 0 
 | wsrep_apply_window            | 0 
 | wsrep_commit_oooe             | 0 
 | wsrep_commit_oool             | 0 
 | wsrep_commit_window           | 0 
 | wsrep_local_state             | 4 
 | wsrep_local_state_comment     | Synced 
 | wsrep_cert_index_size         | 0 
 | wsrep_causal_reads            | 0 
 | wsrep_cert_interval           | 0 
 | wsrep_open_transactions       | 0 
 | wsrep_open_connections        | 0 
 | wsrep_incoming_addresses      | AUTO 
 | wsrep_cluster_weight          | 1 
 | wsrep_desync_count            | 0 
 | wsrep_evs_delayed             |  
 | wsrep_evs_evict_list          |  
 | wsrep_evs_repl_latency        | 0/0/0/0/0 
 | wsrep_evs_state               | OPERATIONAL 
 | wsrep_gcomm_uuid              | fd6cf1bd-c95e-11e9-98ab-d2e5733d21d0           
 | wsrep_applier_thread_count    | 1 
 | wsrep_cluster_capabilities    |  
 | wsrep_cluster_conf_id         | 18446744073709551615 
 | wsrep_cluster_size            | 1
 | wsrep_cluster_state_uuid      | fd6dbdcc-c95e-11e9-ac52-570534ceb766           
 | wsrep_cluster_status          | Primary  
 | wsrep_connected               | ON  
 | wsrep_local_bf_aborts         | 0  
 | wsrep_local_index             | 0  
 | wsrep_provider_capabilities   | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:T
 | wsrep_provider_name           | Galera 
 | wsrep_provider_vendor         | Codership Oy <info@codership.com>
 | wsrep_provider_version        | 26.4.2(r4498) 
 | wsrep_ready                   | ON   
 | wsrep_rollbacker_thread_count | 2 
 | wsrep_thread_count            | 3 
 +-------------------------------+------------------------------------------------ 

Check the cluster size to make sure the cluster came up – it should be a cluster of one right now

root@galera-host-01:~# mysql -e "show status like 'wsrep_cluster_size'"
Enter password: 
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 1     |
+--------------------+-------+

This looks good – time to boot the second server

galera-host-02# systemctl start mariadb

Check the cluster size again – it should now be 2

root@galera-host-01:~# mysql -u root -p -e "show status like 'wsrep_cluster_size'"
Enter password: 
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 2     |
+--------------------+-------+

Start node3, and you’ll see the node size at 3. Nice.

Cluster monitoring for HAProxy

Having a cluster is no good if you don’t have your client machines load balancing across them, and only talking to ‘up’ servers.

For this you need a way for HAProxy to know the state of each server. I like the ‘clustercheck’ script from obissick. (https://github.com/obissick/Galera-ClusterCheck)

First we add a cluster check user to MySQL.

create user clustercheckuser@'localhost' IDENTIFIED BY 'h3fUU3373Pb17Vjt&^C39hFHelA';
GRANT PROCESS ON *.* TO 'clustercheckuser'@'localhost';

Then Install the clustercheck script, and edit it

curl https://raw.githubusercontent.com/obissick/Galera-ClusterCheck/master/clustercheck.sh > /usr/bin/clustercheck
chmod +x /usr/bin/clustercheck
vi /usr/bin/clustercheck
edit the MYSQL_PASSWORD line to put the password in..
i.e. MYSQL_PASSWORD="${2-h3fUU3373Pb17Vjt&^C39hFHelA}"
apt-get install -y xinetd
echo "mysqlchk 9200/tcp #mysql check script" >> /etc/services
cat > /etc/xinetd.d/mysqlchk << __END__
default: on
description: mysqlchk
service mysqlchk
{
disable = no
flags = REUSE
socket_type = stream
port = 9200
wait = no
user = nobody
server = /usr/bin/clustercheck
log_on_failure += USERID
only_from = 192.168.1.0/24
per_source = UNLIMITED
}
__END__
service xinetd restart

Note: you will want to change the only_from to match any IP’s which will be running haproxy. Deploy the above to all members of your galera cluster.

Now, on each client machine, we can configure a block in haproxy similar to the below:

frontend mysql-dev-front
bind 127.0.0.1:3307
mode tcp
default_backend mysql-dev-back

backend mysql-dev-back
mode tcp
balance leastconn
option tcpka
option httpchk
default-server port 9200 inter 2s downinter 5s rise 3 fall 2 slowstart 60s weight 100
server node1 192.168.1.201:3306 check
server node2 192.168.1.202:3306 check
server node3 192.168.1.203:3306 check

NOTE: Make sure that you can connect to port 9200 on both MySQL servers from your client server BEFORE enabling this config in HAProxy!

curl http://192.168.1.201:9200
Galera Cluster Node is synced.
curl: (56) Recv failure: Connection reset by peer

once this is complete, any applications on your client machine can connect to localhost:3307

Failure Modes

Ok so this is one of the most important things we need to think about. There are a few failure modes we need to be able to handle in the cluster.

One node is shut down

Lets insert some rows into a test database, and then shut down node 2.

root@galera-host-01:~# mysql
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 50
Server version: 10.4.7-MariaDB-1:10.4.7+maria~bionic-log mariadb.org binary distribution
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> create database ninjas;
Query OK, 1 row affected (0.006 sec)
MariaDB [(none)]> use ninjas;
Database changed
MariaDB [ninjas]> create table table1 (row1 integer not null);
Query OK, 0 rows affected (0.013 sec)
MariaDB [ninjas]> insert into table1 values (1), (2), (3), (4), (5), (6);
Query OK, 6 rows affected (0.006 sec)
Records: 6  Duplicates: 0  Warnings: 0

root@galera-host-02:~# mysql ninjas -e "select * from table1;"
 +------+
 | row1 |
 +------+
 |    1 |
 |    2 |
 |    3 |
 |    4 |
 |    5 |
 |    6 |
 +------+
root@galera-host-02:~# systemctl stop mariadb

root@galera-host-01:~# mysql ninjas -e "insert into table1 values (7), (8), (9);"

Ok, so we now have data in the DB since host-02 was shutdown. We’re now going to bring host-02 back up and check that it comes back into the cluster cleanly

root@galera-host-02:~# systemctl start mariadb
root@galera-host-02:~# mysql ninjas -e "select * from table1;"
+------+
| row1 |
+------+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
+------+

Perfect.

All the nodes are shut down

And re-started in the correct order

How about where we have to shut down the cluster, and bring it back online? Lets shut down the nodes, first we’ll do reverse order 03, 02, 01, then bring them up 01, 02, 03. (Yes, That’s the way you’re meant to do it. We’re going to do it wrong soon, and see how to recover from that..)

root@galera-host-03:~# systemctl stop mariadb
root@galera-host-02:~# systemctl stop mariadb
root@galera-host-01:~# systemctl stop mariadb

If we have a look in /var/lib/mysql/grastate.dat on each host, we’ll see that galera-host-01 is indeed the host we should be booting the cluster from:

root@galera-host-01:~# cat /var/lib/mysql/grastate.dat 
GALERA saved state
version: 2.1
uuid:    a10015aa-cd62-11e9-a80b-87dacc3a89c3
seqno:   11
safe_to_bootstrap: 1

root@galera-host-02:~# cat /var/lib/mysql/grastate.dat 
GALERA saved state
version: 2.1
uuid:    a10015aa-cd62-11e9-a80b-87dacc3a89c3
seqno:   10
safe_to_bootstrap: 0

root@galera-host-03:~# cat /var/lib/mysql/grastate.dat 
GALERA saved state
version: 2.1
uuid:    a10015aa-cd62-11e9-a80b-87dacc3a89c3
seqno:   9
safe_to_bootstrap: 0

Ok, so we should simply be able to boot the cluster by running galera_new_cluster on galera-host-01, and then starting the other hosts

root@galera-host-01:~# galera_new_cluster
root@galera-host-01:~# mysql -e "show status like 'wsrep_cluster_size'"
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 1     |
+--------------------+-------+

root@galera-host-02:~# systemctl start mariadb

root@galera-host-01:~# mysql -e "show status like 'wsrep_cluster_size'"
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 2     |
+--------------------+-------+

root@galera-host-03:~# systemctl start mariadb

root@galera-host-01:~# mysql -e "show status like 'wsrep_cluster_size'"
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 3     |
+--------------------+-------+

Restarting the cluster in the wrong order

Ok now lets shut it down with host 01 first, and then boot it from host01 first. So, we’re going to shut down the nodes, and then check our galera replication state files

root@galera-host-01:~# systemctl stop mariadb
root@galera-host-02:~# systemctl stop mariadb
root@galera-host-03:~# systemctl stop mariadb

root@galera-host-01:~# cat /var/lib/mysql/grastate.dat 
GALERA saved state
version: 2.1
uuid:    a10015aa-cd62-11e9-a80b-87dacc3a89c3
seqno:   15
safe_to_bootstrap: 0

root@galera-host-02:~# cat /var/lib/mysql/grastate.dat 
GALERA saved state
version: 2.1
uuid:    a10015aa-cd62-11e9-a80b-87dacc3a89c3
seqno:   16
safe_to_bootstrap: 0

root@galera-host-03:~# cat /var/lib/mysql/grastate.dat 
GALERA saved state
version: 2.1
uuid:    a10015aa-cd62-11e9-a80b-87dacc3a89c3
seqno:   17
safe_to_bootstrap: 1

Yep, so host-03 would be the correct host to start. But we’re going to start host-01.

root@galera-host-01:~# galera_new_cluster 
 Job for mariadb.service failed because the control process exited with error code.
 See "systemctl status mariadb.service" and "journalctl -xe" for details.

Well, that’s promising. It won’t let us do it. If we check the log, we se:

2019-09-02  9:58:05 0 [Note] WSREP: Start replication
2019-09-02  9:58:05 0 [Note] WSREP: Connecting with bootstrap option: 1
2019-09-02  9:58:05 0 [Note] WSREP: Setting GCS initial position to a10015aa-cd62-11e9-a80b-87dacc3a89c3:15
2019-09-02  9:58:05 0 [ERROR] WSREP: It may not be safe to bootstrap the cluster from this node. It was not the last one to leave the cluster and may not contain all the updates. To force cluster bootstrap with this node, edit the grastate.dat file manually and set safe_to_bootstrap to 1 .
2019-09-02  9:58:05 0 [ERROR] WSREP: wsrep::connect(gcomm://192.168.13.201,192.168.13.202,192.168.13.203) failed: 7
2019-09-02  9:58:05 0 [ERROR] Aborting

Ok. lets make it really bad, lets edit grastate.dat and set safe_to_bootstrap to 1 😀

root@galera-host-01:~# galera_new_cluster 
root@galera-host-01:~# mysql -e "show status like 'wsrep_cluster_size'"
 +--------------------+-------+
 | Variable_name      | Value |
 +--------------------+-------+
 | wsrep_cluster_size | 1     |
 +--------------------+-------+

Ok, promising. Lets start host-02

root@galera-host-02:~# systemctl start mariadb
root@galera-host-01:~# mysql -e "show status like 'wsrep_cluster_size'"
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 2     |
+--------------------+-------+

Well, that’s beautiful! Ok, lets see what happens when we start host-03, which is still in safe_to_bootstrap in its’ grastate file

root@galera-host-03:~# systemctl start mariadb
root@galera-host-01:~# mysql -e "show status like 'wsrep_cluster_size'"
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 3     |
+--------------------+-------+

Well, colour me impressed! Are they REALLY in sync?

root@galera-host-01:~# mysql ninjas -e "insert into table1 values (14), (15), (16);"

root@galera-host-02:~# mysql ninjas -e "select * from table1;"
+------+
| row1 |
+------+
|    1 |
|    2 |
|    3 |
|    4 |
|    5 |
|    6 |
|    7 |
|    8 |
|    9 |
|   10 |
|   11 
|   12 |
|   14 |
|   15 |
|   16 |
+------+

root@galera-host-03:~# mysql ninjas -e "select * from table1;"
+------+
| row1 |
+------+
|    1 |
|    2 |
|    3 |
|    4 |
|    5 |
|    6 |
|    7 |
|    8 |
|    9 |
|   10 |
|   11 
|   12 |
|   14 |
|   15 |
|   16 |
+------+

Well bugger me, that looks good!

Rebuilding a failed node

SHOULD you happen to have a node get completely buggered (say you have enough data that the systemd 90 second startup timeout screws you during SST and leaves your node broken), you can need to do a clean setup of MariaDB to get Galera going again.

galera-host-02# apt-get purge mariadb-server-10.4
 Reading package lists… Done
 Building dependency tree       
 Reading state information… Done
 The following packages were automatically installed and are no longer required:
   galera-4 libaio1 libcgi-fast-perl libcgi-pm-perl libfcgi-perl libhtml-template-perl mariadb-server-core-10.4 socat
 Use 'apt autoremove' to remove them.
 The following packages will be REMOVED:
   mariadb-server* mariadb-server-10.4*
 0 upgraded, 0 newly installed, 2 to remove and 0 not upgraded.
 After this operation, 77.7 MB disk space will be freed.
 Do you want to continue? [Y/n] y
 (Reading database … 68199 files and directories currently installed.)
 Removing mariadb-server (1:10.4.7+maria~bionic) …
 Removing mariadb-server-10.4 (1:10.4.7+maria~bionic) …
 Processing triggers for man-db (2.8.3-2ubuntu0.1) …
 (Reading database … 68069 files and directories currently installed.)
 Purging configuration files for mariadb-server-10.4 (1:10.4.7+maria~bionic) …

galera-host-02:~# rm -rf /var/lib/mysql/*
galera-host-02:~# mv /etc/mysql/mariadb.conf.d/galera.cnf /root/galera.cnf

galera-host-02:~# apt -y install mariadb-server mariadb-client
 Reading package lists… Done
 Building dependency tree       
 Reading state information… Done
 mariadb-client is already the newest version (1:10.4.7+maria~bionic).
 Suggested packages:
   mailx mariadb-test tinyca
 The following NEW packages will be installed:
   mariadb-server mariadb-server-10.4
 0 upgraded, 2 newly installed, 0 to remove and 0 not upgraded.
 Need to get 4,627 kB of archives.
 After this operation, 77.7 MB of additional disk space will be used.
 Get:1 http://mariadb.mirror.liquidtelecom.com/repo/10.4/ubuntu bionic/main amd64 mariadb-server-10.4 amd64 1:10.4.7+maria~bionic [4,624 kB]
 Get:2 http://mariadb.mirror.liquidtelecom.com/repo/10.4/ubuntu bionic/main amd64 mariadb-server all 1:10.4.7+maria~bionic [3,180 B]
 Fetched 4,627 kB in 6s (824 kB/s)          
 Preconfiguring packages …
 Selecting previously unselected package mariadb-server-10.4.
 (Reading database … 68059 files and directories currently installed.)
 Preparing to unpack …/mariadb-server-10.4_1%3a10.4.7+maria~bionic_amd64.deb …
 Unpacking mariadb-server-10.4 (1:10.4.7+maria~bionic) …
 Selecting previously unselected package mariadb-server.
 Preparing to unpack …/mariadb-server_1%3a10.4.7+maria~bionic_all.deb …
 Unpacking mariadb-server (1:10.4.7+maria~bionic) …
 Setting up mariadb-server-10.4 (1:10.4.7+maria~bionic) …
 Failed to stop mysql.service: Unit mysql.service not loaded.
 Created symlink /etc/systemd/system/mysql.service → /lib/systemd/system/mariadb.service.
 Created symlink /etc/systemd/system/mysqld.service → /lib/systemd/system/mariadb.service.
 Created symlink /etc/systemd/system/multi-user.target.wants/mariadb.service → /lib/systemd/system/mariadb.service.
 Setting up mariadb-server (1:10.4.7+maria~bionic) …
 Processing triggers for man-db (2.8.3-2ubuntu0.1) …

galera-host-02:~# mysql_secure_installation
......
<snipped>

galera-host-02:~# systemctl stop mariadb
galera-host-02:~# cp /root/galera.cnf /etc/mysql/mariadb.conf.d/
galera-host-02:~# echo 'TimeoutSec=infinity' >> /etc/systemd/system/mysqld.service
galera-host-02:~# systemctl daemon-reload
galera-host-02:~# systemctl start mariadb

And bingo, we’re back in the cluster 🙂

Setting up Docker Swarm in VMware NSX

Setting up Docker Swarm is pretty simple. BUT VMWare NSX is a little annoying, in that it blocks the VXLAN transport port (TCP Port 4789) at the hypervisor level. I’m sure this seemed GREAT for security, but it majorly messes up any application USING VXLAN inside the transport zone. Suck as Docker Swarm inside a cloud provider who uses VMWare NSX. As long as you know about this, you can work around it, however, as you can specify an alternate VXLAN port when you initialize your swarm! So let’s do that!

We will be bringing up a swarm on a cluster today with one manager and four nodes – each host has two network interfaces – we’ll be using ens160 in 10.129.2.0/24 for our transport network. we use the –data-path-port parameter to set the VXLAN port.

Note: Our manager, and all nodes, already need Docker installed, incase this isn’t obvious 😀

root@prod-swarm-manager-1:~# docker swarm init --data-path-port 4788 --advertise-addr 10.129.2.21
 Swarm initialized: current node (p9ojg9edmipi7saldcbrcnhyt) is now a manager.
 
To add a worker to this swarm, run the following command:

    docker swarm join --token SWMTKN-1-42rg6zgs3onagtyamztitzgqb21z9hmwnwfdqoabmew4ppk2i5-2r0upkukt2asdfsdf3234512ad 10.129.2.21:2377

To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions. 

And we now have a swarm (with one node) up. Time to add more nodes!

root@prod-swarm-node-1:~# docker swarm join --token SWMTKN-1-42rg6zgs3onagtyamztitzgqb21z9hmwnwfdqoabmew4ppk2i5-2r0upkukt2asdfsdf3234512ad 10.129.2.21:2377
This node joined a swarm as a worker. 

root@prod-swarm-node-2:~# docker swarm join --token SWMTKN-1-42rg6zgs3onagtyamztitzgqb21z9hmwnwfdqoabmew4ppk2i5-2r0upkukt2asdfsdf3234512ad 10.129.2.21:2377
This node joined a swarm as a worker. 

root@prod-swarm-node-3:~# docker swarm join --token SWMTKN-1-42rg6zgs3onagtyamztitzgqb21z9hmwnwfdqoabmew4ppk2i5-2r0upkukt2asdfsdf3234512ad 10.129.2.21:2377
This node joined a swarm as a worker. 

root@prod-swarm-node-3:~# docker swarm join --token SWMTKN-1-42rg6zgs3onagtyamztitzgqb21z9hmwnwfdqoabmew4ppk2i5-2r0upkukt2asdfsdf3234512ad 10.129.2.21:2377
This node joined a swarm as a worker. 

We should now have our swarm up and running – run docker node list, to see!

root@prod-swarm-manager-1:~# docker node list
 ID                            HOSTNAME                 STATUS              AVAILABILITY        MANAGER STATUS      ENGINE VERSION
 p9ojg9edmipi7saldcbrcnhyt *   prod-swarm-manager-1     Ready               Active              Leader              19.03.1
 lnp2b2ijurmtamp0if4aner7y     prod-swarm-node-1        Ready               Active                                  19.03.1
 caxka5zdq0nb9lilcvss1fv82     prod-swarm-node-2        Ready               Active                                  19.03.1
 k0ar3rgjzoz1jjncfpr5xd9t1     prod-swarm-node-3        Ready               Active                                  19.03.1
 oa0ym3ytsgf5svbs2rz205jwr     prod-swarm-node-4        Ready               Active                                  19.03.1

We do, perfect! We now want to manage the swarm with a nice web interface, so lets bring up swarmpit.

root@prod-swarm-manager-1:~# docker run -it --rm \
>   --name swarmpit-installer \
>   --volume /var/run/docker.sock:/var/run/docker.sock \
>   swarmpit/install:1.7
Unable to find image 'swarmpit/install:1.7' locally
1.7: Pulling from swarmpit/install
e7c96db7181b: Pull complete 
5297bd381816: Pull complete 
3a664477889c: Pull complete 
a9b893dcc701: Pull complete 
48bf7c1cb0dd: Pull complete 
555b6ea27ad2: Pull complete 
7e8a5ec7012a: Pull complete 
6adc20046ac5: Pull complete 
42a1f54aa48c: Pull complete 
717a4f34e541: Pull complete 
f95ad45cac17: Pull complete 
f963bb249c55: Pull complete 
Digest: sha256:04e47b8533e5b4f9198d4cbdfea009acac56417227ce17a9f1df549ab66a8520
Status: Downloaded newer image for swarmpit/install:1.7
                                        _ _   
 _____      ____ _ _ __ _ __ ___  _ __ (_) |_ 
/ __\ \ /\ / / _` | '__| '_ ` _ \| '_ \| | __|
\__ \\ V  V / (_| | |  | | | | | | |_) | | |_ 
|___/ \_/\_/ \__,_|_|  |_| |_| |_| .__/|_|\__|
                                 |_|          
Welcome to Swarmpit
Version: 1.7
Branch: 1.7

Preparing dependencies
latest: Pulling from byrnedo/alpine-curl
8e3ba11ec2a2: Pull complete 
6522ab4c8603: Pull complete 
Digest: sha256:e8cf497b3005c2f66c8411f814f3818ecd683dfea45267ebfb4918088a26a18c
Status: Downloaded newer image for byrnedo/alpine-curl:latest
DONE.

Preparing installation
Cloning into 'swarmpit'...
remote: Enumerating objects: 6, done.
remote: Counting objects: 100% (6/6), done.
remote: Compressing objects: 100% (6/6), done.
remote: Total 17028 (delta 1), reused 1 (delta 0), pack-reused 17022
Receiving objects: 100% (17028/17028), 4.39 MiB | 3.05 MiB/s, done.
Resolving deltas: 100% (10146/10146), done.
DONE.

Application setup
Enter stack name [swarmpit]: prod-swarmpit
Enter application port [888]: 
Enter database volume driver [local]: 
Enter admin username [admin]: 
Enter admin password (min 8 characters long): SYJpt6FQ@*j2ztPZ53^yF@!q5VRkZRyr*h$ydWGYE67$RWaHWat5Q$g6#zQtA3q^8QgQeSAMBEPT2^z8t2y#GKb5^X%e
DONE.

Application deployment
Creating network prod-swarmpit_net
Creating service prod-swarmpit_db
Creating service prod-swarmpit_agent
Creating service prod-swarmpit_app
DONE.

Starting swarmpit............DONE.
Initializing swarmpit...DONE.

Summary
Username: admin
Password: SYJpt6FQ@*j2ztPZ53^yF@!q5VRkZRyr*h$ydWGYE67$RWaHWat5Q$g6#zQtA3q^8QgQeSAMBEPT2^z8t2y#GKb5^X%e
Swarmpit is running on port :888

Enjoy :)

And bingo! If I hit up the manager host on port 888, I can login and view the swarm state!

Setting up docker on a new Ubuntu 18.04 server

This is actually fairly simple 🙂

In fact, REALLY simple 😀 Just do the following:

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
apt update
apt-get install -y docker-ce
systemctl start docker
systemctl enable docker 

If you’re running CSF, you’ll want a couple of extra CSF modules installed, namely https://github.com/juliengk/csf-pre_post_sh and https://github.com/juliengk/csf-post-docker

But other than that? Yep, all done 🙂

Changing hostname on Ubuntu 18.04

So we have finally started rolling out 18.04 VM’s for corporate use at work (R1soft finally started rolling ‘non-beta’ modules, which was our main blocker), occasionally I’ll go to run up a bunch of VM’s and munge the hostname of one machine.. With cloud-init, it’s not quite a simple as editing /etc/hostname and rebooting anymore.. But it’s not too bad 🙂

First, edit /etc/cloud/cloud.cfg, and look for the preseve_hostname: field – you want this set to true.

#This will cause the set+update hostname module to not operate (if true)
preserve_hostname: true

Once done, run ‘hostnamectl’

root@prod-docker-manager-1:~# hostnamectl
   Static hostname: prod-docker-manager-1
         Icon name: computer-vm
           Chassis: vm
        Machine ID: 71431bc67882462ab8752997212223e8
           Boot ID: 2523f4f9cc3142b5bf56ad73f93da02e
    Virtualization: vmware
  Operating System: Ubuntu 18.04.2 LTS
            Kernel: Linux 4.15.0-45-generic
      Architecture: x86-64

This will show your current hostname. I guess you don’t need to really show this, it’s handy to know that it IS set as static. If it’s not, you’ll want to go google something 😉

You can now just run ‘hostnamectl set-hostname <newhostname>’

root@prod-docker-manager-1:~# hostnamectl set-hostname prod-swarm-manager-1
root@prod-docker-manager-1:~# hostnamectl
   Static hostname: prod-swarm-manager-1
         Icon name: computer-vm
           Chassis: vm
        Machine ID: 71431bc67882462ab8752997212223e8
           Boot ID: 2523f4f9cc3142b5bf56ad73f93da02e
    Virtualization: vmware
  Operating System: Ubuntu 18.04.2 LTS
            Kernel: Linux 4.15.0-45-generic
      Architecture: x86-64

And you’re good to go. Reboot if you have processes running which depend on the hostname. Or not if this is a brand new host (which is my usual case, where I’ve munged something during the VM install, and am now SSH’d in as the temporary local user to do the actual provisioning).

SSD Caching – Actually doing it

So, caching is built right into LVM these days.  It’s quite neat.  I’m testing this on my OTHER caching box – a shallow 1RU box with space for two SSD’s and that’s about it.

First step is to mount an ISCSI target.  I’m just mounting a target I created on my fileserver, to save some latency (I can mount the main SAN from the DC, but there’s 15ms latency due to the EOIP tunnel over the ADSL here). There’s a much more detailed writeup of this Here

root@isci-cache01:~# iscsiadm -m discovery -t st -p 192.168.102.245 192.168.102.245:3260,1 iqn.2012-01.net.rendrag.fileserver:dedipi0
192.168.11.245:3260,1 iqn.2012-01.net.rendrag.fileserver:dedipi0

root@isci-cache01:~# iscsiadm -m node --targetname "iqn.2012-01.net.rendrag.fileserver:dedipi0" --portal "192.168.102.245:3260" --login
Logging in to [iface: default, target: iqn.2012-01.net.rendrag.fileserver:dedipi0, portal: 192.168.102.245,3260] (multiple)
Login to [iface: default, target: iqn.2012-01.net.rendrag.fileserver:dedipi0, portal: 192.168.102.245,3260] successful.

[ 193.182145] scsi6 : iSCSI Initiator over TCP/IP
[ 193.446401] scsi 6:0:0:0: Direct-Access IET VIRTUAL-DISK 0 PQ: 0 ANSI: 4
[ 193.456619] sd 6:0:0:0: Attached scsi generic sg1 type 0
[ 193.466849] sd 6:0:0:0: [sdb] 1048576000 512-byte logical blocks: (536 GB/500 GiB)
[ 193.469692] sd 6:0:0:0: [sdb] Write Protect is off
[ 193.469697] sd 6:0:0:0: [sdb] Mode Sense: 77 00 00 08
[ 193.476918] sd 6:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[ 193.514882] sdb: unknown partition table
[ 193.538467] sd 6:0:0:0: [sdb] Attached SCSI disk

root@isci-cache01:~# pvcreate /dev/sdb
root@isci-cache01:~# vgcreate vg_iscsi /dev/sdb

root@isci-cache01:~# pvdisplay
--- Physical volume ---
PV Name               /dev/sdb
VG Name               vg_iscsi
PV Size               500.00 GiB / not usable 4.00 MiB
Allocatable           yes
PE Size               4.00 MiB
Total PE              127999
Free PE               127999
Allocated PE          0
PV UUID               0v8SWY-2SSA-E2oL-iAdE-yeb4-owyG-gHXPQK
--- Physical volume ---
PV Name               /dev/sda5
VG Name               isci-cache01-vg
PV Size               238.24 GiB / not usable 0
Allocatable           yes
PE Size               4.00 MiB
Total PE              60988
Free PE               50784
Allocated PE          10204

PV UUID               Y3O48a-tep7-nYjx-gEck-bcwk-tJzP-2Sc2pP

root@isci-cache01:~# lvcreate -L 499G -n testiscsilv vg_iscsi
Logical volume "testiscsilv" created
root@isci-cache01:~# mkfs -t ext4 /dev/mapper/vg_iscsi-testiscsilv
mke2fs 1.42.12 (29-Aug-2014)
Creating filesystem with 130809856 4k blocks and 32702464 inodes
Filesystem UUID: 9aa5f499-902a-4935-bc67-61dd8930e014
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000
Allocating group tables: done
Writing inode tables: done

Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

Now things get a little tricky, as I’d already installed my system with the ssd in one volume group..  I’ll be using a RAID array for the production box for PiCloud.
For now, we’ll just create an LV from the SSD VG, and then add it to the iscsi VG.

root@isci-cache01:~# lvcreate -L 150G -n iscsicaching isci-cache01-vg
Logical volume "iscsicaching" created
root@isci-cache01:~# vgextend vg_iscsi  /dev/mapper/isci--cache01--vg-iscsicaching
  Physical volume "/dev/isci-cache01-vg/iscsicaching" successfully created
  Volume group "vg_iscsi" successfully extended
 
root@isci-cache01:~# lvcreate -L 1G -n cache_meta_lv vg_iscsi /dev/isci-cache01-vg/iscsicaching
Logical volume "cache_meta_lv" created
root@isci-cache01:~# lvcreate -L 148G -n cache_lv vg_iscsi /dev/isci-cache01-vg/iscsicaching
  Logical volume "cache_lv" created
root@isci-cache01:~# lvs
LV            VG              Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
iscsicaching  isci-cache01-vg -wi-ao---- 150.00g
root          isci-cache01-vg -wi-ao----  30.18g
swap_1        isci-cache01-vg -wi-ao----   9.68g
cache_lv      vg_iscsi        -wi-a----- 148.00g
cache_meta_lv vg_iscsi        -wi-a-----   1.00g
  testiscsilv   vg_iscsi        -wi-a----- 499.00g
 
root@isci-cache01:~# pvs
PV                                VG              Fmt  Attr PSize   PFree
/dev/isci-cache01-vg/iscsicaching vg_iscsi        lvm2 a--  150.00g 1020.00m
/dev/sda5                         isci-cache01-vg lvm2 a--  238.23g   48.38g
  /dev/sdb                          vg_iscsi        lvm2 a--  500.00g 1020.00m
 
Now we want to convert these two new LV's into a 'cache pool'
root@isci-cache01:~# lvconvert --type cache-pool --poolmetadata vg_iscsi/cache_meta_lv vg_iscsi/cache_lv
WARNING: Converting logical volume vg_iscsi/cache_lv and vg_iscsi/cache_meta_lv to pool's data and metadata volumes.
THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)
Do you really want to convert vg_iscsi/cache_lv and vg_iscsi/cache_meta_lv? [y/n]: y
Logical volume "lvol0" created
  Converted vg_iscsi/cache_lv to cache pool.

And now we want to attach this cache pool to our iscsi LV.

root@isci-cache01:~# lvconvert --type cache --cachepool vg_iscsi/cache_lv vg_iscsi/testiscsilv
  Logical volume vg_iscsi/testiscsilv is now cached.
 
 
root@isci-cache01:~# dd if=/dev/zero of=/export/test1 bs=1024k count=60
60+0 records in
60+0 records out
62914560 bytes (63 MB) copied, 0.0401375 s, 1.6 GB/s
root@isci-cache01:~# dd if=/dev/zero of=/export/test1 bs=1024k count=5000
^C2512+0 records in
2512+0 records out
2634022912 bytes (2.6 GB) copied, 7.321 s, 360 MB/sroot@isci-cache01:~# ls -l
total 0
root@isci-cache01:~# dd if=/export/test1 of=/dev/null
5144576+0 records in
5144576+0 records out
2634022912 bytes (2.6 GB) copied, 1.82355 s, 1.4 GB/s

Oh yeah!  Over a 15mbps network too!

Now we want to setup XFS quotas so we can have a quota per directory.

root@isci-cache01:/# echo "100001:/export/mounts/pi-01" >> /etc/projects
root@isci-cache01:/# echo "pi-01:10001" >> /etc/projid
root@isci-cache01:/# xfs_quota -x -c 'project -s pi-01' /export
root@isci-cache01:/# xfs_quota -x -c 'limit -p bhard=5g pi-01' /export

root@isci-cache01:/# xfs_quota -x -c report /export
Project quota on /export (/dev/mapper/vg_iscsi-testiscsilv)
Blocks
Project ID       Used       Soft       Hard    Warn/Grace
---------- --------------------------------------------------
pi-01         2473752          0    5242880     00 [--------]

Note: Need the thin-provisioning-tools package, and to ensure that your initramfs gets built with the proper modules included.

Sweet, so we CAN do this 🙂