So, we want to bring up a Galera cluster, and do some basic testing of how to bring it back online should things go pear shaped
First, install MariaDB on all three nodes
# apt-get install software-properties-common # apt-key adv --recv-keys --keyserver hkp://keyserver.ubuntu.com:80 0xF1656F24C74CD1D8 # add-apt-repository "deb [arch=amd64,arm64,ppc64el] http://mariadb.mirror.liquidtelecom.com/repo/10.4/ubuntu $(lsb_release -cs) main" # apt update # apt -y install mariadb-server mariadb-client
At this point mariadb is installed, but has no root password configured.
# mysql_secure_installation NOTE: RUNNING ALL PARTS OF THIS SCRIPT IS RECOMMENDED FOR ALL MariaDB SERVERS IN PRODUCTION USE! PLEASE READ EACH STEP CAREFULLY! In order to log into MariaDB to secure it, we'll need the current password for the root user. If you've just installed MariaDB, and haven't set the root password yet, you should just press enter here. Enter current password for root (enter for none): OK, successfully used password, moving on… Setting the root password or using the unix_socket ensures that nobody can log into the MariaDB root user without the proper authorisation. You already have your root account protected, so you can safely answer 'n'. Switch to unix_socket authentication [Y/n] Enabled successfully! Reloading privilege tables.. … Success! You already have your root account protected, so you can safely answer 'n'. Change the root password? [Y/n] y New password: Re-enter new password: Password updated successfully! Reloading privilege tables.. … Success! By default, a MariaDB installation has an anonymous user, allowing anyone to log into MariaDB without having to have a user account created for them. This is intended only for testing, and to make the installation go a bit smoother. You should remove them before moving into a production environment. Remove anonymous users? [Y/n] y … Success! Normally, root should only be allowed to connect from 'localhost'. This ensures that someone cannot guess at the root password from the network. Disallow root login remotely? [Y/n] y … Success! By default, MariaDB comes with a database named 'test' that anyone can access. This is also intended only for testing, and should be removed before moving into a production environment. Remove test database and access to it? [Y/n] y Dropping test database… … Success! Removing privileges on test database… … Success! Reloading the privilege tables will ensure that all changes made so far will take effect immediately. Reload privilege tables now? [Y/n] y … Success! Cleaning up… All done! If you've completed all of the above steps, your MariaDB installation should now be secure. Thanks for using MariaDB!
Do this on both servers, and you’re now ready to configure the Galera Cluster portion! On each node, you want to create a /etc/mysql/mariadb.conf.d/galera.cnf file:
# cat /etc/mysql/mariadb.conf.d/galera.cnf [mysqld] character-set-server = utf8 character_set_server = utf8 bind-address=0.0.0.0 port=3306 default_storage_engine=InnoDB binlog_format=row innodb_autoinc_lock_mode=2 # Galera cluster configuration wsrep_on=ON wsrep_provider=/usr/lib/galera/libgalera_smm.so wsrep_cluster_address="gcomm://192.168.1.201,192.168.1.202,192.168.1.203" wsrep_cluster_name="galera-cluster-1" wsrep_sst_method=rsync # Cluster node configuration wsrep_node_address="192.168.1.201" wsrep_node_name="galera-host-01" #
The only difference between this file on each node is the last two lines – wsrep_node_address, and wsrep_node_name. note: You’ll probably want to have hosts file entries on your nodes to map IP’s to names and vice versa, unless you have reliable DNS configured internally, as it affects your cluster status displays.
Your wsrep_cluster_address line will have the IP’s of your cluster nodes.
Stop MariaDB on all nodes, and boot galera on the first node:
node2# systemctl stop mariadb
node3# systemctl stop mariadb
node1# systemctl stop mariadb
node1# galera_new_cluster
Your cluster should now have started. Lets check the cluster state.
root@galera-host-01:~# mysql -e "show status like 'wsrep_%'" +-------------------------------+------------------------------------------------ | Variable_name | Value +-------------------------------+------------------------------------------------ | wsrep_local_state_uuid | fd6dbdcc-c95e-11e9-ac52-570534ceb766 | wsrep_protocol_version | 10 | wsrep_last_committed | 1 | wsrep_replicated | 0 | wsrep_replicated_bytes | 0 | wsrep_repl_keys | 0 | wsrep_repl_keys_bytes | 0 | wsrep_repl_data_bytes | 0 | wsrep_repl_other_bytes | 0 | wsrep_received | 2 | wsrep_received_bytes | 144 | wsrep_local_commits | 0 | wsrep_local_cert_failures | 0 | wsrep_local_replays | 0 | wsrep_local_send_queue | 0 | wsrep_local_send_queue_max | 1 | wsrep_local_send_queue_min | 0 | wsrep_local_send_queue_avg | 0 | wsrep_local_recv_queue | 0 | wsrep_local_recv_queue_max | 1 | wsrep_local_recv_queue_min | 0 | wsrep_local_recv_queue_avg | 0 | wsrep_local_cached_downto | 1 | wsrep_flow_control_paused_ns | 0 | wsrep_flow_control_paused | 0 | wsrep_flow_control_sent | 0 | wsrep_flow_control_recv | 0 | wsrep_cert_deps_distance | 0 | wsrep_apply_oooe | 0 | wsrep_apply_oool | 0 | wsrep_apply_window | 0 | wsrep_commit_oooe | 0 | wsrep_commit_oool | 0 | wsrep_commit_window | 0 | wsrep_local_state | 4 | wsrep_local_state_comment | Synced | wsrep_cert_index_size | 0 | wsrep_causal_reads | 0 | wsrep_cert_interval | 0 | wsrep_open_transactions | 0 | wsrep_open_connections | 0 | wsrep_incoming_addresses | AUTO | wsrep_cluster_weight | 1 | wsrep_desync_count | 0 | wsrep_evs_delayed | | wsrep_evs_evict_list | | wsrep_evs_repl_latency | 0/0/0/0/0 | wsrep_evs_state | OPERATIONAL | wsrep_gcomm_uuid | fd6cf1bd-c95e-11e9-98ab-d2e5733d21d0 | wsrep_applier_thread_count | 1 | wsrep_cluster_capabilities | | wsrep_cluster_conf_id | 18446744073709551615 | wsrep_cluster_size | 1 | wsrep_cluster_state_uuid | fd6dbdcc-c95e-11e9-ac52-570534ceb766 | wsrep_cluster_status | Primary | wsrep_connected | ON | wsrep_local_bf_aborts | 0 | wsrep_local_index | 0 | wsrep_provider_capabilities | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:T | wsrep_provider_name | Galera | wsrep_provider_vendor | Codership Oy <info@codership.com> | wsrep_provider_version | 26.4.2(r4498) | wsrep_ready | ON | wsrep_rollbacker_thread_count | 2 | wsrep_thread_count | 3 +-------------------------------+------------------------------------------------
Check the cluster size to make sure the cluster came up – it should be a cluster of one right now
root@galera-host-01:~# mysql -e "show status like 'wsrep_cluster_size'" Enter password: +--------------------+-------+ | Variable_name | Value | +--------------------+-------+ | wsrep_cluster_size | 1 | +--------------------+-------+
This looks good – time to boot the second server
galera-host-02# systemctl start mariadb
Check the cluster size again – it should now be 2
root@galera-host-01:~# mysql -u root -p -e "show status like 'wsrep_cluster_size'" Enter password: +--------------------+-------+ | Variable_name | Value | +--------------------+-------+ | wsrep_cluster_size | 2 | +--------------------+-------+
Start node3, and you’ll see the node size at 3. Nice.
Cluster monitoring for HAProxy
Having a cluster is no good if you don’t have your client machines load balancing across them, and only talking to ‘up’ servers.
For this you need a way for HAProxy to know the state of each server. I like the ‘clustercheck’ script from obissick. (https://github.com/obissick/Galera-ClusterCheck)
First we add a cluster check user to MySQL.
create user clustercheckuser@'localhost' IDENTIFIED BY 'h3fUU3373Pb17Vjt&^C39hFHelA'; GRANT PROCESS ON *.* TO 'clustercheckuser'@'localhost';
Then Install the clustercheck script, and edit it
curl https://raw.githubusercontent.com/obissick/Galera-ClusterCheck/master/clustercheck.sh > /usr/bin/clustercheck chmod +x /usr/bin/clustercheck vi /usr/bin/clustercheck edit the MYSQL_PASSWORD line to put the password in.. i.e. MYSQL_PASSWORD="${2-h3fUU3373Pb17Vjt&^C39hFHelA}" apt-get install -y xinetd echo "mysqlchk 9200/tcp #mysql check script" >> /etc/services cat > /etc/xinetd.d/mysqlchk << __END__ default: on description: mysqlchk service mysqlchk { disable = no flags = REUSE socket_type = stream port = 9200 wait = no user = nobody server = /usr/bin/clustercheck log_on_failure += USERID only_from = 192.168.1.0/24 per_source = UNLIMITED } __END__ service xinetd restart
Note: you will want to change the only_from to match any IP’s which will be running haproxy. Deploy the above to all members of your galera cluster.
Now, on each client machine, we can configure a block in haproxy similar to the below:
frontend mysql-dev-front
bind 127.0.0.1:3307
mode tcp
default_backend mysql-dev-back
backend mysql-dev-back mode tcp balance leastconn option tcpka option httpchk default-server port 9200 inter 2s downinter 5s rise 3 fall 2 slowstart 60s weight 100 server node1 192.168.1.201:3306 check server node2 192.168.1.202:3306 check server node3 192.168.1.203:3306 check
NOTE: Make sure that you can connect to port 9200 on both MySQL servers from your client server BEFORE enabling this config in HAProxy!
curl http://192.168.1.201:9200 Galera Cluster Node is synced. curl: (56) Recv failure: Connection reset by peer
once this is complete, any applications on your client machine can connect to localhost:3307
Failure Modes
Ok so this is one of the most important things we need to think about. There are a few failure modes we need to be able to handle in the cluster.
One node is shut down
Lets insert some rows into a test database, and then shut down node 2.
root@galera-host-01:~# mysql Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 50 Server version: 10.4.7-MariaDB-1:10.4.7+maria~bionic-log mariadb.org binary distribution Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [(none)]> create database ninjas; Query OK, 1 row affected (0.006 sec) MariaDB [(none)]> use ninjas; Database changed MariaDB [ninjas]> create table table1 (row1 integer not null); Query OK, 0 rows affected (0.013 sec) MariaDB [ninjas]> insert into table1 values (1), (2), (3), (4), (5), (6); Query OK, 6 rows affected (0.006 sec) Records: 6 Duplicates: 0 Warnings: 0 root@galera-host-02:~# mysql ninjas -e "select * from table1;" +------+ | row1 | +------+ | 1 | | 2 | | 3 | | 4 | | 5 | | 6 | +------+ root@galera-host-02:~# systemctl stop mariadb root@galera-host-01:~# mysql ninjas -e "insert into table1 values (7), (8), (9);"
Ok, so we now have data in the DB since host-02 was shutdown. We’re now going to bring host-02 back up and check that it comes back into the cluster cleanly
root@galera-host-02:~# systemctl start mariadb
root@galera-host-02:~# mysql ninjas -e "select * from table1;"
+------+
| row1 |
+------+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
+------+
Perfect.
All the nodes are shut down
And re-started in the correct order
How about where we have to shut down the cluster, and bring it back online? Lets shut down the nodes, first we’ll do reverse order 03, 02, 01, then bring them up 01, 02, 03. (Yes, That’s the way you’re meant to do it. We’re going to do it wrong soon, and see how to recover from that..)
root@galera-host-03:~# systemctl stop mariadb root@galera-host-02:~# systemctl stop mariadb root@galera-host-01:~# systemctl stop mariadb
If we have a look in /var/lib/mysql/grastate.dat on each host, we’ll see that galera-host-01 is indeed the host we should be booting the cluster from:
root@galera-host-01:~# cat /var/lib/mysql/grastate.dat GALERA saved state version: 2.1 uuid: a10015aa-cd62-11e9-a80b-87dacc3a89c3 seqno: 11 safe_to_bootstrap: 1 root@galera-host-02:~# cat /var/lib/mysql/grastate.dat GALERA saved state version: 2.1 uuid: a10015aa-cd62-11e9-a80b-87dacc3a89c3 seqno: 10 safe_to_bootstrap: 0 root@galera-host-03:~# cat /var/lib/mysql/grastate.dat GALERA saved state version: 2.1 uuid: a10015aa-cd62-11e9-a80b-87dacc3a89c3 seqno: 9 safe_to_bootstrap: 0
Ok, so we should simply be able to boot the cluster by running galera_new_cluster on galera-host-01, and then starting the other hosts
root@galera-host-01:~# galera_new_cluster root@galera-host-01:~# mysql -e "show status like 'wsrep_cluster_size'" +--------------------+-------+ | Variable_name | Value | +--------------------+-------+ | wsrep_cluster_size | 1 | +--------------------+-------+ root@galera-host-02:~# systemctl start mariadb root@galera-host-01:~# mysql -e "show status like 'wsrep_cluster_size'" +--------------------+-------+ | Variable_name | Value | +--------------------+-------+ | wsrep_cluster_size | 2 | +--------------------+-------+ root@galera-host-03:~# systemctl start mariadb root@galera-host-01:~# mysql -e "show status like 'wsrep_cluster_size'" +--------------------+-------+ | Variable_name | Value | +--------------------+-------+ | wsrep_cluster_size | 3 | +--------------------+-------+
Restarting the cluster in the wrong order
Ok now lets shut it down with host 01 first, and then boot it from host01 first. So, we’re going to shut down the nodes, and then check our galera replication state files
root@galera-host-01:~# systemctl stop mariadb root@galera-host-02:~# systemctl stop mariadb root@galera-host-03:~# systemctl stop mariadb root@galera-host-01:~# cat /var/lib/mysql/grastate.dat GALERA saved state version: 2.1 uuid: a10015aa-cd62-11e9-a80b-87dacc3a89c3 seqno: 15 safe_to_bootstrap: 0 root@galera-host-02:~# cat /var/lib/mysql/grastate.dat GALERA saved state version: 2.1 uuid: a10015aa-cd62-11e9-a80b-87dacc3a89c3 seqno: 16 safe_to_bootstrap: 0 root@galera-host-03:~# cat /var/lib/mysql/grastate.dat GALERA saved state version: 2.1 uuid: a10015aa-cd62-11e9-a80b-87dacc3a89c3 seqno: 17 safe_to_bootstrap: 1
Yep, so host-03 would be the correct host to start. But we’re going to start host-01.
root@galera-host-01:~# galera_new_cluster Job for mariadb.service failed because the control process exited with error code. See "systemctl status mariadb.service" and "journalctl -xe" for details.
Well, that’s promising. It won’t let us do it. If we check the log, we se:
2019-09-02 9:58:05 0 [Note] WSREP: Start replication 2019-09-02 9:58:05 0 [Note] WSREP: Connecting with bootstrap option: 1 2019-09-02 9:58:05 0 [Note] WSREP: Setting GCS initial position to a10015aa-cd62-11e9-a80b-87dacc3a89c3:15 2019-09-02 9:58:05 0 [ERROR] WSREP: It may not be safe to bootstrap the cluster from this node. It was not the last one to leave the cluster and may not contain all the updates. To force cluster bootstrap with this node, edit the grastate.dat file manually and set safe_to_bootstrap to 1 . 2019-09-02 9:58:05 0 [ERROR] WSREP: wsrep::connect(gcomm://192.168.13.201,192.168.13.202,192.168.13.203) failed: 7 2019-09-02 9:58:05 0 [ERROR] Aborting
Ok. lets make it really bad, lets edit grastate.dat and set safe_to_bootstrap to 1 😀
root@galera-host-01:~# galera_new_cluster root@galera-host-01:~# mysql -e "show status like 'wsrep_cluster_size'" +--------------------+-------+ | Variable_name | Value | +--------------------+-------+ | wsrep_cluster_size | 1 | +--------------------+-------+
Ok, promising. Lets start host-02
root@galera-host-02:~# systemctl start mariadb root@galera-host-01:~# mysql -e "show status like 'wsrep_cluster_size'" +--------------------+-------+ | Variable_name | Value | +--------------------+-------+ | wsrep_cluster_size | 2 | +--------------------+-------+
Well, that’s beautiful! Ok, lets see what happens when we start host-03, which is still in safe_to_bootstrap in its’ grastate file
root@galera-host-03:~# systemctl start mariadb root@galera-host-01:~# mysql -e "show status like 'wsrep_cluster_size'" +--------------------+-------+ | Variable_name | Value | +--------------------+-------+ | wsrep_cluster_size | 3 | +--------------------+-------+
Well, colour me impressed! Are they REALLY in sync?
root@galera-host-01:~# mysql ninjas -e "insert into table1 values (14), (15), (16);" root@galera-host-02:~# mysql ninjas -e "select * from table1;" +------+ | row1 | +------+ | 1 | | 2 | | 3 | | 4 | | 5 | | 6 | | 7 | | 8 | | 9 | | 10 | | 11 | 12 | | 14 | | 15 | | 16 | +------+ root@galera-host-03:~# mysql ninjas -e "select * from table1;" +------+ | row1 | +------+ | 1 | | 2 | | 3 | | 4 | | 5 | | 6 | | 7 | | 8 | | 9 | | 10 | | 11 | 12 | | 14 | | 15 | | 16 | +------+
Well bugger me, that looks good!
Rebuilding a failed node
SHOULD you happen to have a node get completely buggered (say you have enough data that the systemd 90 second startup timeout screws you during SST and leaves your node broken), you can need to do a clean setup of MariaDB to get Galera going again.
galera-host-02# apt-get purge mariadb-server-10.4 Reading package lists… Done Building dependency tree Reading state information… Done The following packages were automatically installed and are no longer required: galera-4 libaio1 libcgi-fast-perl libcgi-pm-perl libfcgi-perl libhtml-template-perl mariadb-server-core-10.4 socat Use 'apt autoremove' to remove them. The following packages will be REMOVED: mariadb-server* mariadb-server-10.4* 0 upgraded, 0 newly installed, 2 to remove and 0 not upgraded. After this operation, 77.7 MB disk space will be freed. Do you want to continue? [Y/n] y (Reading database … 68199 files and directories currently installed.) Removing mariadb-server (1:10.4.7+maria~bionic) … Removing mariadb-server-10.4 (1:10.4.7+maria~bionic) … Processing triggers for man-db (2.8.3-2ubuntu0.1) … (Reading database … 68069 files and directories currently installed.) Purging configuration files for mariadb-server-10.4 (1:10.4.7+maria~bionic) … galera-host-02:~# rm -rf /var/lib/mysql/* galera-host-02:~# mv /etc/mysql/mariadb.conf.d/galera.cnf /root/galera.cnf galera-host-02:~# apt -y install mariadb-server mariadb-client Reading package lists… Done Building dependency tree Reading state information… Done mariadb-client is already the newest version (1:10.4.7+maria~bionic). Suggested packages: mailx mariadb-test tinyca The following NEW packages will be installed: mariadb-server mariadb-server-10.4 0 upgraded, 2 newly installed, 0 to remove and 0 not upgraded. Need to get 4,627 kB of archives. After this operation, 77.7 MB of additional disk space will be used. Get:1 http://mariadb.mirror.liquidtelecom.com/repo/10.4/ubuntu bionic/main amd64 mariadb-server-10.4 amd64 1:10.4.7+maria~bionic [4,624 kB] Get:2 http://mariadb.mirror.liquidtelecom.com/repo/10.4/ubuntu bionic/main amd64 mariadb-server all 1:10.4.7+maria~bionic [3,180 B] Fetched 4,627 kB in 6s (824 kB/s) Preconfiguring packages … Selecting previously unselected package mariadb-server-10.4. (Reading database … 68059 files and directories currently installed.) Preparing to unpack …/mariadb-server-10.4_1%3a10.4.7+maria~bionic_amd64.deb … Unpacking mariadb-server-10.4 (1:10.4.7+maria~bionic) … Selecting previously unselected package mariadb-server. Preparing to unpack …/mariadb-server_1%3a10.4.7+maria~bionic_all.deb … Unpacking mariadb-server (1:10.4.7+maria~bionic) … Setting up mariadb-server-10.4 (1:10.4.7+maria~bionic) … Failed to stop mysql.service: Unit mysql.service not loaded. Created symlink /etc/systemd/system/mysql.service → /lib/systemd/system/mariadb.service. Created symlink /etc/systemd/system/mysqld.service → /lib/systemd/system/mariadb.service. Created symlink /etc/systemd/system/multi-user.target.wants/mariadb.service → /lib/systemd/system/mariadb.service. Setting up mariadb-server (1:10.4.7+maria~bionic) … Processing triggers for man-db (2.8.3-2ubuntu0.1) … galera-host-02:~# mysql_secure_installation ...... <snipped> galera-host-02:~# systemctl stop mariadb galera-host-02:~# cp /root/galera.cnf /etc/mysql/mariadb.conf.d/ galera-host-02:~# echo 'TimeoutSec=infinity' >> /etc/systemd/system/mysqld.service galera-host-02:~# systemctl daemon-reload galera-host-02:~# systemctl start mariadb
And bingo, we’re back in the cluster 🙂