Category Archives: Hosting Infrastructure

SSD Caching – Actually doing it

So, caching is built right into LVM these days.  It’s quite neat.  I’m testing this on my OTHER caching box – a shallow 1RU box with space for two SSD’s and that’s about it.

First step is to mount an ISCSI target.  I’m just mounting a target I created on my fileserver, to save some latency (I can mount the main SAN from the DC, but there’s 15ms latency due to the EOIP tunnel over the ADSL here). There’s a much more detailed writeup of this Here

root@isci-cache01:~# iscsiadm -m discovery -t st -p 192.168.102.245 192.168.102.245:3260,1 iqn.2012-01.net.rendrag.fileserver:dedipi0
192.168.11.245:3260,1 iqn.2012-01.net.rendrag.fileserver:dedipi0

root@isci-cache01:~# iscsiadm -m node --targetname "iqn.2012-01.net.rendrag.fileserver:dedipi0" --portal "192.168.102.245:3260" --login
Logging in to [iface: default, target: iqn.2012-01.net.rendrag.fileserver:dedipi0, portal: 192.168.102.245,3260] (multiple)
Login to [iface: default, target: iqn.2012-01.net.rendrag.fileserver:dedipi0, portal: 192.168.102.245,3260] successful.

[ 193.182145] scsi6 : iSCSI Initiator over TCP/IP
[ 193.446401] scsi 6:0:0:0: Direct-Access IET VIRTUAL-DISK 0 PQ: 0 ANSI: 4
[ 193.456619] sd 6:0:0:0: Attached scsi generic sg1 type 0
[ 193.466849] sd 6:0:0:0: [sdb] 1048576000 512-byte logical blocks: (536 GB/500 GiB)
[ 193.469692] sd 6:0:0:0: [sdb] Write Protect is off
[ 193.469697] sd 6:0:0:0: [sdb] Mode Sense: 77 00 00 08
[ 193.476918] sd 6:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[ 193.514882] sdb: unknown partition table
[ 193.538467] sd 6:0:0:0: [sdb] Attached SCSI disk

root@isci-cache01:~# pvcreate /dev/sdb
root@isci-cache01:~# vgcreate vg_iscsi /dev/sdb

root@isci-cache01:~# pvdisplay
--- Physical volume ---
PV Name               /dev/sdb
VG Name               vg_iscsi
PV Size               500.00 GiB / not usable 4.00 MiB
Allocatable           yes
PE Size               4.00 MiB
Total PE              127999
Free PE               127999
Allocated PE          0
PV UUID               0v8SWY-2SSA-E2oL-iAdE-yeb4-owyG-gHXPQK
--- Physical volume ---
PV Name               /dev/sda5
VG Name               isci-cache01-vg
PV Size               238.24 GiB / not usable 0
Allocatable           yes
PE Size               4.00 MiB
Total PE              60988
Free PE               50784
Allocated PE          10204

PV UUID               Y3O48a-tep7-nYjx-gEck-bcwk-tJzP-2Sc2pP

root@isci-cache01:~# lvcreate -L 499G -n testiscsilv vg_iscsi
Logical volume "testiscsilv" created
root@isci-cache01:~# mkfs -t ext4 /dev/mapper/vg_iscsi-testiscsilv
mke2fs 1.42.12 (29-Aug-2014)
Creating filesystem with 130809856 4k blocks and 32702464 inodes
Filesystem UUID: 9aa5f499-902a-4935-bc67-61dd8930e014
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000
Allocating group tables: done
Writing inode tables: done

Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

Now things get a little tricky, as I’d already installed my system with the ssd in one volume group..  I’ll be using a RAID array for the production box for PiCloud.
For now, we’ll just create an LV from the SSD VG, and then add it to the iscsi VG.

root@isci-cache01:~# lvcreate -L 150G -n iscsicaching isci-cache01-vg
Logical volume "iscsicaching" created
root@isci-cache01:~# vgextend vg_iscsi  /dev/mapper/isci--cache01--vg-iscsicaching
  Physical volume "/dev/isci-cache01-vg/iscsicaching" successfully created
  Volume group "vg_iscsi" successfully extended
 
root@isci-cache01:~# lvcreate -L 1G -n cache_meta_lv vg_iscsi /dev/isci-cache01-vg/iscsicaching
Logical volume "cache_meta_lv" created
root@isci-cache01:~# lvcreate -L 148G -n cache_lv vg_iscsi /dev/isci-cache01-vg/iscsicaching
  Logical volume "cache_lv" created
root@isci-cache01:~# lvs
LV            VG              Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
iscsicaching  isci-cache01-vg -wi-ao---- 150.00g
root          isci-cache01-vg -wi-ao----  30.18g
swap_1        isci-cache01-vg -wi-ao----   9.68g
cache_lv      vg_iscsi        -wi-a----- 148.00g
cache_meta_lv vg_iscsi        -wi-a-----   1.00g
  testiscsilv   vg_iscsi        -wi-a----- 499.00g
 
root@isci-cache01:~# pvs
PV                                VG              Fmt  Attr PSize   PFree
/dev/isci-cache01-vg/iscsicaching vg_iscsi        lvm2 a--  150.00g 1020.00m
/dev/sda5                         isci-cache01-vg lvm2 a--  238.23g   48.38g
  /dev/sdb                          vg_iscsi        lvm2 a--  500.00g 1020.00m
 
Now we want to convert these two new LV's into a 'cache pool'
root@isci-cache01:~# lvconvert --type cache-pool --poolmetadata vg_iscsi/cache_meta_lv vg_iscsi/cache_lv
WARNING: Converting logical volume vg_iscsi/cache_lv and vg_iscsi/cache_meta_lv to pool's data and metadata volumes.
THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)
Do you really want to convert vg_iscsi/cache_lv and vg_iscsi/cache_meta_lv? [y/n]: y
Logical volume "lvol0" created
  Converted vg_iscsi/cache_lv to cache pool.

And now we want to attach this cache pool to our iscsi LV.

root@isci-cache01:~# lvconvert --type cache --cachepool vg_iscsi/cache_lv vg_iscsi/testiscsilv
  Logical volume vg_iscsi/testiscsilv is now cached.
 
 
root@isci-cache01:~# dd if=/dev/zero of=/export/test1 bs=1024k count=60
60+0 records in
60+0 records out
62914560 bytes (63 MB) copied, 0.0401375 s, 1.6 GB/s
root@isci-cache01:~# dd if=/dev/zero of=/export/test1 bs=1024k count=5000
^C2512+0 records in
2512+0 records out
2634022912 bytes (2.6 GB) copied, 7.321 s, 360 MB/sroot@isci-cache01:~# ls -l
total 0
root@isci-cache01:~# dd if=/export/test1 of=/dev/null
5144576+0 records in
5144576+0 records out
2634022912 bytes (2.6 GB) copied, 1.82355 s, 1.4 GB/s

Oh yeah!  Over a 15mbps network too!

Now we want to setup XFS quotas so we can have a quota per directory.

root@isci-cache01:/# echo "100001:/export/mounts/pi-01" >> /etc/projects
root@isci-cache01:/# echo "pi-01:10001" >> /etc/projid
root@isci-cache01:/# xfs_quota -x -c 'project -s pi-01' /export
root@isci-cache01:/# xfs_quota -x -c 'limit -p bhard=5g pi-01' /export

root@isci-cache01:/# xfs_quota -x -c report /export
Project quota on /export (/dev/mapper/vg_iscsi-testiscsilv)
Blocks
Project ID       Used       Soft       Hard    Warn/Grace
---------- --------------------------------------------------
pi-01         2473752          0    5242880     00 [--------]

Note: Need the thin-provisioning-tools package, and to ensure that your initramfs gets built with the proper modules included.

Sweet, so we CAN do this 🙂

Setting up ZFS on Debian in 10 minutes

We run a small Citrix XenServer cluster at work, for our internal servers, and we had been running just a simple raid-1 array on the backend server. However the idea of SSD cache peaked my interest, so I backed up our storage repo one weekend, and reinstalled the server.

Here’s how I installed it:

apt-get install build-essential gawk alien fakeroot linux-headers-$(uname -r) zlib1g-dev uuid-dev libblkid-dev libselinux-dev parted lsscsi

#Install SPL
wget http://github.com/downloads/zfsonlinux/spl/spl-0.6.0-rc11.tar.gz
tar -xzvf spl-0.6.0-rc11.tar.gz
cd spl-0.6.0-rc11/
./configure
make deb
dpkg -i *.deb

cd ..
# Install ZFS
wget http://github.com/downloads/zfsonlinux/zfs/zfs-0.6.0-rc11.tar.gz
tar -xzvf zfs-0.6.0-rc11.tar.gz
cd zfs-0.6.0-rc11/
./configure
make deb
dpkg –I *.deb

# have a look at /dev/disk/by-id, to get physical location mappings to drive names
ls -l /dev/disk/by-id/

# and shove them in here:
#e.g.

vi /etc/zfs/zdev.conf
1tb_1 pci-0000:03:06.0-scsi-0:0:0:0
1tb_2 pci-0000:03:06.0-scsi-1:0:0:0
1tb_3 pci-0000:03:06.0-scsi-2:0:0:0
ssd_1 pci-0000:00:11.0-scsi-2:0:0:0

zpool create storagepool raidz 1tb_1 1tb_2 1tb_3
zpool attach storagepool cache ssd_1

zfs create storagepool/pool

# Install iscsitarget to point our xenserver cluster at
apt-get install iscsitarget iscsitarget-dkms

# and create a 500gb backing volume
zfs create -V 500G storagepool/iscsivol01

vi /etc/iet/ietd.conf
Target iqn.2012-01.local.icongroup.icon-szfs01:iscsivol01
Alias iscsivol01
Lun 0 Path=/dev/storagepool/iscsivol01,Type=fileio,ScsiId=2012110201,ScsiSN=2012110201

vi /etc/default/iscsitarget
ISCSITARGET_ENABLE=true

# also remember to set targets.allow and initiators.allow as needed

/etc/init.d/iscsitarget restart

# all good to go!

MySQL Multi-Master Replication Setup

So we have a bunch of websites for different markets, running wordpress, which we would ideally like hosted in their home market. BUT, we want to be able to fail them over to a different country, should the servers in their country go down. Failover in a MySQL master-slave relationship is always a bit of a pain (as it is with any DB engine) – once you’ve failed, you really can’t ‘go back’ to the original master, until you’ve re-synced it all. Which isn’t overly easy when you only have a 1 hour per 24-hr-period maintenance window, across all the markets your company operates in.

Enter MySQL Multi-Master replication. Make a change on on server? It appears on the other. Make a change on the other server? It appears on the first!

The way this works, is that each MySQL server can be both a Master, AND a Slave. So Take two servers, A and B. Any changes made on A are played via logs to the B server. Similarly, any changes on the B server are pushed to the A server. Well, actually it’s a little more than that, as Server A will send the updates it receives from Server B, on to Server B. Why does it do this?

Well, we might have six Masters! Going Master A -> Master B -> Master C -> Master D -> Master E -> Master F. And Master F is feeding Master A. All a nice big circle. So when you make a change on Master B, it propagates to C, D, E, F, and then to A. AND back to B. But B knows not to replicate its own changes on again, and they stop there.

It’s easiest to set this all up with fresh, clean, servers.

I installed MySQL-server on two clean Debian VM’s, one in Australia, one in Ireland.

Configuration

Server A – /etc/my.cnf
Add the following:

[mysqld]
# ... other configuration, tuning, etc ...
server-id = 10
# Make sure this partition has space to log bin, relay and whatever else!
log-bin = /var/lib/mysql/bin.log
relay-log = /var/lib/mysql/slave-relay.log
relay-log-index = /var/lib/mysql/slave-relay-log.index
# Creating some room between pk ids, we can always manually insert if need be.
auto_increment_increment = 10
auto_increment_offset = 1
# This is the default, but let's be safe and ensure it's on
replicate-same-server-id = FALSE
# Want more slaves in the future with writes going to both masters?
log-slave-updates = TRUE
# If there's a reboot, let's not auto start replication. - we need to make sure of where we are, and start it manually..
skip-slave-start = TRUE

Server B – /etc/my.cnf

[mysqld]
# ... other configuration, tuning, etc ...
server-id = 11
log-bin = /var/lib/mysql/bin.log
relay-log = /var/lib/mysql/slave-relay.log
relay-log-index = /var/lib/mysql/slave-relay-log.index
auto_increment_increment = 10
auto_increment_offset = 2
replicate-same-server-id = FALSE
log-slave-updates = TRUE
skip-slave-start = TRUE

You could add multiple more servers here, just increment the server-id, and the auto_increment_offset.

Starting Replication

To start replication, we first need to create a replication user on both servers, then setup the replication attributes.

First, create the replication user on both servers:
Server A

# mysql -u root -p
mysql> GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO repluser@'serverb.ip.address' IDENTIFIED BY 'replpassword';

Server B

# mysql -u root -p
mysql> GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO repluser@'servera.ip.address' IDENTIFIED BY 'replpassword';

Find the master info on Server B:

mysql> show master status;
+------------+----------+--------------+------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------+----------+--------------+------------------+
| bin.000001 | 294 | | |
+------------+----------+--------------+------------------+
1 row in set (0.00 sec)

Now we want to start the replication on Server A, using the info from Server B:

mysql>CHANGE MASTER TO
MASTER_HOST='92.1.1.1',
MASTER_USER='repluser',
MASTER_PASSWORD='replpassword',
MASTER_LOG_FILE='bin.000001',
MASTER_LOG_POS=294;
mysql>start slave;
mysql>show slave status\G

You may need to run the show slave status\G a few times, before the slave drops into the standard ‘Waiting for master to send event’ state.

Once this is done, you can then work on repeating this process to start Server B slaving from Server A.

Find the master info on Server A:

mysql> show master status;
+------------+----------+--------------+------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------+----------+--------------+------------------+
| bin.000001 | 293 | | |
+------------+----------+--------------+------------------+
1 row in set (0.00 sec)

Now we want to start the replication on Server A, using the info from Server B:

mysql>CHANGE MASTER TO
MASTER_HOST='202.62.1.1',
MASTER_USER='repluser',
MASTER_PASSWORD='replpassword',
MASTER_LOG_FILE='bin.000001',
MASTER_LOG_POS=293;
mysql>start slave;
mysql>show slave status\G

Exporting/Importing the Data

Now you want to create any databases, users, grants, and then import any data you want. Keep an eye on the ‘show slave status\G’ on the server opposite to where you’re doing all this, to make sure it is replicating correctly 🙂

And you’re done!

XenServer – Creating a local ISO repository

Starting a new job, I’m essentially starting from scratch, as far as development infrastructure goes – The live sites are hosted on a cpanel server with ftp access only – so debugging is basically a moot point – and there have been no development sites for quite a while – it’s a simple case of edit in dreamweaver/eclipse/zend studio/etc and FTP up and ‘see how it goes’ – so it’s either sync up the dev sites from current codebase, or start them from scratch with something a little more industry-standard.  Granted these are much smaller projects than my previous jobs, but they have the potential to still need multiple developers committing source changes, etc, in the somewhat near future.  So the current ftp-the-changes-up-to-the-server just won’t cut it.   My first step is running up a quick XenServer install, and bringing up a few VM’s for Subversion, LAMP, Jira, etc.  yes, they could all run on one, but if there’s one thing I’ve learnt of the years, it’s keep your environments separated.  Especially source control from development!

The environment here isn’t big enough to justify anything like I have in my small hosting business, with a SAN-backed XenServer cluster with HA licensing etc.  So a standalone XS server it is.  15 mins after finding a spare monitor, I had XenServer running.  And that’s about where I went ‘oh, bugger, ISO repository!’ – That’s right, there’s no fileserver in this office at present, so a-googling I went!

As it turned out, it wasn’t too bad at all 🙂  We basically create a small volume in LVM, mounted it, and add it as an ISO Storage Repository.

First up, get the name of the Volume Group

# pvscan
  PV /dev/sda3   VG VG_XenStorage-c0972b3b-ef4a-346f-42d2-8ddae19499da   lvm2 [690.62 GB / 690.61 GB free]
  Total: 1 [690.62 GB] / in use: 1 [690.62 GB] / in no VG: 0 [0   ]

Ok, so in this case, our VG is VG_XenStorage-c0972b3b-ef4a-346f-42d2-8ddae19499da. Now we want to create a Logical Volume.  I made it 40GB – if I was running small SCSI/SAS disks, I’d probably have made it smaller, but this box has a 750gb SATA disk.

[root@xenserver01 ~]# lvcreate -L 40G -n ISO VG_XenStorage-c0972b3b-ef4a-346f-42d2-8ddae19499da
  Logical volume "ISO" created

And of course, format it.  I use ext2 for anything ‘basic’, as I don’t need journalling, and I like my performance, especially in a SAN environment.  Not that that’s an issue here with a local disk, but it’s a good habit to form 😉

[root@xenserver01 ~]# mkfs.ext2 /dev/VG_XenStorage-c0972b3b-ef4a-346f-42d2-8ddae19499da/ISO 
mke2fs 1.39 (29-May-2006)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
5242880 inodes, 10485760 blocks
524288 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=0
320 block groups
32768 blocks per group, 32768 fragments per group
16384 inodes per group
Superblock backups stored on blocks: 
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
4096000, 7962624
Writing inode tables: done                            
Writing superblocks and filesystem accounting information: done
This filesystem will be automatically checked every 39 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

Well, that was easy. Let’s make a new directory to mount the LV.

[root@xenserver01 ~]# mkdir /mnt/iso_import

Mount the Storage Repository. Make sure that all Volume Groups are active, mount it, and attach it as a storage repository..

[root@xenserver01 ~]# vgchange -a y
  3 logical volume(s) in volume group "VG_XenStorage-c0972b3b-ef4a-346f-42d2-8ddae19499da" now active
[root@xenserver01 ~]# mount /dev/VG_XenStorage-c0972b3b-ef4a-346f-42d2-8ddae19499da/ISO  /mnt/iso_import/
[root@xenserver01 xen]# xe-mount-iso-sr /mnt/iso_import -o bind

Sweet, and make it come up at boot!

[root@xenserver01 ~]# cat >> /etc/rc.local << __END__
vgchange -a y
mount /dev/VG_XenStorage-c0972b3b-ef4a-346f-42d2-8ddae19499da/ISO  /mnt/iso_import/
xe-mount-iso-sr /mnt/iso_import -o bind
__END__

And we’re done!

Oh, and obviously, scp some isos to it! 🙂