Deploying a Spider MariaDB/MySQL cluster with PaQu

Pre-Introduction

In this post I describe how to deploy a cluster (i.e. multiple servers) of dedicated MariaDB/MySQL database machines, how we set them up at the Leibniz Institute for Astrophysics, and how they are used together with the Spider storage engine and PaQu. The cluster will be used to server hundreds of terra bytes of data from cosmological simulations and observational campaigns to scientists all around the world. The cluster will be accessible through our data analysis webpages that we built on Daiquiri.

Introduction

The shiny new database cluster at the Leibniz Institute for Astrophysics consists of one head node which will be the Spider host and 10 database nodes. The head node has the following hardware configuration:

Supermicro Chassis SC825TQ-R740LPB Supermicro Motherboard MBD-X9DRI-F-B

2x 8 Core Intel Xeon E5-2650 v2 @ 2.60 GHz CPU 128 GB RAM (8 x 16GB DDR3, reg ECC 1866 MHz) 2x Gigabit Ethernet on board 1x Ethernet for IPMI Mellanox 2 Port QDR MCX354A-QCBT QDR InfiniBand and 10G, ConnectX-3

For system disks: RAID1 Controller: LSI 9211-4i 2x Hitachi HTE725050A7E630 500GB

For storage: 1x RAID Controller LSI 9271-8i 1x Cache Vault 8x 4 TB SATA3 disks, HGST, 64MB Cache. ULTRASTAR 7K4000 HUS724040ALA640

and the individual nodes are equal to the head node except the following:

2x 4 Core Intel Xeon E5-2609 v2 @ 2.50 GHz CPU 32 GB RAM (8 x 4GB DDR3, reg ECC 1866 MHz)

We will have the system on the small mirrored hardware RAID and the database data on the large hardware RAID. You might have a different setup and you would need to adjust the configuration accordingly.

And to mention some unscientific measurements about I/O performance per Node in this setup:

hdparm states read performance of around 960 MB/sec.
MariaDB 10.11: 2 threads count full table scan: around 600 MB/sec
MariaDB 10.11: 1 thread count full table scan: around 450 MB/sec
MariaDB 10.11: 1 thread where conditioned full table scan: around 370 MB/sec

First we will describe shortly how we set up the individual machines, then how to install MariaDB/MySQL with all the required extensions and then how this setup is deployed onto each individual node. Further we will describe how the distributed database is setup using the Spider engine.

OS prerequisites

We are using CentOS 6.5 with a standard “server” installation. Additionally to the standard installation, the following packages were installed:

emacs-nox
screen
lsscsi
gcc gcc-c++
group “X Window System”
cmake
cmake-gui
ncurses-libs ncurses-devel
libxml2 libxml2-devel
boost boost-devel
bison bison-devel
unixODBC unixODBC-devel
git
php php-pear-MDB2-Driver-mysqli
Group: Infiniband Support

We also installed mysql from the repository, even though we will compile our own version later on. This is to ensure that we obtain a reasonable init.d script for starting and stopping mysql, have the mysql user set up, and obtain the my.cnf file at reasonable paths so that we don’t need to copy anything not covered by “make install”.

yum install mysql mysql-server mysql-libs

Further we created a default (usergroup 1000) user with the fance name of “spider” that the admins have to use to log into the server and which has sudo rights. This user has a home directory on the head node, that will be exported to all the individual nodes through nfs.

Compiling MariaDB

Next we will compile MariaDB 10.0.11 which we obtain from https://www.mariadb.org. We cannot use a precompiled binary package, since we need to compile various add ons ourself. So we need the source and should be familiar with compiling anyways. OK, you could get a binary and then use the source, but honestly, I have no clue whether the MariaDB binaries already come with the Spider engined enabled. Anyways:

Logged into as the shared “spider” user we run:

cd ~
mkdir installs
cd installs
wget -O mariadb-10.0.11.tar.gz "https://downloads.mariadb.org/interstitial/mariadb-10.0.11/source/mariadb-10.0.11.tar.gz/from/http:/mirror2.hs-esslingen.de/mariadb?serve?&serve?"
tar -xf mariadb-10.0.11.tar.gz
cd mariadb-10.0.11.tar.gz

Next we compile MariaDB using cmake (more precisely using cmake-gui, since this provides a more intuitive environment. Make sure you “ssh -XY”-ed into the head node and it has X11 running).

mkdir build
cd build
cmake-gui ..

In the cmake-gui window, click on “Configure”. The cmake-configure script will be run and setup the build environment that we will need to change next.

We need to change the following configuration entries:

CMAKE_INSTALL_PREFIX: /usr/local/mariadb-10.0.11 We will have each version of mysql installed in different directories and have a dynamic link from there to /usr/local/mysql for easy updates.
MYSQL_DATADIR: /store/spider/data or wherever you want to store the actual database files (make sure this directory exists and the “mysql” user has read/write permissions)
TMPDIR: /store/spider/tmp or wherever you want to store temporary files (make sure this directory exists and the “mysql” user has read/write permissions)
WITH_ARCHIVE_STORAGE_ENGINE
WITH_CONNECT_STORAGE_ENGINE
WITH_FEDERATEDX_STORAGE_ENGINE
WITH_METADATA_LOCK_INFO
WITH_QUERY_CACHE_INFO
WITH_QUERY_RESPONSE_TIME
WITH_SEQUENCE_STORAGE_ENGINE

And we are going to add the following additional settings for enabling TokuDB (it might come handy once) and the Spider engine (use Add Entry and create boolean types):

WITH_SPIDER_STORAGE_ENGINE

Once you are done, press “Configure” again and then “Generate”. Close cmake-gui and build MariaDB:

make -j32

or if you have less CPU cores, reduce the number after -j. Next, install the built binaries and clean the compilation directory (we will need to retain the MariaDB sources to compile all plugins).

sudo make install
make clean

Next we will link MariaDB in /usr/local

cd /usr/local/
sudo ln -s /usr/local/mariadb-10.0.11 /usr/local/mysql
sudo chown -R mysql /usr/local/mariadb-10.0.11
sudo chown -R mysql /usr/local/mysql

And we need to edit the init.d file accordingly to point to the newly installed MariaDB binaries. With your editor of “believe”, edit all paths in /etc/init.d/mysqld. I’m using emacs:

sudo emacs /etc/init.d/mysqld
M+x replace-string /usr /usr/local/mysql
to the part where mysqld_safe is called, add the language option:
--language=/usr/local/mysql/share/english \
C-x C-c

Next we setup the /etc/my.cnf configuration file. At this stage, we only set the most crucial parameters. Again with the editor of choice:

emacs /etc/my.cnf
set datadir to /store/spider/data or whatever your choice is
C-x C-c

Then we need to initialize the MariaDB data directories with the procedure given in: https://mariadb.com/kb/en/generic-build-instructions/

cd /usr/local/mysql
sudo scripts/mysql_install --user=mysql

Now we are ready to start MariaDB for the first time:

sudo /etc/init.d/mysqld start

And hopefully everything worked out well.

Installing PaQu and other useful Plugins

As a side note: PaQu itself is a stand-alone PHP tool, that solely produces query plans that need to be submitted to MariaDB for execution. A simple tool is provided with PaQu, but the main purpose was to implement PaQu into other tools. What we will install here, is the “PaQu Kill daemon” that runs on the Spider nodes. The kill daemon queries the head node in given intervals to see if any query that is executed on the Spider node has been killed on the head node (for example killed by our query-queue). The actual PaQu command line query tool is described at the end of this post.

PaQu Kill daemon

We are now going to compile and install the PaQu kill daemon plugin for MariaDB/MySQL. The procedure in compiling plugins that we developed is always the same, so if you manage to compile and install on of our plugins once, you will always succeed (at least in theory) with the others.

cd ~/installs
git clone https://github.com/adrpar/paqu.git
cd paqu/mysql_node_jobMon
mkdir build
cd build

We now need to adjust the cmake Makefile to point to the path of the MariaDB/MySQL source code and binaries. Again with your devine editor:

emacs ../CMakeLists.txt
adjust MYSQL_PATH and MYSQL_SOURCES_PATH
uncomment MARIADB
C-x C-c

Now we start with the compilation:

cmake ..
make
sudo make install

Once we have our Spider nodes installed, runn the following to register the plugin with MariaDB/MySQL run:

mysql --user=root -p < ../install_jobMon.sql

At the moment however, we are setting up the head node, so we are not going to run the kill daemon on the head. We just compiled it.

In order to support distributed standard deviation calcuations, we need to apply the same procedure to:

cd ../../mysql_udf
mkdir build
emacs ../CMakeLists.txt
cmake ..
make
sudo make install
mysql --user=root -P < ../install_udfs.sql  #run this also on the head node

We will further continue with the following packages in our specific setup:

Other nice plugins

Query queue: https://github.com/adrpar/mysql_query_queue SPRNG distributed random numbers: https://github.com/adrpar/mysql_sprng Our collection of usefull UDFs: https://github.com/adrpar/mysql_udf Validate SQL without executing query plugin: https://github.com/adrpar/mysql_validateSQL MySQL sphere (trigonometry on a sphere): https://github.com/adrpar/mysql_sphere

Cloning the system to the Spider nodes

Since we decided to install a clean system on the head node that will be identical to the system on each Spider node. The Spider nodes will only differ in which plugins are enabled in MariaDB/MySQL and that the Spider nodes have a NFS shared home directory with the head node.

The cloning is done using Clonezilla. Clonezilla allows the creation of a verbatim copy of a disk. In our case this would be the system disk of the head node. We use Clonezilla in a PXE boot environment, creating and restoring images from a common NFS mount point in our cluster.

There are many good tutorials available on how to use Clonezilla. Like this one here. It is very straight forward to use.

Adjustments on the cloned Spider nodes

After all the Spider nodes have been initialised with the system on the head node, we need to make some adjustments to each node separately (again assuming CentOS).

First set new host name:
emacs /etc/sysconfig/network
set:
HOSTNAME=<new fancy host name>
NOZEROCONF=yes #not necessarily needed
Get rid of network rules:
rm /etc/udev/rules.d/70-persistent-net.rules
Adjust the fstab accordingly (in our case):
emacs /etc/fstab
set:
LABEL=spider<XX add node number here> /store/spider xfs defaults 0 0
Create the file system on the large RAID partition:
mkfs.xfs -L spider<XX add node number here> /dev/<approperiate device (here sda)>
Recreate all machine ssh keys:
rm /etc/ssh/ssh_host_*
service sshd restart
If you are using infiniband, set the IP addresses accordingly:
emacs /etc/sysconfig/network-scripts/ifcfg-ib0
set:
DEVICE=ib0
TYPE=InfiniBand
ONBOOT=yes
NM_CONTROLLED=no
BOOTPROTO=none
IPADDR=<here your fancy IB ip address>
if needed, also update the ifcfg-ib1 and ifcfg-eth* scripts in /etc/sysconfig/network-scripts

Setup the MariaDB/MySQL configuration files and initialise the databases on the Spider nodes

In principle we are now done. The only thing that is missing, is an updated my.cnf file and a copy of the bare initialised MariaDB/MySQL data directory (if the data is not on the system disk). We will first start by copying the data over to each node and then proceed with the configuration file on the head node and on the Spider nodes. But before we start, we are going to setup “cssh – cluster ssh” to make our live easier.

Setting up cluster ssh on the head node

ssh -XY into your head node. There install cluster ssh (we need to include the EPEL repository for this):

wget http://download.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
sudo rpm -ivh epel-release-6-8.noarch.rpm

check that EPEL is active:

yum repolist

and now we can install cluster ssh:

sudo yum install clusterssh

Next we need to configure cssh:

sudo emacs /etc/clusters

set accordingly:

clusters = spider
spider 192.168.55.70 192.168.55.71 ... add other nodes here ...

Next we are going to setup a shared ssh-key for each individual head node users (in our case root and spider), so that we can cssh password-less into all Spider nodes:

ssh-keygen -t rsa -b 1024

start cssh and login into each node:

cssh spider

then for each node (also the head node if we are going to NFS mount the home directory with the Spider nodes):

ssh <USER>@<spider_node_ips_here>
scp <USER>@<head_node_ip_here>:~/.ssh/id_rsa.pub ~/.ssh/authorized_keys
exit

Now test if everything worked by cssh-ing into all nodes (no password should be needed now):

cssh spider

Setting up the head node my.cnf file

On the head node, our my.cnf file looks as follows (you might need to adjust certain parameters):

# MySQL config file for Spider head node

# The following options will be passed to all MySQL clients
[client]
#password		= your_password
port			= 3306
socket			= /tmp/mysql.sock

# Here follows entries for some specific programs

# The MySQL server
[mysqld]
user			= mysql
default-storage-engine	= MyISAM
datadir			= /store/spider/data
port            	= 3306
socket          	= /tmp/mysql.sock
tmpdir			= /store/spider/tmp

#Server settings
performance_schema
federated		= ON

max_allowed_packet	= 64M

#Global buffer settings:
table_open_cache	= 64
query_cache_size	= 128M

#Global MyISAM buffer settings:
key_buffer_size         = 256M

#per connection settings:
sort_buffer_size	= 64K
thread_stack		= 256K

#per connection MyISAM settings:
read_buffer_size        = 8M
read_rnd_buffer_size 	= 8M

#Temporary tables:
tmp-table-size		= 2G
max_heap_table_size	= 2G

#Time out settings:
connect_timeout		= 280000
interactive_timeout	= 280000
net_read_timeout	= 360000
net_write_timeout	= 360000

#Spider engine settings:
spider_connect_timeout	= 360000
spider_net_read_timeout	= 36000
spider_bulk_update_size	= 128000000
spider_internal_sql_log_off = 0
spider_remote_sql_log_off   = 0
spider_remote_autocommit    = 1
spider_connect_timeout	= 28000
spider_net_read_timeout	= 28000
spider_bgs_mode		= 3
spider_bka_mode		= 1
spider_multi_split_read = 1
spider_quick_mode	= 3
spider_semi_split_read	= 4
spider_semi_split_read_limit = 4
spider_conn_recycle_mode     = 1

[mysqld_safe]
log-error		= /var/log/mysqld.log
pid-file		= /var/run/mysqld/mysqld.pid

open_files_limit	= 128000
event_scheduler		= ON

# binary logging format - mixed recommended
binlog_format	 	= mixed

server-id		= 1

[mysqldump]
quick
max_allowed_packet	= 16M

[mysql]
no-auto-rehash

[myisamchk]
key_buffer_size		= 20M
sort_buffer_size 	= 20M
read_buffer 		= 2M
write_buffer 		= 2M

[mysqlhotcopy]
interactive-timeout

Setup access for Spider node on head node

The Spider nodes need to be able to access the head node. Therefore we are going to create a user for the Spider nodes to access the head node. On the head node, enter MariaDB/MySQL and run the following (we are creating a user that has access from any point in the network – if you don’t want this, you need to create an individual user for each Spider node):

CREATE USER 'spider'@'192.168.XX.%' IDENTIFIED BY 'some_secret_password';
GRANT ALL PRIVILEGES ON *.* TO 'spider'@'192.168.XX.%' WITH GRANT OPTION;

The Spider node my.cnf file

On the head node create a file named /etc/my.cnf.nodes and set it up accordingly (we are using the setup below):

# MySQL config file for Spider nodes

# The following options will be passed to all MySQL clients
[client]
#password		= your_password
port 			= 3306
socket 			= /tmp/mysql.sock

# Here follows entries for some specific programs

# The MySQL server
[mysqld]
user			= mysql
default-storage-engine	= MyISAM
datadir 		= /store/spider/data
port			= 3306
socket			= /tmp/mysql.sock
tmpdir			= /store/spider/tmp

#Server settings
performance_schema
federated		= ON

max_allowed_packet	= 64M

#Global buffer settings:
table_open_cache	= 512
thread_cache_size	= 16
query_cache_size	= 128M

#Global MyISAM buffer settings:
key_buffer_size		= 256M
myisam_sort_buffer_size	= 256M

#per connection settings:
sort_buffer_size	= 2M
thread_stack		= 256K

#per connection MyISAM settings:
read_buffer_size	= 16M
read_rnd_buffer_size   	= 16M

#Temporary tables:
tmp-table-size		= 8M

#Time out settings:
connect_timeout		= 600
interactive_timeout	= 600
net_read_timeout	= 600
net_write_timeout	= 600

#Configuration for PaQu kill daemon
# (uncomment after the installation of the PaQu kill daemon)
#paqu_headNodeConnection = mysql://:@<head_node_ip_here>:3306

#Configuration for mysql_SPRNG
# (uncomment after the installation of mysql_SPRNG)
#sprng_numNodes=<num_spider_nodes>
#sprng_myId=<curr_spider_node_num>

Then copy this file over to all Spider nodes using cssh:

cssh spider
sudo scp root@<head_node_ip_here>:/etc/my.cnf.node /etc/my.cnf

And if needed, alter node specific information in /etc/my.cnf on each Spider node accordingly (such as the mysql_SPRNG settings, if you are using SPRNG).

Setting up the NFS mount of the “spider” user directory on each Spider node

We are going to configure the head node to export the home directory to the Spider node, so that it is easy to exchange SQL scripts during table creation.

On the head node run the following:

Install nfs if needed:

sudo yum install nfs-utils

Turn nfs on:

sudo chkconfig nfs on
sudo service nfs start

Setup the export configuration. In NFS4 you bind the directories you want to export to a specific base directory (in our case /nfs4).

sudo emacs /etc/exports

add the following

# basedir is nfs4
/nfs4/                                192.168.<your_ip_range_here>XX.0/24(rw,sync,insecure,no_root_squash,no_subtree_check,fsid=0)
/nfs4/home                            192.168.<your_ip_range_here>XX.0/24(rw,sync,insecure,no_root_squash,no_subtree_check,nohide)

Now we need to create the appropriate directories:

mkdir -p /nfs4/home

and allow NFS4 in the firewall settings:

sudo system-config-firewall-tui
--> allow nfs4

and finally restart NFS4:

sudo service nfs restart

And on the Spider nodes you register the NFS mount to the home directory (don’t forget to unmount the existing home partition if applicable):

sudo emacs /etc/fstab

comment the home directory entry if applicable and add:

<head_node_ip_here>:/home /home nfs rw 0 0

and mount:

sudo mount -a

Configuring Spider nodes

If you happened to have cloned your MariaDB/MySQL data directory from the head node, you might have the mysql_qqueue running on each node. Uninstall it according to the instructions of the queue for each Spider node. Usually this is done with:

mysql --user=root -p < ~/installs/mysql_query_queue/uninstall_qqueue.sql

Further, you need to install the PaQu kill daemon by running:

mysql --user=root -p < ~/installs/paqu/mysql_node_jobMon/install_jobMon.sql

Setup access for Spider head node on the Spider node

The head node needs to be able to access each Spider node. Therefore we are going to create a user for the head node to access the individual nodes. On each Spider node, enter MariaDB/MySQL and run the following (we are creating a user that has access from any point in the network – if you don’t want this, you need to create an specific user for the head node):

CREATE USER 'spider'@'192.168.XX.%' IDENTIFIED BY 'some_secret_password';
GRANT ALL PRIVILEGES ON *.* TO 'spider'@'192.168.XX.%' WITH GRANT OPTION;

Installing PaQu

Since PaQu is a standalone PHP tool, this is pretty straight forward. We are going to put PaQu into the /opt directory. Since we already have it cloned from the git repository, we just need to copy the sources. On the head node execute:

sudo mkdir /opt/paqu
sudo cp -r ~spider/installs/paqu/src/* /opt/paqu/.

Next, edit the paqu.php file accordingly, to reflect the server settings (basically you just need to give the MySQL connection string to the head node and the user name and password with which the head node connects to the Spider nodes):

#!/usr/bin/php                                                                                                                          
<?php

require_once 'parallelQuery.php';
require_once 'mysqlii.php';

##############################################                                                                                          
############ SERVER CONFIGURATION ############                                                                                          
##############################################                                                                                          

$cmdlineToHeadConnectionString	= "mysql://USERNAME:PASSWORD@127.0.0.1:3306";
$nodesToHeadConnectionString	= "mysql://USERNAME:PASSWORD@HEAD_IP_ADDRESS:3306";
$database 			= "spider_tmp_shard";
$user 				= "spider";
$password 			= "spider";

$addRowNumbersToFinalTable 	= false;

$engine 			= "MyISAM";
$federatedEngine 		= "federated";

##############################################  
...

Now you are able to execute PaQu by issuing:

/opt/paqu/paqu.php

In order to run a query, you need to specify the query, the result database and table where the results will be saved into, and a log file, into which details of the query will be written to (things like the actual query, what PaQu executed on each node and any other output that shard-query, the base of PaQu, produced).

/opt/paqu/paqu.php "SELECT COUNT(*) FROM shardedDatabase.bigTable" resultDB.resultTable ~/query.log

If everything worked, you should now have a new table in resultDB, with the results of your query. Further, your query.log file should contain information about what PaQu actually did to achieve the result.