In this post I describe how to deploy a cluster (i.e. multiple servers) of dedicated MariaDB/MySQL database machines, how we set them up at the Leibniz Institute for Astrophysics, and how they are used together with the Spider storage engine and PaQu. The cluster will be used to server hundreds of terra bytes of data from cosmological simulations and observational campaigns to scientists all around the world. The cluster will be accessible through our data analysis webpages that we built on Daiquiri.
The shiny new database cluster at the Leibniz Institute for Astrophysics consists of one head node which will be the Spider host and 10 database nodes. The head node has the following hardware configuration:
Supermicro Chassis SC825TQ-R740LPB Supermicro Motherboard MBD-X9DRI-F-B
2x 8 Core Intel Xeon E5-2650 v2 @ 2.60 GHz CPU 128 GB RAM (8 x 16GB DDR3, reg ECC 1866 MHz) 2x Gigabit Ethernet on board 1x Ethernet for IPMI Mellanox 2 Port QDR MCX354A-QCBT QDR InfiniBand and 10G, ConnectX-3
For system disks: RAID1 Controller: LSI 9211-4i 2x Hitachi HTE725050A7E630 500GB
For storage: 1x RAID Controller LSI 9271-8i 1x Cache Vault 8x 4 TB SATA3 disks, HGST, 64MB Cache. ULTRASTAR 7K4000 HUS724040ALA640
and the individual nodes are equal to the head node except the following:
2x 4 Core Intel Xeon E5-2609 v2 @ 2.50 GHz CPU 32 GB RAM (8 x 4GB DDR3, reg ECC 1866 MHz)
We will have the system on the small mirrored hardware RAID and the database data on the large hardware RAID. You might have a different setup and you would need to adjust the configuration accordingly.
And to mention some unscientific measurements about I/O performance per Node in this setup:
- hdparm states read performance of around 960 MB/sec.
- MariaDB 10.11: 2 threads count full table scan: around 600 MB/sec
- MariaDB 10.11: 1 thread count full table scan: around 450 MB/sec
- MariaDB 10.11: 1 thread where conditioned full table scan: around 370 MB/sec
First we will describe shortly how we set up the individual machines, then how to install MariaDB/MySQL with all the required extensions and then how this setup is deployed onto each individual node. Further we will describe how the distributed database is setup using the Spider engine.
We are using CentOS 6.5 with a standard “server” installation. Additionally to the standard installation, the following packages were installed:
- gcc gcc-c++
- group “X Window System”
- ncurses-libs ncurses-devel
- libxml2 libxml2-devel
- boost boost-devel
- bison bison-devel
- unixODBC unixODBC-devel
- php php-pear-MDB2-Driver-mysqli
- Group: Infiniband Support
We also installed mysql from the repository, even though we will compile our own version later on. This is to ensure that we obtain a reasonable init.d script for starting and stopping mysql, have the mysql user set up, and obtain the my.cnf file at reasonable paths so that we don’t need to copy anything not covered by “make install”.
Further we created a default (usergroup 1000) user with the fance name of “spider” that the admins have to use to log into the server and which has sudo rights. This user has a home directory on the head node, that will be exported to all the individual nodes through nfs.
Next we will compile MariaDB 10.0.11 which we obtain from https://www.mariadb.org. We cannot use a precompiled binary package, since we need to compile various add ons ourself. So we need the source and should be familiar with compiling anyways. OK, you could get a binary and then use the source, but honestly, I have no clue whether the MariaDB binaries already come with the Spider engined enabled. Anyways:
Logged into as the shared “spider” user we run:
Next we compile MariaDB using cmake (more precisely using cmake-gui, since this provides a more intuitive environment. Make sure you “ssh -XY”-ed into the head node and it has X11 running).
In the cmake-gui window, click on “Configure”. The cmake-configure script will be run and setup the build environment that we will need to change next.
We need to change the following configuration entries:
- CMAKE_INSTALL_PREFIX: /usr/local/mariadb-10.0.11 We will have each version of mysql installed in different directories and have a dynamic link from there to /usr/local/mysql for easy updates.
- MYSQL_DATADIR: /store/spider/data or wherever you want to store the actual database files (make sure this directory exists and the “mysql” user has read/write permissions)
- TMPDIR: /store/spider/tmp or wherever you want to store temporary files (make sure this directory exists and the “mysql” user has read/write permissions)
And we are going to add the following additional settings for enabling TokuDB (it might come handy once) and the Spider engine (use Add Entry and create boolean types):
Once you are done, press “Configure” again and then “Generate”. Close cmake-gui and build MariaDB:
or if you have less CPU cores, reduce the number after -j. Next, install the built binaries and clean the compilation directory (we will need to retain the MariaDB sources to compile all plugins).
Next we will link MariaDB in /usr/local
And we need to edit the init.d file accordingly to point to the newly installed MariaDB binaries. With your editor of “believe”, edit all paths in /etc/init.d/mysqld. I’m using emacs:
Next we setup the /etc/my.cnf configuration file. At this stage, we only set the most crucial parameters. Again with the editor of choice:
Then we need to initialize the MariaDB data directories with the procedure given in: https://mariadb.com/kb/en/generic-build-instructions/
Now we are ready to start MariaDB for the first time:
And hopefully everything worked out well.
Installing PaQu and other useful Plugins
As a side note: PaQu itself is a stand-alone PHP tool, that solely produces query plans that need to be submitted to MariaDB for execution. A simple tool is provided with PaQu, but the main purpose was to implement PaQu into other tools. What we will install here, is the “PaQu Kill daemon” that runs on the Spider nodes. The kill daemon queries the head node in given intervals to see if any query that is executed on the Spider node has been killed on the head node (for example killed by our query-queue). The actual PaQu command line query tool is described at the end of this post.
PaQu Kill daemon
We are now going to compile and install the PaQu kill daemon plugin for MariaDB/MySQL. The procedure in compiling plugins that we developed is always the same, so if you manage to compile and install on of our plugins once, you will always succeed (at least in theory) with the others.
We now need to adjust the cmake Makefile to point to the path of the MariaDB/MySQL source code and binaries. Again with your devine editor:
Now we start with the compilation:
Once we have our Spider nodes installed, runn the following to register the plugin with MariaDB/MySQL run:
At the moment however, we are setting up the head node, so we are not going to run the kill daemon on the head. We just compiled it.
In order to support distributed standard deviation calcuations, we need to apply the same procedure to:
We will further continue with the following packages in our specific setup:
Other nice plugins
Query queue: https://github.com/adrpar/mysql_query_queue SPRNG distributed random numbers: https://github.com/adrpar/mysql_sprng Our collection of usefull UDFs: https://github.com/adrpar/mysql_udf Validate SQL without executing query plugin: https://github.com/adrpar/mysql_validateSQL MySQL sphere (trigonometry on a sphere): https://github.com/adrpar/mysql_sphere
Cloning the system to the Spider nodes
Since we decided to install a clean system on the head node that will be identical to the system on each Spider node. The Spider nodes will only differ in which plugins are enabled in MariaDB/MySQL and that the Spider nodes have a NFS shared home directory with the head node.
The cloning is done using Clonezilla. Clonezilla allows the creation of a verbatim copy of a disk. In our case this would be the system disk of the head node. We use Clonezilla in a PXE boot environment, creating and restoring images from a common NFS mount point in our cluster.
There are many good tutorials available on how to use Clonezilla. Like this one here. It is very straight forward to use.
Adjustments on the cloned Spider nodes
After all the Spider nodes have been initialised with the system on the head node, we need to make some adjustments to each node separately (again assuming CentOS).
First set new host name:
HOSTNAME=<new fancy host name>
NOZEROCONF=yes #not necessarily needed
Get rid of network rules:
Adjust the fstab accordingly (in our case):
LABEL=spider<XX add node number here> /store/spider xfs defaults 0 0
Create the file system on the large RAID partition:
mkfs.xfs -L spider<XX add node number here> /dev/<approperiate device (here sda)>
Recreate all machine ssh keys:
service sshd restart
- If you are using infiniband, set the IP addresses accordingly:
IPADDR=<here your fancy IB ip address>
- if needed, also update the
Setup the MariaDB/MySQL configuration files and initialise the databases on the Spider nodes
In principle we are now done. The only thing that is missing, is an updated my.cnf file and a copy of the bare initialised MariaDB/MySQL data directory (if the data is not on the system disk). We will first start by copying the data over to each node and then proceed with the configuration file on the head node and on the Spider nodes. But before we start, we are going to setup “cssh – cluster ssh” to make our live easier.
Setting up cluster ssh on the head node
ssh -XY into your head node. There install cluster ssh (we need to include the EPEL repository for this):
check that EPEL is active:
and now we can install cluster ssh:
Next we need to configure cssh:
Next we are going to setup a shared ssh-key for each individual head node users (in our case root and spider), so that we can cssh password-less into all Spider nodes:
start cssh and login into each node:
then for each node (also the head node if we are going to NFS mount the home directory with the Spider nodes):
Now test if everything worked by cssh-ing into all nodes (no password should be needed now):
Setting up the head node my.cnf file
On the head node, our
my.cnf file looks as follows (you might need to adjust certain parameters):
Setup access for Spider node on head node
The Spider nodes need to be able to access the head node. Therefore we are going to create a user for the Spider nodes to access the head node. On the head node, enter MariaDB/MySQL and run the following (we are creating a user that has access from any point in the network – if you don’t want this, you need to create an individual user for each Spider node):
The Spider node my.cnf file
On the head node create a file named
/etc/my.cnf.nodes and set it up accordingly (we are using the setup below):
Then copy this file over to all Spider nodes using cssh:
And if needed, alter node specific information in
/etc/my.cnf on each Spider node accordingly (such as the mysql_SPRNG settings, if you are using SPRNG).
Setting up the NFS mount of the “spider” user directory on each Spider node
We are going to configure the head node to export the home directory to the Spider node, so that it is easy to exchange SQL scripts during table creation.
On the head node run the following:
Install nfs if needed:
Turn nfs on:
Setup the export configuration. In NFS4 you bind the directories you want to export to a specific base directory (in our case /nfs4).
add the following
Now we need to create the appropriate directories:
and allow NFS4 in the firewall settings:
and finally restart NFS4:
And on the Spider nodes you register the NFS mount to the home directory (don’t forget to unmount the existing home partition if applicable):
comment the home directory entry if applicable and add:
Configuring Spider nodes
If you happened to have cloned your MariaDB/MySQL data directory from the head node, you might have the mysql_qqueue running on each node. Uninstall it according to the instructions of the queue for each Spider node. Usually this is done with:
Further, you need to install the PaQu kill daemon by running:
Setup access for Spider head node on the Spider node
The head node needs to be able to access each Spider node. Therefore we are going to create a user for the head node to access the individual nodes. On each Spider node, enter MariaDB/MySQL and run the following (we are creating a user that has access from any point in the network – if you don’t want this, you need to create an specific user for the head node):
Since PaQu is a standalone PHP tool, this is pretty straight forward. We are going to put PaQu into the
/opt directory. Since we already have it cloned from the git repository, we just need to copy the sources. On the head node execute:
Next, edit the paqu.php file accordingly, to reflect the server settings (basically you just need to give the MySQL connection string to the head node and the user name and password with which the head node connects to the Spider nodes):
Now you are able to execute PaQu by issuing:
In order to run a query, you need to specify the query, the result database and table where the results will be saved into, and a log file, into which details of the query will be written to (things like the actual query, what PaQu executed on each node and any other output that shard-query, the base of PaQu, produced).
If everything worked, you should now have a new table in resultDB, with the results of your query. Further, your query.log file should contain information about what PaQu actually did to achieve the result.