Many websites or projects experiencing exponential growth face
distributed data storage issues. GlusterFS is an unified, poly-protocol,
scale-out filesystem, capable of serving PBs of data at lightning
speeds and turns common hardware into a high-performance scalable
storage solution.
In this tutorial, we will review a basic replication setup between two
nodes which allows instant synchronization of a specific directory, as
well their related content, permissions changes, etc. If certain terms
used are unfamiliar, please consult the the
GlusterFS documentation.
Disk Partitioning
For a replication setup, GlusterFS requires an identical disk partition present on each node. We will use
apollo and
chronos as nodes, with one GlusterFS volume and brick replicated across the nodes.
From my experience, the most important part of a GlusterFS setup is
planing ahead your disks partitioning. If you have a proper layout, it
will be very easy to create a designated GlusterFS volume group and
logical volume for each node.
We will use as example the following partitioning, present in both nodes:
# df -ah /mnt/gvol0
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_gluster-lv_gvol0
10G 151M 9.2G 2% /mnt/gvol0
The GlusterFS volume naming conventions are:
- /<base directory>/<volume name>/<brick name>/brick
For Selinux compatibility, we will use
/mnt as base directory,
gvol0 as volume name and
brick0 as brick name. Create the volume path on each node:
# ls -lah /mnt/gvol0
total 28K
drwxr-xr-x. 4 root root 4.0K Sep 9 18:52 .
drwxr-xr-x. 3 root root 4.0K Sep 9 18:52 ..
drwx------. 2 root root 16K Sep 9 17:33 lost+found
# install -d -m 0755 /mnt/gvol0/brick0/brick
We are done with the initial setup, let's proceed to GlusterFS configuration.
GlusterFS Setup
Install GlusterFS into each node, by running the following commands:
# yum --enablerepo=axivo install glusterfs-server
# service rpcbind start
Enterprise Linux 6:
# chkconfig glusterd on
# service glusterd start
Enterprise Linux 7:
# systemctl enable glusterd.service
# systemctl start glusterd.service
Before probing the nodes, open the required firewall ports. GlusterFS uses the following ports:
- 111 (tcp and udp) - rpcbind
- 2049 (tcp) - nfs
- 24007 (tcp) - server daemon
- 38465:38469 (tcp) - nfs related services
- 49152 (tcp) - brick
We used the following
iptables configuration file, for each node:
# cat /etc/sysconfig/iptables-glusterfs
-A INPUT -m state --state NEW -m tcp -p tcp -s 192.168.1.0/24 --dport 111 -j ACCEPT
-A INPUT -m state --state NEW -m udp -p udp -s 192.168.1.0/24 --dport 111 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp -s 192.168.1.0/24 --dport 2049 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp -s 192.168.1.0/24 --dport 24007 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp -s 192.168.1.0/24 --dport 38465:38469 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp -s 192.168.1.0/24 --dport 49152 -j ACCEPT
If you use
Firewalld, add the following rules for each node:
# firewall-cmd --permanent --zone=public --add-rich-rule='rule family="ipv4" source address="192.168.1.0/24" port port="111" protocol="tcp" accept'
# firewall-cmd --permanent --zone=public --add-rich-rule='rule family="ipv4" source address="192.168.1.0/24" port port="111" protocol="udp" accept'
# firewall-cmd --permanent --zone=public --add-rich-rule='rule family="ipv4" source address="192.168.1.0/24" port port="2049" protocol="tcp" accept'
# firewall-cmd --permanent --zone=public --add-rich-rule='rule family="ipv4" source address="192.168.1.0/24" port port="24007" protocol="tcp" accept'
# firewall-cmd --permanent --zone=public --add-rich-rule='rule family="ipv4" source address="192.168.1.0/24" port port="38465-38469" protocol="tcp" accept'
# firewall-cmd --permanent --zone=public --add-rich-rule='rule family="ipv4" source address="192.168.1.0/24" port port="49152" protocol="tcp" accept'
You will need to add a port for each additional brick. Since we use only one brick, 49152 is sufficient.
Make sure each node name resolves properly in your DNS setup and probe each node:
[root@apollo ~]# gluster peer probe chronos
peer probe: success.
[root@chronos ~]# gluster peer probe apollo
peer probe: success.
On
apollo only, run the following command to create the replication volume:
[root@apollo ~]# gluster volume create gvol0 replica 2 {apollo,chronos}:/mnt/gvol0/brick0/brick
volume create: gvol0: success: please start the volume to access data
Breaking down the above command, we told GlusterFS to create a
replica
volume and keep a copy of the data on at least 2 bricks at any given
time. Since we only have two bricks, this means each server will house a
copy of the data. Lastly, we specify which nodes and bricks to use.
Verify the volume information and start the volume:
[root@apollo ~]# gluster volume info
Volume Name: gvol0
Type: Replicate
Volume ID: 2b9c2607-9569-48c3-9138-08fb5d8a213f
Status: Created
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: apollo:/mnt/gvol0/brick0/brick
Brick2: chronos:/mnt/gvol0/brick0/brick
[root@apollo ~]# gluster volume start gvol0
volume start: gvol0: success
We are done with the server setup, let's proceed to GlusterFS replication.
Replication Setup
We will use
/var/www/html as replication directory across the two nodes. Please make sure the directory
does not contain
any files. Once the directory is mounted as GlusterFS type, any
previous file present into directory will not be available anymore.
Execute the following commands, to mount
/var/www/html as GlusterFS type:
[root@apollo ~]# install -d -m 0755 /var/www/html
[root@apollo ~]# cat >> /etc/fstab << EOF
apollo:/gvol0 /var/www/html glusterfs defaults 0 0
EOF
[root@apollo ~]# mount -a
[root@apollo ~]# mount -l -t fuse.glusterfs
apollo:/gvol0 on /var/www/html type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072)
[root@chronos ~]# install -d -m 0755 /var/www/html
[root@chronos ~]# cat >> /etc/fstab << EOF
chronos:/gvol0 /var/www/html glusterfs defaults 0 0
EOF
[root@chronos ~]# mount -a
[root@chronos ~]# mount -l -t fuse.glusterfs
chronos:/gvol0 on /var/www/html type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072)
List the GlusterFS pool, to verify any node connectivity issues:
[root@apollo ~]# gluster pool list
UUID Hostname State
5cb470ae-3c88-46fa-8cb9-2dd29d26e104 chronos Connected
188975c7-6d69-472a-b421-641286551d28 localhost Connected
To test the GlusterFS replication, we installed Nginx into both nodes, created a file on
apollo and changed its content as well the file permissions randomly:
[root@apollo ~]# ls -lah /var/www/html
total 8K
drwxr-xr-x. 3 root root 4.0K Sep 9 19:38 .
drwxr-xr-x. 3 root root 4.0K Sep 9 19:38 ..
[root@apollo ~]# yum --enablerepo=axivo install nginx
[root@chronos ~]# yum --enablerepo=axivo install nginx
[root@apollo ~]# ls -lah /var/www/html
total 12K
drwxr-xr-x. 3 root root 4.0K Sep 9 20:01 .
drwxr-xr-x. 3 root root 4.0K Sep 9 19:38 ..
-rw-r--r--. 1 root root 535 Oct 11 2009 404.html
-rw-r--r--. 1 root root 543 Oct 11 2009 50x.html
-rw-r--r--. 1 root root 198 May 6 2006 favicon.ico
-rw-r--r--. 1 root root 528 Oct 11 2009 index.html
-rw-r--r--. 1 root root 377 May 6 2006 nginx.gif
[root@apollo ~]# vi /var/www/html/info.php
[root@chronos ~]# chown nginx /var/www/html/info.php
[root@apollo ~]# cat /var/www/html/info.php
<?php
phpinfo();
The file was instantly replicated on
chronos, with identical content and permissions on both nodes:
[root@apollo ~]# ls -lah /var/www/html
total 13K
drwxr-xr-x. 3 root root 4.0K Sep 9 20:06 .
drwxr-xr-x. 3 root root 4.0K Sep 9 19:38 ..
-rw-r--r--. 1 root root 535 Oct 11 2009 404.html
-rw-r--r--. 1 root root 543 Oct 11 2009 50x.html
-rw-r--r--. 1 root root 198 May 6 2006 favicon.ico
-rw-r--r--. 1 root root 528 Oct 11 2009 index.html
-rw-r--r--. 1 nginx root 18 Sep 9 20:06 info.php
-rw-r--r--. 1 root root 377 May 6 2006 nginx.gif
[root@chronos ~]# cat /var/www/html/info.php
<?php
phpinfo();
You are currently running a high-performance, scalable replication system.
Troubleshooting
The logs are the best place to start your troubleshooting, examine any encountered errors into
/var/log/glusterfs/var-www-html.log file. Please don't turn off your firewall just because you think it is blocking your setup, the firewall adds an
important security layer and should
never be disabled. Instead, study the logs and find the source of problems.
An easy way to determine which ports are used in GlusterFS is by running a volume status check:
# gluster volume status
Status of volume: gvol0
Gluster process Port Online Pid
------------------------------------------------------------------------------
Brick apollo:/mnt/gvol0/brick0/brick 49152 Y 1220
Brick chronos:/mnt/gvol0/brick0/brick 49152 Y 1363
NFS Server on localhost 2049 Y 1230
Self-heal Daemon on localhost N/A Y 1235
NFS Server on chronos 2049 Y 1371
Self-heal Daemon on chronos N/A Y 1375
Task Status of Volume gvol0
------------------------------------------------------------------------------
There are no active volume tasks
You can also run
netstat -tulpn | grep gluster to examine further the used ports.
This tutorial covered an infinitesimal part of GlusterFS capabilities.
You need to understand that rushing through the tutorial without proper
understanding or reading the
documentation
will result in a failure. Once you understand how GlusterFS works, you
are welcome to ask any related questions into our support forums.