In this article, we explain how to use some of the advanced features in NGINX Plus and NGINX Open Source to load balance MySQL™ database servers in a Galera cluster. We explore one approach to eliminating replication problems in a cluster, the advantages of splitting reads and writes across load‑balancing database servers, and how to configure application health checks to confirm that a database server is actually responding to queries.
(Both NGINX and NGINX Plus support the features we’re discussing, but for ease of reading we refer to NGINX Plus throughout.)
Setting Up a Galera Cluster of MySQL Database Servers
This blog builds on our previous blog about MySQL load balancing with NGINX Plus and assumes you have a working knowledge of MySQL, Galera Cluster, and TCP load balancing with NGINX Plus.
We followed the DigitalOcean guide to set up three MariaDB database servers as a Galera cluster, but you can set up an equivalent cluster however you choose, and substitute MySQL for MariaDB. (MariaDB is an open source database which the vendor states is “a binary drop‑in replacement for MySQL”. For our purposes they are completely equivalent, but if you’re interested you can read about the differences between them.)
We also configured NGINX Plus to proxy incoming requests from the application to the database cluster.
Configuring Simple Load Balancing
For a simple kind of load balancing, we have NGINX Plus forward all queries to the designated primary in the cluster. We’re doing TCP load balancing, so we put the directives in the stream
context:
stream {
upstream galera {
#default to primary, fallback to secondary when unavailable
server db-node01.example.com:3306;
server db-node02.example.com:3306 backup;
server db-node03.example.com:3306 down;
}
server {
listen 3306;
proxy_pass galera;
proxy_connect_timeout 1s;
}
}
Here we’re using a three‑server cluster with one designated as the primary, but the basic technique can be expanded to very large clusters.
This configuration confers a number of advantages when working with the cluster.
- We can conceive of the cluster as a single database for the purpose of development, while still taking advantage of the replication benefits of having a cluster.
- We no longer have to worry about parallel writes and collisions happening between the multiple primaries in the cluster. In the previous blog, all servers acted as primary and we had an issue with parallel updates to different servers leading to potentially erroneous data. Even though the cluster was behaving as directed, when a client tried to insert multiple values at once on multiple servers the results were counterintuitive.
- We have eliminated replication lag, where a high volume of traffic relative to the network speed between servers causes a noticeable delay between writing data on one server and being able to read it on a replicating server. In other words, if you read data from one server immediately after it’s written to the database by another server, you might not see the update. We avoid this in our setup because the load balancer directs all traffic to the single primary MySQL server.
Splitting Reads and Writes
While writes are best directed at the primary server in a cluster, it’s often desirable to direct read queries to the secondary members of the cluster, which aren’t doing as much of the heavy lifting. Setting this up is a bit more involved – the NGINX Plus load balancer doesn’t parse the incoming SQL queries, so the reads and writes have to be distinguished in the application tier instead.
The easiest way to split reads and writes is to set up separate load balancers for them, each listening on a different port. The application directs all transactions containing just SELECT
queries to the read load balancer and all the transactions with INSERT
, UPDATE
, or DELETE
queries to the write load balancer. Here’s a sample config file for this setup.
stream {
# Separate TCP load balancers for reads and writes
upstream galera_read {
#zone is necessary for health checks later
zone backend 64k;
server db-node02.example.com:3306;
server db-node03.example.com:3306;
}
upstream galera_write {
zone backend;
server db-node01.example.com:3306;
server db-node02.example.com:3306 backup;
server db-node03.example.com:3306 down;
}
server {
listen 3308;
status_zone tcp_server;
proxy_pass galera_read;
proxy_connect_timeout 1s;
}
server {
listen 3309;
status_zone tcp_server;
proxy_pass galera_write;
proxy_connect_timeout 1s;
}
}
We then have to rework our application code to direct requests to the read port (here, 3308) or write port (3309) as appropriate. This can be a fair amount of work, but many common database libraries (such as Django’s and Hibernate) provide facilities for making your application more database‑aware in the context of multiple databases.
It’s important to note that while the intent of the application architecture is for reads to go through the read load balancer and writes through the write load balancer, nothing prevents reading on the write port and vice versa. To prevent that, we could separate the load balancers on different hosts and grant them different permissions so that the read load balancer would be restricted by the database to just reads, but we’re not showing that configuration here.
Configuring Application Health Checks
When it comes to application health checks with SQL server load balancing, you have a lot of options. The simplest is just to verify that when NGINX Plus forwards a connection to the upstream database servers, it responds to the attempt to connect to the database. This is the default behavior in both NGINX Plus and NGINX. While the proxy doesn’t have the ability to communicate natively with the database, you get a quick response that lets you know the servers are still there.
This simple kind of health check is sufficient in some cases, but it has an awkward drawback – the NGINX Plus load balancer continues to send requests to an upstream database server that is accepting new connections even if it can’t actually service requests (for example, when a replication issue between the database servers prevents the server from performing writes).
In production, you need a way to parse the database response to something meaningful to more fully make use of NGINX’s health‑check capabilities. This is usually done by running a process on the database server, on an open port, that checks the database’s health and responds to HTTP requests from the load balancer with the database’s status. You can either write your own script that runs a simple SELECT
query (or any database health check you choose) whenever a health request ping comes in, or use this existing health‑check script, which performs a simple show
databases
query on the database nodes.
For the test cluster, we installed the linked health‑check script from GitHub and set it up as a service running on port 9200 (its default port). Since the script uses HTTP to respond to the health check of the pings, we confirm it that it’s working correctly by checking the head of its HTTP response when the node is healthy and again when it is down (we’re showing just the first line of the curl
output, with the status code):
user@dbnode-03:# curl -i http://localhost:9200
HTTP/1.1 200 OK
...
user@dbnode-03:# sudo service mysql stop
* Stopping MariaDB database server mysqld [ OK ]
user@dbnode-03:# curl -i http://localhost:9200
HTTP/1.1 503 Service Unavailable
...
The final step is to set up NGINX Plus to use port 9200 for the application health check. The ability to direct health checks to a specific port is a new feature in NGINX Plus Release 8. It’s as simple as adding the port
parameter to the health_check
directive in the server
block:
stream {
upstream galera {
# ...
}
server {
listen 3306;
proxy_pass galera;
proxy_connect_timeout 1s;
health_check port=9200;
}
}
If you’re using NGINX rather than NGINX Plus, you can use a clever workaround to close the MySQL port to outside traffic whenever the server is unavailable. This allows you to get the same in‑depth health check as with NGINX Plus, but in this case the health check is sent to the same port as regular requests to the service.
Whichever method you use, when the health check returns an error response, the NGINX Plus load balancer automatically removes the database server from the pool until the server recovers. A common cause of database downtime is unwittingly giving it too much work, for example an unexpectedly large join or a query that suddenly becomes slow. In this case, the server usually recovers on its own, gradually passing a greater percentage of health checks, but recovery is faster if it has a chance to recover from the excessive load. For this scenario, NGINX Plus offers a way to slowly ramp up traffic to a server over a period of time after it has failed health checks.
Conclusion
We’ve reviewed and expanded on some concepts in MySQL load balancing using a simple Galera cluster setup. Using NGINX Plus as the load balancer, we split distributed load differently for reads and writes to maximize resources. We also looked at a few ways to do in‑depth health checks for the cluster so NGINX Plus can better manage the upstream resources it’s load balancing. The combination of NGINX Plus and Galera allows you to maximize your hardware resources while maintaining reliability.
To try out NGINX Plus for MySQL load balancing in your own environment, start your free 30-day trial today or contact us to discuss your use cases.