MySQL Replication - Percona

MySQL Replication By Stephane Combaudon, Aurimas Mikalauskas and Muhammad Irfan

MySQL's built-in replication is the foundation for building large, high-performance applications on top of MySQL using the "scale-out" architecture. MySQL replication enables DBAs to configure one or more server as replicas of another server - keeping the data synced with the master copy. This Percona eBook answers the questions: * "How does MySQL Replication really work?" * "How to identify and cure MySQL replication slave lag" * "GTIDs in MySQL 5.6: New replication protocol; new ways to break replication"

Copyright © 2006-2014 Percona LLC

MySQL Replication

Table of Contents Chapter 1: How does MySQL Replication really work? Chapter 2: How to identify and cure MySQL replication slave lag Chapter 3: GTIDs in MySQL 5.6: New replication protocol; new ways to break replication

3 6 11

About Percona Percona was founded in August 2006 and now employs a global network of experts with a staff of more than 120 people. Our customer list is large and diverse, including Fortune 50 corporations, popular websites, and small startups. We have over 2,000 customers and, although we do not reveal all of their names, chances are we're working with nearly every large MySQL user you've heard about. To put Percona's MySQL expertise to work for you, please contact us.

Is this an emergency? Get immediate assistance from Percona Support 24/7. Click here Skype: oncall.percona GTalk: [email protected] AIM (AOL Instant Messenger): oncallpercona Telephone direct-to-engineer: +1-877-862-4316 or UK Toll Free: +44-800-088-5561 Telephone to live operator: +1-888-488-8556 Customer portal: https://customers.percona.com/

Copyright © 2006-2014 Percona LLC

MySQL Replication Chapter 1: How does MySQL Replication

How does MySQL Replication really work? By Aurimas Mikalauskas While Percona experts have written in-depth on MySQL replication on our MySQL Performance Blog, such as on replication being single-threaded, on semi-synchronous replication or on estimating replication capacity, I don’t think we’ve ever covered the very basics of how MySQL replication really works. That’s one of the main reasons for publishing this eBook on the subject. Of course, there are many aspects of MySQL replication, but my main focus will be the logistics – how replication events are written on the master, how they are transferred to the replication slave and then how they are applied there. Note that this is NOT a HOWTO setup replication, but rather a type of thing.

Replication events I say replication events in this article because I want to avoid discussion about different replication formats. These are covered pretty well . Put simply, the events can be one of two types: Statement based – in which case these are write queries Row based – in this case these are changes to records, sort of row diffs if you will But other than that, I won’t be going back to differences in replication with different replication formats, mostly because there’s very little that’s different when it comes to transporting the data changes.

On the master So now let me start with what is happening on the master. For replication to work, first of all master needs to be writing replication events to a special log called binary log. This is usually very lightweight activity (assuming events are not synchronized to disk), because writes are buffered and because they are sequential. The binary log file stores data that replication slave will be reading later. Whenever a replication slave connects to a master, master creates a new thread for the connection (similar to one that’s used for just about any other server client) and then it does whatever the client – replication slave in this case – asks. Most of that is going to be (a) feeding replication slave with events from the binary log and (b) notifying slave about newly written events to its binary log. Slaves that are up to date will mostly be reading events that are still cached in OS cache on the master, so there is not going to be any physical disk reads on the master in order to feed binary log

3


events to slave(s). However, when you connect a replication slave that is few hours or even days behind, it will initially start reading binary logs that were written hours or days ago – master may no longer have these cached, so disk reads will occur. If master does not have free IO resources, you may feel a bump at that point.

On the replica Now let’s see what is happening on the slave. When you start replication, two threads are started on the slave: 1. IO thread This process called IO thread connects to a master, reads binary log events from the master as they come in and just copies them over to a local log file called relay log. That’s all. Even though there’s only one thread reading binary log from the master and one writing relay log on the slave, very rarely copying of replication events is a slower element of the replication. There could be a network delay, causing a steady delay of few hundred milliseconds, but that’s about it. If you want to see where IO thread currently is, check the following in “show slave statusG”: Master_Log_File – last file copied from the master (most of the time it would be the same as last binary log written by a master) Read_Master_Log_Pos – binary log from master is copied over to the relay log on the slave up until this position. And then you can compare it to the output of “show master statusG” from the master. 2. SQL thread The second process – SQL thread – reads events from a relay log stored locally on the replication slave (the file that was written by IO thread) and then applies them as fast as possible. This thread is what people often blame for being single-threaded. Going back to “show slave statusG”, you can get the current status of SQL thread from the following variables: Relay_Master_Log_File – binary log from master, that SQL thread is “working on” (in reality it is working on relay log, so it’s just a convenient way to display information) Exec_Master_Log_Pos – which position from master binary log is being executed by SQL thread.

4


Replication lag Now I want to briefly touch the subject of replication lag in this context. When you are dealing with replication lag, first thing you want to know is which of the two replication threads is behind. Most of the time it will be the SQL thread, still it makes sense to double check. You can do that by comparing the replication status variables mentioned above to the master binary log status from the output of “show master statusG” from the master. If it happens to be IO thread, which, as I mentioned many times already, is very rare, one thing you may want to try to get that fixed is . Otherwise, if you are sure it is SQL thread, then you want to understand what is the reason and that you can usually observe by vmstat. Monitor server activity over time and see if it is “r” or “b” column that is “scoring” most of the time. If it is “r”, replication is CPU-bound, otherwise – IO. If it is not conclusive, mpstat will give you better visibility by CPU thread. Note this assumes that there is no other activity happening on the server. If there is some activity, then you may also want to look at diskstats or even do a query review for SQL thread to get a good picture. If you find that replication is CPU bound, this maybe very helpful. If it is IO bound, then fixing it may not be as easy (or rather, as cheap). Let me explain. If replication is IO bound, most of the time that means that SQL thread is unable to read fast enough because reads are single threaded. Yes, you got that right – it is reads that are limiting replication performance, not writes. Let me explain this further. Assume you have a RAID10 with a bunch of disks and write-back cache. Writes, even though they are serialized, will be fast because they are buffered in the controller cache and because internally RAID card can parallelize writes to disks. Hence replication slave with similar hardware can write just as fast as master can. Now Reads. When your workset does not fit in memory, then the data that is about to get modified is going to have to be read from disk first and this is where it is limited by the single-threaded nature of the replication, because one thread will only ever read from one disk at a time. That being said, one solution to fix IO-bound replication is to increase the amount of memory so working set fits in memory. Another – get IO device that can do much more IO operations per second even with a single thread – fastest traditional disks can do up to 250 iops, SSDs – in the order of 10,000 iops. In Chapter Two, Percona support engineer Muhammad Irfan will talk about identifying and curing MySQL replication slave lag. 5

MySQL Replication Chapter 2: How to identify and cure MySQL replication slave lag

How to identify and cure MySQL replication slave lag By Muhammad Irfan

Here on the Percona MySQL Support team, we often see issues where a customer is complaining about replication delays – and many times the problem ends up being tied to MySQL replication slave lag. This of course is nothing new for MySQL users and we’ve had a few posts here on the MySQL Performance Blog on this topic over the years (two particularly popular post in the past were: “Reasons for MySQL Replication Lag” and “Managing Slave Lag with MySQL Replication,” both by Percona CEO Peter Zaitsev). In this chapter, however, I will share some new ways of identifying delays in replication – including possible causes of lagging slaves – and how to cure this problem. How to identify Replication Delay MySQL replication works with two threads, IO_THREAD & SQL_THREAD. IO_THREAD connects to a master, reads binary log events from the master as they come in and just copies them over to a local log file called relay log. On the other hand, SQL_THREAD reads events from a relay log stored locally on the replication slave (the file that was written by IO thread) and then applies them as fast as possible. Whenever replication delays, it’s important to discover first whether it’s delaying on slave IO_THREAD or slave SQL_THREAD. Normally, I/O thread would not cause a huge replication delay as it is just reading the binary logs from the master. However, It depends on the network connectivity, network latency… how fast is that between the servers. The Slave I/O thread could be slow because of high bandwidth usage. Usually, when the slave IO_THREAD is able to read binary logs quickly enough it copies and piles up the relay logs on the slave – which is one indication that the slave IO_THREAD is not the culprit of slave lag. On the other hand, when the slave SQL_THREAD is the source of replication delays it is probably because of queries coming from the replication stream are taking too long to execute on the slave. This is sometimes because of different hardware between master/slave, different schema indexes, workload. Moreover, the slave OLTP workload sometimes causes replication delays because of locking. For instance, if a long-running read against a MyISAM table blocks the SQL thread, or any transaction against an InnoDB table creates an IX lock and blocks DDL in the SQL thread. Also, take into account that slave is single threaded prior to MySQL 5.6, which would be another reason for delays on the slave SQL_THREAD. Let me show you via master status/slave status example to identify either slave is lagging on slave IO_THREAD or slave SQL_THREAD on the following page.

6


7


This clearly suggests that the slave IO_THREAD is lagging and obviously because of that the slave SQL_THREAD is lagging, too, and it yields replication delays. As you can see the Master log file is mysql-bin.018196 (File parameter from master status) and slave IO_THREAD is on mysql-bin.018192 (Master_Log_File from slave status) which indicates slave IO_THREAD is reading from that file, while on master it’s writing on mysql-bin.018196, so the slave IO_THREAD is behind by 4 binlogs. Meanwhile, the slave SQL_THREAD is reading from same file i.e. mysql-bin.018192 (Relay_Master_Log_File from slave status) This indicates that the slave SQL_THREAD is applying events fast enough, but it’s lagging too, which can be observed from the difference between Read_Master_Log_Pos & Exec_Master_Log_Pos from show slave status output. You can calculate slave SQL_THREAD lag from Read_Master_Log_Pos – Exec_Master_Log_Pos in general as long as Master_Log_File parameter output from show slave status and Relay_Master_Log_File parameter from show slave status output are the same. This will give you rough idea how fast slave SQL_THREAD is applying events. As I mentioned above, the slave IO_THREAD is lagging as in this example then off course slave SQL_THREAD is behind too. You can read detailed description of show slave status output fields Also, the Seconds_Behind_Master parameter shows a huge delay in seconds. However, this can be misleading, because it only measures the difference between the timestamps of the relay log most recently executed, versus the relay log entry most recently downloaded by the IO_THREAD. If there are more binlogs on the master, the slave doesn’t figure them into the calculation of Seconds_behind_master. You can get a more accurate measure of slave lag using pt-heartbeat from Percona Toolkit. So, we learned how to check replication delays – either it’s slave IO_THREAD or slave SQL_THREAD. Now, let me provide some tips and suggestions for what exactly causing this delay. Tips and Suggestions What Causing Replication Delay & Possible Fixes Usually, the slave IO_THREAD is behind because of slow network between master/slave. Most of the time, enabling helps to mitigate slave IO_THREAD lag. One other suggestion is to disable binary logging on slave as it’s IO intensive too unless you required it for point in time recovery. To minimize slave SQL_THREAD lag, focus on query optimization. My recommendation is to enable the configuration option so that the queries executed by slave that take more than will be logged to the slow log. To gather more information about query performance, I would also recommend setting the configuration option log_slow_verbosity to “full”. This way we can see if there are queries executed by slave SQL_thread that are taking long time to complete. You can follow my previous post about how to enable slow query log for specific time period with mentioned options here. And as a reminder, log_slow_slave_statements as variable were first introduced in Percona Server 5.1 which is now part of vanilla MySQL from version 5.6.11 In upstream version of MySQL Server log_slow_slave_statements were introduced as command line option. Details can be found here while log_slow_verbosity is Percona Server specific feature. 8


One other reason for delay on the slave SQL_THREAD if you use row-based binlog format is that if any database table is missing a primary key or unique key then it will scan all rows of the table for DML on slave and causes replication delays. Make sure all your tables have a primary key or unique key. Check this bug report for details. You can use the query below on the slave to identify which database tables are missing primary or unique keys.

One improvement is made for this case in MySQL 5.6, where in memory hash is used comes to the rescue. Note that Seconds_Behind_Master is not updated while we read huge RBR event, So, “lagging” may be related to just that – we had not completed reading of the event. For example, in row based replication huge transactions may cause delay on slave side e.g. if you have 10 million rows table and you do “DELETE FROM table WHERE id 5M rows will be sent to slave, each row separately which will be painfully slow. So, if you have to delete oldest rows time to time from huge table using Partitioning might be good alternative for this for some kind of workloads where instead using DELETE use DROP old partition may be good and only statement is replicated because it will be DDL operation. To explain it better, let suppose you have partition1 holding rows of ID’s from 1 to 1000000, partition2 – ID’s from 1000001 to 2000000 and so on so instead of deleting via statement “DELETE FROM table WHERE ID /dev/null mysqld_alive=$? if [[ $mysqld_alive == 0 ]] then seconds_behind_master=$(mysql $EXT_ARGV -e "show slave status" --vertical | grep Seconds_Behind_Master | awk '{print $2}') echo $seconds_behind_master else echo 1 fi } # Uncomment below to test that trg_plugin function works as expected #trg_plugin ------- -- That's the pt-plug.sh file you would need to create and then use it as below with pt-stalk: $ /usr/bin/pt-stalk --function=/root/pt-plug.sh --variable=seconds_behind_master --threshold=300 --cycles=60 [email protected] --log=/root/pt-stalk.log --pid=/root/pt-stalk.pid --daemonize You can adjust the threshold, currently its 300 seconds, combining that with –cycles, it means that if seconds_behind_master value is >= 300 for 60 seconds or more then pt-stalk will start capturing data. Adding –notify-by-email option will notify via email when pt-stalk captures data. You can adjust the pt-stalk thresholds accordingly so that’s how it triggers to collect diagnostic data during problem.

9


Conclusion A lagging slave is a tricky problem but a common issue in MySQL replication. I’ve tried to cover most aspects of replication delays in this post. Please share in the comments section if you know of any other reasons for replication delay.

10

MySQL Replication Chapter 3: GTIDs in MySQL 5.6: New replication protocol; new ways to break replication

GTIDs in MySQL 5.6: New replication protocol; new ways to break replication By Stephane Combaudon One of the MySQL 5.6 features many people are interested in is Global Transactions IDs (. This is for a good reason: Reconnecting a slave to a new master has always been a challenge while it is so trivial when GTIDs are enabled. However, using GTIDs is not only about replacing good old binlog file/position with unique identifiers, it is also using a new replication protocol. And if you are not aware of it, it can bite.

Replication protocols: old vs new The old protocol is pretty straightforward: the slave connects to a given binary log file at a specific offset, and the master sends all the transactions from there. The new protocol is slightly different: the slave first sends the range of GTIDs it has executed, and then the master sends every missing transaction. It also guarantees that a transaction with a given GTID can only be executed once on a specific slave. In practice, does it change anything? Well, it may change a lot of things. Imagine the following situation: you want to start replicating from trx 4, but trx 2 is missing on the slave for some reason.

With the old replication protocol, trx 2 will never be executed while with the new replication protocol, it WILL be executed automatically. Here are 2 common situations where you can see the new replication protocol in action.

Skipping transactions

11


Skipping transactions It is well known that the good old SET GLOBAL sql_slave_skip_counter = N is no longer supported when you want to skip a transaction and GTIDs are enabled. Instead, to skip the transaction with GTID XXX:N, you have to inject an empty transaction:

Why can’t we use sql_slave_skip_counter? Because of the new replication protocol!

Imagine that we have 3 servers like the picture below:

Let’s assume that sql_slave_skip_counter is allowed and has been used on S2 to skip trx 2. What happens if you make S2 a slave of S1?

12


Both servers will exchange the range of executed GTIDs, and S1 will realize that it has to send trx 2 to S2. Two options then: If trx 2 is still in the binary logs of S1, it will be sent to S2, and the transaction is no longer skipped. If trx 2 no longer exists in the binary logs of S1, you will get a replication error. This is clearly not safe, that’s why sql_slave_skip_counter is not allowed with GTIDs. The only safe option to skip a transaction is to execute a fake transaction instead of the real one.

Errant transactions If you execute a transaction locally on a slave (called errant transaction in the MySQL documentation), what will happen if you promote this slave to be the new master? With the old replication protocol, basically nothing (to be accurate, data will be inconsistent between the new master and its slaves, but that can probably be fixed later). With the new protocol, the errant transaction will be identified as missing everywhere and will be automatically executed on failover, which has the potential to break replication. Let’s say you have a master (M), and 2 slaves (S1 and S2). Here are 2 simple scenarios where reconnecting slaves to the new master will fail (with different replication errors):

# Scenario 1

13


# Scenario 2

As you can understand, errant transactions should be avoided with GTID-based replication. If you need to run a local transaction, your best option is to disable binary logging for that specific statement:

Conclusion GTIDs are a great step forward in the way we are able to reconnect replicas to other servers. But they also come with new operational challenges. If you plan to use GTIDs, make sure you correctly understand the new replication protocol, otherwise you may end up breaking replication in new and unexpected ways. I’ll do more exploration about errant transactions in a future post on Percona's MySQL Performance Blog. I invite you to join me there for a conversation on the topic.

14


About the authors During his career, both as a system administrator and a performance engineer, Percona architect Aurimas Mikalauskas has become familiar with many different technologies and how to apply them. He always knows the right tools for the task. In addition to MySQL performance optimization, his key areas of expertise include: MySQL High Availability, full text search, web and content cache servers, and MySQL data recovery. Muhammad Irfan is vastly experienced in the LAMP Stack. He joined Percona in December 2012 as Support Engineer. Prior to joining Percona, he worked in the role of MySQL DBA & LAMP Administrator, maintained high traffic websites, and worked as a Consultant. His professional interests focus on MySQL scalability and on performance optimization. In his spare time, he normally spends time with family and friends and loves to play and watch cricket. Percona architect Stéphane Combaudon joined Percona in July 2012 after working as a MySQL DBA for leading French companies. In real life, he lives in Paris with his wife and their twin daughters. When not in front of a computer or not spending time with his family, he likes playing chess and hiking.

Visit Percona's ever-growing MySQL eBook library for more great content at http://www.percona.com/resources/mysql-ebooks

Copyright © 2006-2014 Percona LLC Powered by TCPDF (www.tcpdf.org)