engrenage-passion

  • Increase font size
  • Default font size
  • Decrease font size

Bientôt

No current events.

Newsfeeds

Planet MySQL

Planet MySQL - https://planet.mysql.com
  • MySQL @ FOSSASIA, 2018
    The FOSSASIA Summit is an annual conference on open technologies and society. The event offers lectures and workshops and various events on a multitude of topics.MySQL-world's most popular open source database, has been registering its presence in FOSSASIA since 2014. Developers (like me), who play daily with MySQL source code, go there and share their knowledge about new features and various other topics related to MySQL.Last year in 2017, we had a full day dedicated to MySQL talks/workshop which was attended by very serious users/customers of MySQL. We also had 'bird-of-a-feather' session in which customers/users got a chance to have in-depth discussion directly with developers on various topics of interest. And it was very much appreciated/enjoyed by both, customers/users as well as we developers.This year, in 2018, again we are ready to register our presence in FOSSASIA, 2018 summit which is scheduled from 22nd March-25th March at Singapore. We have number of MySQL talks lined up where we'll speak about various MySQL topics. # Topic Speaker Time (tentative) 1 Improved Error Logging in MySQL 8.0 Praveenkumar H 9:59 am 2 MySQL: Improving Connection Security Harin 10:29 am 3 MySQL Replication Performance Tuning Venkatesh 11 am 4 Enhanced XA Support for Replication in MySQL-5.7 Nisha PG 1:00 pm 5 The State of the Art on MySQL Group Replication Hemant Dangi 1:30 pm 6 What's new in MySQL Optimizer 8.0 Chaithra Gopalareddy 2:00 pm 7 Atomic DDL in MySQL 8.0 Shipra Jain 2:30 pm 8 MySQL for Distributed transaction and Usage of JSON as a fusion between SQL & NOSQL Ajo Robert 2:59 pm 9 MySQL Performance Schema - A great insight of running MySQL Server Mayank Prasad 3:30 pm 10 Histograms Amit 4:00 pm All off the above talks are planned (tentative) on 25th March. So be there and get most out of MySQL @ FOSSASIA, 2018.See you there. :-)Conference details : https://2018.fossasia.org/ Complete schedule/Venue : TBD[PS: Keep watching this space for more details. ]

  • Using dbdeployer in CI tests
    I was very pleased when Giuseppe Maxia (aka datacharmer) unveiled dbdeployer in his talk at pre-FOSDEM MySQL day. The announcement came just at the right time. I wish to briefly describe how we use dbdeployer (work in progress). The case for gh-ost A user opened an issue on gh-ost, and the user was using MySQL 5.5. gh-ost is being tested on 5.7 where the problem does not reproduce. A discussion with Gillian Gunson raised the concern of not testing on all versions. Can we run gh-ost tests for all MySQL/Percona/MariaDB versions? Should we? How easy would it be? gh-ost tests gh-ost has three different test types: Unit tests: these are plain golang logic tests which are very easy and quick to run. Integration tests: the topic of this post, see following. Today these do not run as part of an automated CI testing. System tests: putting our production tables to the test, continuously migrating our production data on dedicated replicas, verifying checksums are identical and data is intact, read more. Unit tests are already running as part of automated CI (every PR is subjected to those tests). Systems tests are clearly tied to our production servers. What's the deal with the integration tests? gh-ost integration tests The gh-ost integration tests are a suite of scenarios which verify gh-ost's operation is sound. These scenarios are mostly concerned with data types, special alter statements etc. Is converting DATETIME to TIMESTAMP working properly? Are latin1 columns being updated correctly? How about renaming a column? Changing a PRIMARY KEY? Column reorder? 5.7 JSON values? And so on. Each test will recreate the table, run migration, stop replication, check the result, resume replication... The environment for these tests is a master-replica setup, where gh-ost modifies on the table on the replica and can then checksum or compare both the original and the altered ghost table. We develop gh-ost internally at GitHub, but it's also an open source project. We have our own internal CI environment, but then we also wish the public to have visibility into test failures (so that a user can submit a PR and get a reliable automated feedback). We use Travis CI for the public facing tests. To run gh-ost's integration tests as described above as part of our CI tests we should be able to: Create a master/replica setup in CI. Actually, create a master/replica setup in any CI, and namely in Travis CI. Actually, create multiple master/replica setups, of varying versions and vendors, in any ci, including both our internal CI and Travis CI. I was about to embark on a MySQL Sandbox setup, which I was not keen on. But FOSDEM was around the corner and I had other things to complete beforehand. Lucky me, dbdeplyer stepped in. dbdeployer dbdeployer is a rewrite, a replacement to MySQL Sandbox. I've been using MySQL Sandbox for many years, and my laptop is running two sandboxes at this very moment. But MySQL Sandbox has a few limitations or complications: Perl. Versions of Perl. Dependencies of packages of Perl. I mean, it's fine, we can automate that. Command line flag complexity: I always get lost in the complexity of the flags. Get it right or prepare for battle: if you deployed something, but not the way you wanted, there's sometimes limbo situations where you cannot re-deploy the same sandbox again, or you should start deleting files everywhere. Deploy, not remove. Adding a sandbox is one thing. How about removing it? dbdeployer is a golang rewrite, which solves the dependency problem. It ships as a single binary and nothing more is needed. It is simple to use. While it generates the equivalence of a that of a MySQL Sandbox, it does so with less command line flags and less confusion. There's first class handling of the MySQL binaries: you unpack MySQL tarballs, you can list what's available. You can then create sandbox environments: replication, standalone, etc. You can then delete those. It's pretty simple and I have not much more to add -- which is the best thing about it. So, with dbdeployer it is easy to create a master/replica. Something like: dbdeployer unpack path/to/5.7.21.tar.gz --unpack-version=5.7.21 --sandbox-binary ${PWD}/sandbox/binary dbdeployer replication 5.7.21 --nodes 2 --sandbox-binary ${PWD}/sandbox/binary --sandbox-home ${PWD}/sandboxes --gtid --my-cnf-options log_slave_updates --my-cnf-options log_bin --my-cnf-options binlog_format=ROW Where does it all fit in, and what about the MySQL binaries though? So, should dbdeployer be part of the gh-ost repo? And where does one get those MySQL binaries from? Are they to be part of the gh-ost repo? Aren't they a few GB to extract? Neither dbdeployer nor MySQL binaries should be added to the gh-ost repo. And fortunately, Giuseppe also solved the MySQL binaries problem. The scheme I'm looking at right now is as follows: A new public repo, gh-ost-ci-env is created. This repo includes: dbdeployer compiled binaries Minimal MySQL tarballs for selected versions. Those tarballs are reasonably small: between `14MB` and `44MB` at this time. gh-ost's CI to git clone https://github.com/github/gh-ost-ci-env.git (code) gh-ost's CI to setup a master/replica sandbox (one, two). Kick the tests. The above is a work in progress: At this time only runs a single MySQL version. There is a known issue where after a test, replication may take time to resume. Currently on slower boxes (such as the Travis CI containers) this leads to failures. Another concern I have at this time is build time. For a single MySQL version, it takes some 5-7 minutes on my local laptop to run all integration tests. It will be faster on our internal CI. It will be considerably slower on Travis CI, I can expect between 10m - 15m. Add multiple versions and we're looking at a 1hr build. Such long build times will affect our development and delivery times, and so we will split them off the main build. I need to consider what the best approach is. That's all for now. I'm pretty excited for the potential of dbdeployer and will be looking into incorporating the same for orchestrator CI tests.    

  • Exploring Amazon RDS Aurora: replica writes and cache chilling
    Our clients operate on a variety of platforms, and RDS (Amazon Relational Database Service) Aurora has received quite a bit of attention in recent times. On behalf of our clients, we look beyond the marketing, and see what the technical architecture actually delivers.  We will address specific topics in individual posts, this time checking out what the Aurora architecture means for write and caching behaviour (and thus performance). What is RDS Aurora? First of all, let’s declare the baseline.  MySQL Aurora is not a completely new RDBMS. It comprises a set of Amazon modifications on top of stock Oracle MySQL 5.6 and 5.7, implementing a different replication mechanism and some other changes/additions.  While we have some information (for instance from the “deep dive” by AWS VP Anurag Gupta), the source code of the Aurora modifications are not published, so unfortunately it is not immediately clear how things are implemented.  Any architecture requires choices to be made, trade-offs, and naturally these have consequences.  Because we don’t get to look inside the “black box” directly, we need to explore indirectly.  We know how stock MySQL is architected, so by observing Aurora’s behaviour we can try to derive how it is different and what it might be doing.  Mind that this is equivalent to looking at a distant star, seeing a wobble, and deducing from the pattern that there must be one or more planets orbiting.  It’s an educated guess. For the sake of brevity, I have to skip past some aspects that can be regarded as “obvious” to someone with insight into MySQL’s architecture.  I might also defer explaining a particular issue in depth to a dedicated post on that topic.  Nevertheless, please do feel free to ask “so why does this work in this way”, or other similar questions – that’ll help me check my logic trail and tune to the reader audience, as well as help create a clearer picture of the Aurora architecture. Instead of using the binary log, Aurora replication ties into the storage layer.  It only supports InnoDB, and instead of doing disk reads/writes, the InnoDB I/O system talks to an Amazon storage API which delivers a shared/distributed storage, which can work across multiple availability zones (AZs).  Thus, a write on the master will appear on the storage system (which may or may not really be a filesystem).  Communication between AZs is fairly fast (only 2-3 ms extra overhead, relative to another server in the same AZ) so clustering databases or filesystems across AZs is entirely feasible, depending on the commit mechanism (a two-phase commit architecture would still be relatively slow).  We do multi-AZ clustering with Galera Cluster (Percona XtraDB Cluster or MariaDB Galera Cluster).  Going multi-AZ is a good idea that provides resilience beyond a single data centre. So, imagine an individual instance in an Aurora setup as an EC2 (Amazon Elastic Computing) instance with MySQL using an SSD EBS (Amazon Elastic Block Storage) volume, where the InnoDB I/O threads interface more directly the the EBS API.  The actual architecture might be slightly different still (more on that in a later post), but this rough description helps set up a basic idea of what a node might look like. Writes in MySQL In a regular MySQL, on commit a few things happen: the InnoDB log is written to and flushed, the binary log is written to (and possibly flushed), and the changed pages (data and indexes)  in the InnoDB buffer pool are marked dirty, so a background thread knows they need to be written back to disk (this does not need to happen immediately).  When a page is written to disk, normally it uses a “double-write” mechanism where first the original page is read and written to a scratch space, and then the new page is put in the original position.  Depending on the filesystem and underlying storage (spinning disk, or other storage with different block size from InnoDB page size) this may be required to be able to recover from write fails. This does not translate in to as many IOPS because in practice, transaction commits are put together (for instance with MariaDB’s group commit) and thus many commits that happen in a short space effectively only use a few IOs for their log writes.  With Galera cluster, the local logs are written but not flushed, because the guaranteed durability is provided with other nodes in the cluster rather than local persistence of the logfile. In Aurora, a commit has to send either the InnoDB log entries or the changed data pages to the storage layer; which one it is doesn’t particularly matter.  The storage layer has a “quorum set” mechanism to ensure that multiple nodes accept the new data.  This is similar to Galera’s “certification” mechanism that provides the “virtual synchrony”.  The Aurora “deep dive” talk claims that it requires many fewer IOPS for a commit; however, it appears they are comparing a worst-case plain MySQL scenario with an optimal Aurora environment.  Very marketing. Aurora does not use the binary log, which does make one wonder about point-in-time recovery options. Of course, it is possible to recover to any point-in-time from an InnoDB snapshot + InnoDB transaction logs – this would require adding timestamps to the InnoDB transaction log format. While it is noted that the InnoDB transaction log is also backed up to S3, it doesn’t appear to be used directly (so, only for recovery purposes then).  After all, any changed page needs to be communicated to the other instances, so essentially all pages are always flushed (no dirty pages).  When we look at the InnoDB stats GLOBAL STATUS, we sometimes do see up to a couple of dozen dirty pages with Aurora, but their existence or non-existence doesn’t appear to have any correlation with user-created tables and data. Where InnoDB gets its Speed InnoDB rows and indexingWe all know that disk-access is slow.  In order for InnoDB to be fast, it is dependent on most active data being in the buffer pool.  InnoDB does not care for local filesystem buffers – something is either in persistent storage, or in the buffer pool.  In configurations, we prefer direct I/O so the system calls that do the filesystem I/O bypass the filesystem buffers and any related overhead.  When a query is executed, any required page that’s not yet in the buffer pool is requested to be loaded in the background. Naturally, this does slow down queries, which is why we preferably want all necessary pages to already be in memory.  This applies for any type of query.  In InnoDB, all data/indexes are structured in B+trees, so an INSERT has to be merged into a page and possibly causes pages to be split and other items shuffled so as to “re-balance” the tree.  Similarly, a delete may cause page merges and a re-balancing operation.  This way the depth of the tree is controlled, so that even for a billion rows you would generally see a depth of no more than 6-8 pages.  That is, retrieving any row would only require a maximum of 6-8 page reads (potentially from disk). I’m telling you all this, because while most replication and clustering mechanisms essentially work with the buffer pool, Aurora replication appears to works against it.  As I mentioned: choices have consequences (trade-offs).  So, what happens? Aurora Replication When you do a write in MySQL which gets replicated through classic asynchronous replication, the slaves or replica nodes affect the row changes in memory.  This means that all the data (which is stored with the PRIMARY KEY, in InnoDB) as well as any other indexes are updated, the InnoDB log is written, and the pages marked as dirty.  It’s very similar to what happens on the writer/master system, and thus the end result in memory is virtually identical.  While Galera’s cluster replication operates differently from the asynchronous mechanism shown in the diagram, the resulting caching (which pages are in memory) ends up similar. MySQL Replication architectureNot so with Aurora.  Aurora replicates in the storage layer, so all pages are updated in the storage system but not in the in-memory InnoDB buffer pool.  A secondary notification system between the instances ensures that cached InnoDB pages are invalidated.  When you next do a query that needs any of those no-longer-valid cached pages, they will have to be be re-read from the storage system.  You can see a representation of this in the diagram below, indicating invalidated cache pages in different indexes; as shown, for INSERT operations, you’re likely to have pages higher up in the tree and one sideways page change as well because of the B+tree-rebalancing. Aurora replicated insertThe Chilling Effect We can tell the replica is reading from storage, because the same query is much slower than before we did the insert from the master instance.  Note: this wasn’t a matter of timing. Even if we waited slightly longer (to enable a possible background thread to refresh the pages) the post-insert query was just as slow. Interestingly, the invalidation process does not actually remove them from the buffer pool (that is, the # of pages in the buffer pool does not go down); however, the # of page reads does not go up either when the page is clearly re-read.    Remember though that a status variable is just that, it has to be updated to be visible and it simply means that the new functions Amazon implemented don’t bother updating these status variables.  Accidental omission or purposeful obscurity?  Can’t say.  I will say that it’s very annoying when server statistics don’t reflect what’s actually going on, as it makes the stats (and their analysis) meaningless.  In this case, the picture looks better than it is. With each Aurora write (insert/update/delete), the in-memory buffer pool on replicas is “chilled”. Unfortunately, it’s not even just the one query on the replica that gets affected after a write. The primary key as well as the secondary indexes get chilled. If the initial query uses one particular secondary index, that index and the primary key will get warmed up again (at the cost of multiple storage system read operations), however the other secondary indexes are still chattering their teeth. Being Fast on the Web In web applications (whether websites or web-services for mobile apps), typically the most recently added data is the most likely to be read again soon.  This is why InnoDB’s buffer pool is normally very effective: frequently accessed pages remain in memory, while lesser used ones “age” and eventually get tossed out to make way for new pages. Having caches clear due to a write, slows things down.  In the MySQL space, the fairly simply query cache is a good example.  Whenever you write to table A, any cached SELECTs that accesses table A are cleared out of the cache.  Regardless of whether the application is read-intensive, having regular writes makes the query cache useless and we turn it off in those cases.  Oracle has already deprecated the “good old” query cache (which was introduced in MySQL 4.0 in the early 2000s) and soon its code will be completely removed. Conclusion With InnoDB, you’d generally have an AUTO_INCREMENT PRIMARY KEY, and thus newly inserted rows are sequenced to that outer end of the B+Tree.  This also means that the next inserted row often ends up in the same page, again invalidating that recently written page on the replicas and slowing down reads of any of the rows it contained. For secondary indexes, the effect is obviously scattered although if the indexed column is temporal (time-based), it will be similarly affected to the PRIMARY KEY. How much all of this slows things down will very much depend on your application DB access profile.  The read/write ratio will matter little, but rather whether individual tables are written to fairly frequently.  If they do, SELECT queries on those tables made on replicas will suffer from the chill. Aurora uses SSD EBS so of course the storage access is pretty fast.  However, memory is always faster, and we know that that’s important for web application performance.  And we can use similarly fast SSD storage on EC2 or another hosting provider, with mature scaling technologies such as Galera (or even regular asynchronous multi-threaded replication) that don’t give your caches the chills.

  • Archiving MySQL Tables in ClickHouse
    In this blog post, I will talk about archiving MySQL tables in ClickHouse for storage and analytics. Why Archive? Hard drives are cheap nowadays, but storing lots of data in MySQL is not practical and can cause all sorts of performance bottlenecks. To name just a few issues: The larger the table and index, the slower the performance of all operations (both writes and reads) Backup and restore for terabytes of data is more challenging, and if we need to have redundancy (replication slave, clustering, etc.) we will have to store all the data N times The answer is archiving old data. Archiving does not necessarily mean that the data will be permanently removed. Instead, the archived data can be placed into long-term storage (i.e., AWS S3) or loaded into a special purpose database that is optimized for storage (with compression) and reporting. The data is then available. Actually, there are multiple use cases: Sometimes the data just needs to be stored (i.e., for regulatory purposes) but does not have to be readily available (it’s not “customer facing” data) The data might be useful for debugging or investigation (i.e., application or access logs) In some cases, the data needs to be available for the customer (i.e., historical reports or bank transactions for the last six years) In all of those cases, we can move the older data away from MySQL and load it into a “big data” solution. Even if the data needs to be available, we can still move it from the main MySQL server to another system. In this blog post, I will look at archiving MySQL tables in ClickHouse for long-term storage and real-time queries. How To Archive? Let’s say we have a 650G table that stores the history of all transactions, and we want to start archiving it. How can we approach this? First, we will need to split this table into “old” and “new”. I assume that the table is not partitioned (partitioned tables are much easier to deal with). For example, if we have data from 2008 (ten years worth) but only need to store data from the last two months in the main MySQL environment, then deleting the old data would be challenging. So instead of deleting 99% of the data from a huge table, we can create a new table and load the newer data into that. Then rename (swap) the tables. The process might look like this: CREATE TABLE transactions_new LIKE transactions INSERT INTO transactions_new SELECT * FROM transactions WHERE trx_date > now() – interval 2 month RENAME TABLE transactions TO transactions_old, transactions_new TO transactions Second, we need to move the transactions_old into ClickHouse. This is straightforward — we can pipe data from MySQL to ClickHouse directly. To demonstrate I will use the Wikipedia:Statistics project (a real log of all requests to Wikipedia pages). Create a table in ClickHouse: CREATE TABLE wikistat ( id bigint, dt DateTime, project String, subproject String, path String, hits UInt64, size UInt64 ) ENGINE = MergeTree PARTITION BY toYYYYMMDD(dt) ORDER BY dt Ok. 0 rows in set. Elapsed: 0.010 sec. Please note that I’m using the new ClickHouse custom partitioning. It does not require that you create a separate date column to map the table in MySQL to the same table structure in ClickHouse Now I can “pipe” data directly from MySQL to ClickHouse: mysql --quick -h localhost wikistats -NBe "SELECT concat(id,',"',dt,'","',project,'","',subproject,'","', path,'",',hits,',',size) FROM wikistats" | clickhouse-client -d wikistats --query="INSERT INTO wikistats FORMAT CSV" Third, we need to set up a constant archiving process so that the data is removed from MySQL and transferred to ClickHouse. To do that we can use the “pt-archiver” tool (part of Percona Toolkit). In this case, we can first archive to a file and then load that file to ClickHouse. Here is the example: Remove data from MySQL and load to a file (tsv): pt-archiver --source h=localhost,D=wikistats,t=wikistats,i=dt --where "dt <= '2018-01-01 0:00:00'" --file load_to_clickhouse.txt --bulk-delete --limit 100000 --progress=100000 TIME ELAPSED COUNT 2018-01-25T18:19:59 0 0 2018-01-25T18:20:08 8 100000 2018-01-25T18:20:17 18 200000 2018-01-25T18:20:26 27 300000 2018-01-25T18:20:36 36 400000 2018-01-25T18:20:45 45 500000 2018-01-25T18:20:54 54 600000 2018-01-25T18:21:03 64 700000 2018-01-25T18:21:13 73 800000 2018-01-25T18:21:23 83 900000 2018-01-25T18:21:32 93 1000000 2018-01-25T18:21:42 102 1100000 ... Load the file to ClickHouse: cat load_to_clickhouse.txt | clickhouse-client -d wikistats --query="INSERT INTO wikistats FORMAT TSV" The newer version of pt-archiver can use a CSV format as well: pt-archiver --source h=localhost,D=wikitest,t=wikistats,i=dt --where "dt <= '2018-01-01 0:00:00'" --file load_to_clickhouse.csv --output-format csv --bulk-delete --limit 10000 --progress=10000 How Much Faster Is It? Actually, it is much faster in ClickHouse. Even the queries that are based on index scans can be much slower in MySQL compared to ClickHouse. For example, in MySQL just counting the number of rows for one year can take 34 seconds (index scan): mysql> select count(*) from wikistats where dt between '2017-01-01 00:00:00' and '2017-12-31 00:00:00'; +-----------+ | count(*) | +-----------+ | 103161991 | +-----------+ 1 row in set (34.82 sec) mysql> explain select count(*) from wikistats where dt between '2017-01-01 00:00:00' and '2017-12-31 00:00:00'G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: wikistats partitions: NULL type: range possible_keys: dt key: dt key_len: 6 ref: NULL rows: 227206802 filtered: 100.00 Extra: Using where; Using index 1 row in set, 1 warning (0.00 sec) In ClickHouse, it only takes 0.062 sec: :) select count(*) from wikistats where dt between toDateTime('2017-01-01 00:00:00') and toDateTime('2017-12-31 00:00:00'); SELECT count(*) FROM wikistats WHERE (dt >= toDateTime('2017-01-01 00:00:00')) AND (dt <= toDateTime('2017-12-31 00:00:00')) ┌───count()─┐ │ 103161991 │ └───────────┘ 1 rows in set. Elapsed: 0.062 sec. Processed 103.16 million rows, 412.65 MB (1.67 billion rows/s., 6.68 GB/s.) Size on Disk In my previous blog on comparing ClickHouse to Apache Spark to MariaDB, I also compared disk size. Usually, we can expect a 10x to 5x decrease in disk size in ClickHouse due to compression. Wikipedia:Statistics, for example, contains actual URIs, which can be quite large due to the article name/search phrase. This can be compressed very well. If we use only integers or use MD5 / SHA1 hashes instead of storing actual URIs, we can expect much smaller compression (i.e., 3x). Even with a 3x compression ratio, it is still pretty good as long-term storage. Conclusion As the data in MySQL keeps growing, the performance for all the queries will keep decreasing. Typically, queries that originally took milliseconds can now take seconds (or more). That requires a lot of changes (code, MySQL, etc.) to make faster. The main goal of archiving the data is to increase performance (“make MySQL fast again”), decrease costs and improve ease of maintenance (backup/restore, cloning the replication slave, etc.). Archiving to ClickHouse allows you to preserve old data and make it available for reports.

  • ProxySQL Series: MySQL Replication Read-write Split up.
    At Mydbops we always thrive to provide the best MySQL Solutions. We are exploring the modern SQL load balancers. We have planned to write a series of blog on ProxySQL. The first blog in this series is  how to set up ProxySQL for MySQL Replication Topology including Read / Write Split and some background over ProxySQL. What is ProxySQL ? ProxySQL is a open-source high-performance SQL aware proxy. It runs as a daemon watched by a monitoring process. ProxySQL seats between application and db servers. The daemon accepts incoming traffic from MySQL clients and forwards it to backend MySQL servers. A few most commonly used features are : Read/write split On-the-fly queries rewrite Query caching Real time statistics on web console Seamless replication switchover Query mirroring Monitoring So the main advantages of using ProxySQL is, it is designed to run continuously without needing to be restarted. Most configuration can be done at runtime using queries similar to SQL statements and it is more light weight. Let us explore the basic Query Routing  (Read/Write split) for the effective load sharing. We have set up 4 nodes to make the architecture . node1 (172.17.0.1) , Master node2 (172.17.0.2) , Slave node3 (172.17.0.3) , Slave node4 (172.17.0.4) , ProxySQL ProxySQL on Single Master and Two Slaves.Note: By default, ProxySQL binds with two Ports 6032 and 6033. 6032 is admin port and 6033 is the one which accepts incoming connections from clients. MySQL Replication setup : Configuring MySQL’s master-slave replication is outside the scope of this tutorial, we already have nodes with replication running. Before entering to admin interface of ProxySQL , create one application user with all privileges required to your application and one monitoring user at every MySQL DB server. mysql> CREATE USER 'sysbench'@'172.17.0.%' IDENTIFIED BY 'sysbench'; mysql> GRANT ALL PRIVILEGES on *.* TO 'sysbench'@'172.17.0.%'; mysql> CREATE USER 'monitor'@'172.17.0.%' IDENTIFIED BY 'monitor'; mysql> GRANT USAGE,REPLICATION CLIENT on *.* TO 'monitor'@'172.17.0.%'; mysql> FLUSH PRIVILEGES; ProxySQL Setup : Install and start ProxySQL : For Installation kindly refer : https://github.com/sysown/proxysql/wiki $ service proxysql start Starting ProxySQL: Main init phase0 completed in 0.000491 secs. Main init global variables completed in 0.000675 secs. Main daemonize phase1 completed in 0.00015 secs. DONE! Now connect to ProxySQL admin interface to start with configuration : $ mysql -u admin -padmin -h 127.0.0.1 -P6032 Configure Backends : ProxySQL uses the concept of hostgroup. A hostgroup is a group of host with logical functionalities. In this setup , we have used just need 2 hostgroups: hostgroup 0 for the master [Used for Write queries ]hostgroup 1 for the slaves [Used for Read Queries ] Apart from this we can also have one analytical server as slave of same master and we can assign new hostgroup id for the server and redirect all analytical related queries (long running) at this host. Admin> INSERT INTO mysql_servers(hostgroup_id,hostname,port) VALUES (0,'172.17.0.1',3306); Admin> INSERT INTO mysql_servers(hostgroup_id,hostname,port) VALUES (1,'172.17.0.2',3306); Admin> INSERT INTO mysql_servers(hostgroup_id,hostname,port) VALUES (1,'172.17.0.3',3306); Admin> INSERT INTO mysql_replication_hostgroups VALUES (0,1,'production'); Admin > SELECT * FROM mysql_replication_hostgroups; +------------------+------------------+------------+ | writer_hostgroup | reader_hostgroup | comment | +------------------+------------------+------------+ | 0 | 1 | production | +------------------+------------------+------------+ Admin> LOAD MYSQL SERVERS TO RUNTIME; Admin> SAVE MYSQL SERVERS TO DISK;   Note: When we load MYSQL SERVERS , Our writer host also get configured in reader hostgroup automatically by ProxySQL to handle all those queries which are redirected to reader hostgroup in case no slaves are online. So bonus point here is we can decrease the weightage assigned to master servers inside mysql_server table for reader hostgroup , so that our most of read queries will go on server which has higher weight. UPDATE mysql_servers SET weight=200 WHERE hostgroup_id=1 AND hostname='172.17.0.1'; Admin> SELECT hostgroup_id,hostname,port,status,weight FROM mysql_servers; +--------------+------------+------+--------+---------+ | hostgroup_id | hostname | port | status | weight | +--------------+------------+------+--------+---------+ | 0 | 172.17.0.1 | 3306 | ONLINE | 1000 | | 1 | 172.17.0.2 | 3306 | ONLINE | 1000 | | 1 | 172.17.0.3 | 3306 | ONLINE | 1000 | | 1 | 172.17.0.1 | 3306 | ONLINE | 200 | +--------------+------------+------+--------+---------+ Configure User : monitor user will continuously check the status of backend in specified interval.sysbench is user created for the application. Admin> UPDATE global_variables SET variable_value='monitor' WHERE variable_name='mysql-monitor_password'; Admin> LOAD MYSQL VARIABLES TO RUNTIME; Admin> SAVE MYSQL VARIABLES TO DISK; Admin> INSERT INTO mysql_users(username,password,default_hostgroup) VALUES (sysbench,sysbench,1); Admin> SELECT username,password,active,default_hostgroup,default_schema,max_connections,max_connections FROM mysql_users; +----------+----------+--------+-------------------+----------------+-----------------+-----------------+ | username | password | active | default_hostgroup | default_schema | max_connections | max_connections | +----------+----------+--------+-------------------+----------------+-----------------+-----------------+ | sysbench | sysbench | 1 | 0 | NULL | 10000 | 10000 | +----------+----------+--------+-------------------+----------------+-----------------+-----------------+ Admin> LOAD MYSQL USERS TO RUNTIME; Admin> SAVE MYSQL USERS TO DISK; Configure monitoring : ProxySQL constantly monitors the servers it has configured. To do so, it is important to configure some interval and timeout variables ( in milliseconds ). Admin> UPDATE global_variables SET variable_value=2000 WHERE variable_name IN ('mysql-monitor_connect_interval','mysql-monitor_ping_interval','mysql-monitor_read_only_interval'); Admin> UPDATE global_variables SET variable_value = 1000 where variable_name = 'mysql-monitor_connect_timeout'; Admin> UPDATE global_variables SET variable_value = 500 where variable_name = 'mysql-monitor_ping_timeout'; Admin> LOAD MYSQL VARIABLES TO RUNTIME; Admin> SAVE MYSQL VARIABLES TO DISK; Monitor module regularly check replication lag (using seconds_behind_master) if a server has max_replication_lag set to a non-zero value. With below configuration, servers will only be shunned in case replication delay exceeds 60 seconds ( 1 min) behind master Admin> UPDATE mysql_servers SET max_replication_lag=60; Query OK, 1 row affected (0.00 sec) If you want to be more accurate while calculating slave lag, kindly refer: http://proxysql.com/blog/proxysql-and-ptheartbeat There are also other important things in monitoring module which we can configure as per our need. I will prefer writing separate blog in this series. Configure Query Rules : To send all SELECT queries on slave ( based on Regex ). Admin> INSERT INTO mysql_query_rules (active, match_digest, destination_hostgroup, apply) VALUES (1, '^SELECT.*', 1, 0); Admin> INSERT INTO mysql_query_rules (active, match_digest, destination_hostgroup, apply) VALUES (1, '^SELECT.*FOR UPDATE', 0, 1); Admin> SELECT active, match_pattern, destination_hostgroup, apply FROM mysql_query_rules; Admin> SELECT rule_id, match_digest,destination_hostgroup hg_id, apply FROM mysql_query_rules WHERE active=1; +---------+----------------------+-------+-------+ | rule_id | match_digest | hg_id | apply | +---------+----------------------+-------+-------+ | 1 | ^SELECT . | 1 | 0 | | 2 | ^SELECT.*FOR UPDATE$ | 0 | 1 | +---------+----------------------+-------+-------+ Admin> LOAD MYSQL QUERY RULES TO RUNTIME; Admin> SAVE MYSQL QUERY RULES TO DISK; When the Query Processor scans the query rules trying to find a match with no success and it reaches the end, it will apply the default_hostgroup for the specific user according to mysql_users entry. In our case, user sysbench has a default_hostgroup=0, therefore any query not matching the above rules [Eg ALL WRITES ] will be sent to hostgroup 0 [Master].Below stats tables are used to validate if your query rules getting used by incoming traffic. SELECT rule_id, hits, destination_hostgroup hg FROM mysql_query_rules NATURAL JOIN stats_mysql_query_rules; +---------+-------+----+ | rule_id | hits | hg | +---------+-------+----+ | 1 | 17389 | 1 | | 2 | 234 | 0 | +---------+-------+----+ We can also redirect some specific pattern queries by using digest in stats_mysql_query_digest Validate the DB Connection : Application will connect to 6033 port on host 172.17.0.4 of ProxySQL to send DB traffic. ProxySQL-Host$ mysql -u sysbench -psysbench -h 127.0.0.1 -P6033 -e "SELECT @@server_id" +-------------+ | @@server_id | +-------------+ | 2 | +-------------+ Check Backend Status : It shows ProxySQL is able to successfully connect to all backends. mysql> select * from monitor.mysql_server_ping_log order by time_start_us desc limit 3;; +------------+------+------------------+----------------------+------------+ | hostname | port | time_start_us | ping_success_time_us | ping_error | +------------+------+------------------+----------------------+------------+ | 172.17.0.1 | 3306 | 1516795814170574 | 220 | NULL | | 172.17.0.2 | 3306 | 1516795814167894 | 255 | NULL | | 172.17.0.3 | 3306 | 1516795804170751 | 259 | NULL | +------------+------+------------------+----------------------+------------+ I have executed some sysbench test on cluster to check query distributions . Below table of ProxySQL shows number of queries executed per host. Admin > select hostgroup,srv_host,status,Queries,Bytes_data_sent,Latency_us from stats_mysql_connection_pool where hostgroup in (0,1); +-----------+------------+----------+---------+-----------------+------------+ | hostgroup | srv_host | status | Queries | Bytes_data_sent | Latency_us | +-----------+------------+----------+---------+-----------------+------------+ | 0 | 172.17.0.1 | ONLINE | 12349 | 76543232 | 144 | | 1 | 172.17.0.2 | ONLINE | 22135 | 87654356 | 190 | | 1 | 172.17.0.3 | ONLINE | 22969 | 85344235 | 110 | | 1 | 172.17.0.1 | ONLINE | 1672 | 4534332 | 144 | +-----------+------------+----------+---------+-----------------+------------+ If any of your server goes unreachable from any hostgroup , status gets changed from ONLINE to SHUNNED. It means ProxySQL wont send any queries to that host until it comes back to ONLINE. We can also take any server offline for maintenance. To disable a backend server it is required to change its status to OFFLINE_SOFT (Gracefully disabling a backend server) or OFFLINE_HARD(Immediately disabling a backend server.) In this case no new traffic will be send to the node. Admin> UPDATE mysql_servers SET status='OFFLINE_SOFT' WHERE hostname='172.17.0.2'; Query OK, 1 row affected (0.00 sec) Okay you still worried about reading stale data from slave ? Then do not worry , ProxySQL is coming up with new feature to make sure your application get latest updated data. Currently this feature is available only with row based replication with GTID enabled.   For more detailed description on every module: https://github.com/sysown/proxysql/wiki/ProxySQL-Configuration

Dedicated Cloud Hosting for your business with Joomla ready to go. Launch your online home with CloudAccess.net.