Just about anytime Bitlancer reviews configurations for MySQL instances for a Boston DevOps client, we find that they are generally under-tuned and could be improved without extensive effort. This is true even when we don’t have physical access to the systems for peak traffic analysis and/or a complete understanding of the database schema on each MySQL server.
If you’re looking to boost the stability, scalability and speed of your MySQL implementation, this post outlines some places to start.
Do your current VMs have sufficient resources for your MySQL workloads? We usually gather statistics by reviewing load averages and utilization (e.g., by checking sar output). We check I/O utilization, too.
We also check disk configuration and utilization for each MySQL host. Ideally, you want consistent disk configuration and partitioning across all MySQL servers so that automation with Puppet or a comparable configuration management tool is simplified.
Another place to check for resource consumption issues is the processlist for each MySQL system. Also check whether any MySQL instances are using swap space. If it isn’t immediately obvious why swapping would occur, this could point to a problem.
It’s also good practice to isolate MySQL processes, such that 100% of system resources are allocated to MySQL when it runs. There are limited ways to tell MySQL to use a ‘maximum available’ amount of memory. Oftentimes this leads to a balancing act around how much MySQL will use on average, whereas potential utilization could be much higher. Since other applications running on a host will compete for available memory alongside MySQL, it’s therefore better to run them in separate VMs.
As part of tuning your MySQL hosts, I often recommend making configuration parameters as uniform as possible given server hardware configurations.
As part of optimizing MySQL we review the output of
show global variables and
show global status on all hosts. Experience shows that the following configuration settings have the greatest impact on the performance of MySQL instances:
This should be consistently set. I recommend a default value of 7 if available disk allows, so that MySQL will rotate binary logs after seven days across hosts, and behavior is consistent. Note that MySQL will never rotate binary logs if log_bin is enabled and expire_logs_days isn’t tuned.
Often we find this set to 1. For servers with more than 1GB of available memory, increasing this value can improve performance by allowing the MySQL server to operate with greater concurrency.
This is one of the simplest mechanisms for increasing performance and should be one of the first things you tune. In general, you want to set this value to at least 60-70% of system memory, provided that only the InnoDB storage engine is in use. We often find this set too low, which means you’re underutilizing available system resources. For example, a safe setting with 16 GB of RAM and 1,000 maximum possible connections would be 8 or 9 GB.
If you have large numbers of InnoDB tables, it often makes sense to set this value to 0 so that all tables share a common table space. However, it’s important to note that InnoDB data files cannot be shrunk once expanded. Therefore, it can actually make more sense to provide each table with its own InnoDB table space (by setting this to 1) so as to easily reclaim table space formerly allocated to tables that have data deleted and/or purged.
If you don’t require 100% ACID compliance, setting this value to 1 on all servers is safe and effective. Setting it to 2 can be a slightly safer option. There is the small potential for loss of up to 1 second of data should the operating system crash; however, InnoDB crash recovery can usually be trusted to work correctly.
You want this value to be big enough for InnoDB to be able to optimize its I/O, but no so big that it slows recovery. Many teams set this too high, perhaps because you can’t change this setting without a maintenance window and one or potentially two MySQL restarts. I recommend tuning this setting based on data from show engine innodb status queries. The default value of 50 MB is big enough for many servers.
To set this value, review the ratio of key_read_request to key_reads on all hosts from
show global status. The fewer requested reads requiring disk access, the better for performance. Note that this only affects MyISAM tables.
Binary logging is generally only recommended for replication or point-in-time backup and recovery, due to the additional I/O needed. However, for hosts that you’re replicating, and that usually have a low amount of write traffic, log_bin can be turned on.
Setting this value to ‘on’ is generally unnecessary in a traditional master-slave setup, and in that context would likely causing additional I/O. It is usually only appropriate in multi-master replication scenarios.
Make sure this isn’t set too low, especially if you’re utilizing monitoring tools that connect and disconnect without authentication. A safe value is something like 10,000.
I also recommend reviewing aborted_connects to see if any hosts have larger connect error numbers than expected. This can generally be attributed to faulty code that is not properly authenticating to MySQL, or to monitoring software attempting to connect too frequently. This should be investigated as it could cause that specific user’s database connectivity to go into a failure state.
The default value of 151 is often reasonable. Check max_used_connections for guidance. In general, make sure that any increases in max_connections take memory usage into consideration.
max_heap_table_size and tmp_table_size
It’s important to set these variables to identical values across multiple hosts. Additionally, it’s important to note that if the values for these settings are too low, it can lead to unnecessary disk I/O during query execution.
I suggest setting these values based on the output of
show global status on each host, mostly looking at created_tmp_tables and created_tmp_disk_tables, However, the actual impact of these settings can only be determined based on a targeted review, as some queries will be executed utilizing disk-based tmp tables regardless of the setting.
We often find this set to 0, meaning that just one user can utilize the value of max_connections. This makes it harder for someone trying to debug an issue to access the MySQL host. I recommend setting this to 10 less than max_connections, which ensures that a single user cannot prevent an administrator from accessing MySQL when necessary. As an example, on a host with max_connections set to 1000, I recommend setting this to 990.
query_cache_limit and query_cache_size
Query cache has two aspects: it caches query results, but it also expires results anytime a table involved in a query is updated. If that happens often, you’ll find yourself invalidating results so regularly that it isn’t worth it.
Based on the output of
show global status, particularly the qcache_* values, you may determine that the query_cache can be disabled on some hosts. Look at the ratio of query cache inserts to query cache hits.
If implementing replication (where needed), it’s important to remember to set all slaves to read_only. This ensures that write queries are always performed against the master. Note, however, that a user with SUPER privileges can still override this protection, so it is worth keeping a close eye on the activities of all users with SUPER access.
Usually this is turned off. You might get some performance benefit from enabling it in instances where all users are created with IP addresses instead of hostnames (i.e., foouser@10.%.%); in which case, enabling this will reduce the need for a DNS lookup on every connection.
slow_query_log and log_slow_queries
Turning these on is potentially beneficial in terms of log size and, to a lesser degree, server performance. It can also prove useful when attempting to debug application slow-down. Either way, I recommend consistency across all servers in the same class (e.g., test servers) to facilitate automated management of MySQL’s configuration files.
Review this based on the output of ‘show global status’, particularly the threads_created value. Is threads_created increasing steadily or staying relatively consistent, for example?
Setting this can be tricky. Check to see whether some threads are sleeping for long periods. This is often due to faulty application code that is not properly closing connections. In such cases, it can be helpful to lower wait_timeout. Also check whether aborted_clients is generally high on any hosts. This indicates wait_timeout is being hit relatively often. Again, this is likely due to faulty code not properly closing database connections.
At Bitlancer, we specialize in helping Boston-based startups and SMBs reduce DevOps pain and get the most out of their investments in the cloud. If you’re looking to optimize your database environment, increase configuration management automation and control cloud costs or solve your unique DevOps puzzle, contact Bitlancer - your Boston DevOps and cloud advisors.