By Abraham Duenas

For questions and comments, write to the dev@lists.clearlinux.org project mailing list. Visit https://lists.clearlinux.org/ to subscribe. 

As the Clear Linux* Project for Intel® Architecture continues to redefine the boundaries for what is possible in a cloud-based Linux distribution running on Intel silicon, both power and performance play an increasingly important role. The goal of this newsletter is to highlight some of the improvements made by the Power and Performance (PnP) quad. 

This week’s newsletter looks at:  Profile-Guided Optimization with MariaDB Benchmarks

MariaDB is a community-developed fork of the MySQL* relational database management system (RDBMS) intended to remain free under the GNU GPL. RDBMS have been a popular choice for financial records, manufacturing and logistical information, personnel data, and other applications since the 1980s. One could not imagine the cloud world without a relational database management system. MySQL, MariaDB or Percona Server are the de-facto standards for OpenStack*, in terms of internal database options. 

There are many different ways to optimize how well MariaDB performs. You can, for example, optimize a wide variety of configuration options or analyze and optimize your queries. Yet another option is to optimize the actual binaries of MariaDB at compilation time.

As we have seen in previous newsletters, GCC supports something called Profile-Guided Optimization (PGO). The underlying idea behind PGO is to use data collected from test runs to determine which areas of the code are the most commonly used ones, and to optimize the output binary accordingly. This week the team looked at MariaDB improvement with PGO. MariaDB binaries were produced with this approach and compared to binaries that weren't. The experiment is is based on MariaDB's official article.

The benchmark used in this test (sysbench) does some rather short transactions on well-indexed tables in order to mimic the workload of a production system. The benchmark measures both the throughput (in transactions per second) and the average response time.

After enabling PGO for the MariaDB code, significant improvements (up to 20%) in all the benchmarks were reported. Table 1 shows the test of reading form the database with multiple threads when the datadir lives in a RAM disk.

Number of threads

baseline, 2 socket, ram disk

PGO, 2 socket, ram disk

% of improvement

1

515.26

662.42

22.21551282

9

1130.6

1387.7

18.52705916

18

1352.8

1648.8

17.95245027

36

1304.6

1583.4

17.60767968

72

1304.2

1546.3

15.65672897

Average

1121.492

1365.724

17.88296903

Table 1. Transaction per second in read only mode

 

Table 2 shows the same experiment, but these results come from a test that reads and writes from the same database.

 

Number of threads

baseline, 2 socket, ram disk

PGO, 2 socket, ram disk

% of improvement

1

461.53

586.81

21.34932943

9

888.88

1061.2

16.23822088

18

958.46

1094.3

12.41341497

36

890.8

1076.7

17.26571933

72

819.46

1030.5

20.47937894

Average

803.826

969.902

17.12296706

Table 2. Transaction per second in read/write mode

The same experiment was executed for the “Response time” test. Tables 3 and 4 show the response time of read only and read/write tests. As we can see, the best response time decreased more than 20 percent.

Number of threads

baseline, 2 socket, ram disk

PGO, 2 socket, ram disk

% of improvement

1

1.94

1.51

22.16494845

9

7.96

6.48

18.59296482

18

13.3

10.91

17.96992481

36

27.59

22.73

17.61507793

72

55.18

46.56

15.62160203

Average

21.194

17.638

16.77833349

Table 3. Response time in read only mode

 

 

 

Number of threads

baseline, 2 socket, ram disk

PGO, 2 socket, ram disk

% of improvement

1

2.17

1.7

21.65898618

9

10.12

8.48

16.2055336

18

18.78

16.45

12.40681576

36

40.41

33.43

17.27295224

72

87.84

69.86

20.46903461

Average

31.864

25.984

18.45342707

Table 4. Response time in read/write mode

As you can see, performance improvement thanks to PGO is almost 20 percent -- in either the number of transactions per second, or in the response time.

At this point we wanted to go further and answer the question: "How might this performance improvement benefit a typical cloud system?"  In order to answer the question we need to set up a cloud environment and run specific cloud tests to measure the effect of PGO.

The cloud test suite we decided to use is Rally. Rally is a benchmarking tool that automates and unifies multi-node OpenStack deployments, cloud verification, benchmarking and profiling. It can be used as a basic tool for an OpenStack system that would continuously improve its performance and stability.

The test we decided to run is the “Keystone Create User”. We saw up to 8 percent improvement with this test. PGO's test runs helped optimize the most commonly-used areas of code.  In this experiment, we found that the best trainer for MariaDB is not the Rally test itself, but rather the sysbench test due to the fact that the Rally benchmark training gave slightly lower performance overall.

In conclusion, the numbers above show a solid performance improvement of up to 20% with the PGO capabilities of the GCC compiler. None of the sysbench test cases showed negative performance impact from compiling with profile-guided optimizations; and even better, it improved performance of the OpenStack test up to  8 percent. With this, the Clear Linux Project for Intel Architecture continues to redefine the performance boundaries for what is possible in a cloud-based Linux* distribution running on Intel silicon.