Wednesday, September 23, 2015

Followup to Fun with SQL

I increased the size of my test:

In the first graph it showed that the improvement had stopped at about 80 processes.  I'm going with that, since on one of the runs with 80 processes, Mariadb started kicking out "No connections available" error.  It only happened on one run.  The 2nd chart shows that it is still improving, I will admit that I was having trouble getting accurate numbers with that many processes running. 

75 processes seems ok, anything more and I think I was not making much additional progress.

Saturday, September 19, 2015

Fun with SQL

MySQL/MariaDB is supposed to be one of the fastest SQL servers on the market.  But is that accurate? They do very fast selects (extracting data), but inserts and updates are considerably slower. 

As a standard benchmark, I ran a program that inserted about 50,000 records into a table.  This was done on an i5 desktop running MariaDB under Linux Mint.  I broke it down into approximate 5000 record segments.  The input file of random data was created in Free Pascal and written to a Pascal flat file.

The chart is pretty much flat over this group of data. My experience suggests, though, that if I put more data through, the inserts would slow down as the file got bigger.

[Notes on Pascal: The CLI Pascal program could not read from the Pascal Flat File Format and then write to the database.  I suspect that there is a memory leak in the standard Pascal File Calls that was damaging the MySQL data structures.  I ended up just reading the data and writing it out to Standard Output.  I then piped this to MariaDB's command prompt.  Neither process used very much CPU: 

0.7   2 19670 myuser    /usr/bin/mysql
0.0   3 19692 myuser    ./LoadData

Why would I even write data in the Pascal Format?  It's an obsolete file format that has been around since at least 1980.  But, it is very fast.  On my i5 laptop, it can write data out as fast as 11 million records per minute.  So, if I was in a situation where I had to capture a lot of data, I would consider it.]

In truth, writing to MariaDB is not very fast. If you had a lot of data to import, it would be like watching molasses in January.  I have worked with MySQL and it's performance isn't any better.

Both of these database servers are supposed to be multi-threaded.  In theory it should be possible to increase throughput by running multiple inserts at the same time. But is that the case?

This show the average of how many records per second were being added when the table had 50,000 records in it.  The X axis is the number of processes and the Y axis is the records being added per second.

This graph shows how long it takes in seconds to reach 20,000 records.  Again the X axis is number of processes and the Y axis is the number of seconds required.  These two graphs are basically mirror images of each other. 

The conclusion:  Running multiple processes to increase throughput does work.  I have increased throughput by more than a factor of 10.  I have not hit the point of diminishing returns. Even on a home computer, MariaDB could handle 25 proccesses hitting the same table.  What's more, it's not taking that big of a hit on the CPU.

0.0   3 20744 myuser    ./LoadData2 PersonFile01      0
0.0   1 20748 myuser    ./LoadData2 PersonFile02      0
0.0   2 20754 myuser    ./LoadData2 PersonFile03      0
0.0   0 20759 myuser    ./LoadData2 PersonFile04      0
0.0   2 20765 myuser    ./LoadData2 PersonFile05      0

0.1   1 20745 myuser    /usr/bin/mysql --binary-mod   0
0.1   0 20749 myuser    /usr/bin/mysql --binary-mod   0
0.1   0 20755 myuser    /usr/bin/mysql --binary-mod   0
0.1   1 20761 myuser    /usr/bin/mysql --binary-mod   0
0.2   0 20766 myuser    /usr/bin/mysql --binary-mod   0

The above shows the load these were adding to my system while running five processes.  The first column shows the cpu load for that specific process, and the second column shows what cpu the process is running on.  Total for all 10 processes isn't much more than 0.6% spread across four CPU's.  Even running 25  processes, I was only pulling 2.3% load spread across four processors. That is only about 0.6% per CPU.