I increased the size of my test:
In the first graph it showed that the improvement had stopped at about 80 processes. I'm going with that, since on one of the runs with 80 processes, Mariadb started kicking out "No connections available" error. It only happened on one run. The 2nd chart shows that it is still improving, I will admit that I was having trouble getting accurate numbers with that many processes running.
75 processes seems ok, anything more and I think I was not making much additional progress.
Wednesday, September 23, 2015
Saturday, September 19, 2015
Fun with SQL
MySQL/MariaDB is supposed to be one of the fastest SQL servers on the market. But is that accurate? They do very fast selects (extracting data), but inserts and updates are considerably slower.
As a standard benchmark, I ran a program that inserted about 50,000 records into a table. This was done on an i5 desktop running MariaDB under Linux Mint. I broke it down into approximate 5000 record segments. The input file of random data was created in Free Pascal and written to a Pascal flat file.
The chart is pretty much flat over this group of data. My experience suggests, though, that if I put more data through, the inserts would slow down as the file got bigger.
[Notes on Pascal: The CLI Pascal program could not read from the Pascal Flat File Format and then write to the database. I suspect that there is a memory leak in the standard Pascal File Calls that was damaging the MySQL data structures. I ended up just reading the data and writing it out to Standard Output. I then piped this to MariaDB's command prompt. Neither process used very much CPU:
0.7 2 19670 myuser /usr/bin/mysql
0.0 3 19692 myuser ./LoadData
Why would I even write data in the Pascal Format? It's an obsolete file format that has been around since at least 1980. But, it is very fast. On my i5 laptop, it can write data out as fast as 11 million records per minute. So, if I was in a situation where I had to capture a lot of data, I would consider it.]
In truth, writing to MariaDB is not very fast. If you had a lot of data to import, it would be like watching molasses in January. I have worked with MySQL and it's performance isn't any better.
Both of these database servers are supposed to be multi-threaded. In theory it should be possible to increase throughput by running multiple inserts at the same time. But is that the case?
This show the average of how many records per second were being added when the table had 50,000 records in it. The X axis is the number of processes and the Y axis is the records being added per second.
This graph shows how long it takes in seconds to reach 20,000 records. Again the X axis is number of processes and the Y axis is the number of seconds required. These two graphs are basically mirror images of each other.
The conclusion: Running multiple processes to increase throughput does work. I have increased throughput by more than a factor of 10. I have not hit the point of diminishing returns. Even on a home computer, MariaDB could handle 25 proccesses hitting the same table. What's more, it's not taking that big of a hit on the CPU.
0.0 3 20744 myuser ./LoadData2 PersonFile01 0
0.0 1 20748 myuser ./LoadData2 PersonFile02 0
0.0 2 20754 myuser ./LoadData2 PersonFile03 0
0.0 0 20759 myuser ./LoadData2 PersonFile04 0
0.0 2 20765 myuser ./LoadData2 PersonFile05 0
0.1 1 20745 myuser /usr/bin/mysql --binary-mod 0
0.1 0 20749 myuser /usr/bin/mysql --binary-mod 0
0.1 0 20755 myuser /usr/bin/mysql --binary-mod 0
0.1 1 20761 myuser /usr/bin/mysql --binary-mod 0
0.2 0 20766 myuser /usr/bin/mysql --binary-mod 0
The above shows the load these were adding to my system while running five processes. The first column shows the cpu load for that specific process, and the second column shows what cpu the process is running on. Total for all 10 processes isn't much more than 0.6% spread across four CPU's. Even running 25 processes, I was only pulling 2.3% load spread across four processors. That is only about 0.6% per CPU.
As a standard benchmark, I ran a program that inserted about 50,000 records into a table. This was done on an i5 desktop running MariaDB under Linux Mint. I broke it down into approximate 5000 record segments. The input file of random data was created in Free Pascal and written to a Pascal flat file.
The chart is pretty much flat over this group of data. My experience suggests, though, that if I put more data through, the inserts would slow down as the file got bigger.
[Notes on Pascal: The CLI Pascal program could not read from the Pascal Flat File Format and then write to the database. I suspect that there is a memory leak in the standard Pascal File Calls that was damaging the MySQL data structures. I ended up just reading the data and writing it out to Standard Output. I then piped this to MariaDB's command prompt. Neither process used very much CPU:
0.7 2 19670 myuser /usr/bin/mysql
0.0 3 19692 myuser ./LoadData
Why would I even write data in the Pascal Format? It's an obsolete file format that has been around since at least 1980. But, it is very fast. On my i5 laptop, it can write data out as fast as 11 million records per minute. So, if I was in a situation where I had to capture a lot of data, I would consider it.]
In truth, writing to MariaDB is not very fast. If you had a lot of data to import, it would be like watching molasses in January. I have worked with MySQL and it's performance isn't any better.
Both of these database servers are supposed to be multi-threaded. In theory it should be possible to increase throughput by running multiple inserts at the same time. But is that the case?
This show the average of how many records per second were being added when the table had 50,000 records in it. The X axis is the number of processes and the Y axis is the records being added per second.
This graph shows how long it takes in seconds to reach 20,000 records. Again the X axis is number of processes and the Y axis is the number of seconds required. These two graphs are basically mirror images of each other.
The conclusion: Running multiple processes to increase throughput does work. I have increased throughput by more than a factor of 10. I have not hit the point of diminishing returns. Even on a home computer, MariaDB could handle 25 proccesses hitting the same table. What's more, it's not taking that big of a hit on the CPU.
0.0 3 20744 myuser ./LoadData2 PersonFile01 0
0.0 1 20748 myuser ./LoadData2 PersonFile02 0
0.0 2 20754 myuser ./LoadData2 PersonFile03 0
0.0 0 20759 myuser ./LoadData2 PersonFile04 0
0.0 2 20765 myuser ./LoadData2 PersonFile05 0
0.1 1 20745 myuser /usr/bin/mysql --binary-mod 0
0.1 0 20749 myuser /usr/bin/mysql --binary-mod 0
0.1 0 20755 myuser /usr/bin/mysql --binary-mod 0
0.1 1 20761 myuser /usr/bin/mysql --binary-mod 0
0.2 0 20766 myuser /usr/bin/mysql --binary-mod 0
The above shows the load these were adding to my system while running five processes. The first column shows the cpu load for that specific process, and the second column shows what cpu the process is running on. Total for all 10 processes isn't much more than 0.6% spread across four CPU's. Even running 25 processes, I was only pulling 2.3% load spread across four processors. That is only about 0.6% per CPU.
Sunday, April 05, 2015
MySQL Access Comparison
Just for fun, I compared MySQL access times from several different languages. Python which is REAL popular these days, Free Pascal (An Open Source Object Pascal Compiler) and c (gcc).
Here are the results:
And from the point of view of Transactions per second
I was really kind of surprised how much better c did then Free Pascal. I did a comparison several months ago that showed c and Free Pascal performing much closer to each other, so I assume the Free Pascal Library needs some work. Also, I was surprised that by changing MariaDB from a my-small to my-medium I saw no improvement. Below is a chart from the sort comparison:
Here are the results:
And from the point of view of Transactions per second
I was really kind of surprised how much better c did then Free Pascal. I did a comparison several months ago that showed c and Free Pascal performing much closer to each other, so I assume the Free Pascal Library needs some work. Also, I was surprised that by changing MariaDB from a my-small to my-medium I saw no improvement. Below is a chart from the sort comparison:
Friday, January 09, 2015
How to Write your own Monitoring App
First the problem:
Some Linux users
like to run servers and or services. You could be running Samba,
MySQL, Web Servers and maybe even a Chat Server. You could be
running these all on your desktop, but that wastes electricity and
uses up your desktop when you may want to use it for other purposes.
A primary desktop would need to be up all the time, and many desktops
have a high end video card which uses as much electricity as the rest
of the computer combined. Why not run them all on one or more
headless servers? This could be one big processor running multiple
virtual machines or on single less powerful computers (maybe even
“Raspberry Pi”s) each supporting a specific service. A headless
i3 could come in between 60 -70 watts, while a gaming rig can use 400
watts or more.
But how do you
monitor them? You can download and install Nagios or something like
that, but it could be more fun to write your own. You can design the
GUI app to fit your needs, not what someone says you need.
Maybe you want the app to sit in a little corner and not use up much
space, or maybe you want it large on one computer all by itself.
Second, what
language (Compiler or Interpreter) are you going to use.
There are many
choices and in reality you can make almost any of them work, so it is
going to come down to your personal choice. Okay, I like Lazarus,
which is a GUI front end to the Free Pascal Compiler. It is a true
compiler and generates relatively fast code. Lazarus will run
circles around Python performance-wise, but in truth c++ will
outperform Lazarus. Pascal is easier to write then c++ and you can
have a working app in much less time. In performance tests I have
seen, Free Pascal generated programs that run on average 2.5 times
longer then an equal program written in c++, but Python will run
about 6 times longer then an equivalent Free Pascal program. Your
results may vary, in the below charts, Python took 50 times longer to
run then Free Pascal, and the differences between Free Pascal and c
was marginal. On the learning curve, Python is the easiest to learn
(but not by much), followed by Pascal, and the hardest to learn is
c++.
If you think that
this charts is irrelevant to computing today. Sorting is a fairly
good way to compare performance of compilers or interpreters and the
source code is freely available on the internet. And lastly, I have
an application that had a Procedure that updated a MySQL Table with a
sort value that would change on a daily basis, and then sort the
output. It had a performance issue due to the fact that MySQL
updates data slowly, but can select it very quickly. I rewrote the
application to load the table into RAM, update it there and then run
an internal sort. This Procedure went from running minutes to
milliseconds. There may come a point where it will become necessary
to run a sort on data inside your program.
The Algorithm.
I knew that I wanted
to start ssh processes into each server and then process the returned
data. When you put the command right after the address to connect
to, the session will terminate after the command is run. I have
written previous programs that have started background processes that
just returned the output, which could then be processed by the main
application. This is relatively easy to do; you need to add Process
to the uses clause and declare a TProcess and a TStringList Variable.
uses
Classes,
SysUtils, Process;
Var
MyProcess: TProcesss;
MyStrList: TStringList;
And then the
following needs added at the point where you want to start the
process:
MyStrList
:= TStringList.Create;
MyProcess
:= TProcess.Create(nil);
MyProcess.CommandLine
:= 'ValidLinuxCommand';
MyProcess.Options
:= MyProcess.Options + [poWaitOnExit, poUsePipes];
MyProcess.Execute;
MyStrList.LoadFromStream(MyProcess.Output);
{
Do something with the String List Here }
MyStrList.Free;
MyProcess.Free;
That isn't really
all that hard, and it will work as long as the output is smaller then
about 2k. If it is going to be larger, we need to add more
complexity. This is explained in detail at:
http://wiki.lazarus.freepascal.org/Executing_External_Programs
Initially I was
thinking it would need to be a multi-threaded app, and I started
researching that option. I was afraid one stuck process could freeze
the whole application. The more I looked at that option, however,
the more complex it sounded. Each thread could not update graphical
components within the main app (that can crash the Window Manager),
so they would have to update basic objects (Integers, Strings, floats
and such) within the main thread and then have the main thread update
the component objects and then there is added complexity of needing
to use Critical Sections (this would prevent more than one thread
from accessing shared object at the same time).
I then thought, well
maybe I can start each subprocess and not have to wait on it
finishing, and then just create a loop that tests each subprocess's
state. If the subprocess has finished, process its output and then
just start a replacement subprocess. No messing around with all of
the multi-thread stuff.
So first of all I
declared a simple record:
TProcessRec
= Record
Process:
TProcess;
Addr,
User,
Password:
String;
Pct:
Double;
StrList:
TStringList;
End;
And then declared an
Array of that Record:
ProcessTable:
Array[1..KKArraySize]
of TProcessRec;
Now we want to set
up the main loop, but first we want to prime the pump and start all
the ssh processes needed:
For
i := 1 to KKArraySize Do
Begin
If
Length(ProcessTable[i].Addr) > 1 Then
Begin
ProcessTable[i].Process :=
TProcess.Create(nil);
ProcessTable[i].StrList:=TStringList.Create;
ProcessTable[i].Process.CommandLine
:= CmdLine(i);
ProcessTable[i].Process.Options := ProcessTable[1].Process.Options +
[poUsePipes];
ProcessTable[i].Process.Execute;
end;
end;
I am just looping
through my Array and checking to see if there is an address defined,
if so start a ssh. Note, I removed the poWaitOnExit from
ProcessOptions because we don't want to stop and wait for each
process to finish. Instead, we will test the subprocess in the next
step. Also note that I am calling a function CmdLine to build the
Linux command I want to run. Below is that function:
Function
CmdLine(i: Integer): String;
Begin
CmdLine
:= 'sshpass -p "' + ProcessTable[i].Password +
'" ssh ' + ProcessTable[i].User + '@' +
'" ssh ' + ProcessTable[i].User + '@' +
ProcessTable[i].Addr + ' sar -u 1
1;free';
end;
This is pretty
simple, it just uses the Array to fill in user, Password and Address.
As you can also note, I am using sshpass to pass the password to
ssh and am also using the sar utility. sshpass can be found
in most repositories and sar is part of the sysstat monitoring
utilities which also can be found in most repositories. I found sar
gave cpu utilization numbers very close to what top gives, so
that is why I ended up using it. sar or whatever utility you
end up using will have to be installed on all of the servers you are
monitoring.
Now finally the main
loop:
EndTime
:= Now + EncodeTime(0, 0, 5, 0); // 5 Seconds
While
Now < EndTime do
Begin
Application.ProcessMessages;
Sleep(10);
End;
For
i:=1 to KKArraySize Do
Begin
If
Length(ProcessTable[i].Addr) > 1 Then
Begin
If
ProcessTable[i].Process.Running = False then
Begin
ProcessTable[i].StrList.LoadFromStream
(ProcessTable[i].Process.Output);
For
j:=0 to ProcessTable[i].StrList.Count-1 Do
Begin
If
copy(ProcessTable[i].StrList[j],1,7)
= 'Average' then
= 'Average' then
Begin
Sum :=
SumProcessorStats
(ProcessTable[i].StrList[j]);
SumProcessorStats
(ProcessTable[i].StrList[j]);
PctStr := Format('%5.2f', [Sum]);
ProcessTable[i].Pct := Sum;
End;
End;
BladeSG.Cells[i,2] :=
Format('%5.2f',
[ProcessTable[i].Pct]);
[ProcessTable[i].Pct]);
ProcessTable[i].StrList.Clear;
ProcessTable[i].Process.CommandLine
:= CmdLine(i);
ProcessTable[i].Process.Options :=
ProcessTable[1].Process.Options +
[poUsePipes];
ProcessTable[1].Process.Options +
[poUsePipes];
ProcessTable[i].Process.Execute;
End;
End;
End;
Well, that is a
mouthful! This is all in a Repeat ... Forever Loop, well, almost
a forever loop because I do have a way to break it. The first While
Loop, is just a means to get the App to wait 5 seconds before
checking on the status of all of our background processes. The first
two If statements seem silly and seem like they should be combined,
but ... Since I only initialized a Process in the array where the
Addr field had actual data in it, if we ran the “Process.Running”
statement for an uninitialized Process variable, the app would have
had a memory error and would have crashed.
ProcessTable[i].StrList.LoadFromStream
(ProcessTable[i].Process.Output);
―— just
loads the output from our ssh into our TStringList, which was
declared in our Array. A TStringList is actually just an array of
Strings and the output of the sar utility has the word Average
in it. So the following block of code just sums up the output from
the sar utility. I summed up user, nice, and system values
listed from the sar utility. You may decide on a different strategy.
For j:=0 to ProcessTable[i].StrList.Count-1 Do
Begin
If
copy(ProcessTable[i].StrList[j],1,7) = 'Average'
then
Begin
Sum := SumProcessorStats
(ProcessTable[i].StrList[j]);
PctStr := Format('%5.2f', [Sum]);
ProcessTable[i].Pct := Sum;
End;
End;
SumProcessorStats is
a function and it does the adding of the sar outputed fields
together.
Function
SumProcessorStats(s: String): Double;
Var
UsrStr,
NiceStr, SysStr: String;
Sum,
Dbl: Double;
begin
UsrStr
:= copy(s,24,6);
NiceStr
:= copy(s,34,6);
SysStr
:= copy(s,44,6);
Sum
:= 0.0;
Dbl
:= StrToFloat(UsrStr);
Sum
:= Sum + Dbl;
Dbl
:= StrToFloat(NiceStr);
Sum
:= Sum + Dbl;
Dbl
:= StrToFloat(SysStr);
Sum
:= Sum + Dbl;
SumProcessorStats := Sum;
end;
Finally, since the
previous subprocess had finished, this last piece starts a new
subprocess:
ProcessTable[i].StrList.Clear;
ProcessTable[i].Process.CommandLine
:= CmdLine(i);
ProcessTable[i].Process.Options
:= ProcessTable[1].Process.Options +
[poUsePipes];
ProcessTable[i].Process.Execute;
Security is an issue
here. The addresses, users and passwords could be hard-coded into
your App. If somebody broke into your system though, they could get
all of your passwords by looking at your source code. An improvement
would be to have all that info in MySQL Table and have the Operator
prompted for the password to access that table. Even better would be
to have the data all encrypted and have the Operator prompted for a
decryption key.
At some point, we
may want to start sending e-mails out when certain conditions occur.
Like if one of the servers is maxed out for several minutes, or if a
server becomes unresponsive. Sending out e-mails isn't that hard to
add. There is a library that can be downloaded for free called
Synapse (http://wiki.freepascal.org/Synapse). You would need to
download the library and add the following libraries to uses clause:
synautil, synacode,
blcksock, pop3send, smtpsend
I copied the whole
Synapse Library into the directory containing my app, and then
whatever Lazarus needed, it would compile. The content of the e-mail
would need to be added to a tstringlist and then to send the e-mail,
use something similar to the following:
If
Not SendTo('ToEmailAddress', 'FromEmailAddress', 'Subject',
'EmailServer', MsgStringList) Then
ShowMessage('E-Mail Didn''t Work');
ToEmailAddress could
be your gmail address on your phone, FromEmailAddress would be your
regular e-mail address, EmailServer will be your e-mail (Probably
something like mail.myservername.com). There is more information on
the Wiki Site listed above.
Performance
On an Intel i5, the
main thread uses about 6% of the CPU on one core. Very acceptable.
Subscribe to:
Posts (Atom)