You might be familiar with Linux load averages already, the term “load average” refers to,
usually three numbers, somehow represent the load on the system's CPU.
In this article I’ll try making this three numbers clearer and
understandable.
The Linux Load Average is driven by the three factors :
- Run on, or are waiting for, the CPU
- Perform Disk I/O
- Perform Network I/O
But how does one interpret a Load Average that seems to be too
high? The first step is to look at the CPU utilization. If this isn't
100% and the Load Average is above the number of CPU's in the system,
the Load Average is primarily driven by processes performing disk I/O,
network I/O or the combination of both. Finding the processes
responsible for most of the I/O isn't straightforward, there
are many tools available to assist you in doing so.
If the CPU utilization is 100% and the Load Average is above the
number of CPUs in the system, the Load Average is either completely
driven by processes running on, or waiting for, the CPU or driven by a
combination of processes running on, or waiting for, the CPU and
processes performing I/O (which could be in turn a combination of disk
and network I/O).
The easiest way to see the “load average” of your system is by
uptime
.It also appears in top
and can be graphed in the console by tload
. In all three cases the load average refers to a group of three numbers. For example, in the following output of uptime
10:41:47 up 9 days,48min,1 user,load average: 0.82, 0.71, 0.66
the last three numbers are the “load average”. Each number represent
the systems load as a moving average over 1, 5 and 15 minutes
respectively. Now, the important thing is to understand what is being
averaged, the load metric.
The metric that represent the load at a given point in time is how
many process are queued for running at each given time (including the
process that is currently being ran). Generally speaking, on a single
core machine, this can be looked at as CPU utilization percentage when
multiplied by 100. For example if I had a load-average of 0.50 in the
last minute, this means that over the last minute half of the time the
CPU was idle as it had no running process. On the other hand if I had
load average of 2.50 it means that over the last minute an average of
1.5 process were waiting to their turn to run. so the CPU was overloaded
by 150%.
On a multi-core systems things are a bit different, but in order to
avoid unnecessary complications one can usually divide the load-average
by the number of cores an treat the result as the load average of single
core machine. For example let’s say the load average of a two-core
machine was 3.00 2.00 0.50. This means that over the last minute we had
an average of three runnable process, this means that one process, in
average, was queued as there are two core in the machine that can run to
process at a time. So the machine was overloaded had a load of 150% its
capability. Over the last 5 minutes the load average of 2.00 means that
we roughly had 2 process running each time, so the machine was fully
utilized but wasn’t overloaded by work. On the other hand over last 15
minutes the load-average of 0.50 means that we could handle 4 time that
load without overloading the CPU, we only had (0.50/2)*100=25% CPU
utilization in that 15 minutes.
I hope I made the load-average a bit more clearer using the above
example. Load-average is an important metric for measuring a system
performance, and good understanding of it is beneficial.
Note that this document comes without warranty of any kind. But every effort
has been made to provide the information as accurate as possible. I welcome
emails from any readers with comments, suggestions, and corrections at
webmaster_at admin@linuxhowto.in
Copyright © 2012 LINUXHOWTO.IN
Copyright © 2012 LINUXHOWTO.IN
great post, at last I its crystal clear to me! :-)
ReplyDelete