642 lines
27 KiB
Plaintext
Executable File
642 lines
27 KiB
Plaintext
Executable File
This is sysstat's Frequently Asked Questions!
|
|
Be sure to read this carefully before asking for help...
|
|
If you don't find the solution to your problem here then send me an email
|
|
(please remember to include sysstat's version number, a sample output
|
|
showing the bug, and the contents of the /proc/stat file for your system.
|
|
Also tell me what version your kernel is).
|
|
|
|
|
|
1. GENERAL QUESTIONS
|
|
|
|
1.1. When I compile sysstat, it fails with the following message:
|
|
"make: msgfmt: Command not found".
|
|
1.2. When I try to compile sysstat, it fails and says it cannot find some
|
|
include files.
|
|
1.3. I don't understand why sysstat displays the time sometimes as HH:MM:SS
|
|
and sometimes as HH:MM:SS AM/PM...
|
|
|
|
2. QUESTIONS RELATING TO SAR/SADC/SADF
|
|
|
|
2.1. The sar command complains with the following message:
|
|
"Invalid system activity file: ...".
|
|
2.2. The sar command complains with the following message:
|
|
"Cannot append data to that file (...)".
|
|
2.3. The sar command complains with the following message:
|
|
"Invalid data format".
|
|
2.4. I get the following error message when I try to run sar:
|
|
"Cannot open /var/log/sa/sa30: No such file or directory".
|
|
2.5. Are sar daily data files fully compatible with Sun Solaris format
|
|
sar files?
|
|
2.6. I have some trouble running sar on my SMP box. My server crashes
|
|
with a kernel oops.
|
|
2.7. The "Average:" results from the sar command are just rubbish...
|
|
2.8. My database (e.g. MySQL) doesn't appear to understand the time zone
|
|
displayed by 'sadf -d'...
|
|
2.9. I tried to use the -p option of the sadf command, together with the
|
|
options -s and -e. Unfortunately, I have nothing displayed at all.
|
|
2.10. I cannot see all my disks when I use the sar -d command...
|
|
2.11. Do you know a tool which can graphically plot the data collected by sar?
|
|
2.12. When I launch sadc, I get the error message:
|
|
"flock: Resource temporarily unavailable".
|
|
2.13. How should I run sysstat / sar so that I get a reading for 00:00:00?
|
|
2.14. The sar command complains with the following message:
|
|
"Requested activities not available in file ...".
|
|
2.15. Does sar need a lot of resources to run?
|
|
2.16. Are the measurements gathered by sadc cumulative or instantaneous?
|
|
2.17. Some fields are always displayed as 0.00 when I use the sar -d
|
|
command.
|
|
2.18. The sar command complains with the following message:
|
|
"Requested activities not available".
|
|
2.19. How can I keep sar data for more than one month?
|
|
2.20. How can I load sar data into an Oracle database for performance
|
|
analysis and capacity planning?
|
|
2.21. The sar command displays some weird output values...
|
|
2.22. What happened to sar's options -h, -H, -x and -X?
|
|
|
|
3. QUESTIONS RELATING TO IOSTAT
|
|
|
|
3.1. I can't see all my disks when I use the iostat command...
|
|
3.2. iostat -x doesn't report disk I/O statistics...
|
|
3.3. Why can't iostat display extended statistics for partitions with
|
|
2.6.x kernels?
|
|
3.4. I don't understand the output of iostat. It doesn't match what I expect
|
|
it to be...
|
|
3.5. Why values displayed by iostat are so different in the first report
|
|
from those displayed in subsequent ones?
|
|
3.6. iostat -x displays huge numbers for some fields...
|
|
|
|
4. QUESTIONS RELATING TO PIDSTAT
|
|
|
|
4.1. pidstat -d doesn't report task I/O statistics...
|
|
4.2. The pidstat command complains with the following message:
|
|
"Requested activities not available".
|
|
4.3. pidstat doesn't display statistics for process (task) xyz...
|
|
|
|
|
|
1. GENERAL QUESTIONS
|
|
####################
|
|
|
|
1.1. When I compile sysstat, it fails with the following message:
|
|
make: msgfmt: Command not found
|
|
make: ***[locales] Error 127
|
|
|
|
The msgfmt command belongs to the GNU gettext package.
|
|
If you don't have it on your system, just configure sysstat with
|
|
NLS disabled like this:
|
|
|
|
$ ./configure --disable-nls
|
|
|
|
or answer 'y' (for "yes") to the question
|
|
"Disable National Language Support (NLS)? (y/n) [--disable-nls]"
|
|
if you use the Interactive Configuration script (iconfig),
|
|
then compile sysstat as usual (make ; make install).
|
|
Please read the README-nls file included in sysstat source package to learn
|
|
some more about National Language Support.
|
|
|
|
~~~
|
|
|
|
1.2. When I try to compile sysstat, it fails and says it cannot find some
|
|
include files:
|
|
In file included from /usr/include/bits/errno.h:25,
|
|
from /usr/include/errno.h:36,
|
|
from common.c:26:
|
|
/usr/include/linux/errno.h:4: asm/errno.h: No such file or directory
|
|
<SNIP>
|
|
common.c: In function `get_kb_shift':
|
|
common.c:180: `PAGE_SIZE' undeclared (first use in this function)
|
|
common.c:178: warning: `size' might be used uninitialized in this function
|
|
make: *** [common.o] Error 1
|
|
|
|
Make sure that you have the Linux kernel sources installed in
|
|
/usr/src/linux. Also make sure that the symbolic link exists in the
|
|
/usr/src/linux/include directory and points to the right architecture, e.g.:
|
|
|
|
# ll /usr/src/linux/include/asm
|
|
lrwxrwxrwx 1 root root 8 May 5 18:31 /usr/src/linux/include/asm -> asm-i386
|
|
|
|
In fact, only the Linux kernel headers should be necessary to be able
|
|
to compile sysstat.
|
|
|
|
~~~
|
|
|
|
1.3. I don't understand why sysstat displays the time sometimes as HH:MM:SS
|
|
and sometimes as HH:MM:SS AM/PM...
|
|
|
|
The time format used by sysstat tools depends on the locale of your system.
|
|
The locale is defined by several environment variables, among which the LANG
|
|
variable is perhaps the most widely used. See the following example:
|
|
|
|
$ export LANG=en_US
|
|
$ sar
|
|
Linux 2.4.9 (brooks.seringas.fr) 07/20/04
|
|
|
|
04:32:11 PM LINUX RESTART
|
|
|
|
05:00:00 PM CPU %user %nice %system %iowait %idle
|
|
05:10:00 PM all 0.24 0.00 89.64 0.00 10.12
|
|
Average: all 0.24 0.00 89.64 0.00 10.12
|
|
|
|
$ export LANG=fr_FR
|
|
$ sar
|
|
Linux 2.4.9 (brooks.seringas.fr) 20.07.2004
|
|
|
|
16:32:11 LINUX RESTART
|
|
|
|
17:00:00 CPU %user %nice %system %iowait %idle
|
|
17:10:00 all 0,24 0,00 89,64 0,00 10,12
|
|
Moyenne: all 0,24 0,00 89,64 0,00 10,12
|
|
|
|
As you can notice, the time format but also the date, the decimal point, and
|
|
even some words (like "Average") have changed according to the specified
|
|
locale.
|
|
|
|
|
|
2. QUESTIONS RELATING TO SAR/SADC/SADF
|
|
######################################
|
|
|
|
2.1. The sar command complains with the following message:
|
|
Invalid system activity file: ...
|
|
|
|
You are trying to use a file which is not a system activity file, or
|
|
whose format is no longer compatible with that of files created by
|
|
current version of sar.
|
|
If you were trying to use the standard system activity files located in the
|
|
/var/log/sa directory then the solution is easy: just log in as root and
|
|
remove by hand all the files located in the /var/log/sa directory:
|
|
|
|
# rm /var/log/sa/sa??
|
|
|
|
With recent versions of sysstat (8.1.1 and later), it is now possible to
|
|
know which version of sar or sadc has been used to create a data file.
|
|
Just enter the following command:
|
|
|
|
$ sadf -H /your/datafile | grep sysstat
|
|
File created using sar/sadc from sysstat version 8.1.7
|
|
|
|
~~~
|
|
|
|
2.2. The sar command complains with the following message:
|
|
Cannot append data to that file (...)
|
|
|
|
The internal structure of the data file does not allow sar to append
|
|
data to it. The data file may come from another machine, or the components
|
|
of the current box, such as the number of processors, may have changed.
|
|
Use another data file, or delete the current daily data file, and try again.
|
|
|
|
~~~
|
|
|
|
2.3. The sar command complains with the following message:
|
|
Invalid data format
|
|
|
|
This error message means that sadc (the system activity data collector that
|
|
sar is using) is not consistent with the sar command. In most cases this is
|
|
because the sar and sadc commands do not belong to the same release of the
|
|
sysstat package. Remember that sar may search for sadc in predefined
|
|
directories (/usr/local/lib/sa, /usr/lib/sa, ...) before looking in the
|
|
current directory!
|
|
|
|
~~~
|
|
|
|
2.4. I get the following error message when I try to run sar:
|
|
Cannot open /var/log/sa/sa30: No such file or directory
|
|
|
|
Please read the sar(1) manual page! Daily data files are created in the
|
|
/var/log/sa directory using the data collector (sadc) or using option
|
|
-o with sar. Once they are created, sar can display statistics saved
|
|
in those files.
|
|
But sar can also display statistics collected "on the fly": just enter
|
|
the proper option on the command line to indicate which statistics are
|
|
to be displayed, and also specify an <interval> and <count> number.
|
|
E.g.:
|
|
|
|
# sar 2 5 --> will report CPU utilization every two seconds, five times.
|
|
# sar -n DEV 3 0 --> will report network device utilization every
|
|
3 seconds, in an infinite loop.
|
|
|
|
~~~
|
|
|
|
2.5. Are sar daily data files fully compatible with Sun Solaris format
|
|
sar files?
|
|
|
|
No, the format of the binary data files created by sysstat's sar command
|
|
is not compatible with formats from other Unixes, because it contains
|
|
data which are closely linked to Linux.
|
|
For the same reason, sysstat cannot work on platforms other than Linux...
|
|
|
|
~~~
|
|
|
|
2.6. I have some trouble running sar on my SMP box. My server crashes
|
|
with a kernel oops:
|
|
Feb 17 04:05:00 bolums1 kernel: Unable to handle kernel paging request
|
|
at virtual address fffffc1c
|
|
Feb 17 04:05:00 bolums1 kernel: current->tss.cr3 = 19293000, %cr3 = 19293000
|
|
Feb 17 04:05:00 bolums1 kernel: *pde = 0026b067
|
|
Feb 17 04:05:00 bolums1 kernel: *pte = 00000000
|
|
Feb 17 04:05:00 bolums1 kernel: Oops: 0000
|
|
Feb 17 04:05:00 bolums1 kernel: CPU: 0
|
|
Feb 17 04:05:00 bolums1 kernel: EIP:
|
|
<...>
|
|
|
|
The trouble you have is triggered by a *Linux* kernel bug, not a sysstat
|
|
one... The best solution is to upgrade your kernel to the latest stable
|
|
release.
|
|
Also, if you cannot upgrade your box, try to configure sysstat with the
|
|
SMP race workaround:
|
|
|
|
$ ./configure --enable-smp-race
|
|
|
|
or answer 'y' to the question:
|
|
"Linux SMP race in serial driver workaround? (y/n) [--enable-smp-race]"
|
|
if you use the Interactive Configuration script (iconfig).
|
|
Indeed, we found that 2.2.x kernels (with x <= 15) have an SMP race
|
|
condition, which the sar command may trigger when it reads the
|
|
/proc/tty/driver/serial file.
|
|
|
|
~~~
|
|
|
|
2.7. The "Average:" results from the sar command are just rubbish...
|
|
E.g.:
|
|
11:00:00 AM CPU %user %nice %system %idle
|
|
11:10:00 AM all 0.54 0.00 0.89 98.57
|
|
11:20:01 AM all 3.02 8.05 22.85 66.08
|
|
11:30:01 AM all 8.15 0.00 2.31 89.54
|
|
11:40:01 AM all 8.03 0.00 2.42 89.55
|
|
11:50:01 AM all 16.04 0.00 2.81 81.16
|
|
12:00:00 PM all 21.11 0.00 3.23 75.66
|
|
03:40:01 PM all 100.01 100.01 100.01 0.00
|
|
04:40:00 PM all 100.00 0.00 100.00 0.00
|
|
04:50:00 PM all 5.87 0.00 1.26 92.87
|
|
05:00:00 PM all 4.70 0.00 1.48 93.82
|
|
05:10:00 PM all 4.93 0.00 1.29 93.78
|
|
Average: all 100.22 100.20 100.13 0.00
|
|
|
|
Your sar command was not installed properly. Whenever your computer
|
|
is restarted (as it is the case here between 12:00:00 PM and 03:40:01 PM),
|
|
the 'sysstat' shell script must be called by the system, so that the
|
|
LINUX RESTART message can be inserted into the daily data file, indicating
|
|
that the relevant kernel counters have been reinitialized...
|
|
You can install the 'sysstat' script by hand in the relevant startup
|
|
directory, or you can ask sysstat to do it for you during configuration
|
|
stage by entering:
|
|
|
|
$ ./configure --enable-install-cron
|
|
|
|
Or you can answer 'y' to the question:
|
|
"Set crontab to start sar automatically? (y/n) [--enable-install-cron]"
|
|
if you use the Interactive Configuration script (iconfig).
|
|
Then compile sysstat as usual and run 'make install' as the last stage.
|
|
|
|
~~~
|
|
|
|
2.8. My database (e.g. MySQL) doesn't appear to understand the time zone
|
|
displayed by 'sadf -d'...
|
|
|
|
The format includes the timezone detail in the output. This is to make
|
|
sure it is communicated clearly that UTC is how the data is always
|
|
converted and printed. Moreover we don't depend on the TZ environment
|
|
variable and we don't have some data converted to a different timezone
|
|
for any reason, known or unknown.
|
|
When you deal with raw accounting data you always want it in UTC.
|
|
Of course, you want it to all be the same when loading into a database.
|
|
If your database can't deal with timezones, you should write a short script
|
|
to strip the "UTC" characters from the data being loaded into the database.
|
|
|
|
~~~
|
|
|
|
2.9. I tried to use the -p option of the sadf command, together with the
|
|
options -s and -e. Unfortunately, I have nothing displayed at all.
|
|
|
|
This is because no data belong to the specified time interval!
|
|
With versions older than 7.0.1, the time specified by options -s or -e
|
|
may be considered as given in UTC (Coordinated Universal Time) or local
|
|
time depending on whether sadf displays its output in UTC or local time.
|
|
Option -p makes sadf display its timestamp in UTC, as indicated in the
|
|
manual page. The UTC value may be different from the value that sar
|
|
(or sadf without option -p) displays. The same remark applies to the
|
|
use of -d option.
|
|
|
|
~~~
|
|
|
|
2.10. I cannot see all my disks when I use the sar -d command...
|
|
|
|
See question 3.1 below.
|
|
|
|
~~~
|
|
|
|
2.11. Do you know a tool which can graphically plot the data collected by sar?
|
|
|
|
Several such tools are lying around on the internet. I haven't tested all of
|
|
them and there must still be some way for improvement...
|
|
You can find among others: isag (a Perl script included in the sysstat
|
|
package), kSar, sarvant, sar2gp, loadgraph, SysStat Charts, sarplot...
|
|
rrd.cgi (http://haroon.sis.utoronto.ca/rrd/scripts/) is a perl front-end for
|
|
rrdtool and can be used to make some graphs (see a demo at
|
|
http://haroon.sis.utoronto.ca/perl/rrd.cgi/sar_stats/).
|
|
I've also heard of commercial tools which use sysstat: PerfMan comes to mind,
|
|
among others.
|
|
If you find others which you think are of real interest, please let me know
|
|
so that I can update this list.
|
|
|
|
~~~
|
|
|
|
2.12. When I launch sadc, I get the error message:
|
|
flock: Resource temporarily unavailable
|
|
|
|
You are launching sadc using -L option. With this option, sadc tries to
|
|
get an exclusive lock on the output file. The above error message indicates
|
|
that another sadc process was running and had already locked the same output
|
|
file. Stop all sadc instances and try again.
|
|
|
|
~~~
|
|
|
|
2.13. I have sysstat setup to run via cron:
|
|
0 * * * * /usr/local/lib/sa/sa1 600 6 &
|
|
so that I get an activity report every 10 minutes.
|
|
When I use sar to get my output, there is no reading for 00:00:00. This
|
|
means that at midnight every night there is a spike, or dip, in the graphs.
|
|
How should I run sysstat / sar so that I get a reading for 00:00:00?
|
|
|
|
Sysstat does get its data at midnight, but two data samples are needed to
|
|
display the values.
|
|
When there is a "file rotation" (beginning of a new day), sadc writes its data
|
|
at the end of the previous daily data file (/var/log/sa/sa<DD>) *and* at the
|
|
beginning of the new one (/var/log/sa/sa<DD+1>). Please note that '-' must be
|
|
used to specify the output file for sadc to be able to detect such a file
|
|
rotation. So a crontab like the following one should enable you to get the
|
|
data for midnight at the end of each daily data file:
|
|
|
|
# Activity reports every 10 minutes from 01:00:00 to 22:50:00
|
|
0 1-22 * * * /usr/local/lib/sa/sa1 600 6 &
|
|
# Activity reports every 10 minutes from 23:00:00 to 00:00:00
|
|
# Reporting until 00:00:00 ensures that a file rotation will be detected
|
|
# by sadc
|
|
0 23 * * * /usr/local/lib/sa/sa1 600 7 &
|
|
# Activity reports every 10 minutes from 00:10:00 to 00:50:00
|
|
10 0 * * * /usr/local/lib/sa/sa1 600 5 &
|
|
|
|
Another possible crontab would be:
|
|
|
|
*/10 1-22 * * * /usr/lib/sa/sa1 -d 1 1
|
|
0,10,20,30,40 23 * * * /usr/lib/sa/sa1 -d 1 1
|
|
50 23 * * * /usr/lib/sa/sa1 -d 600 2
|
|
10,20,30,40,50 0 * * * /usr/lib/sa/sa1 -d 1 1
|
|
|
|
~~~
|
|
|
|
2.14. The sar command complains with the following message:
|
|
Requested activities not available in file ...
|
|
|
|
This error message means that you are trying to extract non-existent activities
|
|
from the data file. Usually sadc reads all the available activities from the
|
|
system and stores them in the data file. However, to prevent data files from
|
|
taking too much space on disk, some activities must be explicitly set on the
|
|
command line to be read by sadc.
|
|
To tell sadc that an optional activity should be collected, use switch -S
|
|
followed by the keyword corresponding to that activity (see sadc(8) manual page).
|
|
As of this writing, optional activities are: interrupts, disks, SNMP, IPv6 and
|
|
power management statistics.
|
|
IMPORTANT NOTE: The list of activities that are saved in a file can no longer
|
|
be modified once the file has been created. So it is important to use the proper
|
|
options the first time sadc is called (whether via a crontab, a script like
|
|
sa1 or even the script used to insert a RESTART message when the machine is
|
|
rebooted).
|
|
NB: If the sar command complains with the error message:
|
|
"Requested activities not available" (without mentioning "in file"),
|
|
then see question 2.18 below.
|
|
|
|
~~~
|
|
|
|
2.15. Does sar need a lot of resources to run?
|
|
|
|
No, sar doesn't need a lot of CPU to run, nor does it make your system slow,
|
|
contrary to what some people think. In the first place, it only runs every ten
|
|
minutes by default. Secondly, when it does run, it is over and done very
|
|
quickly. Try:
|
|
|
|
$ time /usr/lib/sa/sa1
|
|
|
|
to verify that for yourself.
|
|
Nor do you have to be concerned about using up all your disk space.
|
|
sar will use a few hundred kilobytes for a whole day's worth of data, and it
|
|
normally only stores one week worth (this can be configured via the HISTORY
|
|
variable in the /etc/sysconfig/sysstat file). It is entirely self limiting.
|
|
Moreover, you can ask sar to compress its datafiles older than a certain
|
|
number of days: see the COMPRESSAFTER parameter in the /etc/sysconfig/sysstat
|
|
configuration file.
|
|
|
|
~~~
|
|
|
|
2.16. Are the measurements gathered by sadc cumulative or instantaneous values?
|
|
|
|
Each counter maintained by the kernel is cumulative since system boot. As a
|
|
consequence the measurements gathered by sadc are cumulative values.
|
|
Moreover all per-second statistics displayed by sar are average values on the
|
|
given time interval. So the value for counter foo at time T is calculated as:
|
|
foo/s = [foo(T) - foo(T-dt)] / dt
|
|
where dt is the interval given on the command line.
|
|
|
|
~~~
|
|
|
|
2.17. Some fields are always displayed as 0.00 when I use the sar -d
|
|
command.
|
|
|
|
See question 3.2 below.
|
|
|
|
~~~
|
|
|
|
2.18. The sar command complains with the following message:
|
|
Requested activities not available
|
|
|
|
This error message means that you are trying to display activities that the
|
|
kernel itself is unable to provide.
|
|
When this error message is displayed while trying to save the data into an
|
|
existing file ("sar -o datafile ..."), this may also be because the target
|
|
file cannot accept the requested activities. In this case, just try to use
|
|
another file or create a new one. See also question 2.14 above.
|
|
|
|
~~~
|
|
|
|
2.19. How can I keep sar data for more than one month?
|
|
|
|
By default sar saves its data in the standard system activity data file,
|
|
the /var/log/sa/sa<DD> file, where <DD> is the current day in the month.
|
|
To prevent sar from overwriting any existing files, just set the variable
|
|
HISTORY in /etc/sysconfig/sysstat to the number of days during which data
|
|
must be kept. When this variable has a value greater than 28, sa1 script
|
|
uses a month-by-month directory structure; datafiles are named YYYYMM/saDD
|
|
and the script maintains links to these datafiles to mimic the standard
|
|
sar datafile structure.
|
|
However please note that pre-existing datafiles will be deleted as links
|
|
will be created and replace them.
|
|
|
|
~~~
|
|
|
|
2.20. How can I load sar data into an Oracle database for performance
|
|
analysis and capacity planning?
|
|
|
|
The simplest way for that is to use sadf (a command included in sysstat
|
|
package) with its option -d. It displays sar data in a format that can
|
|
easily be ingested by a relational database system (fields are separated
|
|
by a semicolon). It should then be easy for a tool like SQL*Loader to
|
|
load the data into the Oracle database.
|
|
|
|
~~~
|
|
|
|
2.21. The sar command displays some weird CPU values...
|
|
E.g.:
|
|
10:50:01 AM CPU %user %nice %system %iowait %idle
|
|
11:00:01 AM all 90.90 0.00 5.17 3.93 0.00
|
|
11:00:01 AM 0 39.40 0.00 2.37 2.07 56.17
|
|
11:00:01 AM 1 29.71 0.00 1.73 1.17 67.39
|
|
11:00:01 AM 2 42.69 0.00 2.34 1.11 53.85
|
|
11:00:01 AM 3 26.24 0.00 1.41 1.61 70.74
|
|
...
|
|
|
|
Sysstat may have met an overflow condition while reading CPU usage from
|
|
your /proc/stat file. This condition is all the more likely to happen as
|
|
your machine uptime is high and/or there are many processors.
|
|
Sysstat up to version 5.0.6 uses 32-bit integer variables to store CPU usage.
|
|
Then, beginning with version 5.1.1, sysstat has shifted to 64-bit variables,
|
|
which has fixed the problem. So try to upgrade your version of sysstat to
|
|
the latest stable release and check that the problem has gone.
|
|
|
|
~~
|
|
|
|
2.22. What happened to sar's options -h, -H, -x and -X?
|
|
|
|
These old options have been removed from sar because new commands have been
|
|
made available. You should now use the sadf command instead of sar -h or
|
|
sar -H, and the pidstat command instead of sar -x or sar -X. Please read
|
|
their manual page to learn some more about their respective options.
|
|
|
|
|
|
3. QUESTIONS RELATING TO IOSTAT
|
|
###############################
|
|
|
|
3.1. I can't see all my disks when I use the iostat command...
|
|
|
|
Yes. This is a kernel limit. Old kernels (2.2.x for instance) used to
|
|
maintain stats for the first four devices.
|
|
The accounting code has changed in 2.4 kernels, and the result may (or
|
|
may not) be better for your system. Indeed, Linux 2.4 maintains the stats
|
|
in a two dimensional array, with a maximum of 16 devices (DK_MAX_DISK
|
|
in the kernel sources). Moreover, if the device major number exceeds
|
|
DK_MAX_MAJOR (whose value also defaults to 16 in the kernel sources),
|
|
then stats for this device will not be collected.
|
|
So, a solution may be simply to change the values of these limits in
|
|
linux/include/linux/kernel_stat.h and recompile your kernel.
|
|
You should no longer have any problem with post 2.5 kernels, since
|
|
statistics are maintained by the kernel for all the devices.
|
|
In the particular case of iostat, also be sure to use the ALL keyword
|
|
on the command line to display statistical information relating to
|
|
every device, including those that are defined but have never been used
|
|
by the system.
|
|
|
|
~~~
|
|
|
|
3.2. iostat -x doesn't report disk I/O statistics...
|
|
|
|
For 'iostat -x' to be able to report extended disk I/O statistics,
|
|
it is better to use a recent version of the Linux kernel (2.6.x).
|
|
Indeed, iostat tries to read data from the /proc/diskstats file or
|
|
from the sysfs filesystem for that.
|
|
But iostat may also be able to display extended statistics with
|
|
older kernels (e.g. 2.4.x) providing that all the necessary
|
|
statistical information is available in the /proc/partitions file,
|
|
which requires that a patch be applied to the kernel (this is
|
|
often done on kernels included in various distros). In recent 2.4.x
|
|
kernels, the /proc/partitions file has all the necessary data
|
|
providing that the kernel has been compiled with CONFIG_BLK_STATS=y.
|
|
|
|
~~~
|
|
|
|
3.3. Why can't iostat display extended statistics for partitions with
|
|
some 2.6.x kernels?
|
|
|
|
Because the kernel maintains these stats only for devices, and not for
|
|
partitions! Here is an excerpt from the document iostats.txt written by
|
|
Rick Lindsley (ricklind@us.ibm.com) and included in the kernel source
|
|
documentation:
|
|
"There were significant changes between 2.4 and 2.6 in the I/O subsystem.
|
|
As a result, some statistic information disappeared. The translation from
|
|
a disk address relative to a partition to the disk address relative to
|
|
the host disk happens much earlier. All merges and timings now happen
|
|
at the disk level rather than at both the disk and partition level as
|
|
in 2.4. Consequently, you'll see a different statistics output on 2.6 for
|
|
partitions from that for disks."
|
|
Extended I/O statistics for partitions are available again with kernels
|
|
2.6.25 and later.
|
|
|
|
~~~
|
|
|
|
3.4. I don't understand the output of iostat. It doesn't match what I expect it
|
|
to be...
|
|
|
|
By default iostat displays I/O activity in blocks per second. With old
|
|
kernels (i.e. older than 2.4.x) a block is of indeterminate size and therefore
|
|
the displayed values are not useful.
|
|
With recent kernels (kernels 2.4 and later), iostat is now able to get disk
|
|
activities from the kernel expressed in a number of sectors. If you take a
|
|
look at the kernel code, the sector size is actually allowed to vary although
|
|
I have never seen anything other than 512 bytes.
|
|
|
|
~~~
|
|
|
|
3.5. Why values displayed by iostat are so different in the first report
|
|
from those displayed in subsequent ones?
|
|
|
|
Probably because, as written in the manual page, the first report generated
|
|
by iostat concerns the time since system startup, whereas subsequent ones
|
|
cover only the time since the previous report (that is to say, the interval
|
|
of time entered on the command line).
|
|
|
|
~~~
|
|
|
|
3.6. iostat -x displays huge numbers for some fields...
|
|
|
|
Because of a Linux kernel bug, iostat -x may display huge I/O response times
|
|
(svctm) and a bandwidth utilization (%util) of 100% for some devices. Indeed
|
|
these devices have a value for the field #9 (beginning after the device name)
|
|
in /proc/{partitions,diskstats} which is always different from 0, and even
|
|
negative sometimes. Yet this field should go to zero, since it gives the
|
|
number of I/Os currently in progress (it is incremented as requests are
|
|
submitted, and decremented as they finish).
|
|
To (temporarily) fix the problem, you should reboot your system to reset the
|
|
counters in /proc/{partitions,diskstats}.
|
|
|
|
|
|
4. QUESTIONS RELATING TO PIDSTAT
|
|
################################
|
|
|
|
4.1. pidstat -d doesn't report task I/O statistics...
|
|
|
|
For pidstat -d to be able to report I/O statistics for tasks, you need
|
|
a recent Linux kernel (2.6.20 or later) with the option
|
|
CONFIG_TASK_IO_ACCOUNTING compiled in.
|
|
|
|
~~~
|
|
|
|
4.2. The pidstat command complains with the following message:
|
|
"Requested activities not available".
|
|
|
|
This message is displayed when the pidstat command is unable to display
|
|
the statistics you have requested. This may happen when you try to display
|
|
statistics for child processes (option -T CHILD) because all options of
|
|
pidstat don't necessarily work for child processes. Read the manual page
|
|
to know which statistics are available for child processes.
|
|
|
|
~~~
|
|
|
|
4.3. pidstat doesn't display statistics for process (task) xyz...
|
|
|
|
This must be because pidstat only displays statistics for active tasks
|
|
by default. If you don't use option -p on the command line, then pidstat
|
|
will display only tasks with non-zero statistics. For example, "pidstat -u"
|
|
will display only tasks that have actually used some CPU resources since
|
|
system startup. You should enter "pidstat -u -p ALL" to make sure that all
|
|
the processes are listed in the report.
|
|
|
|
--
|
|
Sebastien Godard (sysstat <at> orange.fr) is the author and the current
|
|
maintainer of this package.
|