Filesystem Wait I/O Problems

| February 19th, 2016

Wait I/O issues are caused by various issues with the environment and identifying first what  is the leading cause of the problem is the first step to isolating the primary problem.  You’ll probably identify other issues and uncover some major problems elsewhere as well.

Where to Start!?

Checking resources where typical background processes scheduled to run via cron or system processes (like Journaling) are key here.  Below is how to help determine high I/O issues.

Detection of Problem

When you log into a Linux box, if the WA is present and with very high percentage, you will feel the login process will take much longer time than the normal. Then any operation will also take much longer than they usually do.

Determination of Problem

To determine if the problem is caused by WAit IO is relatively easy, let’s use vmstat:

# vmstat 2
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  1   1088 260620 1584364 13291744    0    0     5    93    0    0  1  4 86  9
 0  1   1088 260604 1584364 13291744    0    0     0     0 1274 1016  0  1 75 24
 0  0   1088 260480 1584364 13291744    0    0     0   262 1753 1720  0  1 95  4
 0  1   1088 260240 1584364 13291744    0    0     0   352 2863 2434  0  2 84 13
 0  0   1088 266192 1584364 13291744    0    0     0   252 1246 1095  0  1 83 16
 0  0   1088 266360 1584364 13291744    0    0     0     2 1274 1387  2  2 97  0
 0  2   1088 266236 1584364 13291744    0    0     0   356  889  679  0  1 82 17
 0  1   1088 266236 1584364 13291744    0    0     0   260 1082  889  0  1 72 26

The last column is the “wa” (CPU is waiting), and if it’s constantly > 0, then your system is WAiting (x%) on I/O.

If the sum of columns id (Idle) and wa (Wait IO) is almost 100, this means there are definitely I/O issues because the system is primarily IDle+WAiting on I/O.  The USer and SYstem are not an issue here seem the system is not doing any thing but disk IO operations. It’s typically is caused by a disk configuration problem, like journal for ext4.

The vmstat results don’t tell you if it’s the result of a foreground or background process.  If a user is downloading a large file or backing up the system, then the WAit time will obviously be high and is only showing that “some” process is causing higher IO for the whole system.

Isolating the Cause

As we will see below, the primary reason is the the “Journal Flushing Operation”. It periodically flushes journal commits and other modifications to disk. To determine if this is the cause, use this command for a few minutes and see what the top I/O processes are:

 # iotop -o -a

Total DISK READ:       0.00 B/s | Total DISK WRITE:      23.68 K/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND  
 1747 be/4 root          0.00 B    166.30 M  0.00 % 57.11 % [kjournald]
 1749 be/4 root          0.00 B      2.77 M  0.00 % 14.90 % [kjournald]
  366 be/3 root          0.00 B     16.42 M  0.00 %  6.73 % [jbd2/dm-0-8]
 2797 be/4 root         32.01 M    118.80 M  0.00 %  4.91 % [nfsd]
 2798 be/4 root         29.18 M    109.29 M  0.00 %  4.59 % [nfsd]
 2799 be/4 root         27.73 M     91.23 M  0.00 %  4.49 % [nfsd]

Now, let’s check for kjournal and nfsd to just see how much time since the last system reboot, resources were consumed:

# ps auxf | egrep "kjournal|nfsd|COMMAND"
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      1747  4.2  0.0      0     0 ?        D     2015 8022:45  \_ [kjournald]
root      1748  0.0  0.0      0     0 ?        S     2015  87:24  \_ [kjournald]
root      1749  0.1  0.0      0     0 ?        S     2015 305:30  \_ [kjournald]
root      2795  0.0  0.0      0     0 ?        S<    2015   0:04  \_ [nfsd4]
root      2796  0.0  0.0      0     0 ?        S<    2015   0:00  \_ [nfsd4_callbacks]
root      2797  0.9  0.0      0     0 ?        S     2015 1798:46  \_ [nfsd]
root      2798  0.9  0.0      0     0 ?        S     2015 1801:16  \_ [nfsd]
root      2799  0.9  0.0      0     0 ?        S     2015 1814:44  \_ [nfsd]
root      2800  0.9  0.0      0     0 ?        S     2015 1745:46  \_ [nfsd]
root      2801  0.9  0.0      0     0 ?        S     2015 1782:23  \_ [nfsd]
root      2802  0.9  0.0      0     0 ?        S     2015 1770:46  \_ [nfsd]
root      2803  0.9  0.0      0     0 ?        S     2015 1753:19  \_ [nfsd]
root      2804  0.9  0.0      0     0 ?        S     2015 1765:07  \_ [nfsd]
 ...

Wait IO problem can be caused by several reasons and in this case journaling to resources using our NFS partitions.  The “ps auxf” command shows us a tree-like structure to isolate which processes are consuming the most CPU.

The “STAT” column will show processes current state.  Those background processes with “D” status code are “Uninterruptiable sleep”. The processes with “D+” means “Uninterruptible sleep foreground process” and are typically not your problem processes as they are running in the foreground and more than likely something that you’re aware of (like a backup/tar/cpio, etc.). In this example, the cause of Wait IO is the File System Journaling.

The STAT, which means what STATe the process is in is defined as:

D Uninterruptible sleep (usually due to I/O)
R Running or runnable (on run queue)
S Interruptible sleep (waiting for an event to complete)
T Stopped, either by a job control signal or because it is being traced.
W paging (not valid since the 2.6.xx kernel)
X dead (should never be seen)
Z Defunct ("zombie") process, terminated but not reclaimed by its parent.

So, just to reiterate, if a process with a STAT of “D”, this means it’s actually taking ALL CPU resource with NO possible interruption. That’s why the your performance is at issue.  It’s going to wait on the I/O and will not response to other commands during this time.

To nail down the process taking up your valuable CPU time, you can use this command to sample any process that has the STATe of “D”:

# while true; do date; ps auxf | awk '{if($8=="D") print $0;}'; sleep 1; done
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
Fri Feb 19 11:59:08 CST 2016
root 1747 4.2 0.0 0 0 ? D 2015 8023:21 \_ [kjournald]
root 1749 0.1 0.0 0 0 ? D 2015 305:33 \_ [kjournald]
Fri Feb 19 11:59:09 CST 2016
root 1747 4.2 0.0 0 0 ? D 2015 8023:21 \_ [kjournald]
root 1749 0.1 0.0 0 0 ? D 2015 305:33 \_ [kjournald]
Fri Feb 19 11:59:10 CST 2016
root 366 0.1 0.0 0 0 ? D 2015 315:20 \_ [jbd2/dm-0-8]
root 1747 4.2 0.0 0 0 ? D 2015 8023:21 \_ [kjournald]
root 1749 0.1 0.0 0 0 ? D 2015 305:33 \_ [kjournald]

From the result above, you see there are two process which are consuming your CPU with Wait I/O is primarily kjournald with 4.2%

Note: if only date/time are displayed on the screen, it means there is no any serious Wait I/O issues.

Also you can use the following command to realize a monitoring on these two processes:

So how to Fix?  Here’s a possible Solution

High I/O and WAit time will never the same in any situation, but looking at what we’re working with, we’re focused on the STATe “D” where background processes are consuming our resources.   If the server is for development and not critical, you can disable Journaling. Just make sure you do regular backups of course… If the server is a product server with some sort of RAID configured to protect the failure of a disk with cache buffering, etc.. then you can probably disable it there as well but each situation is unique to your own environment.

Here’s how to disable ext4 journaling