Reese Knowledgebase

High Load Average and processes in D state

View Kristian Reese's profile on LinkedIn


If you like this article, please +1 or Recommend via FB with the provided buttons above:

Article ID: 101
by: Reese K.
Posted: 25 Apr, 2013
Last updated: 24 Sep, 2013
Views: 2681

High Load Average and processes in D state

This article will discuss things you can do to help troubleshoot latent servers due to several processes in "D" state.  Though this article will focus on CentOS running virtuozzo 4.6.0 2.6.18-028stab101.1 with /vz mounted over NFS, you may gleam some information if this is not your setup.

If running virtuozzo 4.7, this may help.

History:

Working towards a solution has been a long, drawn out process.  This isn't something that came over night, nor is it something that came exclusively working with parallels support.  I spent days with their support personnel collecting process stack information and analyzing them to no end.  However, I will share my experience in dealing with this problem from bits and pieces of working with parallels support, and things I've discovered along the way.

As our systems began experiencing this issue, we were running Virtuozzo kernel 2.6.18-028stab089.1 and frequently experienced issues with high load average, latent load times of containers hosted content, and several processes in D state.  Taking the following series of steps helped to dramatically improve performance and for a long time, and curbed the issues of state D processes and high load averages:

Hardware specs:

  • 96 GB RAM
  • Dual Quad core Intel(R) Xeon(R) CPU E5530  @ 2.40GHz
  • 150 containers per node
  1. Set dcachesize of the hardware node appropriately:
    1. ~# vzctl set 0 --dcachesize 5G --save
  2. disable fsync calls inside containers (pick your preferred method -> step 2.1 or 2.2)
    1. ~# echo 0 > /proc/sys/fs/fsync-enable
    2. ~# sysctl fs.fsync-enable=0
    3. Add the following two lines to /etc/sysctl.conf to make permanent across reboots
      #disable fsync() calls inside Containers
      fs.fsync-enable=0
  3. change kernel.pid_max and fs.file-max
    1. ~# sysctl fs.file-max=1048576 kernel.pid_max=262144
    2. Change values in /etc/sysctl.conf to make permanent across reboots
      kernel.pid_max=262144
      fs.file-max=1048576
  4. update your virtuozzo installation using Parallels update utility
    1. ~# vzup2date -m batch install --loader-autoconfig
  5. Limit amount of CT simultaneous start ups at boot
    1. set PARALLEL=16 in /etc/sysconfig/vz
  6. reboot the hardware node

At the time, this updated the server to kernel 2.6.18-028stab101.1 and these changes had positive impact on performance.  For a long time, this issue didn't come back.... until recently.  With that said, if the above solution has helped for your environment, you may want to stop here.  Besides, the information below doesn't include additional configuration changes, but more of a means to identify problem containers and get them under control in order to bring load average down and cease state D processes.

Examining D state processes


 

At times, high load averages will ensue and processes enter D state, even after apply the latest updates, etc as noted above.  So what to do in this situation?

Solutions:

I devised the following method to identify the top 5 containers are have the most D state processes, and have found by simply migrating the top offender resulted in returning the server to normal load whereby D state processes virtually cease to exist:

i=1
while :
do
     vzps -o veid,uid,pid,ppid,vsz,rsz,state,wchan:20,cmd axfww | awk '$7~/[D]/' | awk '{ print $1 }' | sort -n | uniq -c | sort -n | tail -5 >> top5
     sleep 5
     echo"`expr $i \*5` sec of collection has surpassed"
     i=`expr $i + 1`
done

In one command:

i=1; while :; do vzps -o veid,uid,pid,ppid,vsz,rsz,state,wchan:20,cmd axfww | awk '$7~/[D]/' | awk '{ print $1 }' | sort -n | uniq -c | sort -n | tail -5 >> top5; sleep 5; echo "`expr $i \* 5` sec of collection has surpassed"; i=`expr $i + 1`;done

After letting that run for 300 seconds or so, I will look for the top 5 and so far, migrating the container with the highest amount of encoutered D state processes has brought things under control:

cat top5 | awk '{ print $2 }' | sort -n | uniq -c | sort -n | tail -5

Have a look at the external links section below for other tips on examining state D processes and finding the kernel function that caused D process state (good luck with that!)


Alternative Solution

Another thing that has worked for me recently is to look for containers with high load averages:

[root@vps01 ~]# vzlist -o veid,laverage

Examine the output and reboot all containers that exhibit a high load average.

END

This article was:   Helpful | Not Helpful
External links
http://kb.kristianreese.com/index.php?View=entry&EntryID=99
http://kb.parallels.com/en/114831
http://download.swsoft.com/virtuozzo/virtuozzo4.0/docs/en/lin/VzLinuxUG/353.htm

Also listed in
folder UNIX -> Linux

Prev   Next
HP     How to find stack information for state D processes

RSS