Archive for the Performance Category

An issue with php70-php-fpm Perl PCRE causing some segfaults every once in a while.

Current pcre/jit details from phpinfo:

# php -r "phpinfo();" |egrep -i "pcre|jit"
auto_globals_jit => On => On
pcre
PCRE (Perl Compatible Regular Expressions) Support => enabled
PCRE Library Version => 8.32 2012-11-30
PCRE JIT Support => enabled
pcre.backtrack_limit => 1000000 => 1000000
pcre.jit => 0 => 0
pcre.recursion_limit => 100000 => 100000

Here’s the php.ini section (/etc/opt/remi/php70/php.ini)

 930 [Pcre]
 931 ;PCRE library backtracking limit.
 932 ; http://php.net/pcre.backtrack-limit
 933 ;pcre.backtrack_limit=100000
 934 pcre.backtrack_limit=2000000
 935 ; Increased from 1,000,000 (default) to 2,000,000
 936 
 937 ;PCRE library recursion limit.
 938 ;Please note that if you set this value to a high number you may consume all
 939 ;the available process stack and eventually crash PHP (due to reaching the
 940 ;stack size limit imposed by the Operating System).
 941 ; http://php.net/pcre.recursion-limit
 942 ;pcre.recursion_limit=100000
 943 pcre.recursion_limit=1000000
 944 ; Increased from 100,000 to 1,000,000
 945 
 946 ;Enables or disables JIT compilation of patterns. This requires the PCRE
 947 ;library to be compiled with JIT support.
 948 ;pcre.jit=0
 949 pcre.jit=1
 950 ; STK 051717 Enabled pcre.jit.  It's compiled in and looks like it should help perf.
 951 ; php -r "phpinfo();" |egrep -i "pcre|jit"

After making that change, restart php70-php-fpm and you should be good with the new values set:

# php -r "phpinfo();" |egrep -i "pcre|jit"
auto_globals_jit => On => On
pcre
PCRE (Perl Compatible Regular Expressions) Support => enabled
PCRE Library Version => 8.32 2012-11-30
PCRE JIT Support => enabled
pcre.backtrack_limit => 2000000 => 2000000
pcre.jit => 1 => 1
pcre.recursion_limit => 1000000 => 1000000

Still, need to find out what is causing the PCRE to cause a stack overflow.

EDIT: Need to double-check SELinux

If you want to allow httpd to execmem, then you must tell SELinux about this by enabling the ‘httpd_execmem’ boolean.

# setsebool -P httpd_execmem 1

You can generate a local policy module to allow this access.  You can allow this access for now by executing:

# ausearch -c 'php-fpm' --raw | audit2allow -M my-phpfpm
# semodule -i my-phpfpm.pp
Additional Information:
Source Context system_u:system_r:httpd_t:s0
Target Context system_u:system_r:httpd_t:s0
Source php-fpm

Raw Audit Messages:
type=AVC msg=audit(1495071733.385:311998): avc: denied { execmem } for pid=989 comm="php-fpm" scontext=system_u:system_r:httpd_t:s0 tcontext=system_u:system_r:httpd_t:s0 tclass=process

type=SYSCALL msg=audit(1495071733.385:311998): arch=x86_64 syscall=mmap success=no exit=EACCES a0=0 a1=10000 a2=7 a3=22 items=0 ppid=984 pid=989 auid=4294967295 uid=99 gid=99 euid=99 suid=99 fsuid=99 egid=99 sgid=99 fsgid=99 tty=(none) ses=4294967295 comm=php-fpm exe=/opt/remi/php70/root/usr/sbin/php-fpm subj=system_u:system_r:httpd_t:s0 key=(null)

Hash: php-fpm,httpd_t,httpd_t,process,execmem

 

System Architecture for Scaling Virtual Environment

All traffic should have ssl certs installed for each domain at nginx level.  Nginx is configured as ssl proxy only.  Certs are provided by Let’s Encrypt unless otherwise needed for other purposes.

(non-secure) Varnish (80) –> Apache (8080) –> Redis –> MariaDB

(secure) Nginx (as ssl proxy) (443) –> Varnish (80) –> Apache (8080) –> Redis –> MariaDB

 

Add Link here to Varnish CMS / WordPress, Joomla, Drupal config

Apache / PHP-FPM –> Redis –> MariaDB

 

WordPress Special Notes:

You have to tell WordPress that you are behind SSL and it will function properly. To accomplish this, I use the following code in wp-config.php

if ($_SERVER['HTTP_X_FORWARDED_PROTO'] == 'https') {
    $_SERVER['HTTPS']='on';
}

Be sure to refresh everything once you make you change:

# systemctl restart varnish; systemctl restart nginx; systemctl restart php70-php-fpm; systemctl restart httpd 

You may find yourself needing to download a WP plugin to help with any other issues.

Here are a couple to try:

  • https://mattgadient.com/remove-protocol/
  • https://wordpress.org/plugins/remove-http/

Misc Notes — Found (save) for possible varnish vcl changes




We run Varnish in between an F5 and Apache as well as use Nginx for ssl and load
> balancing in development, in conjunction with WordPress backends. You have to
> tell WordPress that you are behind SSL and it will function properly. To
> accomplish this I’d use the following code in wp-config.php
> 
> if ($_SERVER['HTTP_X_FORWARDED_PROTO'] == 'https') {
>        $_SERVER['HTTPS']='on';
> }
> 
> You can then also set FORCE_SSL_ADMIN and FORCE_SSL_LOGIN however you see fit
> and it should work. I saw some updates not that long ago to support proxy
> headers but don’t believe they are fully supported yet.
> 
> Jason
> 
> 
>> On Nov 2, 2015, at 12:37 PM, Carlos M. Fernández <cfernand at sju.edu> wrote:
>> 
>> Hi, Phil,
>> 
>> We don't use Nginx but do SSL termination at a hardware load balancer,
>> with most of the work to support that setup done in the VCL, and something
>> similar could possibly apply to your scenario.
>> 
>> Our load balancer can use different backend ports depending on which
>> protocol the client requests; e.g., if the client connects to port 80 for
>> HTTP, then the load balancer proxies that to Varnish on port 80, while if
>> the client connects to 443 for HTTPS the load balancer proxies to Varnish
>> on port 8008. The choice of Varnish port numbers doesn't matter, just the
>> fact that Varnish listens on both ports and that the load balancer uses
>> one or the other based on the SSL status with the client (using the
>> command line option "-a :80,8008" in this case).
>> 
>> Then, in vcl_recv, we have the following to inform the backend when an SSL
>> request has arrived:
>> 
>> if ( std.port( server.ip ) == 8008 ) {
>>    set req.http.X-Forwarded-Proto = "https";
>> }
>> 
>> We also have the following in vcl_hash to cache HTTP and HTTPS requests
>> separately and avoid redirection loops:
>> 
>> if ( req.http.X-Forwarded-Proto ) {
>>    hash_data( req.http.X-Forwarded-Proto );
>> }
>> 
>> The backend then can look for that header and respond accordingly. For
>> example, in Apache we set the HTTPS environment variable to "on":
>> 
>> SetEnvIf X_FORWARDED_PROTO https HTTPS=on
>> 
>> I have no knowledge of Nginx, but if it can be configured to use different
>> backend ports then you should be able to use the above.
>> 
>> Best regards,
>> --
>> Carlos.

Setup

Create a mountpoint for the disk :

mkdir /mnt/ramdisk

Secondly, add this line to /etc/fstab in to mount the drive at boot-time.

tmpfs /mnt/ramdisk tmpfs defaults,size=2g,noexec,nosuid,uid=65534,gid=65534,mode=1755 0 0

Change the size option in the above line to easily accommodate the amount of the files you’ll have. Don’t worry, it doesn’t allocate all of that space immediately, but only as it’s used. It’s safe to use up to half of your RAM, perhaps more if your system has a lot of ram that’s not being used.

Mount the new filesystem

mount /mnt/ramdisk

Check to see that it’s mounted

mount
df -h

You should see these entries in mount and df output

tmpfs on /mnt/ramdisk type tmpfs (rw,relatime,size=8388608k)

tmpfs                 8.0G  0.0G  8.0G   0% /mnt/ramdisk

Create directory for Backups

Next we need to create a directory to store the backup copies of the files in.

mkdir /var/ramdisk-backup

Init Script

You can put it wherever you like, so long as you change the script we create below to reflect the new location.

Create a script at /etc/init.d/ramdisk with the following contents

#! /bin/sh 
# /etc/init.d/ramdisk
### BEGIN INIT INFO
# Provides: ramdisk
# Required-Start: $local_fs $remote_fs $syslog $named $network $time
# Required-Stop: $local_fs $remote_fs $syslog $named $network
# Should-Start:
# Should-Stop:
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: ramdisk sync for files
# Description: ramdisk syncing of files
### END INIT INFO

case "$1" in
 start)
        echo "Copying files to RAM disk"
        rsync -av /var/ramdisk-backup/ /mnt/ramdisk/
        echo [date +"%Y-%m-%d %H:%M"] Ramdisk Synched from HD >> /var/log/ramdisk_sync.log
        ;;
 sync)
        echo "Syncing files from RAM disk to Hard Disk"
        echo [date +"%Y-%m-%d %H:%M"] Ramdisk Synched to HD >> /var/log/ramdisk_sync.log
        rsync -av --delete --recursive --force /mnt/ramdisk/ /var/ramdisk-backup/
        ;;
 stop)
        echo "Synching log files from RAM disk to Hard Disk"
        echo [date +"%Y-%m-%d %H:%M"] Ramdisk Synched to HD >> /var/log/ramdisk_sync.log
        rsync -av --delete --recursive --force /mnt/ramdisk/ /var/ramdisk-backup/
        ;;
 *)
        echo "Usage: /etc/init.d/ramdisk {start|stop|sync}"
        exit 1
        ;;
esac

exit 0

Now set this up to run at startup:

update-rc.d ramdisk defaults 00 99

Example For Apache

Configure and add the disk_cache module to Apache, or enable it from your /etc/apache2/mods-available/ then symlink that to the mods-enabled directory.

Our /etc/apache2/mods-available/disk_cache.conf file looks like this below:

<IfModule mod_disk_cache.c>
# cache cleaning is done by htcacheclean, which can be configured in
# /etc/default/apache2
#
# For further information, see the comments in that file, 
# /usr/share/doc/apache2.2-common/README.Debian, and the htcacheclean(8)
# man page.

# This path must be the same as the one in /etc/default/apache2
#CacheRoot /var/cache/apache2/mod_disk_cache
CacheRoot /mod_disk_cache

# This will also cache local documents. It usually makes more sense to
# put this into the configuration for just one virtual host.

CacheEnable disk /

# The result of CacheDirLevels * CacheDirLength must not be higher than
# 20. Moreover, pay attention on file system limits. Some file systems
# do not support more than a certain number of subdirectories in a
# single directory (e.g. 32000 for ext3)
CacheDirLevels 5
CacheDirLength 3

# CacheLock on
# CacheLockPath /tmp/mod_cache-lock
# CacheLockMaxAge 5

</IfModule>

Inspect htcacheclean Parameters

Now, review your /etc/default/apache2 file for htcacheclean changes:

### htcacheclean settings ###

## run htcacheclean: yes, no, auto
## auto means run if /etc/apache2/mods-enabled/disk_cache.load exists
## default: auto
HTCACHECLEAN_RUN=auto

## run mode: cron, daemon
## run in daemon mode or as daily cron job
## default: daemon
HTCACHECLEAN_MODE=daemon

## cache size 
##HTCACHECLEAN_SIZE=300M
HTCACHECLEAN_SIZE=2000M

## interval: if in daemon mode, clean cache every x minutes
HTCACHECLEAN_DAEMON_INTERVAL=360

## path to cache
## must be the same as in CacheRoot directive
##HTCACHECLEAN_PATH=/var/cache/apache2/mod_disk_cache
HTCACHECLEAN_PATH=/mnt/ramdisk

## additional options:
## -n : be nice
## -t : remove empty directories
HTCACHECLEAN_OPTIONS="-n -t"

Modify /etc/init.d/apache2 init File.  Add the following close to the top to check to see if we already have /mod_disk_cache mounted:

#-------------------------------------------------------------------------------------------------#
# Added by Shane 02/23/16 for local caching of files to keep the I/O down
if [ df -k|grep ramdisk|wc -l|awk '{print $1}' -eq 1 ] 
then
 echo "We already have /mnt/ramdisk mounted. Not remounting."
else
 if [ ! -d /mnt/ramdisk ]
 then
 echo "/mnt/ramdisk Does not exist. Let's create it."
 mkdir /mnt/ramdisk 2>/dev/null
 else
 echo "Mounting tmpfs /mnt/ramdisk"
 mount -o defaults,size=2g,noexec,nosuid,uid=65534,gid=65534,mode=1755 -t tmpfs tmpfs /mnt/ramdisk
 if [ $? -gt 0 ] 
 then
 echo "There's an error and it probably did not mount."
 else
 echo "tmpfs /mnt/ramdisk mount successful."
 fi 
 fi 
fi
#edit /etc/default/apache2 to change this.
HTCACHECLEAN_RUN=auto
HTCACHECLEAN_MODE=daemon
HTCACHECLEAN_SIZE=2000M
HTCACHECLEAN_DAEMON_INTERVAL=120
#HTCACHECLEAN_PATH=/var/cache/apache2$DIR_SUFFIX/mnt/ramdisk
HTCACHECLEAN_PATH=/mnt/ramdisk
HTCACHECLEAN_OPTIONS=""
#-------------------------------------------------------------------------------------------------#

 

Example for RRD Files in Observium – Move or Copy files to Prime

If you’re doing this for RRD files, either move your RRDs to /var/ramdisk-backup/observium_rrd and then load them into the ram disk:

mv /opt/observium/rrd /var/ramdisk-backup/observium_rrd
/etc/init.d/ramdisk start

Or move your RRDs to the ram disk itself and then sync them out to the backup:

mv /opt/observium/rrd /mnt/ramdisk/rrd
/etc/init.d/ramdisk sync

Create Symlink

Now either symlink /mnt/ramdisk/rrd to /opt/observium/rrd, or change the configuration so the rrds are loaded from the ramdisk path.

You can put ramdisk sync into your crontab to periodically sync your ram disk back to the hard disk:

2 * * * * root        /etc/init.d/ramdisk sync >> /dev/null 2>&1

Filesystem Wait I/O Problems

| February 19th, 2016

Wait I/O issues are caused by various issues with the environment and identifying first what  is the leading cause of the problem is the first step to isolating the primary problem.  You’ll probably identify other issues and uncover some major problems elsewhere as well.

Where to Start!?

Checking resources where typical background processes scheduled to run via cron or system processes (like Journaling) are key here.  Below is how to help determine high I/O issues.

Detection of Problem

When you log into a Linux box, if the WA is present and with very high percentage, you will feel the login process will take much longer time than the normal. Then any operation will also take much longer than they usually do.

Determination of Problem

To determine if the problem is caused by WAit IO is relatively easy, let’s use vmstat:

# vmstat 2
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  1   1088 260620 1584364 13291744    0    0     5    93    0    0  1  4 86  9
 0  1   1088 260604 1584364 13291744    0    0     0     0 1274 1016  0  1 75 24
 0  0   1088 260480 1584364 13291744    0    0     0   262 1753 1720  0  1 95  4
 0  1   1088 260240 1584364 13291744    0    0     0   352 2863 2434  0  2 84 13
 0  0   1088 266192 1584364 13291744    0    0     0   252 1246 1095  0  1 83 16
 0  0   1088 266360 1584364 13291744    0    0     0     2 1274 1387  2  2 97  0
 0  2   1088 266236 1584364 13291744    0    0     0   356  889  679  0  1 82 17
 0  1   1088 266236 1584364 13291744    0    0     0   260 1082  889  0  1 72 26

The last column is the “wa” (CPU is waiting), and if it’s constantly > 0, then your system is WAiting (x%) on I/O.

If the sum of columns id (Idle) and wa (Wait IO) is almost 100, this means there are definitely I/O issues because the system is primarily IDle+WAiting on I/O.  The USer and SYstem are not an issue here seem the system is not doing any thing but disk IO operations. It’s typically is caused by a disk configuration problem, like journal for ext4.

The vmstat results don’t tell you if it’s the result of a foreground or background process.  If a user is downloading a large file or backing up the system, then the WAit time will obviously be high and is only showing that “some” process is causing higher IO for the whole system.

Isolating the Cause

As we will see below, the primary reason is the the “Journal Flushing Operation”. It periodically flushes journal commits and other modifications to disk. To determine if this is the cause, use this command for a few minutes and see what the top I/O processes are:

 # iotop -o -a

Total DISK READ:       0.00 B/s | Total DISK WRITE:      23.68 K/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND  
 1747 be/4 root          0.00 B    166.30 M  0.00 % 57.11 % [kjournald]
 1749 be/4 root          0.00 B      2.77 M  0.00 % 14.90 % [kjournald]
  366 be/3 root          0.00 B     16.42 M  0.00 %  6.73 % [jbd2/dm-0-8]
 2797 be/4 root         32.01 M    118.80 M  0.00 %  4.91 % [nfsd]
 2798 be/4 root         29.18 M    109.29 M  0.00 %  4.59 % [nfsd]
 2799 be/4 root         27.73 M     91.23 M  0.00 %  4.49 % [nfsd]

Now, let’s check for kjournal and nfsd to just see how much time since the last system reboot, resources were consumed:

# ps auxf | egrep "kjournal|nfsd|COMMAND"
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      1747  4.2  0.0      0     0 ?        D     2015 8022:45  \_ [kjournald]
root      1748  0.0  0.0      0     0 ?        S     2015  87:24  \_ [kjournald]
root      1749  0.1  0.0      0     0 ?        S     2015 305:30  \_ [kjournald]
root      2795  0.0  0.0      0     0 ?        S<    2015   0:04  \_ [nfsd4]
root      2796  0.0  0.0      0     0 ?        S<    2015   0:00  \_ [nfsd4_callbacks]
root      2797  0.9  0.0      0     0 ?        S     2015 1798:46  \_ [nfsd]
root      2798  0.9  0.0      0     0 ?        S     2015 1801:16  \_ [nfsd]
root      2799  0.9  0.0      0     0 ?        S     2015 1814:44  \_ [nfsd]
root      2800  0.9  0.0      0     0 ?        S     2015 1745:46  \_ [nfsd]
root      2801  0.9  0.0      0     0 ?        S     2015 1782:23  \_ [nfsd]
root      2802  0.9  0.0      0     0 ?        S     2015 1770:46  \_ [nfsd]
root      2803  0.9  0.0      0     0 ?        S     2015 1753:19  \_ [nfsd]
root      2804  0.9  0.0      0     0 ?        S     2015 1765:07  \_ [nfsd]
 ...

Wait IO problem can be caused by several reasons and in this case journaling to resources using our NFS partitions.  The “ps auxf” command shows us a tree-like structure to isolate which processes are consuming the most CPU.

The “STAT” column will show processes current state.  Those background processes with “D” status code are “Uninterruptiable sleep”. The processes with “D+” means “Uninterruptible sleep foreground process” and are typically not your problem processes as they are running in the foreground and more than likely something that you’re aware of (like a backup/tar/cpio, etc.). In this example, the cause of Wait IO is the File System Journaling.

The STAT, which means what STATe the process is in is defined as:

D Uninterruptible sleep (usually due to I/O)
R Running or runnable (on run queue)
S Interruptible sleep (waiting for an event to complete)
T Stopped, either by a job control signal or because it is being traced.
W paging (not valid since the 2.6.xx kernel)
X dead (should never be seen)
Z Defunct ("zombie") process, terminated but not reclaimed by its parent.

So, just to reiterate, if a process with a STAT of “D”, this means it’s actually taking ALL CPU resource with NO possible interruption. That’s why the your performance is at issue.  It’s going to wait on the I/O and will not response to other commands during this time.

To nail down the process taking up your valuable CPU time, you can use this command to sample any process that has the STATe of “D”:

# while true; do date; ps auxf | awk '{if($8=="D") print $0;}'; sleep 1; done
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
Fri Feb 19 11:59:08 CST 2016
root 1747 4.2 0.0 0 0 ? D 2015 8023:21 \_ [kjournald]
root 1749 0.1 0.0 0 0 ? D 2015 305:33 \_ [kjournald]
Fri Feb 19 11:59:09 CST 2016
root 1747 4.2 0.0 0 0 ? D 2015 8023:21 \_ [kjournald]
root 1749 0.1 0.0 0 0 ? D 2015 305:33 \_ [kjournald]
Fri Feb 19 11:59:10 CST 2016
root 366 0.1 0.0 0 0 ? D 2015 315:20 \_ [jbd2/dm-0-8]
root 1747 4.2 0.0 0 0 ? D 2015 8023:21 \_ [kjournald]
root 1749 0.1 0.0 0 0 ? D 2015 305:33 \_ [kjournald]

From the result above, you see there are two process which are consuming your CPU with Wait I/O is primarily kjournald with 4.2%

Note: if only date/time are displayed on the screen, it means there is no any serious Wait I/O issues.

Also you can use the following command to realize a monitoring on these two processes:

So how to Fix?  Here’s a possible Solution

High I/O and WAit time will never the same in any situation, but looking at what we’re working with, we’re focused on the STATe “D” where background processes are consuming our resources.   If the server is for development and not critical, you can disable Journaling. Just make sure you do regular backups of course… If the server is a product server with some sort of RAID configured to protect the failure of a disk with cache buffering, etc.. then you can probably disable it there as well but each situation is unique to your own environment.

Here’s how to disable ext4 journaling

Sticking <script></script> tags referring an external resource in the middle of your HTML code will hang the loading of your page while your browser gets the missing script.

This is usually a problem with the loading of the AddToAny sharing script and a quantcast tag (which has since been removed). Check to make sure there are no other custom scripts that are slowing down your site.

The solution is generic and simple. Use this code to load your Javascript asynchronously:

<script type="text/javascript">

(function() {
    var s = document.createElement('script');
    s.type = 'text/javascript';
    s.async = true;
    s.src = 'http://yourdomain.com/script.js';
    var x = document.getElementsByTagName('script')[0];
    x.parentNode.insertBefore(s, x);
})();

</script>

Instead of loading it synchronously:

<script type="text/javascript" src="http://yourdomain.com/script.js"></script>

One thing to consider is that asynchronous loaded Javascript will not block jQuery’s document.ready handlers. Synchronous scripts will. If your jQuery code depends on a plugin (e.g., the simple modal dialog) it may be better to load it synchronously instead.