Sunday, March 07, 2010

Optimizing Hindupedia load times -- (mediawiki on GoDaddy)

Optimizing Hindupedia seems like an uphill task. There is not alot that works on GoDaddy shared hosting and GoDaddy performance varies by the time of day because they run thousands
of domains on a single server.



Webpagetest.org is your best friend. It gives commonly missing optimization tips--like you should enable browser caching, use minify to compress/combine your css & js files, enable gzip compression to reduce transfer times, etc.

Here is the original test result I received before I began this process:



Note: unless you want to rewrite mediawiki, you can't combine all the css/js files. I did, however, make some optimizations (ie combine css files) for the skin we are using, GuMax (into 3 bundles of 2 skins each)...I also didn't want to spend a lot of money on a CDN...since this is a non-profit site with no money at this point.

I did all of this, but still saw poor performance. Than, I started digging some more.


After all of this, I got the following results:


Modifying LocalSettings.php

The next step was to look at the LocalSettings.php for mediawiki to see if there was anything that could be done there...one of them was to add the minify extension to ensure all js/css got minified.

#Add Minify--which strips extra stuff from css & js files to make them smaller
require_once("$IP/extensions/Minify/Minify.php");

#use ETags to facilitate caching at intermediary layers & the browser
$wgUseETag=true; /* default: false */
#server to send pages in to the browser in a compressed format
$wgUseGzip=false; /* default: false */
#cache sidebar ...
#$wgSidebarCacheExpiry = 86400s /*default*/
$wgEnableSidebarCache = true; /* default: false */
#enable client side caching
$wgCachePages = true; /* default: true */

#ParserCache
#$wgParserCacheType = CACHE_ANYTHING;
$wgEnableParserCache = true; /* default:false*/
#$wgMainCacheType = CACHE_ANYTHING; /* default: CACHE_ANYTHING */

# Enable the basic file cache for static pages for non-logged-in visitors
$wgUseFileCache = true; /* default: false */
$wgFileCacheDirectory = "$IP/cache";
#$wgFileCacheDirectory = “/home/.author/leedh/lee.org/[my temp folder]“;

#default is 1, changing to a higher number will be a little bit
# nicer to the database
$wgHitcounterUpdateFreq = 100;

I also took a close look at the extensions I had installed and commented out extensions that were specifically for administration in order to reduce load times since they aren't useful for anyone but me (I am the only admin). Keep in mind that every extension is run every time a page is loaded.

Also, keep in mind that $wgUseFileCache turns on file caching--and this cache never gets invalidated. All pages on Hindupedia are dynamic (atleast the random page list in article footers) which was designed to optimize SEO (trick google/et al to think that the page had changed). Thus, the cache works against this and must be deleted at least once a day or two to make the pages change.

PHP/MySQL Optimization
PHP optimizers like apc can't be installed (no make/gcc). There are no details on how to effectively use the zend optimizer that godaddy provides. Also, I found ways to optimize MySql, but GoDaddy refused to implement any of those requests.

GoDaddy tech support is useless
I also spent an hour on the phone with GoDaddy to see if they would be willing to do anything to help...(this was after failing to get any sort of coherent answer over their email tech service...where they would get stuck at the give me a trace route and than give me a "standard" traceroute since the first one I sent was a formatted output!).

On the phone, I escalated beyond the first level tech support which was completely useless. I emailed the 2nd level support person the latency graph from google webmaster tools:


and the second level person promised to have the site monitored for 24 hours...well, Hindupedia performed really well during that 24 hour period, than then slowed down once again...

.htaccess optimization
One of the things I noticed in the webpage performance graph is that the time-to-first-byte for http://www.hindupedia.com/ was exceptionally long. Especially, since the only thing that happens here is that .htaccess is read and forwards the user to http://www.hindupedia.com/en/Main_Page. So, the next step was to see what I could do to optimize my already simple .htaccess file...

AddHandler x-httpd-php5 .php
AddHandler x-httpd-php .php4

Options -MultiViews

RewriteEngine On

rewritecond %{http_host} ^hindupedia.com [nc]
rewriterule ^(.*)$ http://www.hindupedia.com/en/Main_Page [r,nc]

RewriteBase /
Rewritecond %{QUERY_STRING} title=(.*)
RewriteRule ^hindupedia/en/index.php(.*)$ eng/index.php?title=%1 [R]
RewriteRule ^hindupedia/en/(.*)$ eng/index.php?title=$1 [L]
RewriteRule ^hindupedia/$ eng/index.php?title=$1

RewriteRule ^hindupedia/urllist.txt$ yahoo_sitemap.php

RewriteBase /
RewriteRule ^hindupedia/(.*\.(css|js))$ min/index.php?f=$1&debug=0 [L]

Header set Cache-Control "max-age=2592000"


I had a really hard time to figure out why the above .htaccess file is slow. I decided to reduce the code as much as I could and hope for the best. In this process, I came across two really good reference sites for optimizing .htaccess (or really learning what the various components mean).
Using this, I ended up with the following as my .htaccess

AddHandler x-httpd-php5 .php
AddHandler x-httpd-php .php4

#AddHandler application/x-httpd-php .css
#php_value auto_prepend_file gzip-css.php
#php_flag zlib.output_compression On


Options -MultiViews
RewriteEngine On
RewriteBase /

# 301 redirect from base site for SEO
rewritecond %{http_host} ^hindupedia.com [nc]
rewriterule ^(.*)$ http://www.hindupedia.com/en/Main_Page [r,nc]

# redirect a base address to the main page
RewriteRule ^hindupedia/$ en/Main_Page [R=301]
#RewriteRule ^hindupedia/$ eng/index.php?title=Main_Page [R]
#RewriteRule ^hindupedia/$ eng/index.php?title=Main_Page

#redirect from base to en/ type of framework
#RewriteBase /
#Rewritecond %{QUERY_STRING} title=(.*)
#RewriteRule ^hindupedia/en/index.php(.*)$ eng/index.php?title=%1 [R]
RewriteRule ^hindupedia/en/(.*)$ eng/index.php?title=$1


#Rewritecond %{QUERY_STRING} .
#RewriteRule hindupedia/en/index.php(.*)$ /eng/index.php?title=%1 [R]

#created symlink to file instead
#RewriteRule ^hindupedia/urllist.txt$ yahoo_sitemap.php

#rewrite css/js files to send them through minify
#RewriteBase /
RewriteRule ^hindupedia/(.*\.(css|js))$ min/index.php?f=$1&debug=0 [L]


Header set Cache-Control "max-age=2592000"

I left most of the old code commented out as I wasn't sure how this would work out...
Also, instead of using .htaccess to link my yahoo_sitemap to urllist.txt, I ssh'ed into my godaddy account and created a symbolic link: ln -s yahoo_sitemap.php urllist.txt

That way, I didn't need to do it in my .htaccess .

I don't know why these things were causing .htaccess processing to be so slow, but I saw a dramatic improvement after these changes.


You will notice that time-to-first-byte for http://www.hindupedia.com/ has significantly decreased. I believe that the overall latency increase is due to GoDaddy's inconsistent performance.

MediaWiki Maintenance

Lastly, I sshed into my GoDaddy account (allowed even in shared hosting...just needs to be turned on). Under the maintenance dir, there are a bunch of php scripts. I ran two of them...

/usr/local/php5/bin/php rebuildall.php

Somewhere, during the execution of the above script, php gets killed (probably takes too much cpu time). For me, it died during the running of refreshLinks, so I ran that maintenance script and had it pickup where the first script got killed

/usr/local/php5/bin/php refreshLinks.php 300

The result is promising (but not absolute since there is too much variation due to GoDaddy's hosting)...



Conclusion

All of these optimizations seem to have helped performance. However, they are minor in relation to GoDaddy's inconsistent performance...in the middle of these optimizations, Hindupedia simply stopped loading (ie timed-out) and later, had load times of over a minute. Than, an hour later, came back down to reasonable load times...

Moving mediawiki to another host is something I don't have the time or patience to do right now...so that will be for another day. however, for those looking to find a host for MediaWiki, I would recommend that you don't do it at GoDaddy.