Varnish is one of my favorite things for building scalable web services these days - it can dramatically improve just about any site’s performance, and with a few configuration tweaks it can even provide per-session caching and other niceties.
It’s used as this site’s primary front-end, and has served me well so far.
Capturing varnishncsa
output for R
I’ve been playing around with R, and HTTP access logs are always an interesting dataset - easy to understand, but hard to get meaningful results out of.
So I built the following script to capture varnishncsa
output and build a table of timestamped requests listing the host, wiki namespace, HTTP result code an (crucially for me) cache misses and back-end response times (Varnish can log the time from request to first byte from the back-end):
import os, sys, csv, time
from subprocess import *
# the fields we want
fields = ['timestamp','hostname','namespace','page','result','size','responsetime','cache']
child = Popen(["varnishncsa","-F","%r,%s,%b,%{Varnish:time_firstbyte}x,%{Varnish:handling}x"], stdout=PIPE)
line = child.stdout.readline()
o = csv.writer(open('output.csv','wb+'))
# output the header
o.writerow(fields)
while line:
(req,result,size,responsetime,cache) = line.strip().split(',')
(method,url,protocol) = req.split(' ')
# try to split the GET url HTTP/1.0 stuff into components of interest
try:
(dummy,dummy,hostname,namespace,page) = url.split('/',4)
except:
(dummy,dummy,hostname,dummy) = url.split('/',3)
namespace = page = ''
timestamp = str(time.time())
# varnishncsa logs some requests with a null size
if size == '-':
size = ''
row = [globals()[x] for x in fields]
print ','.join(row)
o.writerow(row)
line = child.stdout.readline()
This outputs lines in the following format:
timestamp,hostname,namespace,page,result,size,responsetime,cache 1336745775.29,taoofmac.com,space,HOWTO/Setup/daapd,200,9076,0.038134813,miss 1336745775.39,taoofmac.com,themes,serif/css/serif-min.css,200,5997,0.000071287,hit 1336745775.49,taoofmac.com,themes,serif/js/site-min.js,200,41993,0.000078678,hit 1336745775.84,taoofmac.com,themes,serif/img/noise.png,200,8431,0.000062704,hit 1336745775.84,taoofmac.com,themes,serif/img/sitelogo_2011.png,200,16346,0.000038862,hit 1336745775.94,taoofmac.com,themes,serif/img/error.png,200,666,0.000059605,hit 1336745784.61,the.taoofmac.com,space,RecentChanges?format=rss,302,167,0.000052452,hit 1336745787.52,planet.taoofmac.com,,,404,358,0.000077963,hit 1336745797.29,the.taoofmac.com,,,200,129190,0.094822168,miss 1336745802.3,the.taoofmac.com,space,HOWTO/Merge%20Folders,304,0,0.006964445,pass