In my previous post about web application proxies, I compared HAProxy and Nginx performance when proxying a simple Rails application. While HAProxy was able to serve pages faster and more consistently, the beanchmark also uncovered an apparent design flaw in HAProxy that caused some connections to hang around in the queue for a long time. HAProxy’s author, Willy Tarreau, quickly stepped in to attack the problem, and soon provided a new point release:
My first analysis was that this problem was caused by “direct” requests (those with a server cookie) always being considered before the load balanced ones. But while fixing this design idiocy, I discovered a real problem : it was perfectly possible for a fresh new request to be served immediately without passing through the queue, causing requests in the queue to be delayed for at least as long as the queue timeout, until they might eventually expire. Now *that* explains the horrible peaks on Alexander’s graphs. My problem was that it was a real misdesign, which could not be fixed by a 3-liner patch. So I spent the whole week reworking the queue management logic in a saner manner and running regression tests.
The fix has further repercussions:
[T]he good news is that not only this fixes a number of 503 errors and long response times when running with a low maxconn, but as an added bonus, the “redispatch” option is now naturally considered when a server’s maxqueue is reached, so that it will now not be necessary anymore to trade between large queues and the risk of returning 503 errors.
Willy also realized that his redesign work would lead the way to priority-based request scheduling in the future, which is great news.
With the new release in hand, I have finally found the time to sit down and do a rematch. The conclusion? In short, the patch works as intended: It eliminates the odd spikes while still providing smoother performance than Nginx. The spikes that remain are present with Nginx as well, and their regularity implies some kind of periodic activity, possibly on the box itself, although a much more likely culprit is Ruby’s garbage collection. Damn you, curiously slow and old-fashioned interpreter implementation!
Finally, some people requested CPU usage data from vmstat. For this new benchmark I updated my scripts to run vmstat concurrently with ab, hoping there would some meaty differences for charting, but it turns out that there is no significant difference between HAProxy and Nginx — at best, CPU usage looks a trifle smoother with HAProxy, but this could be a fluke. I suspect you have to amp up the load considerably to achieve a sensible comparison. Still, I have included the vmstat data in the raw data tarball for anyone who is interested.
Anyway, enjoy the graphs. Many thanks to Willy for working out a solution so promptly and expertly.
Nginx vs HAProxy at 3 concurrent connections
Nginx vs HAProxy at 10 concurrent connections
Nginx vs HAProxy at 30 concurrent connections