The last few days I have been comparing Nginx to HAProxy, with surprising results.
First, a bit of background. For a long time we at Bengler have been using Nginx as the main web server for our projects (1, 2), as well as to proxy Rails running under Mongrel. Nginx is a superb little open-source web server with a small footprint, sensible configuration language, modern feature set and buckets of speed. However, we quickly realized that the load balancing features of the proxy are not up to scratch.
The core problem is the proxy load balancing algorithm. Nginx only comes with a round-robin balancer and a hash-based balancer. Only the former is of interest to us since our object is to distribute the load evenly across a pack of Mongrel back ends. The round-robin algorithm is often an acceptable tool: if every request finishes within a few milliseconds, there’s no problem.
But if a page takes a while to load, Nginx will start routing requests to backends that are already processing requests — as a result, some backends will be queueing up requests while some backends will remain idle. You will get an uneven load distribution, and the unevenness will increase with the amount of load subject to the load-balancer.
So when Gzegorz Nosek, backed by EngineYard, announced his fair load balancer module, we naturally pounced on it. Gzegorz’s module routes requests to the back end with the fewest outstanding requests, and this improved performance a lot.
Unfortunately, Gzegorz’s patch is not completely stable, and turned out to be the main source of our stability problems of late. Sometimes it sits down chewing the carpet while backends go idle and requests pile up, or worse, goes into tailspin and refuses to serve requests, for which the only remedy is a cold restart of Nginx. Even in normal operation, however, it will often send multiple connections to a backend even when some are idle, since there is no limit on the number of connections each backend can receive.
After reading about HAProxy (there’s nice blog Rails-oriented blog post here), I felt the itch to try out this product myself. HAProxy has a handsome feature set:
- It’s is proxy — and only a proxy. It can’t serve files, for example: proxying is all its does.
- It can proxy anything TCP-based — not just HTTP.
- Plenty of load-balancing algorithms, including a “least connections” strategy that picks the backend with the fewest pending connections. Which happens to be just what we want.
- Backends can be sanity- and health-checked by URL to avoid routing requests to brain-damaged backends. (It can even stagger these checks to avoid spikes.)
- A dedicated status page gives you backend status, uptime and lots of yummy metrics. There’s also a way to read metrics from a Unix domain socket.
- Requests can be routed based on all sorts of things: cookies, URL substrings, client IP, etc.
I like the fact that HAProxy is so single-minded in its approach. Experience tells me that simple, specialized, single-purpose applications are preferable over complex, flexible one-size-fits-all applications, Varnish and Memcached being two relevant examples.
To determine if HAProxy is up to par, I have done a few simple benchmarks. They’re not awesomely scientific, but I think they are good enough.
The setup: Dedicated test machine (quad-core AMD64 2.4GHz, 4GB RAM), 3 mongrels running an actual Rails 1.2 app. I use Apache’s ab benchmarking tool for the testing (many people prefer httperf, but we have never quite seen eye to eye) and I run 1,000 requests at various levels of concurrency. The page being tested is a minimal controller action that makes one database call, one Memcached lookup and renders an empty page; it takes about 20ms to render.
I have configured Nginx with Gzegorz’s fair load-balancing patch. The configuration does nothing except set up a proxy against Mongrel.
I have configured HAProxy with the “leastconns” algorithm and “maxconn 1” for each Mongrel. This is intentionally unfair — but the object is not a comparison of HAProxy and Nginx when each is configured identically; rather, I would like to observe what kind of performance profile can be achieved with HAProxy’s superior gadgetry.
The “maxconns” setting is significant — since only a single request is handed to Mongrel at a time, it means that when all backends are busy, pending client requests will idle inside HAProxy — rather than inside Mongrel. Subsequently, when a backend becomes available, the next request in line will be routed to that backend. Without this restriction, of course, requests would end up in busy Mongrels and sit there even though other backends might be available.
Nginx, using the fair load-balancing patch, will behave similarly, but will suffer occasionally overlapping requests since it has no limit on the number of connections each back end can receive.
So, the data. The following graphs show the response times of each request.
Nginx — 3 concurrent connections
HAProxy — 3 concurrent connections
Nginx — 10 concurrent connections
HAProxy — 10 concurrent connections
Nginx — 30 concurrent connections
HAProxy — 30 concurrent connections
HAproxy comes out on top with regard to requests/second — at 30 concurrent connections, we get 218 req/s compared to 162 req/s for Nginx — but the real difference here is in the distribution of response time.
At 3 concurrent connections, Nginx begins to serve every request a bit more slowly, whereas HAProxy at 10 concurrnet connections manages to deliver 95% of the requests all within the time of the fastest request. At the same time, Nginx performance is all over the map while HAProxy remains fairly consistent. Unfortunately, this evenness happens at the expense of returning a small number of connections extremely slowly.
I’m uncertain if HAProxy imposes an absolute ordering on the request queue; since backends tend to be full, perhaps some connections simply sit around for a long time without being scheduled. That would explain the blips on the graph; in one test session I had a single request taking 47 seconds.
In a real-world situation, some of these requests would simply time out, hopefully to be rescued by a friendly “sorry, we’re overloaded” error page. Is this an acceptable compromise between performance and responsiveness? I think it is, given that they should only occur during exceptional load; in such situations I prefer serving really fast pages to most users and possibly disappointing an extremely small number of users, rather than letting everyone suffer.
I think these results show that HAProxy is a better choice for us. The additional features and fine-grained proxy control are also extremely welcome. HAProxy’s lack of support for sharing static files means that we will also put Nginx behind HAProxy and route requests accordingly.
You can download the raw data here.
15 Comments
Hi,
could you please post the configurations and exact versions you used on the load-balancers, especially haproxy’s, since the response times are absolutely awful in my opinion ?
Also, you might want to try again with a slightly higher maxconn (eg: 2), because I suspect that eventhough mongrel serves one
request at a time, it might improve its latency
when another request is already pending in the
system.
Last, you may want to enable logging on haproxy. You will see the response times
split in request/queue/connect/headers/data,
and this will tell you exactly where the time
is spent. It is not acceptable to have response times that high. Even 1 second would
be too high in my opinion.
Willy
Here’s my config: HAProxy, Nginx.
In restrospect, I should have turned off the file checks (the configuration comes from a live site) which may be giving Nginx a handicap.
I agree with you about maxconn and latency, but the idea here is to avoid having requests waiting in Mongrel at all. Once they’re sent to Mongrel, they are stuck there until Rails becomes idle.
The response times look awful when compressed into a graph like this, but you take a look at the data, you will notice that the distribution is predominantly fast requests. For example, at 30 concurrent connections the arithmetic mean is 91ms, and the 3rd quartile is 6ms.
I will re-run with logging to see what’s going on with the lagging responses.
OK thanks for the confs.
There’s nothing outstanding there. To reply to your question, haproxy processes the queues in the arrival order. The only case there is a risk of
delayed processing is when you use persistence cookies, or a hash-based
algorithm, because in this case the server processes its own queue before
processing the global one (which I now know how to fix). But in your case,
you run leastconn and no cookie, so that cannot be your problem.
Have you monitored CPU and disk I/O usage on the mongrel servers during
the benchmark ? I find it surprizing that there is so much of a difference between nginx and haproxy in terms of requests/s, considering that you should always saturate on mongrel at such a low load for either haproxy or nginx (200 reqs/s should eat less than 1% CPU on either of them). So one possible explanation (which the logs may indicate) is that if you can really stress mongrel higher when using haproxy, then it sometimes has some hicups (eg: less time for garbage collecting, etc…).
Also, the data report surprizingly repetitive patterns in both haproxy and nginx, which I attribute to an unstable processing time on the servers. That may corroborate the theory above. Once again, logs will tell us.
Alexander,
don’t waste your time on another bench, I’ve just discovered an awful bug in the way haproxy handles its global queue. Sometimes, it can happen that a request remains in the queue for as long as the queue timeout (contimeout in your case). I’m currently working on getting that fixed. When that’s done, I will really appreciate it if you could give it another try.
Regards,
Willy
That’s good news! Thanks for spending time on this.
For future benchmarks I will monitor CPU and I/O. I think you’re right that the periodic blips indicate garbage collection or something else building up steam.
I would also try benchmarking Nginx in round-robin mode to compare the performance with the fair load balancer patch.
Alexander, thanks a lot for the great writeup. Lots of juicy data, and a bug fix coming out of it! Looking forward to seeing the benchmarks after this is patched. (Go Willy go!)
haproxy is awesome. i haven’t tried nginx yet, but was looking to try it and stumbled across this comparison. definitely excited that willy found an issue that will make the haproxy code even better. great job willy – haproxy is a wonderful product. thanks for the benchmark alexander, it was definitely nice to see.
Thanks guys. BTW, the bug is now fixed in 1.3.15.2. I’m impatient to see the new benchmark under same conditions. To be honnest, I would not be surprized if the lower response times increased due to the amount of unserved requests in the previous test.
Have you benched the patched version?
I’m very curious as to your config and how it fairs.
For those who have not seen it yet, yes Alexander has benched the new version here : https://affectioncode.wordpress.com/2008/06/28/another-comparison-of-haproxy-and-nginx/
And yes, it dramatically improves the results :-)
I’ve found that in our network WAN accelerators have made a big difference
Link to the raw data here is broken :(
The server is temporarily down, sorry. Will be back up on Monday.
Your Nginx config is making it check for the existence of .maintenance and .downtime on every single request. This not only destroys the credibility of these benchmarks but it’s also glaringly retarded.
@Ryan, I hope you realize that the point was not the average response times, but the fact that Nginx was spiking all over the place, which cannot be attributed the use of “if” checks, and the fact that Nginx could not control per-Mongrel queueing, which is what makes HAProxy work so well in this case. Calling my post “retarded” is not just rude, it also shows you did not understand it.
Also, you are commenting on a three-year old post.
One Trackback/Pingback
[…] nice articles comparing nginx and HAProxy are here and here. In a great display of the open source feedback loop, the first article turned up a flaw […]