The state of browser-based HTML editors

Or, are HTML editors designed by idiots?

For some time now I have been searching for a good embeddable, JavaScript-based HTML editor — an editor that I can use to let users write their posts and comments in the browser using “what-you-see-is-what-you-get” formatting: bold and italic type, bulleted lists, images and so on.

What do I expect of an HTML editor? Well, it has to be open-source, compatible with the major browsers (IE 6 and 7, Safari, Firefox and Opera) on their respective operating systems, and it has to be modular in order to embed it into the existing application framework and design. Those goals are not particularly ambitious in this day and age — so why does such a product not exist?

The main problem is, surprisingly, not browser compatibility or openness. I have evaluated several projects in depth, including TinyMCE and NicEdit, and looked at several others, including FCKeditor, DevEdit and KTML, and — their propensity to crash or produce loopy formatting aside — all of them provide the essentials, and some of them go pretty far with advanced editing features such as table drawing and floating objects. The problem is the lack of modularity and the extra baggage that comes with the “one size fits all” approach.

On looking at a package such as TinyMCE, it is immediately apparent that these projects have been designed to be swallowed whole — that is, they are monolithic frameworks and not simply toolkits. TinyMCE, FCKeditor and NicEdit, to cite the ones I am most familiar with, all provide subsystems for plugins, toolbars, localization, skinning, themes, dialogs, widgets and so on — that’s a lot of baggage when you just want an editor widget. But it’s the fundamental design of this conglomeration of components that’s screwed up.

To explain, let me step back for a second and illustrate how I think an editor should work. To start with, let’s review the traditional mode-view-controller (MVC) design pattern:

For an editor, this translates to the following:

Why MVC? Such a design is needed first of all to separate concerns. The view interacts with the user; the controller controls the UI and translates user interactions into model modifications; the model tells listeners when it has changed. Thus each part of the system is isolated from unnecessary concerns, and is prevented from meddling with the stuff that isn’t any of its business.

A side effect of such clear division is that isolating the interactions means there can be any number of players in the game, all of them unaware of each other. For example, a well-designed MVC setup can let you attach multiple views to the same data model. And by view, I mean anything that has a visual representation: A live word count widget, for instance, is just an editor view that doesn’t edit anything, but merely shows a different representation of the same underlying data.

(A digression: MVC actually breaks down in the browser if you rely on the DOM as your data model, because the DOM is implicitly connected to the view. Thus you cannot have, say, a “raw HTML tags” view of the document alongside a full-fledged WYSIWYG editor. Unfortunately, this is probably unavoidable because the only practicable way of implementing HTML editing in a browser today is to use the browser’s own editing support. Maybe you could store an internal DOM document and synchronize the trees, but that sounds a bit icky. That said, I haven’t actually tried.)

To support the complete MVC cycle, the editor (what I’ve called the “editor object” in the diagram above, to distinguish from the editor component as a whole) must be programmable and internally consistent with the DOM. To manipulate the document programmatically from JavaScript — for example, to set a subset of the document to a specific font — you grab a piece of the DOM, manipulate it and let the DOM event machinery trickle the changes up to the controller, which then updates the view. That’s basic MVC.

Since the editor object also represents state other than HTML data — caret position, scroll position and selection being three — the editor object also has to be scriptable, so that you can manipulate and observe that state. And to simplify certain operations, it should provide enough of a veneer to make common editor operations easily decomposable: “select whole word”, “delete word, “move three words to the right”, “set as bold,” that kind of thing.

MVC, then, helps you build the required ecosystem of controllers and views — toolbars, word counters, spell checkers, colour pickers and the like — in which each actually extends the MVC pattern:

Take toolbars, for example: Traditionally a strip of icons above the editor, some icons representing actions (indent text, copy, paste), a few representing both action and state (a “B” icon both lets you turn text bold, and is also highlighted to show when the current selection is set in boldface), and icons are either enabled or disabled depending on whether their functionality is available in the current context.

To implement a toolbar, then, you create a toolbar controller and a toolbar view. The controller tells the editor that it wants to listen for notifications; these notifications are used to update the toolbar buttons. If the current text is bold, highlight the “B” icon; if there’s something in the clipboard, enable the “paste” button. Similarly, clicking a toolbar button sends a notification back to the controller, which then sends a message to the editor object.

All right. This is all exceedingly basic stuff, so why I am lecturing you about it and calling people names?

Because nobody has so far bothered to implement an MVC-based HTML editor.

At least, nobody as far as I can see. Among the projects I have looked at, NicEdit comes closest to resembling MVC, if you really squint and maybe turn the brightness of your screen way down. You can actually download cut-down versions of the code that strip away extraneous functionality like dialogs. Unfortunately, it is still a jumbled mess that commingles its concerns. For example, the core editor class knows about plugins, panels and buttons, and interacts with each of these directly instead of relying on notifications or well-defined APIs.

TinyMCE, for its part, doesn’t care at all — for instance, there is no way, that I can find, to create two editors on the same page that have different “themes”. It’s so screwed up that it thinks a theme should include localization, toolbars, graphics, dialogs, skins and — most insane of all — code. Each theme is like an application in itself — there’s a “simple” theme with just a few toolbar buttons, but to get a colour picker toolbar button you must use the “advanced” theme, which has tons of other features that might not interest you. This means that to add anything into their system in practice, you have to extend (or worse, copy) the advanced theme. Words like “decoupling” are not in these guys’ dictionary.

These editors are so focused on being easy to set up, that when you want to tinker — remove stuff, add stuff, taking things apart and see how they can fit together a little differently — you are bound to meet the wall. WordPress users will have noticed that even WordPress has decided to circumvent TinyMCE, rather than customize it, for their HTML editor toolbar. Their widgets are not intergrated into the main toolbar, and they don’t attempt to use TinyMCE’s crummy dialog UI either; instead their own buttons are relegated to a separate “Add media” toolbar:

So if these frameworks are so awful, how should a good framework look — in cold, hard code? It’s actually not difficult:

var editor = new PerfectEditor();
editor.attach(document.getElementById("comment_textarea"));
editor.document.observeChanges(function() {
  var boldButton = document.getElementById("bold_button");
  if (editor.getSelection().getStyle("font-weight") == "bold") {
    boldButton.addClassName("active");
  } else {
    boldButton.removeClassName("active");
  }
}); 

Certainly some people will pooh-pooh such an approach and call it unnecessarily complex — “TinyMCE does this stuff for me automatically”. Well, sure — but do you really have the option of not doing it automatically? That’s the crux of my argument, that the monolithic approach eliminates choice and prevents customization.

But the above example just gives you the raw metal, which can then be glossed over with helper classes (or plugins, if you will):

var editor = new PerfectEditor();
editor.attach(document.getElementById("comment_textarea"));
var toolbar = new PerfectToolbar();
toolbar.setButtonSet(PerfectToolbar.ButtonSets.BASIC);
toolbar.createForEditor(editor, PerfectToolbar.Position.ABOVE); 

That’s nicely separated and layered. But let’s whittle away the boilerplate and move everything into a one-line helper to satisfy the “up and running in 15 minutes” PHP-in-21-days guys:

PerfectEditor.setup("comment_textarea", {
  toolbar: PerfectToolbar.ButtonSets.BASIC,
  toolbarPosition: PerfectToolbar.Position.ABOVE}); 

There, that’s pretty concise. And you still have the option of decoupling everything completely if, say, you want to scratch that standard toolbar and go crazy with your own design.

To demonstrate how these projects fail in practice, consider what would take to implement an “insert image” button. Let’s imagine, blatantly ignoring reality, that we are Flickr, and we want the button to bring up a nice sidebar panel with a selection of your photos, a search field, and perhaps a way to upload new photos. All of this needs to build on Flickr’s existing internal templating system, CSS and so on. Flickr also already has a system for drawing buttons, so we want a Flickr-toolbar, not a “TinyMCE-style” toolbar.

With NicEdit, TinyMCE and FCKeditor, you would create a plugin or theme or something. But in a perfect world, the JavaScript code would look something like this:

var photoPicker = new PhotoPicker();
photoPicker.setNSID("123456");
var toolbar = new Toolbar();
toolbar.createForEditor(editor, PerfectToolbar.Position.ABOVE);
toolbar.addButton({
  label: "Insert Photo",
  icon: "/images/insert_photo.png"}, function() {
    photoPicker.show();
  }
});

I will let you imagine what the actual PhotoPicker class will look like, except for the code that inserts the image:

insertButton.observe("click", function() {
  var imageElement = document.createElement("img");
  imageElement.src = this.photoUrl;
  imageElement.observe("dblclick", this.editPhotoSettings.bind(this));
  editor.document.insertElement(editor.getSelection(), imageElement);
}.bind(this)); 

I think that looks swell.

Another comparison of HAProxy and Nginx

In my previous post about web application proxies, I compared HAProxy and Nginx performance when proxying a simple Rails application. While HAProxy was able to serve pages faster and more consistently, the beanchmark also uncovered an apparent design flaw in HAProxy that caused some connections to hang around in the queue for a long time. HAProxy’s author, Willy Tarreau, quickly stepped in to attack the problem, and soon provided a new point release:

My first analysis was that this problem was caused by “direct” requests (those with a server cookie) always being considered before the load balanced ones. But while fixing this design idiocy, I discovered a real problem : it was perfectly possible for a fresh new request to be served immediately without passing through the queue, causing requests in the queue to be delayed for at least as long as the queue timeout, until they might eventually expire. Now *that* explains the horrible peaks on Alexander’s graphs. My problem was that it was a real misdesign, which could not be fixed by a 3-liner patch. So I spent the whole week reworking the queue management logic in a saner manner and running regression tests.

The fix has further repercussions:

[T]he good news is that not only this fixes a number of 503 errors and long response times when running with a low maxconn, but as an added bonus, the “redispatch” option is now naturally considered when a server’s maxqueue is reached, so that it will now not be necessary anymore to trade between large queues and the risk of returning 503 errors.

Willy also realized that his redesign work would lead the way to priority-based request scheduling in the future, which is great news.

With the new release in hand, I have finally found the time to sit down and do a rematch. The conclusion? In short, the patch works as intended: It eliminates the odd spikes while still providing smoother performance than Nginx. The spikes that remain are present with Nginx as well, and their regularity implies some kind of periodic activity, possibly on the box itself, although a much more likely culprit is Ruby’s garbage collection. Damn you, curiously slow and old-fashioned interpreter implementation!

Finally, some people requested CPU usage data from vmstat. For this new benchmark I updated my scripts to run vmstat concurrently with ab, hoping there would some meaty differences for charting, but it turns out that there is no significant difference between HAProxy and Nginx — at best, CPU usage looks a trifle smoother with HAProxy, but this could be a fluke. I suspect you have to amp up the load considerably to achieve a sensible comparison. Still, I have included the vmstat data in the raw data tarball for anyone who is interested.

Anyway, enjoy the graphs. Many thanks to Willy for working out a solution so promptly and expertly.

Nginx vs HAProxy at 3 concurrent connections

Nginx vs HAProxy at 10 concurrent connections

Nginx vs HAProxy at 30 concurrent connections

Comparing Nginx and HAProxy for web applications

The last few days I have been comparing Nginx to HAProxy, with surprising results.

First, a bit of background. For a long time we at Bengler have been using Nginx as the main web server for our projects (12), as well as to proxy Rails running under Mongrel. Nginx is a superb little open-source web server with a small footprint, sensible configuration language, modern feature set and buckets of speed. However, we quickly realized that the load balancing features of the proxy are not up to scratch.

The core problem is the proxy load balancing algorithm. Nginx only comes with a round-robin balancer and a hash-based balancer. Only the former is of interest to us since our object is to distribute the load evenly across a pack of Mongrel back ends. The round-robin algorithm is often an acceptable tool: if every request finishes within a few milliseconds, there’s no problem.

But if a page takes a while to load, Nginx will start routing requests to backends that are already processing requests — as a result, some backends will be queueing up requests while some backends will remain idle. You will get an uneven load distribution, and the unevenness will increase with the amount of load subject to the load-balancer.

So when Gzegorz Nosek, backed by EngineYard, announced his fair load balancer module, we naturally pounced on it. Gzegorz’s module routes requests to the back end with the fewest outstanding requests, and this improved performance a lot.

Unfortunately, Gzegorz’s patch is not completely stable, and turned out to be the main source of our stability problems of late. Sometimes it sits down chewing the carpet while backends go idle and requests pile up, or worse, goes into tailspin and refuses to serve requests, for which the only remedy is a cold restart of Nginx. Even in normal operation, however, it will often send multiple connections to a backend even when some are idle, since there is no limit on the number of connections each backend can receive.

After reading about HAProxy (there’s nice blog Rails-oriented blog post here), I felt the itch to try out this product myself. HAProxy has a handsome feature set:

  • It’s is proxy — and only a proxy. It can’t serve files, for example: proxying is all its does.
  • It can proxy anything TCP-based — not just HTTP.
  • Plenty of load-balancing algorithms, including a “least connections” strategy that picks the backend with the fewest pending connections. Which happens to be just what we want.
  • Backends can be sanity- and health-checked by URL to avoid routing requests to brain-damaged backends. (It can even stagger these checks to avoid spikes.)
  • A dedicated status page gives you backend status, uptime and lots of yummy metrics. There’s also a way to read metrics from a Unix domain socket.
  • Requests can be routed based on all sorts of things: cookies, URL substrings, client IP, etc.

I like the fact that HAProxy is so single-minded in its approach. Experience tells me that simple, specialized, single-purpose applications are preferable over complex, flexible one-size-fits-all applications, Varnish and Memcached being two relevant examples.

To determine if HAProxy is up to par, I have done a few simple benchmarks. They’re not awesomely scientific, but I think they are good enough.

The setup: Dedicated test machine (quad-core AMD64 2.4GHz, 4GB RAM), 3 mongrels running an actual Rails 1.2 app. I use Apache’s ab benchmarking tool for the testing (many people prefer httperf, but we have never quite seen eye to eye) and I run 1,000 requests at various levels of concurrency. The page being tested is a minimal controller action that makes one database call, one Memcached lookup and renders an empty page; it takes about 20ms to render.

I have configured Nginx with Gzegorz’s fair load-balancing patch. The configuration does nothing except set up a proxy against Mongrel.

I have configured HAProxy with the “leastconns” algorithm and “maxconn 1″ for each Mongrel. This is intentionally unfair — but the object is not a comparison of HAProxy and Nginx when each is configured identically; rather, I would like to observe what kind of performance profile can be achieved with HAProxy’s superior gadgetry.

The “maxconns” setting is significant — since only a single request is handed to Mongrel at a time, it means that when all backends are busy, pending client requests will idle inside HAProxy — rather than inside Mongrel. Subsequently, when a backend becomes available, the next request in line will be routed to that backend. Without this restriction, of course, requests would end up in busy Mongrels and sit there even though other backends might be available.

Nginx, using the fair load-balancing patch, will behave similarly, but will suffer occasionally overlapping requests since it has no limit on the number of connections each back end can receive.

So, the data. The following graphs show the response times of each request.

Nginx — 3 concurrent connections

Nginx — 3 concurrent connections

HAProxy — 3 concurrent connections

HAProxy — 3 concurrent connections

Nginx — 10 concurrent connections

Nginx — 10 concurrent connections

HAProxy — 10 concurrent connections

HAProxy — 10 concurrent connections

Nginx — 30 concurrent connections

Nginx — 10 concurrent connections

HAProxy — 30 concurrent connections

HAProxy — 30 concurrent connections

HAproxy comes out on top with regard to requests/second — at 30 concurrent connections, we get 218 req/s compared to 162 req/s for Nginx — but the real difference here is in the distribution of response time.

At 3 concurrent connections, Nginx begins to serve every request a bit more slowly, whereas HAProxy at 10 concurrnet connections manages to deliver 95% of the requests all within the time of the fastest request. At the same time, Nginx performance is all over the map while HAProxy remains fairly consistent. Unfortunately, this evenness happens at the expense of returning a small number of connections extremely slowly.

I’m uncertain if HAProxy imposes an absolute ordering on the request queue; since backends tend to be full, perhaps some connections simply sit around for a long time without being scheduled. That would explain the blips on the graph; in one test session I had a single request taking 47 seconds.

In a real-world situation, some of these requests would simply time out, hopefully to be rescued by a friendly “sorry, we’re overloaded” error page. Is this an acceptable compromise between performance and responsiveness? I think it is, given that they should only occur during exceptional load; in such situations I prefer serving really fast pages to most users and possibly disappointing an extremely small number of users, rather than letting everyone suffer.

I think these results show that HAProxy is a better choice for us. The additional features and fine-grained proxy control are also extremely welcome. HAProxy’s lack of support for sharing static files means that we will also put Nginx behind HAProxy and route requests accordingly.

You can download the raw data here.

Follow

Get every new post delivered to your Inbox.