Archive for July, 2007

Gaining Momentum

Tuesday, July 24th, 2007

I’ve contributed to Open Source projects in the past, but this is the first time I’ve stood behind a project. I must say, it’s a lot of work. But it’s worth it because we’re developing technology that’s on the cutting edge. We’re really pushing the limits of HTTP and making way for a new breed of web applications.

I’m finally starting to get all sorts of emails about the project. It’s gratifying to know that someone out there reads this stuff that I write, and even goes as far as to try out the tutorials.

In more quantitative terms,

  • I’ve been contacted by more than two dozen developers
  • There’s generally 3-4 people in the IRC channel at all times, and 1-2 conversations a day (not counting core developer conversations.)
  • There are three documentation contributors who aren’t me
  • Much wider testing against a diverse range of platforms
  • Tagged by 20 people on del.icio.us

I’ve spent so much time developing that I haven’t done enough evangelizing. I really ought to get our name out there to all the Web 2.0 developer sites. But We aren’t going to do that just yet.

We are at version 0.1.3, which sounds like a pretty low number. This is intentional because we want to scare away developers who might get a bad image of Orbited if it doesn’t work for them right away or they encounter a bug. I’m targeting a 0.2.0 release in September, and at that point I’ll start shouting from the rooftops. Not just online either. I want to publish a few papers and attend some conferences.

In short, we are up momentum. We’re still a ways from having a large user base, but we’ll get there. In a few months we’ll be unstoppable. There’s interest in Orbited and it’s here to stay.

Slimmer WebIRC = Orbited Live Help

Tuesday, July 24th, 2007

As Mario already mentioned, I find his current WebIRC interface a bit confusing. It’s non-obvious how to log in, and how to join channels, and completely unintuitive for a click on the current channel’s name to close the channel. Moreover, Orbited could use a good IRC demo, to log people into the #orbited channel, so that we can help them. So I’m working on cutting the feature set down to a minimum, slapping on some nicer-looking CSS, and putting the thing up on the main Orbited website, where non-IRC-savvy users can open it up to get Orbited help.

WebIRC screenshot

I shouldn’t have peeked at Mario’s JavaScript code for WebIRC. It works, but it’s confusing (what in the world is data[0][n][1]? Who named these variables anyway?), and it’s far more verbose than necessary. I’ll probably end up completely rewriting it. Fortunately the server-side shouldn’t need to be altered much if at all.

Orbited + Cometd

Wednesday, July 18th, 2007

I had a revelation: Cometd and Orbited are not at odds. Cometd is not competing with Orbited — each provides a unique set of features, and they solve problems in different domains. Moreover, a hybrid of the two technologies could result in a system more flexible than either alone.

I want to back up for a moment and reference the SQLAlchemy site. This is the very first time I saw a software system praised as being able to “scale down.” What the heck is that? We’re all in a rat race to scale up and it never occurred to me that there could be benefit to moving in the opposite direction. But then I thought about my past introduction to Python. The year was 2002, and I wanted a “web framework”, and someone so politely pointed me towards Zope. I was intrigued as it was the first whole-solution system I’d ever used for building web pages. But within six months I stopped drinking the kool-aid and despaired. It was so hard to make simple apps with Zope. There were endless repositories of great Zope packages and features that eventually just got in the way of development. I had to work around them to make something simple.

So here’s my pitch for Orbited. It scales down. It allows you to use simple APIs to create as lightweight a solution as you like. For all the small-scale, in-house, or prototype applications, you just want something easy and fast to develop. That’s Orbited.

Of course, SQLAlchemy also claims to “scale up”. Turns out that the same is true about orbited. It is designed around libevent for maximum speed on a single node. More importantly though, its simplicity allows you to scale laterally such that 5 orbited nodes support exactly 5 times as many users as 1 orbited node.

Something is missing though, and it smacks of the word Enterprise. I personally detest the beastly word, but there are lessons to be learned from it. An Enterprise solution has three requirements, generally. 1) Clear Scaling Path, 2) All-in-One solution, and 3) Java, as far as I can tell anyway. Orbited has a clear scaling path, check. Orbited also has a Java client library in the works (check… basically). It is missing the second requirement.

Enter Cometd and the Bayeux Protocol. The great thing about Bayeux and Cometd is that they handle every conceivable aspect of HTTP Push that you could ever hope for. There is event confirmation, room for authentication, pubsub, pluggable transports, complete dojo integration, and much more. I think Orbited does pretty well itself in the areas of authentication and pluggable transports, but the real issue is that we are missing the publish/subscribe support. This pattern of development is an overkill for many HTTP Push apps, but it is mighty helpful for writing anything that resembles a set of chat rooms, as I suspect many comet apps will.

Which brings me back to my revelation: Cometd and Orbited are not competitors. Orbited handles the transport layer of HTTP Push. Cometd can be built on top of Orbited without much difficulty. It’s a bit of work to implement the Bayeux protocol, but it’s clear how to go about it. The real issue I have with Cometd is that its current architecture is inherently hard to distribute across server nodes. Please note: this is not a theoretical shortcoming of Cometd or Bayeux; rather, the technology is scalabilty-bound by current implementation details. I am ready to jump on the Cometd bandwagon, and I bring Orbited to the table. From Orbited to Cometd: Let us handle the difficulties of scaling, both vertically (libevent) and laterally (share nothing), and we’ll let you take care of the protocol, Bayeux, and the browser-side development, dojo toolkit.

The last piece of the puzzle is as easy to find as your favorite IRC daemon. Or perhaps, for the truly hip, Jabber conferencing. Any distributed, channel-based system that scales laterally can effectively handle the subscription functionality necessary to implement the Bayeux protocol. I’ve started work on a Jabber-Orbited interface, and Mario Balibrera has already made substantial progress on a IRC-Orbited interface. The jury is still out on names for these two interfaces, so send me an email if you have any ideas.

I’ve identified all the pieces, so at this point I give this fledgling union of cutting edge technologies a name: TailSpin. That’s certainly how I feel when I try to wrap my head around the scope of operations necessary to power a modern web application. And it takes elements of both Orbited and Cometd.

Here’s some visualization of the proposed system. You can see how it compares to other Comet-style stacks at our Stack Comparison page.

TailSpin stack

In summary, I don’t think Cometd is a bad idea. I think that packing so many layers of functionality into a single project is a bad idea. From Cometd we should take Bayeux, but leave the rest of the stack to technologies better suited to solve lower-level problems. The result of decoupling these layers is that experts can tackle each field directly without the headache of interaction bugs. Furthermore, end-developers can use whichever parts of the stack they feel they need. If they want low-level but easy-to-use HTTP Push, then they can just build on Orbited directly. If they are creating limited-user-base applications, then they might be better served by something like a current Cometd implementation—a single process solution for Bayeux. If they want publish subscribe and don’t need the browser at all, then they can run an IRC server or run Jabber.

Nobody Does It Better (than Jacob)

Monday, July 16th, 2007

So Jacob loves WebIRC. Seriously, he can’t get enough. I think we’re going to have an intervention soon.

The server’s going strong without any major changes since I wrote it two long months ago. In fact, today is WebIRC’s two-month birthday, and someone didn’t send a card. Anyway, it’s about as stable as you can get. I had a computer connected for 52 hours before it crashed for unrelated reasons. Seriously, it does that all the time, I need to reformat. In fact, 52 hours might be the longest it’s gone without crashing lately.

Jacob’s really been doing some exciting things with the application, taking it in a direction I never really considered. He’s simplifying it, for starters, stripping away all the functionality that seemed cool 15 years ago but just confuses people used to Google Talk. He’s developing it into something that should debut in the next month or so on Orbited.org: a very small (in pixels), lightweight little marvel that you can just stick in the corner of any old website. That’s just plain awesome.

Ok, if you really miss it that much, the old WebIRC will run here for a while.

HaloD — Dynamic Load Balancing for Orbited

Tuesday, July 10th, 2007

HaloD is a daemon that performs dynamic load scaling for an orbited cluster.

Before you get too excited, note that this is a proposed project. It is in the planning stage. No code exists yet, and I am not going to tackle the project until I feel that Orbited is 100% ready for production use. (What’s the point of scaling Orbited if you can’t use it?) I welcome any contributions or developers who wish to join the project. This isn’t for the faint of heart though. Implementing HaloD requires writing low-level C code that interfaces directly with the TCP/IP stack in the kernel via the Netfilter project. Another side-effect of this is that HaloD will be platform specific: Linux. This is acceptable to me though because most web apps that use open source software tend to be running on Linux anyway.

With that said, I’ll get on with it.

Name

The name HaloD comes from a combination of the terms “halo orbit” and “daemon”. It is pronounced like the english word “hallowed” but “halo dee” is also an acceptable pronunciation.

“A halo orbit is an orbit around a Lagrange point between two larger bodies. Because the orbit tends to be unstable stationkeeping is required to keep an object such as a satellite in this orbit” — Wikipedia.

How it Works

HaloD accepts incoming connections from the orbited nodes. Whenever an orbited node starts, if configured to do so, it immediately opens a connection with the HaloD server. The HaloD server decides to either 1. Keep the node in reserve, or 2. Add the node to the active cluster.

Browsers and Orbited clients connect to the cluster by connecting to the HaloD machine. For every port that HaloD exposes to browsers, there will be a corresponding port exposed to Orbited Clients (web apps).

The key to understanding HaloD is to think of a typical commodity router. It has a single ip exposed to the outside world, yet allows access to a multitude of intranet computers. It does this by designating certain ports to certain machines and keeping track of which ports map to which machines. So if machine1 makes a request to google.com, the router sends the request out on port 23432, and remembers that incoming traffic on port 23432 maps to a local port on machine1.

Similarly, HaloD is a custom router. It listens on a range of ports, and then maps them to the Orbited cluster nodes behind it. The reason this is helpful is that the end destination of a request to port 500, let’s say, might change, but the browser or web app is none the wiser. Hopefully the following diagram will help you visualize HaloD.

HaloD Architecture

The diagram only shows how browsers connect to HaloD and are forwarded to Orbited nodes, but the same method is used for web applications connecting to HaloD.

Routing Algorithm

Assume for a moment that there is a max of N nodes; let N be 24. For the first node, no matter which of the 24 ports you connect to HaloD on (for either the browser or the client), the packets will be routed to the one (and only) orbited node. So there are 24 routes to a single orbited node. When a second node is added, 12 of the routes are moved over. They are now directed at the second node. When a third node is added, 1/3 of the routes are moved to the third node.

It wouldn’t do the choose any arbitrary routes to move upon adding a node, we need to do it systematically. The formula to determine which routes are moved is as follows:

Algorithm setup

Assign an id from 1 to N to each route.

Adding each new node (algorithm)

  1. The new node is given an id of M + 1 where M is the largest id of an existing node (if no nodes are present then M is 0)
  2. If no other nodes exist, all N routes are assigned to the new node. Finish.
  3. Sort the nodes by number of routes assigned (k), and secondly by node id
  4. Moving in descending order down the list, move assignment of the highest route id for the given node to the new node until floor(N / (M + 1)) routes have been reassigned.
  5. If this quota hasn’t been met but the list has been compeltely traversed, then start at the top and repeat the process (step 4)
  6. For all routes reassigned, send a ‘route moved’ message to each route’s former orbited node. The orbited node will issue a reconnect event to any users connected via that route if necessary. (It is necessary for the Iframe transport, but not for the XHR transport. For XHR just close all open connections and the browsers will re-connect.)

Progression of Routes as Nodes are added

The result of the above algorithm can most easily be understood by examining a picture. If we have 24 routes, the following diagram shows which routes are assigned to each of n nodes, for n between 1 and 8.

HaloD node routing algorithm

Failover

A major reason to use dynamic load balancing is to provide a failover mechanism. If one of your nodes fails, you don’t want to know about it right away. Rather, you want the entire system to continue functioning and you want as little downtime for any user as possible. Here is how we do this with HaloD:

If HaloD loses connection with an orbited node and is not able to immediately re-establish connection, then it reassigns all routes from that node to a failover node. This is the quickest and easiest way to keep service up at all times.

But sometimes you don’t have a failover node available. In this case, HaloD reassigns the node with the highest id (node M) to replace the failed node, then uses a reverse of the route reassignment process describe above to reassign all former node M routes to the appropriate orbited nodes given that we now have total nodes = N = M-1. Note that the client-side orbited implementation (javascript) needs to support automatic reconnection if the connection is lost. Because HaloD immediately recognizes an Orbited node failing, it can swap the routes in a matter of a few cpu cycles, so before the browsers even realize their Orbited Node is down, it’ll be back up.

Implementation Details

As I said before, this project requires per-packet processing, which is only possible at the kernel level, either directly or via the Netfilter project. That means that HaloD must be written entirely in low-level C. We also need to work at a level a bit higher than packet processing, as HaloD needs to accept incoming connections from the Orbited nodes for monitoring purposes. Assuming we want to put 1024 nodes in a single cluster, then we need to quickly process heartbeat signals from each of the nodes. This isn’t a lot of bandwith, but we want it to be snappy. I think the best solution here is to use either libevent or epoll directly. This should work well with our low latency, high concurrency requirements. And who knows, eventually someone may want to put 2048 nodes or more in a single cluster.

The routing table itself is just a hash table in memory. It may be a good idea to dump the routing table to a database or network device every time it’s updated. This allows the possibility of creating real-time failover solutions for the HaloD server itself.

Conclusion

I know this may seem an ambitious project at first glance, but the hardest part is testing it with a realistic setup. The routing algorithm is simple, as will be the heartbeat protocol. The project is necessary though to the future of real-time web applications. Maybe not HaloD in particular, but a similar solution will be needed, and sooner than you might imagine. I think that the hardest part will be attracting web-developers who understand the need for such a system, and are comfortable working in C. Most web developers today have little history with low-level languages — as it should be.

Rails — Ramdisk?

Tuesday, July 3rd, 2007

Now that we have a Ruby client API, we need a demo! So I downloaded Ruby on Rails, opened up the famous screencast, and started making my chat demo. But Python (and CherryChat) knows how to keep its state in a regular variable, whereas it wasn’t clear to me (newbie that I am) how to do that in Ruby and Rails. So I decided, as any self-respecting programmer likely would, to ask on IRC. Specifically, in the #rubyonrails channel on freenode. The discussion that resulted was a classic (long enough that I’m putting it below the fold)

(more…)