Author Archive

Orbited at AjaxWorld

Wednesday, August 29th, 2007

In line with my past post about moving forward, we are presenting Orbited at its first conference: AjaxWorld. The conference is located in Santa Clara, CA and will be held September 23 — 26. There will be two presentations: “Comet for Highly Scalable Applications” by me, and “Comet for Everyone: Building Simple, Scalable Comet Applications with Orbited” by Jacob. If you weren’t already planning on attending this conference, definitely don’t come on account of our presentations. Unless your company wants to pay. The conference fee is upwards of $1700 — much more than I would pay for such an event.

Strangely, my presentation is listed under the track “Security and Performance.” I suppose you could look at it as a lecture on performance, but it is definitely not about security. Jacob had the good fortune of being listed under “Advanced AJAX” which sounds a bit more like Comet. It seems to me that they need a “Comet” track because other presentations on this topic are listed under such tracks as “Hot Topic”, “Enterprise Ajax”, and “RIAs in Action”. Go figure. At least they have talks on Comet.

I believe they are recording the presentations and possibly even simulcasting both the slides and video feed. Don’t quote me on that. Regardless, we’ll post our presentation slides and speaking notes here on the blog, after the conference.

WordPress migration

Monday, August 27th, 2007

Please bear with us. We are attempting to migrate the blog to Wordpress, but the script I wrote isn’t working quite as expected. Hopefully in a day or two the blog will be back to normal.

Why the migration? We have a whole slew of announcements to make and we wanted a nice blog for once.

Juggernaut is a Bad Idea

Monday, August 27th, 2007

I guess I’m a year out of date, but I’m just getting to reading various blogs about comet servers besides orbited. I just finished reading Dion Almaer’s post, Juggernaut: Comet for Rails? and I have a few bones to pick.

Some quick background: Juggernaut uses flash sockets to maintain communication between the browser and the push server. Comet has generally been described as an interaction model of bi-directional asynchronous communication… basically adding http-push to the traditional http-pull methods we all use. Comet has been more specifically defined in terms of long-polling and http streaming, methods that both use javascript and http to accomplish the goal.

Towards the end of the article, Dion lists the “advantages/disadvantages of using a flash socket over other methods,” but what he really does is give a factually incorrect listing of reasons he thinks flash is better than anything else without listing a single disadvantage. I’m not sure if this is something Dion wrote, or something Alex McGraw, the author of Juggernaut, wrote. The exact same list appears on the Juggernaut homepage. I’ll assume for the time being that Dion originated this list.

Dion says, in regards to flash socket’s supriority, “It’s much less of a hack.” Yes, many implementations of comet are considered to be “hacks,” but what about Flash? We are basically sticking a flash runtime, an alien piece of software, directly into a browser and then providing crude interfaces for the two to communicate. I have had no end of trouble with memory leaks because of faulty interaction between the flash runtime and Firefox. Additionally, Flash is really a duplication of many features already present in http/browsers. Instead of building on these features, Flash circumvents them and then is crudely hacked wholesale on top of browsers. Perhaps we need sockets and vector graphics, but that should be defined by web standards rather than adobe. But lets assume that javascript comet is a hack for a moment, and Flash isn’t. Whats the problem? With good apis the end-developer doesn’t really care what happens behind the scenes, only that it works. Its a hack that only a few people have to deal with once, and then our problem is solved forever. And those people are already dealing with it… just look at orbited and cometd.

Dion says, “It doesn’t crash your browser (Comet can do this after a while) ” I can only assume he is referring to what happens with novice javsacript programmers keep creating additional XHR objects when trying to implement long-polling, often resulting in memory over-usage and the occasional browser crash. This has nothing to do with the technology though. Its like blaming c because some programmers forget to deallocate memory. Furthermore, HTTP streaming, another common implementation of comet, doesn’t suffer from this problem at all. I have never seen nor heard of HTTP streaming crashing a browser.

Dion says, “95% of browsers support it (flash 6).” But many processors do not, such as AMD64 and ARM. So that means that this technique fails for all iphones. Iphone users, I imagine, will soon be a respectable chunk of users who would be using fancy web applications in the first place.

Dion says, “It’s much easier to implement” This is true… assuming you drop all of http, which is what makes comet hard to implement in the first palce. If you create your own, non-standard protocol for communication between the browser and push server, then its easier to implement. But, of course, your solution only works with itself. What if you want to swap Juggernaut out for cometd? You can’t hold on to any of your browser code, it all has to go. Wheras, if you were using Bayeux (via cometd or an alternative), then you could swap out the backend at will and the browser code need not be changed in the slightest.

Dion says, “It can use a different port - unlike comet - so you don’t need any custom dispatch servlets for forwarding messages through rails to the push server - it can connect directly.” This is factually incorrect. You can use any port with comet. Orbited includes an enlightening example: a web-based irc chat that uses comet. It has static content served from port 80, and has comet served from port 8000. This works, no problem. Don’t take my word for it, you can even find a running instance of the example.

Additional Disadvantages

  1. Juggernaut internalizes its publish/subscribe architecture much in the same way that fdAjax or Cometd do. Refer to my sixth point in my previous post, The Failure of Cometd

  2. Flash isn’t universal. I mentioned this in earlier refutation, and it is important. There are many users who can use comet but cannot use flash.

  3. Flash sockets don’t use http. This is much more dangerous than the obvious lack of standards… it will fail for many “enterprise” firewalls that don’t just blindly look at port numbers. They examine packets and block traffic that doesn’t appear to be http. Users will visit the page and it just won’t work and there will be no explanation — not even an error in the javascript console. We had our network revolution and http on port 80 is all that survived, best to remember that.

  4. In soviet Russia, you own flash.

Future Direction

Saturday, August 25th, 2007

It seems like I just started Orbited yesterday, though in reality I started prototyping it about a year ago. A lot has happened since then and when I step back and look at the website and the project as a whole, I’m very pleased.

I’ve been sticking with low version numbers because I only want adventurous developers who are interested in pushing the boundaries of the web, but don’t mind tracking down a bug or two. In my mind its a small price to pay. Of course, 95% of developers aren’t interested in the least — They’ll adopt once the technology works flawlessly. At this point, I’m pleased to announce that We’ll soon be switching to 0.2.0 development in the repository. If there’s any new functionality you want, now’s the time to say so.

Before 0.2.0 work commences, I want to catch every type of error possible and provide a meaningful report in the error log. The Catch-all functions fine, but its not nice to see cryptic tracebacks all the time. So hopefully over the next couple weeks you’ll know exactly what goes wrong when something breaks.

I also think that Orbited is stable enough for real use. I’m not going to quite call it production ready, but I personally am confident enough to use it in production. The big problem now is opening Orbited up to a wider array of communities. At the moment we only have a functioning python and ruby Orbited client. We need to actively recruit developers from the following languages:

  • PHP
  • Perl
  • Java
  • C#/.net

If we have simple chat tutorials for all of these major languages, then we’d be ready to start getting some coverage on ajaxian and other similar sites. There’s only so much the core development team can do, particularly because our skill set is slanted towards C, python, and Java. There are so many application developers out there who use PHP that we cannot ignore them and consider ourselves a serious contender in the space of comet servers.

But Orbited itself is ready for the next set of developers, the slightly less adventurous kind. To that end, we are going to start publicizing Orbited on various developer mailing lists and irc channels. I recently submitted an abstract to the AjaxWorld Conference, as did Jacob. If accepted, I will talk on the share-nothing architecture of Orbited, and the difficulties of scaling pubsub Comet architectures. Jacob will present on Orbited from a more practical standpoint — how to use it to easily create real-time applications.

This is an exciting time, and it’s only going to get better. Twenty years from now all of these problems will be gone — boring old news. But right now we are poised to invent solutions and really push the envelope. That’s exciting.

Announcing Orbited 0.1.5

Saturday, August 25th, 2007

Announcing Orbited 0.1.5

  • New website: www.orbited.org (thanks to Jacob Rus!)
  • Live Demo! Get help from out irc channel… in your web browser!
    • www.orbited.org/livehelp (thanks to Jacob Rus!)
  • Major stability fixes
    • No longer crashes with malformed input
    • Catchall error reporting (includes line number)
    • Proxy fix
      • somehow proxy lost HTTP/1.0 support (missing content-length header case)
      • Caused it to not work with Pylons/Paste. now it does
  • New Pylons Demo.. tutorial soon to come (thanks to Matthew/desmaj!)

Widespread Proxy Dependence

Sunday, August 19th, 2007

When I was experimenting with Orbited early on, before it was an Open Source project, I became very, very annoyed by browser security. Specifically, for some god-forsaken reason it’s just this side of impossible to do cross domain scripting even when you are running both servers, and on the same parent domain. Sure, at this point it seems straightforward to me, but that’s only because I spent a year tackling the problem.

If you are interested in cross-domain scripting, read what some of the experts have done. Such as Abe Fettig. For the other 99.995% of you who don’t give a rat’s ass, I included a proxy in Orbited.

The proxy, as you may well already know, overcomes the problem of cross-domain scripting by putting Orbited in front of your web application such that the browser thinks orbited is the Comet server and the web app.

The proxy is a testing and development tool though. In a production environment it is a pretty bad idea to use the proxy. Not only will it make it much harder for you to scale your application and achieve true redundancy at all tiers, it will use at least twice the cpu as you should be using.

Even so, it turns out that 90% of developers only want to use the proxy. From what I’ve gleaned, most people want to create prototypes or in-house apps. I think that the idea that anyone could use Comet is so new an idea that developers of large web applications are staying away. It’s going to take a new startup (maybe meebo) getting huge before all the other sites out there start integrating Comet.

In the mean time, as the prototypes roll, I’m left holding the bag. I had absolutely no idea what I was getting into when I included the proxy.

Proxy v1

When I first started, I was using CherryPy as a backend for all of my test and demo apps. I shut off CherryPy’s multi-threading because I wasn’t interested in generating all sorts of thread synchronization bugs. As a result, I had to also disable keepalive. With a single thread and keepalive, one user would tie up the server.

Given that keepalive was off, it was very clear how to proxy incoming requests. Examine the url. If it looks like an Orbited url, then hold on to the connection. Otherwise, check the proxy rules and send it on its way. After I designated a connection as proxy, I never looked at it again.

Proxy v2

I wrote an app that used more than one thread, and I enabled keepalive. Whoops. Suddenly my orbited requests were sometimes going straight through to the webapp. Whats going on? Well, my proxy would look at the first request coming down the pipe and then never look again. So the second request, which was for Orbited, got proxied as well.

Some long hours later, and I had a new proxy. It would look at all the data as it flowed from the client to the server. At the end of each request, it turned the connection back over to Orbited. When the next request came down the pipe, Orbited ran its usual check to see if it was for Orbited or for the proxy.

Proxy v3

So along comes a django developer. He writes a simple version of the cherrychat app in django, sets up the proxy, and nothing happens. No requests ever make it back from django to the proxy. And get this — the error reported is: “<type ‘exceptions.TypeError’&rt;: exceptions must be classes, instances, or strings (deprecated), not type” with no useful line number.

Turns out pyevent 0.3 and python 2.5 on ubuntu 7.04 causes exceptions generated by Orbited to be reported incorrectly. So I patched pyevent 0.3 and tried again. After much grief, the problem became clear. Django uses a rarely seen feature of HTTP/1.1 known as Transfer-Encoding: chunked;. It allows Django (or any other web framework) to render templates incrementally and send the results as they are available. The reason this wouldn’t be possible without chunked transfer is that in HTTP/1.1 you need to specify the Content-Length header before any content. But Django can’t know how long the content will be until it’s rendered.

So I rewrote the proxy yet again, because the previous code base wasn’t very conducive to this sort of interaction. This time though, the rewrite took about two hours because I had the old version to reference. After I finished, my Django tester disappeared never to return, apparently discouraged by the previous lack of Django support. I tried to do my own Django thing but I got lost. And I couldn’t make it use HTTP/1.1, much less chunked transfer encoding. So someone please test this when you get a chance.

Proxy v3.1

After coming all this way, I stopped supporting a subset of HTTP/1.0. That is, If not Content-Length header is specified, the connection just ends when it ends. But I was assuming, as per HTTP/1.1 that it meant the length was 0. This wasn’t a very involved fix, but I’d like to thank Matthew (desmaj) for finding this bug.

So the whole Ordeal was definitely worth it — I know all about HTTP Protocol and how it effects comet, proxies, and comet proxies. The proxy is more stable, which is great. I think that it will continue to be the main method of deploying Orbited applications because it’s just so much easier. It makes the difference between a 10 minute tutorial and a 30 minute tutorial. Just because of the proxy which isn’t a viable real world tool, we probably have a 1000% higher retention rate of prospective Orbited users. It just goes to show you that ease of use is what makes something popular. No one cares about scalability until after they’ve chosen a framework.

Announcing Orbited 0.1.4

Wednesday, August 8th, 2007
  • Rewrote the Proxy
    • Django didn’t use to work. Does it now? Calling all Django Developers
    • Added transfer-encoded: chunked
    • New keepalive Engine
    • Arbitrary keepalive timeout
    • Fixed missing transport bug (doesn’t crash now)
  • New User Group @ Google Groups

Gaining Momentum

Tuesday, July 24th, 2007

I’ve contributed to Open Source projects in the past, but this is the first time I’ve stood behind a project. I must say, it’s a lot of work. But it’s worth it because we’re developing technology that’s on the cutting edge. We’re really pushing the limits of HTTP and making way for a new breed of web applications.

I’m finally starting to get all sorts of emails about the project. It’s gratifying to know that someone out there reads this stuff that I write, and even goes as far as to try out the tutorials.

In more quantitative terms,

  • I’ve been contacted by more than two dozen developers
  • There’s generally 3-4 people in the IRC channel at all times, and 1-2 conversations a day (not counting core developer conversations.)
  • There are three documentation contributors who aren’t me
  • Much wider testing against a diverse range of platforms
  • Tagged by 20 people on del.icio.us

I’ve spent so much time developing that I haven’t done enough evangelizing. I really ought to get our name out there to all the Web 2.0 developer sites. But We aren’t going to do that just yet.

We are at version 0.1.3, which sounds like a pretty low number. This is intentional because we want to scare away developers who might get a bad image of Orbited if it doesn’t work for them right away or they encounter a bug. I’m targeting a 0.2.0 release in September, and at that point I’ll start shouting from the rooftops. Not just online either. I want to publish a few papers and attend some conferences.

In short, we are up momentum. We’re still a ways from having a large user base, but we’ll get there. In a few months we’ll be unstoppable. There’s interest in Orbited and it’s here to stay.

Orbited + Cometd

Wednesday, July 18th, 2007

I had a revelation: Cometd and Orbited are not at odds. Cometd is not competing with Orbited — each provides a unique set of features, and they solve problems in different domains. Moreover, a hybrid of the two technologies could result in a system more flexible than either alone.

I want to back up for a moment and reference the SQLAlchemy site. This is the very first time I saw a software system praised as being able to “scale down.” What the heck is that? We’re all in a rat race to scale up and it never occurred to me that there could be benefit to moving in the opposite direction. But then I thought about my past introduction to Python. The year was 2002, and I wanted a “web framework”, and someone so politely pointed me towards Zope. I was intrigued as it was the first whole-solution system I’d ever used for building web pages. But within six months I stopped drinking the kool-aid and despaired. It was so hard to make simple apps with Zope. There were endless repositories of great Zope packages and features that eventually just got in the way of development. I had to work around them to make something simple.

So here’s my pitch for Orbited. It scales down. It allows you to use simple APIs to create as lightweight a solution as you like. For all the small-scale, in-house, or prototype applications, you just want something easy and fast to develop. That’s Orbited.

Of course, SQLAlchemy also claims to “scale up”. Turns out that the same is true about orbited. It is designed around libevent for maximum speed on a single node. More importantly though, its simplicity allows you to scale laterally such that 5 orbited nodes support exactly 5 times as many users as 1 orbited node.

Something is missing though, and it smacks of the word Enterprise. I personally detest the beastly word, but there are lessons to be learned from it. An Enterprise solution has three requirements, generally. 1) Clear Scaling Path, 2) All-in-One solution, and 3) Java, as far as I can tell anyway. Orbited has a clear scaling path, check. Orbited also has a Java client library in the works (check… basically). It is missing the second requirement.

Enter Cometd and the Bayeux Protocol. The great thing about Bayeux and Cometd is that they handle every conceivable aspect of HTTP Push that you could ever hope for. There is event confirmation, room for authentication, pubsub, pluggable transports, complete dojo integration, and much more. I think Orbited does pretty well itself in the areas of authentication and pluggable transports, but the real issue is that we are missing the publish/subscribe support. This pattern of development is an overkill for many HTTP Push apps, but it is mighty helpful for writing anything that resembles a set of chat rooms, as I suspect many comet apps will.

Which brings me back to my revelation: Cometd and Orbited are not competitors. Orbited handles the transport layer of HTTP Push. Cometd can be built on top of Orbited without much difficulty. It’s a bit of work to implement the Bayeux protocol, but it’s clear how to go about it. The real issue I have with Cometd is that its current architecture is inherently hard to distribute across server nodes. Please note: this is not a theoretical shortcoming of Cometd or Bayeux; rather, the technology is scalabilty-bound by current implementation details. I am ready to jump on the Cometd bandwagon, and I bring Orbited to the table. From Orbited to Cometd: Let us handle the difficulties of scaling, both vertically (libevent) and laterally (share nothing), and we’ll let you take care of the protocol, Bayeux, and the browser-side development, dojo toolkit.

The last piece of the puzzle is as easy to find as your favorite IRC daemon. Or perhaps, for the truly hip, Jabber conferencing. Any distributed, channel-based system that scales laterally can effectively handle the subscription functionality necessary to implement the Bayeux protocol. I’ve started work on a Jabber-Orbited interface, and Mario Balibrera has already made substantial progress on a IRC-Orbited interface. The jury is still out on names for these two interfaces, so send me an email if you have any ideas.

I’ve identified all the pieces, so at this point I give this fledgling union of cutting edge technologies a name: TailSpin. That’s certainly how I feel when I try to wrap my head around the scope of operations necessary to power a modern web application. And it takes elements of both Orbited and Cometd.

Here’s some visualization of the proposed system. You can see how it compares to other Comet-style stacks at our Stack Comparison page.

TailSpin stack

In summary, I don’t think Cometd is a bad idea. I think that packing so many layers of functionality into a single project is a bad idea. From Cometd we should take Bayeux, but leave the rest of the stack to technologies better suited to solve lower-level problems. The result of decoupling these layers is that experts can tackle each field directly without the headache of interaction bugs. Furthermore, end-developers can use whichever parts of the stack they feel they need. If they want low-level but easy-to-use HTTP Push, then they can just build on Orbited directly. If they are creating limited-user-base applications, then they might be better served by something like a current Cometd implementation—a single process solution for Bayeux. If they want publish subscribe and don’t need the browser at all, then they can run an IRC server or run Jabber.

HaloD — Dynamic Load Balancing for Orbited

Tuesday, July 10th, 2007

HaloD is a daemon that performs dynamic load scaling for an orbited cluster.

Before you get too excited, note that this is a proposed project. It is in the planning stage. No code exists yet, and I am not going to tackle the project until I feel that Orbited is 100% ready for production use. (What’s the point of scaling Orbited if you can’t use it?) I welcome any contributions or developers who wish to join the project. This isn’t for the faint of heart though. Implementing HaloD requires writing low-level C code that interfaces directly with the TCP/IP stack in the kernel via the Netfilter project. Another side-effect of this is that HaloD will be platform specific: Linux. This is acceptable to me though because most web apps that use open source software tend to be running on Linux anyway.

With that said, I’ll get on with it.

Name

The name HaloD comes from a combination of the terms “halo orbit” and “daemon”. It is pronounced like the english word “hallowed” but “halo dee” is also an acceptable pronunciation.

“A halo orbit is an orbit around a Lagrange point between two larger bodies. Because the orbit tends to be unstable stationkeeping is required to keep an object such as a satellite in this orbit” — Wikipedia.

How it Works

HaloD accepts incoming connections from the orbited nodes. Whenever an orbited node starts, if configured to do so, it immediately opens a connection with the HaloD server. The HaloD server decides to either 1. Keep the node in reserve, or 2. Add the node to the active cluster.

Browsers and Orbited clients connect to the cluster by connecting to the HaloD machine. For every port that HaloD exposes to browsers, there will be a corresponding port exposed to Orbited Clients (web apps).

The key to understanding HaloD is to think of a typical commodity router. It has a single ip exposed to the outside world, yet allows access to a multitude of intranet computers. It does this by designating certain ports to certain machines and keeping track of which ports map to which machines. So if machine1 makes a request to google.com, the router sends the request out on port 23432, and remembers that incoming traffic on port 23432 maps to a local port on machine1.

Similarly, HaloD is a custom router. It listens on a range of ports, and then maps them to the Orbited cluster nodes behind it. The reason this is helpful is that the end destination of a request to port 500, let’s say, might change, but the browser or web app is none the wiser. Hopefully the following diagram will help you visualize HaloD.

HaloD Architecture

The diagram only shows how browsers connect to HaloD and are forwarded to Orbited nodes, but the same method is used for web applications connecting to HaloD.

Routing Algorithm

Assume for a moment that there is a max of N nodes; let N be 24. For the first node, no matter which of the 24 ports you connect to HaloD on (for either the browser or the client), the packets will be routed to the one (and only) orbited node. So there are 24 routes to a single orbited node. When a second node is added, 12 of the routes are moved over. They are now directed at the second node. When a third node is added, 1/3 of the routes are moved to the third node.

It wouldn’t do the choose any arbitrary routes to move upon adding a node, we need to do it systematically. The formula to determine which routes are moved is as follows:

Algorithm setup

Assign an id from 1 to N to each route.

Adding each new node (algorithm)

  1. The new node is given an id of M + 1 where M is the largest id of an existing node (if no nodes are present then M is 0)
  2. If no other nodes exist, all N routes are assigned to the new node. Finish.
  3. Sort the nodes by number of routes assigned (k), and secondly by node id
  4. Moving in descending order down the list, move assignment of the highest route id for the given node to the new node until floor(N / (M + 1)) routes have been reassigned.
  5. If this quota hasn’t been met but the list has been compeltely traversed, then start at the top and repeat the process (step 4)
  6. For all routes reassigned, send a ‘route moved’ message to each route’s former orbited node. The orbited node will issue a reconnect event to any users connected via that route if necessary. (It is necessary for the Iframe transport, but not for the XHR transport. For XHR just close all open connections and the browsers will re-connect.)

Progression of Routes as Nodes are added

The result of the above algorithm can most easily be understood by examining a picture. If we have 24 routes, the following diagram shows which routes are assigned to each of n nodes, for n between 1 and 8.

HaloD node routing algorithm

Failover

A major reason to use dynamic load balancing is to provide a failover mechanism. If one of your nodes fails, you don’t want to know about it right away. Rather, you want the entire system to continue functioning and you want as little downtime for any user as possible. Here is how we do this with HaloD:

If HaloD loses connection with an orbited node and is not able to immediately re-establish connection, then it reassigns all routes from that node to a failover node. This is the quickest and easiest way to keep service up at all times.

But sometimes you don’t have a failover node available. In this case, HaloD reassigns the node with the highest id (node M) to replace the failed node, then uses a reverse of the route reassignment process describe above to reassign all former node M routes to the appropriate orbited nodes given that we now have total nodes = N = M-1. Note that the client-side orbited implementation (javascript) needs to support automatic reconnection if the connection is lost. Because HaloD immediately recognizes an Orbited node failing, it can swap the routes in a matter of a few cpu cycles, so before the browsers even realize their Orbited Node is down, it’ll be back up.

Implementation Details

As I said before, this project requires per-packet processing, which is only possible at the kernel level, either directly or via the Netfilter project. That means that HaloD must be written entirely in low-level C. We also need to work at a level a bit higher than packet processing, as HaloD needs to accept incoming connections from the Orbited nodes for monitoring purposes. Assuming we want to put 1024 nodes in a single cluster, then we need to quickly process heartbeat signals from each of the nodes. This isn’t a lot of bandwith, but we want it to be snappy. I think the best solution here is to use either libevent or epoll directly. This should work well with our low latency, high concurrency requirements. And who knows, eventually someone may want to put 2048 nodes or more in a single cluster.

The routing table itself is just a hash table in memory. It may be a good idea to dump the routing table to a database or network device every time it’s updated. This allows the possibility of creating real-time failover solutions for the HaloD server itself.

Conclusion

I know this may seem an ambitious project at first glance, but the hardest part is testing it with a realistic setup. The routing algorithm is simple, as will be the heartbeat protocol. The project is necessary though to the future of real-time web applications. Maybe not HaloD in particular, but a similar solution will be needed, and sooner than you might imagine. I think that the hardest part will be attracting web-developers who understand the need for such a system, and are comfortable working in C. Most web developers today have little history with low-level languages — as it should be.