RESTful services as intelligent websites

March 27, 2012

Thinking of RESTful services as intelligent websites helps drive our architecture decisions towards simplicity, discoverability and maximum use of the HTTP protocol. That should help us design services that are easier to use, debug, evolve and maintain.

In this post I’ll try to address a few important aspects of RESTful services, considering them as intelligent websites. We might need more formal specifications in some cases, but this is hopefully a good start.

Discoverability and service homepage

Having a descriptive homepage for your service helps search engines and people discover it, and the base service URL should “never” change. Service-specific subdomains come to mind.

The service homepage includes all the necessary information such as service owner’s contact information, links to source code, etc.

News about service updates live right here on the service homepage, ideally as full content for the most recent news, but at least as links.

The key idea is that I shouldn’t have to tell you more than the service homepage’s URL for you to be able to use the service.

Even if your service is a company-internal one that’s not meant to become public, having a decent-looking homepage, or at least one that’s well organized and well readable, won’t hurt.


In my pragmatic view Hypermedia as the Engine of Application State basically means links tell you where to go next.

In a website meant for humans, the meaning of a link is often expressed by logical groups: navigation links at the top left, “more info” links in a smaller font at the bottom of a page, etc.

For machines, adding rel attributes to <link> elements (in HTML, or the equivalents in other formats) tells us what we can do next. A client should be able to first try a link with non-destructive results, and get a response the supplies details about how the specified interaction is supposed to work. If those details are too involved to be expressed in a simple HTTP request/response (which should be a “maybe this is too complex” warning), links to explanations can be provided in the HTML content of our intelligent website.

Self-explaining services

The service documentation, if any is needed, is also provided on the service homepage, or linked from it.

Service interactions are designed to be as obvious as possible to minimize the need for documentation, and readable automated tests (hosted on the service website of course, or linked from it) can help document the details.

HTML forms describe service parameters

HTML forms are best way to document the service parameters: provide a form that humans can use to play with the service, with enough information such as lists and ranges of values so that users can figure out by themselves what the service expects.

The idea is that a human user will play with your service from the HTML form, then move on to implementing a client in the language of their choice.

The action attribute of <form> elements also contributes to HATEOAS – intelligent clients can discover that if needed, and it’s obvious for human users.

And of course, speaking of parameters, be liberal in what you accept, and conservative in what you send.

Speaking in URLs

Like humans, RESTful services need to be precise when they speak about something important.

If your service response says invalid input format for example, it’s not hard to include in that response an URL that points to a description of the correct input format. That makes all the difference between a frustrating and a useful error message, and is part of HATEOAS as well.

Web-friendly responses

Readable response formats will help people make sense of your service. The HTTP protocol does provide standard ways of compressing responses to avoid using more bandwidth than needed, so optimizing the response for wire efficiency does not make much sense unless you’re really expecting measuring huge traffic. And even if you need an optimized binary response format, there’s probably a way to make that optional.

HTTP status codes

Thou shalt not abuse HTTP status codes – or you might be suddenly transformed into a teapot.

Returning a 200 OK status code with content that describes an error is a no-no: if something went wrong, the HTTP status code needs to express that.


Website authentication and authorization mechanisms and secure connections work for machine clients as well, no need to reinvent that wheel.

HTTP sessions are a bad idea in a RESTful context of course, state is driven by hypertext as discussed above.

Character encodings

The issues are the same for human users as for machine clients: you need to play by the HTTP protocol rules when it comes to character encodings, and using UTF-8 as the default is usually the best option.

Stable service URLs

As with good websites, once a service has been published at a given URL, it should continue working in the same way “forever”.

A substantially different new version of the service should get its own different URL – at least a different path containing a version number, or maybe a new subdomain name.

Long-running jobs

Regardless of human or machine clients, you usually don’t want HTTP requests to last more than a few seconds. Long-running jobs should initially create a new resource that describes the job’s progress and lets the client know when the output is available. We’ve had an interesting discussion about this in Apache Stanbol recently, about long-running jobs for semantic reasoners.

Service metadata

Non-trivial services often have interesting metadata to provide, which can have both static and dynamic parts, like configuration values and usage statistics for example.

Here again, the website view helps: that metadata is just made accessible from the service homepage via links, ideally with both human (a href) and machine (link rel) variants.


Designing your RESTful service as an intelligent website should help people make good use of it, and will probably also help you make its interactions simpler and clearer.

If your service can’t be presented as a website, it might mean that your interaction design is not optimal, or not really RESTful.

I’m happy to see these opinions challenged, of course, if someone has any counter-examples.

Update: I should have mentioned that this post, and especially the “intelligent websites” concept, was inspired by conversations with Roy Fielding, in the early days of Apache Sling. I haven’t asked him to review it, so this doesn’t mean he would endorse the whole thing…I’m giving credit for inspiration but still liable for any mistakes ;-)

Update #2: See also my REST reading list, a regularly updated collection of links to articles that I’ve found useful in understanding and explaining REST.. smoke and mirrors, and RESTful design

March 22, 2011

Glow shotWhen I was a kid, my aunt gave me a book called the art of engineering. The title sounded weird to me at first – isn’t engineering the opposite of art?

It’s not – artful design can be visible in the best pieces of software, and not only at the user interface level. I find the realtime display of Firefox 4 downloads by fascinating, and being my curious self I wondered how the data is transferred.

Starting with the requirement of broadcasting real-time data to millions of clients simultaneously, many of us would end up with expensive message queuing systems, RPC, WebSockets, SOAP^H^H^H^H (not SOAP – don’t make me cry). Lots of fun ways to add some powers of ten to your budget.

Don’t believe anyone who tells you that software has to be complicated, or that engineering cannot be artful. Simplicity always wins, and is an excellent example of that.

The first thing that I noticed when looking at how glow gets its data (which was very easy thanks to the use of sane http/json requests) is that glow is not real-time.

I’d call it smoke and mirrors real-time: the client just requests a new batch of data points every minute, and the server can change this interval at any time, which can be very handy if traffic increases. Fetching slightly old data every minute is more than enough for a human user who doesn’t care if the data is a bit outdated, and it makes the system a bit simpler.

The first of these two regular data requests is to an URL like The path already tells you a lot about what this is, which although not required is often a sign of a good RESTful design.

The response contains an array of data points (number of downloads per minute), along with two very important items that control the data transfer:


The interval tells the client when to ask for data next, and the next item is the path to the next batch of data. At least that’s what I assume, I haven’t checked the client code in detail but that seems obvious.

Using URLs and data that seem obvious is the essence of the Web, and of a good RESTful design. Using RPC, WebSockets or any other supposedly more sophisticated mechanism would bring nothing to the user, and would only make things more complicated. Being able to throttle data requests from the server-side using the interval and next items is very flexible, obvious, and does not require any complicated logic on the client side.

The second data URL looks like, and if my quick analysis is correct it returns geographic coordinates of the dots that represent geolocated downloads. It uses the same interval/next mechanism for throttling requests.

All in all, an excellent example of engineering smoke and mirrors applied in the right way, and of simple and clean RESTful design. No need for “sophisticated” tools when the use case doesn’t really require them. Kudos to whoever designed this!

Update: The Mozilla team has more details on their blog. Thanks to Alex Parvulescu for pointing that out.

On speaking in URLs

December 3, 2010

speaking-in-urls-2.jpgI’ve seen about five examples just today where speaking in URLs (I spoke about that before, slide 27) would have saved people from misunderstandings, and avoided wasting our collective time.

When writing about something that has an URL on a project’s mailing list, for example, pointing to it precisely makes a big difference. You will save people’s time, avoid misunderstandings and over time create a goldmine of linked information on the Web. It’s not a web without links, ok?

Writing (or at least SLING-931 if that’s the local convention) is so much clearer than writing about “the jcrinstall web console problem”. You might know what the latter is right now, but how about 6 months later when someone finds your message in the mailing lists archives?

Of course, all your important technical things have stable URLs, right?

Twitter is the new CB…but it’s missing the channels!

September 29, 2010

When I was a kid, Citizen Band Radio (aka “CB”) was all the rage if you could afford it.

Those small unlicensed two-way radios have a relatively short range, actually extremely short if you compare to the global range of Twitter today. And they don’t have that many channels, 40 in most cases if I remember correctly. That works as long as the density of CB users is not too high in a given area.

For general chat, CB etiquette requires you to start by calling on a common channel for whoever you want to talk to, and, once you find your partner(s), quickly agree on a different channel to move to, to avoid hogging the common channel.

That “agree on a different channel to move to” feature is key to sharing a limited medium efficiently. As the Twitter population grows, the timeline that I’m getting is more and more crowded, with more and more stuff that I’m not interested in, although I’m willing to follow the general flow of a lot of people.

The global reach of services like Twitter and ubiquitous Internet access makes CB mostly obsolete today.

Twitter is the new CB, in many ways.

What Twitter lacks, however, are the channels, as in:

Could you guys at SXSW move to the #c.sxsw channel and stop boring us with your conference chitchat? We’re jealous, ok? Thanks.

Direct messages don’t work for that, as they are limited to two users. A bit like a point-to-point channel, like the telephone, as opposed to multipoint as the CB provides.

Twitter channels can also be very useful for data, like weather stations or other continuous data sources that can benefit from hierachically organized channels. But let’s keep that discussion for another post. Like my mom said, one topic, one post (not sure it was her actually).

What does Twitter need to support channels?

I think the following rule is sufficient:

Any message that contains a hashtag starting with #c. is not shown in the general timeline, except to people who are explicitely mentioned with their @id in the message.

Such messages can then be retrieved by searching for channel hashtags, including partial hashtag values to support hierarchies.

Using hierachical channel names by convention opens interesting possibilities. The ApacheCon conference general channel would be #c.apachecon for example, the java track #c.apachecon.j, etc.

This channel filtering could of course be implemented in Twitter clients (@stephtara, remember you said you were going to mention that to @loic?), but in my opinion implementing it on the server side makes more sense as it’s a generally useful feature.

Then again, I’m a server-side guy ;-)

Opinions welcome, of course.

Dear Oracle, can we have our nice javadoc URLs back?

July 21, 2010

If you support this request, please vote for it in the comments below and/or on twitter using the #E17476 hashtag!

Update (2010/07/24): it looks like the old URLs are back, thanks Oracle and especially @mreinhold!

Update (2010/07/27): see also Good Feedback and Happy Endings – The Ugly URLs.

Dear Oracle,

A while ago you bought Sun, and IIRC promised to do good things for Java. Or at least indicated you would. Or something like that.

Now, a bad thing happened a few days ago. Not a bad bad bad thing, just a tiny annoying change in the cool URLs that Sun used to publish the JDK’s javadocs. Not annoying annoying annoying but not nice.

Even Google remembers: today if I search for IndexOutOfBoundsException on Google it returns the following URL:

Which is a cool URL that shouldn’t change.

Now, requesting this URL today causes a redirect to:

Which is also somewhat cool, but not as much. Factor 10 down in coolness. It makes me assume that you’re serving javadocs from a CD, and that CD’s identifier is E17476_01. That’s useful info if you’re the filesystem driver who’s reading the CD, but I doubt filesystem drivers are searching for javadocs on Google. Also, I’m not looking at downloading anything. Just browsing, okay?

Cool URLs shouldn’t change.

Can we have the old one back? Ok, maybe with instead of – you bought them anyway. But please please please, let the poor CD filesystem driver alone!


P.S. we’re having a little vote on Twitter about this, check it out at (URL updated in 2018 to account for Twitter API changes)

Can I haz web?

June 15, 2010

How many people today still think information is only valid or “serious” when represented on an A4 piece of paper?

Way too many, if you ask me.

I’m always disappointed when people push out important content as PDF documents (or much worse…I won’t even name that format) attached to web pages or email messages, instead of just including the content in those web pages or messages, as a first-class citizen.

For some reason, people seem to think that information presented in A4 format has more value than the same information presented as a simple and clean web page. It is quite the opposite actually: web pages can be linked to, easily indexed, reformatted for efficient reading (thanks readability), etc.

Ted Nelson, the inventor of hypertext, wrote in 1999 already [1]:

We must overthrow the paper model, with its four prison walls and peephole one-way links

And also, in the same paper:

WYSIWYG generally means “What You See Is What You Get” — meaning what you get when you print it out. In other words, paper is the flat heart of most of today’s software concepts.

Granted, we haven’t fully solved the two-way links problem yet, but I hope you get the idea. Who needs paper or A4 pages? This is 2010, and this is the Web.

Please think about it next time you publish an important piece of information. Does it really need to live in the prison walls of a “document”? In what ways is that more valid than a web page or plain text email message?

Most of the time, almost always, the answer is: it’s not more valid, it’s just less usable.

Can I haz web? kthxbye.


Cognition, do you understand me?

September 19, 2008

Searching for Robert Moog invented it on returns the Minimoog page as its first results, and the other results are very relevant to that story. Very impressive.



I love the Web

November 9, 2007

Especially when pmarca uses his blog to announce his (and his wife’s) $27.5 million donation to Stanford Hospital. Citizen media, really.

Now even I can understand HTTP error codes

September 20, 2007

adam-koford-http-402.jpgThanks to Adam Koford‘s excellent set of Illustrated HTTP errors.

Via Attila Szegedi.

Update: as Leo indicates, this post has been featured in today’s heute magazine. So let’s see: heute links to me who links to Attila who links to Adam…isn’t the web fun?

Crowbar: scrape javascript-generated pages via Gecko and REST!

February 24, 2007

Careful with buzzwords on that one…from the extremely clever hacks department, Stefano strikes again with crowbar, a RESTish page-scraping component based on the Gecko rendering engine.

As a first proof of concept, scraping crowbar’s test page, where some content is generated via javascript, works just fine.

Using a non-javascript scraper gets you the raw page, obviously (you lame crawler!):

$ curl -s
function init() {
document.getElementById("message").innerHTML = "Hi Crowbar!";
<body onload="init()">
<h1 id="message">Hi lame crawler</h1>

Using crowbar as a proxy, the page is rendered using the Gecko engine (called via a simple XULRunner app), exactly as a a client browser would do (of course – it is the same client browser engine that Firefox would use):

$ curl -s --data "url="  | xml fo -s 2
<?xml version="1.0"?>
<-- (stuff added by crowbar for its test page omitted)... -->
function init() {
document.getElementById("message").innerHTML = "Hi Crowbar!";
<BODY onload="init()">
<H1 id="message">Hi Crowbar!</H1>

Note the use of XMLStarlet to format the resulting document: as it’s a DOM dump from Gecko, the output is well-formed in all cases.

The only thing missing seems to be the encoding declaration in the XML output: crawling for example (one of my favorite references on the Web), didn’t work produce parsable XML as the XML output is serialized using the iso-8859-1 encoding (at least here on my macosx system), but the XML declaration doesn’t mention this.

The code is at and can be installed directly under XULRunner.

Very clever! This will take some polishing of course, but if you need to index or analyze javascript-generated content (automated testing anyone?), there’s no better way than getting it straight from the browser horse’s mouth!

Update: the brand new Crowbar web site has more info.