XSLT and XPath, without the pain!

November 21, 2007

Here are my slides from last week’s ApacheCon. That’s mostly an excuse to play with the cool slideshare embedding widget, as the slides aren’t really standalone: you might need me to make sense of them ;-)

The talk went well, including some challenging questions and feedback, which is cool.

Someone with a strong background in rules-based languages came up to talk, and made me realize that my talk is really targeted at people who come from a procedural language background. People who know Prolog or a related language have an easier time moving to XSLT that those who come from Java or C, so my “avoid procedural constructs” line doesn’t really apply to them.


DAX: XSLT-like transforms in Java

September 20, 2007

Intrigued by Lars Trieloff’s DAX – Where flowscript and XSLT meet talk title for the Cocoon GT, I did some research and I like what I see!

See this excerpt from Mindquarry’s AnalysisTransform example:

/**
* Match all romeo and juliet line elements and update speaker map
* @param node line element
*/
@Path("//line[../speaker='Rom.' or ../speaker='Jul.']")
public void speaker(Node node) {
// grab speaker node from preceeding sibling axis
Node speakerNode = speakerXPath.selectSingleNode(node);
String speaker = speakerNode.getStringValue();
updateStatistics(speaker);
}

Due to the @Path annotation, this method replaces an xsl:template that would have the same XPath.

Using it is as simple as:

public void testExecute() throws DocumentException{
// create a dom4j doc
SAXReader reader = new SAXReader();
Document doc = reader.read(
Thread.currentThread()
.getContextClassLoader()
.getResourceAsStream("dax/examples/randj.xml")
);

// create a transformer
AnalysisTransform t = new AnalysisTransform();

// perform the transform and check results
t.execute(doc);
assertEquals( t.lines.get("Rom."), new Integer(606));
assertEquals( t.lines.get("Jul."), new Integer(542));
}
}

Great stuff! There’s more info on the Mindquarry wiki, and Lars will be at the GT to tell us more.

The Transformer code is also an interesting example of how to put Java annotations to good use.


Victory in the schema wars

December 5, 2006

From the ROFL department, Rick Jelliffe’s post about XML schema wars is a must read. Of course…the other is the Middle East.

This comes in response to Elliotte Rusty Harold’s RELAX Wins and Tim Bray’s Choose RELAX Now posts. Go read them if you’re not yet convinced that XML Schema sucks.

Boring technology is rarely good technology: XML Schema proves this.


Solr: indexing XML with Lucene and REST

August 10, 2006

111-solr.gifxml.com just published my article, Solr: indexing XML with Lucene and REST, written after evaluating Solr in the last few weeks.

You’ll certainly perceive my enthusiasm for Solr in the article: the “solid indexing component with a simple interface” philosophy of Solr fits my current projects perfectly, and it’s a simple enough layer on top of Lucene to be easy to understand. On top of that, its references are quite solid already.

Note that Solr won’t do any crawling or content extraction, it’s just an indexer with an HTTP/XML interface. But for integration with content or data management systems, it’s just what you need.

The “REST” bit in the article title is not from me, I’d say the interface is “RESTish” but it’s not pure REST I think. Actually, designing a pure REST interface for a search engine would be interesting – but Solr’s current interface is more than good enough.


Paloose – Cocoon sitemaps in PHP

August 4, 2006

Synchronicity I guess…yesterday talking to a colleague we came to the conclusion that Cocoon pipelines are here to stay and today comes an announcement for Paloose, an implementation of Cocoon sitemaps, including many of our standard components apparently, in PHP5.

I haven’t had time to test it yet, if you do you’re welcome to add comments here.

The nice thing is that hosting PHP5 is much easier for small projects than hosting Java, so this might help make Cocoon pipelines a standard in wider circles.

Also, having a clean way of using the rich PHP-based templating system to process XML data (coming from Cocoon maybe) would be cool – I don’t know how good the integration can be with the current version of Paloose.

Of course this only does pipelines, it’s not a full implementation of Cocoon, but…pipelines are here to stay, so this is a Good Thing.

I shouldn’t end this without mentioning Popoon, another Cocoon pipelines system written in PHP. I’ll try to compare them in my Copious Free Time.


FOP 0.90 alpha 1 released, congratulations!

November 23, 2005

The Apache FOP team just announced the availability of FOP 0.90 alpha 1, see the official announcement on the FOP mailing list.

This is a very important announcement for FOP, because it is the first release of a redesign which started a long time ago. So I hope the FOP people will be able to throw the big party that they deserve after this very important milestone.

An extensive list of supported features (including comparison with the mainstream 0.20.5 release) is available on the FOP compliance page.

Kudos to the people who hanged in tight over the years to make this happen: it’s been a long and bumpy ride, but persistence pays! I’m thinking mostly of Jeremias because I know him personally, but the whole team has shown that they’re back on track, and this is great news!

On a sidenote, this release includes decent RTF support, comparable or better to what was in jfor. So it looks like we’ll be able to declare the jfor project officially integrated into FOP, this is also great news! I’ve not been following closely as I have no need for RTF in my current projects, but it’s good to see that people are still working on and using this.

Congratulations to the FOP team, you guys rock!


Do ants like angle brackets?

April 2, 2004

James Duncan Davidson, who created Apache Ant, reflects on ant’s use of XML to define builds.

As is often the case, what started as a simple configuration file for builds is increasingly being used as a scripting language.

James says he’d do it differently now: although well suited to the tree structure of build files, XML tends to get in the way when editing these files.

For me the lesson is: XML is without question the best format to exchange data between machines. But as soon as there is a human in the chain, using a friendlier structured format is often better. Python is a great example of human-friendly structured text.