Solr: indexing XML with Lucene and REST
You'll certainly perceive my enthusiasm for Solr in the article: the "solid indexing component with a simple interface" philosophy of Solr fits my current projects perfectly, and it's a simple enough layer on top of Lucene to be easy to understand. On top of that, its references are quite solid already.
Note that Solr won't do any crawling or content extraction, it's just an indexer with an HTTP/XML interface. But for integration with content or data management systems, it's just what you need.
The "REST" bit in the article title is not from me, I'd say the interface is "RESTish" but it's not pure REST I think. Actually, designing a pure REST interface for a search engine would be interesting - but Solr's current interface is more than good enough.