Recursion in XSLT is fun

Well, recursion is always fun and powerful, isn’t it?

If you’re into XSLT you might like this one – the problem was converting a flat structure (a wordML text run) in a nested structure as used for formatting in HTML.

<!--
Convert the wordML "siblings attributes and content" format
to a "nested styles" structure as used in HTML.

This is not meant to actually process wordML documents, it's
more a demonstration of recursive processing in XSLT.

With input like:

<word-textrun-example
xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml">
<w:r>
<w:rPr>
<w:i/>
<w:b/>
<w:lang w:val="EN-GB"/>
</w:rPr>
<w:t>Here's a textrun with attributss i and b</w:t>
</w:r>
</word-textrun-example>

where the i and b attributes are stored in siblings of the content,
this produces a nested structure:

<textrun>
<i>
<b>
<lang lang="EN-GB">
<content>Here's a textrun with attributes i and b</content>
</lang>
</b>
</i>
</textrun>

-->

<xsl:stylesheet
version="1.0"
xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<!-- by default copy -->
<xsl:template match="*">
<xsl:copy>
<xsl:copy-of select="@*"/>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>

<!-- text run: recursively process style elements, ending with content -->
<xsl:template match="w:r">
<xsl:variable name="styles" select="w:rPr/*"/>
<textrun>
<xsl:call-template name="styleProcessor">
<xsl:with-param name="styles" select="$styles"/>
<xsl:with-param name="content" select="w:t"/>
</xsl:call-template>
</textrun>
</xsl:template>

<!--
Here's the beef - recursively generate nested elements based on
the "styles" list of nodes
-->
<xsl:template name="styleProcessor">
<xsl:param name="styles"/>
<xsl:param name="content"/>

<!-- Remove the first element from "styles" for the next recursion step -->
<xsl:variable name="nextStyles" select="$styles[position() > 1]"/>

<!-- Pluggable conversion of style element names -->
<xsl:variable name="styleElement">
<xsl:apply-templates select="$styles[1]" mode="styleElement"/>
</xsl:variable>

<!-- Generate a nested style element, with pluggable attributes -->
<xsl:element name="{$styleElement}">
<!-- This is meant to generate additional attributes for certain style elements -->
<xsl:apply-templates select="$styles[1]" mode="styleAttributes"/>

<xsl:choose>
<xsl:when test="$nextStyles">
<!-- There are more recursion steps -->
<xsl:call-template name="styleProcessor">
<xsl:with-param name="styles" select="$nextStyles"/>
<xsl:with-param name="content" select="$content"/>
</xsl:call-template>
</xsl:when>

<xsl:otherwise>
<!-- no more recursion steps, process content -->
<xsl:apply-templates select="$content"/>
</xsl:otherwise>
</xsl:choose>

</xsl:element>
</xsl:template>

<!-- styleElement mode: generate the element name to use for a given style element -->
<xsl:template match="*" mode="styleElement">
<xsl:value-of select="local-name()"/>
</xsl:template>

<!-- styleAttributes mode: can be used to add attributes for certain style elements -->
<xsl:template match="*" mode="styleAttributes"/>

<!-- add the language as an attribute for the w:lang element -->
<xsl:template match="w:lang" mode="styleAttributes">
<xsl:attribute name="lang">
<xsl:value-of select="@w:val"/>
</xsl:attribute>
</xsl:template>

<!-- process the content of the text run -->
<xsl:template match="w:t">
<content>
<xsl:apply-templates/>
</content>
</xsl:template>

</xsl:stylesheet>

5 Responses to Recursion in XSLT is fun

  1. Brian Ewins says:

    Hate to be pedantic, but that isn’t legal XSL-T 1.0 (though it may be in 1.1). You use this expression: “$styles[1]” etc, but the spec says:

    “Variables introduce an additional data-type into the expression language. This additional data type is called result tree fragment…In particular, it is not permitted to use the /, //, and [] operators on result tree fragments.” (sec 11.1)

    From a quick glance over, it looks like you can portably get the effect you intended by using following-sibling::*[1].

  2. Hmmm, looks like you’re right. I must admit that I hacked this without looking at the specs ;-)

    I won’t rewrite it now but I see the idea with following-sibling, looks more standard – thanks!

  3. Nico Verwer says:

    This is an intereting area of the XSLT spec, which has lead to a lot of confusion. In 11.2, it says that there are several ways to bind a variable:
    # If the variable-binding element has a select attribute, […] the value of the variable is the object that results from evaluating the expression.
    # If the variable-binding element does not have a select attribute and has non-empty content […] The content of the variable-binding element is a template, which is instantiated to give the value of the variable. The value is a result tree fragment

    If you use the select attribute, the variable is bound to a node-set (if that is the result of evaluating the expression), and only if has content, it becomes a result tree fragment.

    Both Xalan and Saxon implement variables in this way.

    This area has become much simpler in XSLT 2.0.

  4. So IIUC in my case, the variable-binding statement is

    <xsl:with-param name=”styles” select=”$styles”/>

    and then with

    <xsl:param name=”styles”/>

    $styles correctly contains a node-set object.

    This explains why my example works – cool!

  5. Carlos says:

    Ok and now how can I do the opposite? Transform HTML to WordML?
    Thank you.
    Carlos

%d bloggers like this: