} elements. \endlist Again the double slash means select all the \c{<recipe>} elements in the document. The single slash before the \c{<title>} element means select only those \c{<title>} elements that are \e{child} elements of a \c{<recipe>} element (i.e. not grandchildren, etc). The XQuery evaluates to a final result set containing the \c{<title>} element of each \c{<recipe>} element in the cookbook. \section2 Axis Steps The most common kind of path step is called an \e{axis step}, which tells the query engine which way to navigate from the context node, and which test to perform when it encounters nodes along the way. An axis step has two parts, an \e{axis specifier}, and a \e{node test}. Conceptually, evaluation of an axis step proceeds as follows: For each node in the focus set, the query engine navigates out from the node along the specified axis and applies the node test to each node it encounters. The nodes selected by the node test are collected in the result set, which becomes the focus set for the next step. In the example XQuery above, the second and third steps are both axis steps. Both apply the \c{element(name)} node test to nodes encountered while traversing along some axis. But in this example, the two axis steps are written in a \l{Shorthand Form} {shorthand form}, where the axis specifier and the node test are not written explicitly but are implied. XQueries are normally written in this shorthand form, but they can also be written in the longhand form. If we rewrite the XQuery in the longhand form, it looks like this: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 22 The two axis steps have been expanded. The first step (\c{//recipe}) has been rewritten as \c{/descendant-or-self::element(recipe)}, where \c{descendant-or-self::} is the axis specifier and \c{element(recipe)} is the node test. The second step (\c{title}) has been rewritten as \c{/child::element(title)}, where \c{child::} is the axis specifier and \c{element(title)} is the node test. The output of the expanded XQuery will be exactly the same as the output of the shorthand form. To create an axis step, concatenate an axis specifier and a node test. The following sections list the axis specifiers and node tests that are available. \section2 Axis Specifiers An axis specifier defines the direction you want the query engine to take, when it navigates away from the context node. QtXmlPatterns supports the following axes. \table \header \o Axis Specifier \o refers to the axis containing... \row \o \c{self::} \o the context node itself \row \o \c{attribute::} \o all attribute nodes of the context node \row \o \c{child::} \o all child nodes of the context node (not attributes) \row \o \c{descendant::} \o all descendants of the context node (children, grandchildren, etc) \row \o \c{descendant-or-self::} \o all nodes in \c{descendant} + \c{self} \row \o \c{parent::} \o the parent node of the context node, or empty if there is no parent \row \o \c{ancestor::} \o all ancestors of the context node (parent, grandparent, etc) \row \o \c{ancestor-or-self::} \o all nodes in \c{ancestor} + \c{self} \row \o \c{following::} \o all nodes in the tree containing the context node, \e not including \c{descendant}, \e and that follow the context node in the document \row \o \c{preceding::} \o all nodes in the tree contianing the context node, \e not including \c{ancestor}, \e and that precede the context node in the document \row \o \c{following-sibling::} \o all children of the context node's \c{parent} that follow the context node in the document \row \o \c{preceding-sibling::} \o all children of the context node's \c{parent} that precede the context node in the document \endtable \section2 Node Tests A node test is a conditional expression that must be true for a node if the node is to be selected by the axis step. The conditional expression can test just the \e kind of node, or it can test the \e kind of node and the \e name of the node. The XQuery specification for \l{http://www.w3.org/TR/xquery/#node-tests} {node tests} also defines a third condition, the node's \e {Schema Type}, but schema type tests are not supported in QtXmlPatterns. QtXmlPatterns supports the following node tests. The tests that have a \c{name} parameter test the node's name in addition to its \e{kind} and are often called the \l{Name Tests}. \table \header \o Node Test \o matches all... \row \o \c{node()} \o nodes of any kind \row \o \c{text()} \o text nodes \row \o \c{comment()} \o comment nodes \row \o \c{element()} \o element nodes (same as star: *) \row \o \c{element(name)} \o element nodes named \c{name} \row \o \c{attribute()} \o attribute nodes \row \o \c{attribute(name)} \o attribute nodes named \c{name} \row \o \c{processing-instruction()} \o processing-instructions \row \o \c{processing-instruction(name)} \o processing-instructions named \c{name} \row \o \c{document-node()} \o document nodes (there is only one) \row \o \c{document-node(element(name))} \o document node with document element \c{name} \endtable \target Shorthand Form \section2 Shorthand Form Writing axis steps using the longhand form with axis specifiers and node tests is semantically clear but syntactically verbose. The shorthand form is easy to learn and, once you learn it, just as easy to read. In the shorthand form, the axis specifier and node test are implied by the syntax. XQueries are normally written in the shorthand form. Here is a table of some frequently used shorthand forms: \table \header \o Shorthand syntax \o Short for... \o matches all... \row \o \c{name} \o \c{child::element(name)} \o child nodes that are \c{name} elements \row \o \c{*} \o \c{child::element()} \o child nodes that are elements (\c{node()} matches \e all child nodes) \row \o \c{..} \o \c{parent::node()} \o parent nodes (there is only one) \row \o \c{@*} \o \c{attribute::attribute()} \o attribute nodes \row \o \c{@name} \o \c{attribute::attribute(name)} \o \c{name} attributes \row \o \c{//} \o \c{descendant-or-self::node()} \o descendent nodes (when used instead of '/') \endtable The \l{http://www.w3.org/TR/xquery/}{XQuery language specification} has a more detailed section on the shorthand form, which it calls the \l{http://www.w3.org/TR/xquery/#abbrev} {abbreviated syntax}. More examples of path expressions written in the shorthand form are found there. There is also a section listing examples of path expressions written in the the \l{http://www.w3.org/TR/xquery/#unabbrev} {longhand form}. \target Name Tests \section2 Name Tests The name tests are the \l{Node Tests} that have the \c{name} parameter. A name test must match the node \e name in addition to the node \e kind. We have already seen name tests used: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 19 In this path expression, both \c{recipe} and \c{title} are name tests written in the shorthand form. XQuery resolves these names (\l{http://www.w3.org/TR/xquery/#id-basics}{QNames}) to their expanded form using whatever \l{http://www.w3.org/TR/xquery/#dt-namespace-declaration} {namespace declarations} it knows about. Resolving a name to its expanded form means replacing its namespace prefix, if one is present (there aren't any present in the example), with a namespace URI. The expanded name then consists of the namespace URI and the local name. But the names in the example above don't have namespace prefixes, because we didn't include a namespace declaration in our \c{cookbook.xml} file. However, we will often use XQuery to query XML documents that use namespaces. Forgetting to declare the correct namespace(s) in an XQuery is a common cause of XQuery failures. Let's add a \e{default} namespace to \c{cookbook.xml} now. Change the \e{document element} in \c{cookbook.xml} from: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 23 to... \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 24 This is called a \e{default namespace} declaration because it doesn't include a namespace prefix. By including this default namespace declaration in the document element, we mean that all unprefixed \e{element} names in the document, including the document element itself (\c{cookbook}), are automatically in the default namespace \c{http://cookbook/namespace}. Note that unprefixed \e{attribute} names are not affected by the default namespace declaration. They are always considered to be in \e{no namespace}. Note also that the URL we choose as our namespace URI need not refer to an actual location, and doesn't refer to one in this case. But click on \l{http://www.w3.org/XML/1998/namespace}, for example, which is the namespace URI for elements and attributes prefixed with \c{xml:}. Now when we try to run the previous XQuery example, no output is produced! The path expression no longer matches anything in the cookbook file because our XQuery doesn't yet know about the namespace declaration we added to the cookbook document. There are two ways we can declare the namespace in the XQuery. We can give it a \e{namespace prefix} (e.g. \c{c} for cookbook) and prefix each name test with the namespace prefix: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 3 Or we can declare the namespace to be the \e{default element namespace}, and then we can still run the original XQuery: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 4 Both methods will work and produce the same output, all the \c{<title>} elements: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 5 But note how the output is slightly different from the output we saw before we added the default namespace declaration to the cookbook file. QtXmlPatterns automatically includes the correct namespace attribute in each \c{<title>} element in the output. When QtXmlPatterns loads a document and expands a QName, it creates an instance of QXmlName, which retains the namespace prefix along with the namespace URI and the local name. See QXmlName for further details. One thing to keep in mind from this namespace discussion, whether you run XQueries in a Qt program using QtXmlPatterns, or you run them from the command line using xmlpatterns, is that if you don't get the output you expect, it might be because the data you are querying uses namespaces, but you didn't declare those namespaces in your XQuery. \section3 Wildcards in Name Tests The wildcard \c{'*'} can be used in a name test. To find all the attributes in the cookbook but select only the ones in the \c{xml} namespace, use the \c{xml:} namespace prefix but replace the \e{local name} (the attribute name) with the wildcard: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 7 Oops! If you save this XQuery in \c{file.xq} and run it through \c{xmlpatterns}, it doesn't work. You get an error message instead, something like this: \e{Error SENR0001 in file:///...file.xq, at line 1, column 1: Attribute xml:id can't be serialized because it appears at the top level.} The XQuery actually ran correctly. It selected a bunch of \c{xml:id} attributes and put them in the result set. But then \c{xmlpatterns} sent the result set to a \l{QXmlSerializer} {serializer}, which tried to output it as well-formed XML. Since the result set contains only attributes and attributes alone are not well-formed XML, the \l{QXmlSerializer} {serializer} reports a \l{http://www.w3.org/TR/2005/WD-xslt-xquery-serialization-20050915/#id-errors} {serialization error}. Fear not. XQuery can do more than just find and select elements and attributes. It can \l{Constructing Elements} {construct new ones on the fly} as well, which is what we need to do here if we want \c{xmlpatterns} to let us see the attributes we selected. The example above and the ones below are revisited in the \l{Constructing Elements} section. You can jump ahead to see the modified examples now, and then come back, or you can press on from here. To find all the \c{name} attributes in the cookbook and select them all regardless of their namespace, replace the namespace prefix with the wildcard and write \c{name} (the attribute name) as the local name: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 8 To find and select all the attributes of the \e{document element} in the cookbook, replace the entire name test with the wildcard: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 9 \section1 Using Predicates In Path Expressions Predicates can be used to further filter the nodes selected by a path expression. A predicate is an expression in square brackets ('[' and ']') that either returns a boolean value or a number. A predicate can appear at the end of any path step in a path expression. The predicate is applied to each node in the focus set. If a node passes the filter, the node is included in the result set. The query below selects the recipe element that has the \c{<title>} element \c{"Hard-Boiled Eggs"}. \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 10 The dot expression ('.') can be used in predicates and path expressions to refer to the current context node. The following query uses the dot expression to refer to the current \c{<method>} element. The query selects the empty \c{<method>} elements from the cookbook. \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 11 Note that passing the dot expression to the \l{http://www.w3.org/TR/xpath-functions/#func-string-length} {string-length()} function is optional. When \l{http://www.w3.org/TR/xpath-functions/#func-string-length} {string-length()} is called with no parameter, the context node is assumed: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 12 Actually, selecting an empty \c{<method>} element might not be very useful by itself. It doesn't tell you which recipe has the empty method: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 31 \target Empty Method Not Robust What you probably want to see instead are the \c{<recipe>} elements that have empty \c{<method>} elements: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 32 The predicate uses the \l{http://www.w3.org/TR/xpath-functions/#func-string-length} {string-length()} function to test the length of each \c{<method>} element in each \c{<recipe>} element found by the node test. If a \c{<method>} contains no text, the predicate evaluates to \c{true} and the \c{<recipe>} element is selected. If the method contains some text, the predicate evaluates to \c{false}, and the \c{<recipe>} element is discarded. The output is the entire recipe that has no instructions for preparation: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 33 The astute reader will have noticed that this use of \c{string-length()} to find an empty element is unreliable. It works in this case, because the method element is written as \c{<method/>}, guaranteeing that its string length will be 0. It will still work if the method element is written as \c{<method></method>}, but it will fail if there is any whitespace between the opening and ending \c{<method>} tags. A more robust way to find the recipes with empty methods is presented in the section on \l{Boolean Predicates}. There are many more functions and operators defined for XQuery and XPath. They are all \l{http://www.w3.org/TR/xpath-functions} {documented here}. \section2 Positional Predicates Predicates are often used to filter items based on their position in a sequence. For path expressions processing items loaded from XML documents, the normal sequence is \l{http://www.w3.org/TR/xquery/#id-document-order} {document order}. This query returns the second \c{<recipe>} element in the \c{cookbook.xml} file: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 13 The other frequently used positional function is \l{http://www.w3.org/TR/xpath-functions/#func-last} {last()}, which returns the numeric position of the last item in the focus set. Stated another way, \l{http://www.w3.org/TR/xpath-functions/#func-last} {last()} returns the size of the focus set. This query returns the last recipe in the cookbook: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 16 And this query returns the next to last \c{<recipe>}: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 17 \section2 Boolean Predicates The other kind of predicate evaluates to \e true or \e false. A boolean predicate takes the value of its expression and determines its \e{effective boolean value} according to the following rules: \list \o An expression that evaluates to a single node is \c{true}. \o An expression that evaluates to a string is \c{false} if the string is empty and \c{true} if the string is not empty. \o An expression that evaluates to a boolean value (i.e. type \c{xs:boolean}) is that value. \o If the expression evaluates to anything else, it's an error (e.g. type \c{xs:date}). \endlist We have already seen some boolean predicates in use. Earlier, we saw a \e{not so robust} way to find the \l{Empty Method Not Robust} {recipes that have no instructions}. \c{[string-length(method) = 0]} is a boolean predicate that would fail in the example if the empty method element was written with both opening and closing tags and there was whitespace between the tags. Here is a more robust way that uses a different boolean predicate. \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 34 This one uses the \l{http://www.w3.org/TR/xpath-functions/#func-empty} {empty()} and function to test whether the method contains any steps. If the method contains no steps, then \c{empty(step)} will return \c{true}, and hence the predicate will evaluate to \c{true}. But even that version isn't foolproof. Suppose the method does contain steps, but all the steps themselves are empty. That's still a case of a recipe with no instructions that won't be detected. There is a better way: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 35 This version uses the \l{http://www.w3.org/TR/xpath-functions/#func-not} {not} and \l{http://www.w3.org/TR/xpath-functions/#func-normalize-space} {normalize-space()} functions. \c{normalize-space(method))} returns the contents of the method element as a string, but with all the whitespace normalized, i.e., the string value of each \c{<step>} element will have its whitespace normalized, and then all the normalized step values will be concatenated. If that string is empty, then \c{not()} returns \c{true} and the predicate is \c{true}. We can also use the \l{http://www.w3.org/TR/xpath-functions/#func-position} {position()} function in a comparison to inspect positions with conditional logic. The \l{http://www.w3.org/TR/xpath-functions/#func-position} {position()} function returns the position index of the current context item in the sequence of items: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 14 Note that the first position in the sequence is position 1, not 0. We can also select \e{all} the recipes after the first one: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 15 \target Constructing Elements \section1 Constructing Elements In the section about \l{Wildcards in Name Tests} {using wildcards in name tests}, we saw three simple example XQueries, each of which selected a different list of XML attributes from the cookbook. We couldn't use \c{xmlpatterns} to run these queries, however, because \c{xmlpatterns} sends the XQuery results to a \l{QXmlSerializer} {serializer}, which expects to serialize the results as well-formed XML. Since a list of XML attributes by itself is not well-formed XML, the serializer reported an error for each XQuery. Since an attribute must appear in an element, for each attribute in the result set, we must create an XML element. We can do that using a \l{http://www.w3.org/TR/xquery/#id-for-let} {\e{for} clause} with a \l{http://www.w3.org/TR/xquery/#id-variables} {bound variable}, and a \l{http://www.w3.org/TR/xquery/#id-orderby-return} {\e{return} clause} with an element constructor: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 25 The \e{for} clause produces a sequence of attribute nodes from the result of the path expression. Each attribute node in the sequence is bound to the variable \c{$i}. The \e{return} clause then constructs a \c{} element around the attribute node. Here is the output: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 28 The output contains one \c{} element for each \c{xml:id} attribute in the cookbook. Note that XQuery puts each attribute in the right place in its \c{} element, despite the fact that in the \e{return} clause, the \c{$i} variable is positioned as if it is meant to become \c{} element content. The other two examples from the \l{Wildcards in Name Tests} {wildcard} section can be rewritten the same way. Here is the XQuery that selects all the \c{name} attributes, regardless of namespace: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 26 And here is its output: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 29 And here is the XQuery that selects all the attributes from the \e{document element}: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 27 And here is its output: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 30 \section2 Element Constructors are Expressions Because node constructors are expressions, they can be used in XQueries wherever expressions are allowed. \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 40 If \c{cookbook.xml} is loaded without error, a \c{<resept>} element (Norweigian word for recipe) is constructed for each \c{<recipe>} element in the cookbook, and the child nodes of the \c{<recipe>} are copied into the \c{<resept>} element. But if the cookbook document doesn't exist or does not contain well-formed XML, a single \c{<resept>} element is constructed containing an error message. \section1 Constructing Atomic Values XQuery also has atomic values. An atomic value is a value in the value space of one of the built-in datatypes in the \l {http://www.w3.org/TR/xmlschema-2} {XML Schema language}. These \e{atomic types} have built-in operators for doing arithmetic, comparisons, and for converting values to other atomic types. See the \l {http://www.w3.org/TR/xmlschema-2/#built-in-datatypes} {Built-in Datatype Hierarchy} for the entire tree of built-in, primitive and derived atomic types. \note Click on a data type in the tree for its detailed specification. To construct an atomic value as element content, enclose an expression in curly braces and embed it in the element constructor: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 36 Sending this XQuery through xmlpatterns produces: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 37 To compute the value of an attribute, enclose the expression in curly braces and embed it in the attribute value: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 38 Sending this XQuery through xmlpatterns produces: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 39 \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 40 If \c{cookbook.xml} is loaded without error, a \c{<resept>} element (Norweigian word for recipe) is constructed for each \c{<recipe>} element in the cookbook, and the child nodes of the \c{<recipe>} are copied into the \c{<resept>} element. But if the cookbook document doesn't exist or does not contain well-formed XML, a single \c{<resept>} element is constructed containing an error message. \section1 Running The Cookbook Examples Most of the XQuery examples in this document refer to the cookbook written in XML shown below. Save it as \c{cookbook.xml}. In the same directory, save one of the cookbook XQuery examples in a \c{.xq} file (e.g. \c{file.xq}). Run the XQuery using Qt's command line utility: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 6 \section2 cookbook.xml \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 100 \section1 Further Reading There is much more to the XQuery language than we have presented in this short introduction. We will be adding more here in later releases. In the meantime, playing with the \c{xmlpatterns} utility and making modifications to the XQuery examples provided here will be quite informative. An XQuery textbook will be a good investment. You can also ask questions on XQuery mail lists: \list \o \l{http://lists.trolltech.com/qt-interest/}{qt-interest} \o \l{http://www.x-query.com/mailman/listinfo/talk}{talk at x-query.com}. \endlist \l{http://www.functx.com/functx/}{FunctX} has a collection of XQuery functions that can be both useful and educational. This introduction contains many links to the specifications, which, of course, are the ultimate source of information about XQuery. They can be a bit difficult, though, so consider investing in a textbook: \list \o \l{http://www.w3.org/TR/xquery/}{XQuery 1.0: An XML Query Language} - the main source for syntax and semantics. \o \l{http://www.w3.org/TR/xpath-functions/}{XQuery 1.0 and XPath 2.0 Functions and Operators} - the builtin functions and operators. \endlist \section1 FAQ The answers to these frequently asked questions explain the causes of several common mistakes that most beginners make. Reading through the answers ahead of time might save you a lot of head scratching. \section2 Why didn't my path expression match anything? The most common cause of this bug is failure to declare one or more namespaces in your XQuery. Consider the following query for selecting all the examples in an XHTML document: \quotefile snippets/patternist/simpleHTML.xq It won't match anything because \c{index.html} is an XHTML file, and all XHTML files declare the default namespace \c{"http://www.w3.org/1999/xhtml"} in their top (\c{<html>}) element. But the query doesn't declare this namespace, so the path expression expands \c{html} to \c{{}html} and tries to match that expanded name. But the actual expanded name is \c{{http://www.w3.org/1999/xhtml}html}. One possible fix is to declare the correct default namespace in the XQuery: \quotefile snippets/patternist/simpleXHTML.xq Another common cause of this bug is to confuse the \e{document node} with the top element node. They are different. This query won't match anything: \quotefile snippets/patternist/docPlainHTML.xq The \c{doc()} function returns the \e{document node}, not the top element node (\c{<html>}). Don't forget to match the top element node in the path expression: \quotefile snippets/patternist/docPlainHTML2.xq \section2 What if my input namespace is different from my output namespace? Just remember to declare both namespaces in your XQuery and use them properly. Consider the following query, which is meant to generate XHTML output from XML input: \quotefile snippets/patternist/embedDataInXHTML.xq We want the \c{<html>}, \c{<body>}, and \c{} nodes we create in the output to be in the standard XHTML namespace, so we declare the default namespace to be \c{http://www.w3.org/1999/xhtml}. That's correct for the output, but that same default namespace will also be applied to the node names in the path expression we're trying to match in the input (\c{/tests/test[@status = "failure"]}), which is wrong, because the namespace used in \c{testResult.xml} is perhaps in the empty namespace. So we must declare that namespace too, with a namespace prefix, and then use the prefix with the node names in the path expression. This one will probably work better: \quotefile snippets/patternist/embedDataInXHTML2.xq \section2 Why doesn't my return clause work? Recall that XQuery is an \e{expression-based} language, not \e{statement-based}. Because an XQuery is a lot of expressions, understanding XQuery expression precedence is very important. Consider the following query: \quotefile snippets/patternist/forClause2.xq It looks ok, but it isn't. It is supposed to be a FLWOR expression comprising a \e{for} clause and a \e{return} clause, but it isn't just that. It \e{has} a FLWOR expression, certainly (with the \e{for} and \e{return} clauses), but it \e{also} has an arithmetic expression (\e{+ $d}) dangling at the end because we didn't enclose the return expression in parentheses. Using parentheses to establish precedence is more important in XQuery than in other languages, because XQuery is \e{expression-based}. In In this case, without parantheses enclosing \c{$i + $d}, the return clause only returns \c{$i}. The \c{+$d} will have the result of the FLWOR expression as its left operand. And, since the scope of variable \c{$d} ends at the end of the \e{return} clause, a variable out of scope error will be reported. Correct these problems by using parentheses. \quotefile snippets/patternist/forClause.xq \section2 Why didn't my expression get evaluated? You probably misplaced some curly braces. When you want an expression evaluated inside an element constructor, enclose the expression in curly braces. Without the curly braces, the expression will be interpreted as text. Here is a \c{sum()} expression used in an \c{<e>} element. The table shows cases where the curly braces are missing, misplaced, and placed correctly: \table \header \o element constructor with expression... \o evaluates to... \row \o <e>sum((1, 2, 3))</e> \o <e>sum((1, 2, 3))</e> \row \o <e>sum({(1, 2, 3)})</e> \o <e>sum(1 2 3)</e> \row \o <e>{sum((1, 2, 3))}</e> \o <e>6</e> \endtable \section2 My predicate is correct, so why doesn't it select the right stuff? Either you put your predicate in the wrong place in your path expression, or you forgot to add some parentheses. Consider this input file \c{doc.txt}: \quotefile snippets/patternist/doc.txt Suppose you want the first \c{} element of every \c{} element. Apply a position filter (\c{[1]}) to the \c{/span} path step: \quotefile snippets/patternist/filterOnStep.xq Applying the \c{[1]} filter to the \c{/span} step returns the first \c{} element of each \c{} element: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 41 \note: You can write the same query this way: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 44 Or you can reduce it right down to this: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 45 On the other hand, suppose you really want only one \c{} element, the first one in the document (i.e., you only want the first \c{} element in the first \c{} element). Then you have to do more filtering. There are two ways you can do it. You can apply the \c{[1]} filter in the same place as above but enclose the path expression in parentheses: \quotefile snippets/patternist/filterOnPath.xq Or you can apply a second position filter (\c{[1]} again) to the \c{/p} path step: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 43 Either way the query will return only the first \c{} element in the document: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 42 \section2 Why doesn't my FLWOR behave as expected? The quick answer is you probably expected your XQuery FLWOR to behave just like a C++ \e{for} loop. But they aren't the same. Consider a simple example: \quotefile snippets/patternist/letOrderBy.xq This query evaluates to \e{4 -4 -2 2 -8 8}. The \e{for} clause does set up a \e{for} loop style iteration, which does evaluate the rest of the FLWOR multiple times, one time for each value returned by the \e{in} expression. That much is similar to the C++ \e{for} loop. But consider the \e{return} clause. In C++ if you hit a \e{return} statement, you break out of the \e{for} loop and return from the function with one value. Not so in XQuery. The \e{return} clause is the last clause of the FLWOR, and it means: \e{Append the return value to the result list and then begin the next iteration of the FLWOR}. When the \e{for} clause's \e{in} expression no longer returns a value, the entire result list is returned. Next, consider the \e{order by} clause. It doesn't do any sorting on each iteration of the FLWOR. It just evaluates its expression on each iteration (\c{$a} in this case) to get an ordering value to map to the result item from each iteration. These ordering values are kept in a parallel list. The result list is sorted at the end using the parallel list of ordering values. The last difference to note here is that the \e{let} clause does \e{not} set up an iteration through a sequence of values like the \e{for} clause does. The \e{let} clause isn't a sort of nested loop. It isn't a loop at all. It is just a variable binding. On each iteration, it binds the \e{entire} sequence of values on the right to the variable on the left. In the example above, it binds (4 -4) to \c{$b} on the first iteration, (-2 2) on the second iteration, and (-8 8) on the third iteration. So the following query doesn't iterate through anything, and doesn't do any ordering: \quotefile snippets/patternist/invalidLetOrderBy.xq It binds the entire sequence (2, 3, 1) to \c{$i} one time only; the \e{order by} clause only has one thing to order and hence does nothing, and the query evaluates to 2 3 1, the sequence assigned to \c{$i}. \note We didn't include a \e{where} clause in the example. The \e{where} clause is for filtering results. \section2 Why are my elements created in the wrong order? The short answer is your elements are \e{not} created in the wrong order, because when appearing as operands to a path expression, there is no correct order. Consider the following query, which again uses the input file \c{doc.txt}: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 46 The query finds all the \c{} elements in the file. For each \c{} element, it builds a \c{} element in the output containing the concatenated contents of all the \c{} element's child \c{} elements. Running the query through \c{xmlpatterns} might produce the following output, which is not sorted in the expected order. \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 47 You can use a \e{for} loop to ensure that the order of the result set corresponds to the order of the input sequence: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 48 This version produces the same result set but in the expected order: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 49 \section2 Why can't I use \c{true} and \c{false} in my XQuery? You can, but not by just using the names \c{true} and \c{false} directly, because they are \l{Name Tests} {name tests} although they look like boolean constants. The simple way to create the boolean values is to use the builtin functions \c{true()} and \c{false()} wherever you want to use \c{true} and \c{false}. The other way is to invoke the boolean constructor: \quotefile snippets/patternist/xsBooleanTrue.xq */

/**************************************************************************** ** ** Copyright (C) 2009 Nokia Corporation and/or its subsidiary(-ies). ** Contact: Qt Software Information (qt-info@nokia.com) ** ** This file is part of the documentation of the Qt Toolkit. ** ** $QT_BEGIN_LICENSE:LGPL$ ** Commercial Usage ** Licensees holding valid Qt Commercial licenses may use this file in ** accordance with the Qt Commercial License Agreement provided with the ** Software or, alternatively, in accordance with the terms contained in ** a written agreement between you and Nokia. ** ** GNU Lesser General Public License Usage ** Alternatively, this file may be used under the terms of the GNU Lesser ** General Public License version 2.1 as published by the Free Software ** Foundation and appearing in the file LICENSE.LGPL included in the ** packaging of this file. Please review the following information to ** ensure the GNU Lesser General Public License version 2.1 requirements ** will be met: http://www.gnu.org/licenses/old-licenses/lgpl-2.1.html. ** ** In addition, as a special exception, Nokia gives you certain ** additional rights. These rights are described in the Nokia Qt LGPL ** Exception version 1.0, included in the file LGPL_EXCEPTION.txt in this ** package. ** ** GNU General Public License Usage ** Alternatively, this file may be used under the terms of the GNU ** General Public License version 3.0 as published by the Free Software ** Foundation and appearing in the file LICENSE.GPL included in the ** packaging of this file. Please review the following information to ** ensure the GNU General Public License version 3.0 requirements will be ** met: http://www.gnu.org/copyleft/gpl.html. ** ** If you are unsure which license is appropriate for your use, please ** contact the sales department at qt-sales@nokia.com. ** $QT_END_LICENSE$ ** ****************************************************************************/ /*! \page xquery-introduction.html \title A Short Path to XQuery \ingroup scripting \startpage index.html QtReference Documentation \target XQuery-introduction XQuery is a language for querying XML data or non-XML data that can be modeled as XML. XQuery is specified by the \l{http://www.w3.org}{W3C}. \tableofcontents \section1 Introduction Where Java and C++ are \e{statement-based} languages, the XQuery language is \e{expression-based}. The simplest XQuery expression is an XML element constructor: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 20 This \c{} element is an XQuery expression that forms a complete XQuery. In fact, this XQuery doesn't actually query anything. It just creates an empty \c{} element in the output. But \l{Constructing Elements} {constructing new elements in an XQuery} is often necessary. An XQuery expression can also be enclosed in curly braces and embedded in another XQuery expression. This XQuery has a document expression embedded in a node expression: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 21 It creates a new \c{} element in the output and sets its \c{id} attribute to be the \c{id} attribute from an \c{} element in the \c{other.html} file. \section1 Using Path Expressions To Match & Select Items In C++ and Java, we write nested \c{for} loops and recursive functions to traverse XML trees in search of elements of interest. In XQuery, we write these iterative and recursive algorithms with \e{path expressions}. A path expression looks somewhat like a typical \e{file pathname} for locating a file in a hierarchical file system. It is a sequence of one or more \e{steps} separated by slash '/' or double slash '//'. Although path expressions are used for traversing XML trees, not file systems, in QtXmlPatterms we can model a file system to look like an XML tree, so in QtXmlPatterns we can use XQuery to traverse a file system. See the \l {File System Example} {file system example}. Think of a path expression as an algorithm for traversing an XML tree to find and collect items of interest. This algorithm is evaluated by evaluating each step moving from left to right through the sequence. A step is evaluated with a set of input items (nodes and atomic values), sometimes called the \e focus. The step is evaluated for each item in the focus. These evaluations produce a new set of items, called the \e result, which then becomes the focus that is passed to the next step. Evaluation of the final step produces the final result, which is the result of the XQuery. The items in the result set are presented in \l{http://www.w3.org/TR/xquery/#id-document-order} {document order} and without duplicates. With QtXmlPatterns, a standard way to present the initial focus to a query is to call QXmlQuery::setFocus(). Another common way is to let the XQuery itself create the initial focus by using the first step of the path expression to call the XQuery \c{doc()} function. The \c{doc()} function loads an XML document and returns the \e {document node}. Note that the document node is \e{not} the same as the \e{document element}. The \e{document node} is a node constructed in memory, when the document is loaded. It represents the entire XML document, not the document element. The \e{document element} is the single, top-level XML element in the file. The \c{doc()} function returns the document node, which becomes the singleton node in the initial focus set. The document node will have one child node, and that child node will represent the document element. Consider the following XQuery: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 18 The \c{doc()} function loads the file \l{cookbook.xml} and returns the document node. The document node then becomes the focus for the next step \c{//recipe}. Here the double slash means select all \c{} elements found below the document node, regardless of where they appear in the document tree. The query selects all \c{} elements in the cookbook. See \l{Running The Cookbook Examples} for instructions on how to run this query (and most of the ones that follow) from the command line. Conceptually, evaluation of the steps of a path expression is similar to iterating through the same number of nested \e{for} loops. Consider the following XQuery, which builds on the previous one: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 19 This XQuery is a single path expression composed of three steps. The first step creates the initial focus by calling the \c{doc()} function. We can paraphrase what the query engine does at each step: \list 1 \o for each node in the initial focus (the document node)... \o for each descendant node that is a \c{} element... \o collect the child nodes that are \c{} elements. \endlist Again the double slash means select all the \c{<recipe>} elements in the document. The single slash before the \c{<title>} element means select only those \c{<title>} elements that are \e{child} elements of a \c{<recipe>} element (i.e. not grandchildren, etc). The XQuery evaluates to a final result set containing the \c{<title>} element of each \c{<recipe>} element in the cookbook. \section2 Axis Steps The most common kind of path step is called an \e{axis step}, which tells the query engine which way to navigate from the context node, and which test to perform when it encounters nodes along the way. An axis step has two parts, an \e{axis specifier}, and a \e{node test}. Conceptually, evaluation of an axis step proceeds as follows: For each node in the focus set, the query engine navigates out from the node along the specified axis and applies the node test to each node it encounters. The nodes selected by the node test are collected in the result set, which becomes the focus set for the next step. In the example XQuery above, the second and third steps are both axis steps. Both apply the \c{element(name)} node test to nodes encountered while traversing along some axis. But in this example, the two axis steps are written in a \l{Shorthand Form} {shorthand form}, where the axis specifier and the node test are not written explicitly but are implied. XQueries are normally written in this shorthand form, but they can also be written in the longhand form. If we rewrite the XQuery in the longhand form, it looks like this: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 22 The two axis steps have been expanded. The first step (\c{//recipe}) has been rewritten as \c{/descendant-or-self::element(recipe)}, where \c{descendant-or-self::} is the axis specifier and \c{element(recipe)} is the node test. The second step (\c{title}) has been rewritten as \c{/child::element(title)}, where \c{child::} is the axis specifier and \c{element(title)} is the node test. The output of the expanded XQuery will be exactly the same as the output of the shorthand form. To create an axis step, concatenate an axis specifier and a node test. The following sections list the axis specifiers and node tests that are available. \section2 Axis Specifiers An axis specifier defines the direction you want the query engine to take, when it navigates away from the context node. QtXmlPatterns supports the following axes. \table \header \o Axis Specifier \o refers to the axis containing... \row \o \c{self::} \o the context node itself \row \o \c{attribute::} \o all attribute nodes of the context node \row \o \c{child::} \o all child nodes of the context node (not attributes) \row \o \c{descendant::} \o all descendants of the context node (children, grandchildren, etc) \row \o \c{descendant-or-self::} \o all nodes in \c{descendant} + \c{self} \row \o \c{parent::} \o the parent node of the context node, or empty if there is no parent \row \o \c{ancestor::} \o all ancestors of the context node (parent, grandparent, etc) \row \o \c{ancestor-or-self::} \o all nodes in \c{ancestor} + \c{self} \row \o \c{following::} \o all nodes in the tree containing the context node, \e not including \c{descendant}, \e and that follow the context node in the document \row \o \c{preceding::} \o all nodes in the tree contianing the context node, \e not including \c{ancestor}, \e and that precede the context node in the document \row \o \c{following-sibling::} \o all children of the context node's \c{parent} that follow the context node in the document \row \o \c{preceding-sibling::} \o all children of the context node's \c{parent} that precede the context node in the document \endtable \section2 Node Tests A node test is a conditional expression that must be true for a node if the node is to be selected by the axis step. The conditional expression can test just the \e kind of node, or it can test the \e kind of node and the \e name of the node. The XQuery specification for \l{http://www.w3.org/TR/xquery/#node-tests} {node tests} also defines a third condition, the node's \e {Schema Type}, but schema type tests are not supported in QtXmlPatterns. QtXmlPatterns supports the following node tests. The tests that have a \c{name} parameter test the node's name in addition to its \e{kind} and are often called the \l{Name Tests}. \table \header \o Node Test \o matches all... \row \o \c{node()} \o nodes of any kind \row \o \c{text()} \o text nodes \row \o \c{comment()} \o comment nodes \row \o \c{element()} \o element nodes (same as star: *) \row \o \c{element(name)} \o element nodes named \c{name} \row \o \c{attribute()} \o attribute nodes \row \o \c{attribute(name)} \o attribute nodes named \c{name} \row \o \c{processing-instruction()} \o processing-instructions \row \o \c{processing-instruction(name)} \o processing-instructions named \c{name} \row \o \c{document-node()} \o document nodes (there is only one) \row \o \c{document-node(element(name))} \o document node with document element \c{name} \endtable \target Shorthand Form \section2 Shorthand Form Writing axis steps using the longhand form with axis specifiers and node tests is semantically clear but syntactically verbose. The shorthand form is easy to learn and, once you learn it, just as easy to read. In the shorthand form, the axis specifier and node test are implied by the syntax. XQueries are normally written in the shorthand form. Here is a table of some frequently used shorthand forms: \table \header \o Shorthand syntax \o Short for... \o matches all... \row \o \c{name} \o \c{child::element(name)} \o child nodes that are \c{name} elements \row \o \c{*} \o \c{child::element()} \o child nodes that are elements (\c{node()} matches \e all child nodes) \row \o \c{..} \o \c{parent::node()} \o parent nodes (there is only one) \row \o \c{@*} \o \c{attribute::attribute()} \o attribute nodes \row \o \c{@name} \o \c{attribute::attribute(name)} \o \c{name} attributes \row \o \c{//} \o \c{descendant-or-self::node()} \o descendent nodes (when used instead of '/') \endtable The \l{http://www.w3.org/TR/xquery/}{XQuery language specification} has a more detailed section on the shorthand form, which it calls the \l{http://www.w3.org/TR/xquery/#abbrev} {abbreviated syntax}. More examples of path expressions written in the shorthand form are found there. There is also a section listing examples of path expressions written in the the \l{http://www.w3.org/TR/xquery/#unabbrev} {longhand form}. \target Name Tests \section2 Name Tests The name tests are the \l{Node Tests} that have the \c{name} parameter. A name test must match the node \e name in addition to the node \e kind. We have already seen name tests used: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 19 In this path expression, both \c{recipe} and \c{title} are name tests written in the shorthand form. XQuery resolves these names (\l{http://www.w3.org/TR/xquery/#id-basics}{QNames}) to their expanded form using whatever \l{http://www.w3.org/TR/xquery/#dt-namespace-declaration} {namespace declarations} it knows about. Resolving a name to its expanded form means replacing its namespace prefix, if one is present (there aren't any present in the example), with a namespace URI. The expanded name then consists of the namespace URI and the local name. But the names in the example above don't have namespace prefixes, because we didn't include a namespace declaration in our \c{cookbook.xml} file. However, we will often use XQuery to query XML documents that use namespaces. Forgetting to declare the correct namespace(s) in an XQuery is a common cause of XQuery failures. Let's add a \e{default} namespace to \c{cookbook.xml} now. Change the \e{document element} in \c{cookbook.xml} from: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 23 to... \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 24 This is called a \e{default namespace} declaration because it doesn't include a namespace prefix. By including this default namespace declaration in the document element, we mean that all unprefixed \e{element} names in the document, including the document element itself (\c{cookbook}), are automatically in the default namespace \c{http://cookbook/namespace}. Note that unprefixed \e{attribute} names are not affected by the default namespace declaration. They are always considered to be in \e{no namespace}. Note also that the URL we choose as our namespace URI need not refer to an actual location, and doesn't refer to one in this case. But click on \l{http://www.w3.org/XML/1998/namespace}, for example, which is the namespace URI for elements and attributes prefixed with \c{xml:}. Now when we try to run the previous XQuery example, no output is produced! The path expression no longer matches anything in the cookbook file because our XQuery doesn't yet know about the namespace declaration we added to the cookbook document. There are two ways we can declare the namespace in the XQuery. We can give it a \e{namespace prefix} (e.g. \c{c} for cookbook) and prefix each name test with the namespace prefix: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 3 Or we can declare the namespace to be the \e{default element namespace}, and then we can still run the original XQuery: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 4 Both methods will work and produce the same output, all the \c{<title>} elements: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 5 But note how the output is slightly different from the output we saw before we added the default namespace declaration to the cookbook file. QtXmlPatterns automatically includes the correct namespace attribute in each \c{<title>} element in the output. When QtXmlPatterns loads a document and expands a QName, it creates an instance of QXmlName, which retains the namespace prefix along with the namespace URI and the local name. See QXmlName for further details. One thing to keep in mind from this namespace discussion, whether you run XQueries in a Qt program using QtXmlPatterns, or you run them from the command line using xmlpatterns, is that if you don't get the output you expect, it might be because the data you are querying uses namespaces, but you didn't declare those namespaces in your XQuery. \section3 Wildcards in Name Tests The wildcard \c{'*'} can be used in a name test. To find all the attributes in the cookbook but select only the ones in the \c{xml} namespace, use the \c{xml:} namespace prefix but replace the \e{local name} (the attribute name) with the wildcard: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 7 Oops! If you save this XQuery in \c{file.xq} and run it through \c{xmlpatterns}, it doesn't work. You get an error message instead, something like this: \e{Error SENR0001 in file:///...file.xq, at line 1, column 1: Attribute xml:id can't be serialized because it appears at the top level.} The XQuery actually ran correctly. It selected a bunch of \c{xml:id} attributes and put them in the result set. But then \c{xmlpatterns} sent the result set to a \l{QXmlSerializer} {serializer}, which tried to output it as well-formed XML. Since the result set contains only attributes and attributes alone are not well-formed XML, the \l{QXmlSerializer} {serializer} reports a \l{http://www.w3.org/TR/2005/WD-xslt-xquery-serialization-20050915/#id-errors} {serialization error}. Fear not. XQuery can do more than just find and select elements and attributes. It can \l{Constructing Elements} {construct new ones on the fly} as well, which is what we need to do here if we want \c{xmlpatterns} to let us see the attributes we selected. The example above and the ones below are revisited in the \l{Constructing Elements} section. You can jump ahead to see the modified examples now, and then come back, or you can press on from here. To find all the \c{name} attributes in the cookbook and select them all regardless of their namespace, replace the namespace prefix with the wildcard and write \c{name} (the attribute name) as the local name: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 8 To find and select all the attributes of the \e{document element} in the cookbook, replace the entire name test with the wildcard: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 9 \section1 Using Predicates In Path Expressions Predicates can be used to further filter the nodes selected by a path expression. A predicate is an expression in square brackets ('[' and ']') that either returns a boolean value or a number. A predicate can appear at the end of any path step in a path expression. The predicate is applied to each node in the focus set. If a node passes the filter, the node is included in the result set. The query below selects the recipe element that has the \c{<title>} element \c{"Hard-Boiled Eggs"}. \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 10 The dot expression ('.') can be used in predicates and path expressions to refer to the current context node. The following query uses the dot expression to refer to the current \c{<method>} element. The query selects the empty \c{<method>} elements from the cookbook. \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 11 Note that passing the dot expression to the \l{http://www.w3.org/TR/xpath-functions/#func-string-length} {string-length()} function is optional. When \l{http://www.w3.org/TR/xpath-functions/#func-string-length} {string-length()} is called with no parameter, the context node is assumed: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 12 Actually, selecting an empty \c{<method>} element might not be very useful by itself. It doesn't tell you which recipe has the empty method: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 31 \target Empty Method Not Robust What you probably want to see instead are the \c{<recipe>} elements that have empty \c{<method>} elements: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 32 The predicate uses the \l{http://www.w3.org/TR/xpath-functions/#func-string-length} {string-length()} function to test the length of each \c{<method>} element in each \c{<recipe>} element found by the node test. If a \c{<method>} contains no text, the predicate evaluates to \c{true} and the \c{<recipe>} element is selected. If the method contains some text, the predicate evaluates to \c{false}, and the \c{<recipe>} element is discarded. The output is the entire recipe that has no instructions for preparation: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 33 The astute reader will have noticed that this use of \c{string-length()} to find an empty element is unreliable. It works in this case, because the method element is written as \c{<method/>}, guaranteeing that its string length will be 0. It will still work if the method element is written as \c{<method></method>}, but it will fail if there is any whitespace between the opening and ending \c{<method>} tags. A more robust way to find the recipes with empty methods is presented in the section on \l{Boolean Predicates}. There are many more functions and operators defined for XQuery and XPath. They are all \l{http://www.w3.org/TR/xpath-functions} {documented here}. \section2 Positional Predicates Predicates are often used to filter items based on their position in a sequence. For path expressions processing items loaded from XML documents, the normal sequence is \l{http://www.w3.org/TR/xquery/#id-document-order} {document order}. This query returns the second \c{<recipe>} element in the \c{cookbook.xml} file: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 13 The other frequently used positional function is \l{http://www.w3.org/TR/xpath-functions/#func-last} {last()}, which returns the numeric position of the last item in the focus set. Stated another way, \l{http://www.w3.org/TR/xpath-functions/#func-last} {last()} returns the size of the focus set. This query returns the last recipe in the cookbook: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 16 And this query returns the next to last \c{<recipe>}: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 17 \section2 Boolean Predicates The other kind of predicate evaluates to \e true or \e false. A boolean predicate takes the value of its expression and determines its \e{effective boolean value} according to the following rules: \list \o An expression that evaluates to a single node is \c{true}. \o An expression that evaluates to a string is \c{false} if the string is empty and \c{true} if the string is not empty. \o An expression that evaluates to a boolean value (i.e. type \c{xs:boolean}) is that value. \o If the expression evaluates to anything else, it's an error (e.g. type \c{xs:date}). \endlist We have already seen some boolean predicates in use. Earlier, we saw a \e{not so robust} way to find the \l{Empty Method Not Robust} {recipes that have no instructions}. \c{[string-length(method) = 0]} is a boolean predicate that would fail in the example if the empty method element was written with both opening and closing tags and there was whitespace between the tags. Here is a more robust way that uses a different boolean predicate. \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 34 This one uses the \l{http://www.w3.org/TR/xpath-functions/#func-empty} {empty()} and function to test whether the method contains any steps. If the method contains no steps, then \c{empty(step)} will return \c{true}, and hence the predicate will evaluate to \c{true}. But even that version isn't foolproof. Suppose the method does contain steps, but all the steps themselves are empty. That's still a case of a recipe with no instructions that won't be detected. There is a better way: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 35 This version uses the \l{http://www.w3.org/TR/xpath-functions/#func-not} {not} and \l{http://www.w3.org/TR/xpath-functions/#func-normalize-space} {normalize-space()} functions. \c{normalize-space(method))} returns the contents of the method element as a string, but with all the whitespace normalized, i.e., the string value of each \c{<step>} element will have its whitespace normalized, and then all the normalized step values will be concatenated. If that string is empty, then \c{not()} returns \c{true} and the predicate is \c{true}. We can also use the \l{http://www.w3.org/TR/xpath-functions/#func-position} {position()} function in a comparison to inspect positions with conditional logic. The \l{http://www.w3.org/TR/xpath-functions/#func-position} {position()} function returns the position index of the current context item in the sequence of items: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 14 Note that the first position in the sequence is position 1, not 0. We can also select \e{all} the recipes after the first one: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 15 \target Constructing Elements \section1 Constructing Elements In the section about \l{Wildcards in Name Tests} {using wildcards in name tests}, we saw three simple example XQueries, each of which selected a different list of XML attributes from the cookbook. We couldn't use \c{xmlpatterns} to run these queries, however, because \c{xmlpatterns} sends the XQuery results to a \l{QXmlSerializer} {serializer}, which expects to serialize the results as well-formed XML. Since a list of XML attributes by itself is not well-formed XML, the serializer reported an error for each XQuery. Since an attribute must appear in an element, for each attribute in the result set, we must create an XML element. We can do that using a \l{http://www.w3.org/TR/xquery/#id-for-let} {\e{for} clause} with a \l{http://www.w3.org/TR/xquery/#id-variables} {bound variable}, and a \l{http://www.w3.org/TR/xquery/#id-orderby-return} {\e{return} clause} with an element constructor: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 25 The \e{for} clause produces a sequence of attribute nodes from the result of the path expression. Each attribute node in the sequence is bound to the variable \c{$i}. The \e{return} clause then constructs a \c{} element around the attribute node. Here is the output: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 28 The output contains one \c{} element for each \c{xml:id} attribute in the cookbook. Note that XQuery puts each attribute in the right place in its \c{} element, despite the fact that in the \e{return} clause, the \c{$i} variable is positioned as if it is meant to become \c{} element content. The other two examples from the \l{Wildcards in Name Tests} {wildcard} section can be rewritten the same way. Here is the XQuery that selects all the \c{name} attributes, regardless of namespace: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 26 And here is its output: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 29 And here is the XQuery that selects all the attributes from the \e{document element}: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 27 And here is its output: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 30 \section2 Element Constructors are Expressions Because node constructors are expressions, they can be used in XQueries wherever expressions are allowed. \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 40 If \c{cookbook.xml} is loaded without error, a \c{<resept>} element (Norweigian word for recipe) is constructed for each \c{<recipe>} element in the cookbook, and the child nodes of the \c{<recipe>} are copied into the \c{<resept>} element. But if the cookbook document doesn't exist or does not contain well-formed XML, a single \c{<resept>} element is constructed containing an error message. \section1 Constructing Atomic Values XQuery also has atomic values. An atomic value is a value in the value space of one of the built-in datatypes in the \l {http://www.w3.org/TR/xmlschema-2} {XML Schema language}. These \e{atomic types} have built-in operators for doing arithmetic, comparisons, and for converting values to other atomic types. See the \l {http://www.w3.org/TR/xmlschema-2/#built-in-datatypes} {Built-in Datatype Hierarchy} for the entire tree of built-in, primitive and derived atomic types. \note Click on a data type in the tree for its detailed specification. To construct an atomic value as element content, enclose an expression in curly braces and embed it in the element constructor: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 36 Sending this XQuery through xmlpatterns produces: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 37 To compute the value of an attribute, enclose the expression in curly braces and embed it in the attribute value: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 38 Sending this XQuery through xmlpatterns produces: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 39 \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 40 If \c{cookbook.xml} is loaded without error, a \c{<resept>} element (Norweigian word for recipe) is constructed for each \c{<recipe>} element in the cookbook, and the child nodes of the \c{<recipe>} are copied into the \c{<resept>} element. But if the cookbook document doesn't exist or does not contain well-formed XML, a single \c{<resept>} element is constructed containing an error message. \section1 Running The Cookbook Examples Most of the XQuery examples in this document refer to the cookbook written in XML shown below. Save it as \c{cookbook.xml}. In the same directory, save one of the cookbook XQuery examples in a \c{.xq} file (e.g. \c{file.xq}). Run the XQuery using Qt's command line utility: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 6 \section2 cookbook.xml \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 100 \section1 Further Reading There is much more to the XQuery language than we have presented in this short introduction. We will be adding more here in later releases. In the meantime, playing with the \c{xmlpatterns} utility and making modifications to the XQuery examples provided here will be quite informative. An XQuery textbook will be a good investment. You can also ask questions on XQuery mail lists: \list \o \l{http://lists.trolltech.com/qt-interest/}{qt-interest} \o \l{http://www.x-query.com/mailman/listinfo/talk}{talk at x-query.com}. \endlist \l{http://www.functx.com/functx/}{FunctX} has a collection of XQuery functions that can be both useful and educational. This introduction contains many links to the specifications, which, of course, are the ultimate source of information about XQuery. They can be a bit difficult, though, so consider investing in a textbook: \list \o \l{http://www.w3.org/TR/xquery/}{XQuery 1.0: An XML Query Language} - the main source for syntax and semantics. \o \l{http://www.w3.org/TR/xpath-functions/}{XQuery 1.0 and XPath 2.0 Functions and Operators} - the builtin functions and operators. \endlist \section1 FAQ The answers to these frequently asked questions explain the causes of several common mistakes that most beginners make. Reading through the answers ahead of time might save you a lot of head scratching. \section2 Why didn't my path expression match anything? The most common cause of this bug is failure to declare one or more namespaces in your XQuery. Consider the following query for selecting all the examples in an XHTML document: \quotefile snippets/patternist/simpleHTML.xq It won't match anything because \c{index.html} is an XHTML file, and all XHTML files declare the default namespace \c{"http://www.w3.org/1999/xhtml"} in their top (\c{<html>}) element. But the query doesn't declare this namespace, so the path expression expands \c{html} to \c{{}html} and tries to match that expanded name. But the actual expanded name is \c{{http://www.w3.org/1999/xhtml}html}. One possible fix is to declare the correct default namespace in the XQuery: \quotefile snippets/patternist/simpleXHTML.xq Another common cause of this bug is to confuse the \e{document node} with the top element node. They are different. This query won't match anything: \quotefile snippets/patternist/docPlainHTML.xq The \c{doc()} function returns the \e{document node}, not the top element node (\c{<html>}). Don't forget to match the top element node in the path expression: \quotefile snippets/patternist/docPlainHTML2.xq \section2 What if my input namespace is different from my output namespace? Just remember to declare both namespaces in your XQuery and use them properly. Consider the following query, which is meant to generate XHTML output from XML input: \quotefile snippets/patternist/embedDataInXHTML.xq We want the \c{<html>}, \c{<body>}, and \c{} nodes we create in the output to be in the standard XHTML namespace, so we declare the default namespace to be \c{http://www.w3.org/1999/xhtml}. That's correct for the output, but that same default namespace will also be applied to the node names in the path expression we're trying to match in the input (\c{/tests/test[@status = "failure"]}), which is wrong, because the namespace used in \c{testResult.xml} is perhaps in the empty namespace. So we must declare that namespace too, with a namespace prefix, and then use the prefix with the node names in the path expression. This one will probably work better: \quotefile snippets/patternist/embedDataInXHTML2.xq \section2 Why doesn't my return clause work? Recall that XQuery is an \e{expression-based} language, not \e{statement-based}. Because an XQuery is a lot of expressions, understanding XQuery expression precedence is very important. Consider the following query: \quotefile snippets/patternist/forClause2.xq It looks ok, but it isn't. It is supposed to be a FLWOR expression comprising a \e{for} clause and a \e{return} clause, but it isn't just that. It \e{has} a FLWOR expression, certainly (with the \e{for} and \e{return} clauses), but it \e{also} has an arithmetic expression (\e{+ $d}) dangling at the end because we didn't enclose the return expression in parentheses. Using parentheses to establish precedence is more important in XQuery than in other languages, because XQuery is \e{expression-based}. In In this case, without parantheses enclosing \c{$i + $d}, the return clause only returns \c{$i}. The \c{+$d} will have the result of the FLWOR expression as its left operand. And, since the scope of variable \c{$d} ends at the end of the \e{return} clause, a variable out of scope error will be reported. Correct these problems by using parentheses. \quotefile snippets/patternist/forClause.xq \section2 Why didn't my expression get evaluated? You probably misplaced some curly braces. When you want an expression evaluated inside an element constructor, enclose the expression in curly braces. Without the curly braces, the expression will be interpreted as text. Here is a \c{sum()} expression used in an \c{<e>} element. The table shows cases where the curly braces are missing, misplaced, and placed correctly: \table \header \o element constructor with expression... \o evaluates to... \row \o <e>sum((1, 2, 3))</e> \o <e>sum((1, 2, 3))</e> \row \o <e>sum({(1, 2, 3)})</e> \o <e>sum(1 2 3)</e> \row \o <e>{sum((1, 2, 3))}</e> \o <e>6</e> \endtable \section2 My predicate is correct, so why doesn't it select the right stuff? Either you put your predicate in the wrong place in your path expression, or you forgot to add some parentheses. Consider this input file \c{doc.txt}: \quotefile snippets/patternist/doc.txt Suppose you want the first \c{} element of every \c{} element. Apply a position filter (\c{[1]}) to the \c{/span} path step: \quotefile snippets/patternist/filterOnStep.xq Applying the \c{[1]} filter to the \c{/span} step returns the first \c{} element of each \c{} element: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 41 \note: You can write the same query this way: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 44 Or you can reduce it right down to this: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 45 On the other hand, suppose you really want only one \c{} element, the first one in the document (i.e., you only want the first \c{} element in the first \c{} element). Then you have to do more filtering. There are two ways you can do it. You can apply the \c{[1]} filter in the same place as above but enclose the path expression in parentheses: \quotefile snippets/patternist/filterOnPath.xq Or you can apply a second position filter (\c{[1]} again) to the \c{/p} path step: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 43 Either way the query will return only the first \c{} element in the document: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 42 \section2 Why doesn't my FLWOR behave as expected? The quick answer is you probably expected your XQuery FLWOR to behave just like a C++ \e{for} loop. But they aren't the same. Consider a simple example: \quotefile snippets/patternist/letOrderBy.xq This query evaluates to \e{4 -4 -2 2 -8 8}. The \e{for} clause does set up a \e{for} loop style iteration, which does evaluate the rest of the FLWOR multiple times, one time for each value returned by the \e{in} expression. That much is similar to the C++ \e{for} loop. But consider the \e{return} clause. In C++ if you hit a \e{return} statement, you break out of the \e{for} loop and return from the function with one value. Not so in XQuery. The \e{return} clause is the last clause of the FLWOR, and it means: \e{Append the return value to the result list and then begin the next iteration of the FLWOR}. When the \e{for} clause's \e{in} expression no longer returns a value, the entire result list is returned. Next, consider the \e{order by} clause. It doesn't do any sorting on each iteration of the FLWOR. It just evaluates its expression on each iteration (\c{$a} in this case) to get an ordering value to map to the result item from each iteration. These ordering values are kept in a parallel list. The result list is sorted at the end using the parallel list of ordering values. The last difference to note here is that the \e{let} clause does \e{not} set up an iteration through a sequence of values like the \e{for} clause does. The \e{let} clause isn't a sort of nested loop. It isn't a loop at all. It is just a variable binding. On each iteration, it binds the \e{entire} sequence of values on the right to the variable on the left. In the example above, it binds (4 -4) to \c{$b} on the first iteration, (-2 2) on the second iteration, and (-8 8) on the third iteration. So the following query doesn't iterate through anything, and doesn't do any ordering: \quotefile snippets/patternist/invalidLetOrderBy.xq It binds the entire sequence (2, 3, 1) to \c{$i} one time only; the \e{order by} clause only has one thing to order and hence does nothing, and the query evaluates to 2 3 1, the sequence assigned to \c{$i}. \note We didn't include a \e{where} clause in the example. The \e{where} clause is for filtering results. \section2 Why are my elements created in the wrong order? The short answer is your elements are \e{not} created in the wrong order, because when appearing as operands to a path expression, there is no correct order. Consider the following query, which again uses the input file \c{doc.txt}: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 46 The query finds all the \c{} elements in the file. For each \c{} element, it builds a \c{} element in the output containing the concatenated contents of all the \c{} element's child \c{} elements. Running the query through \c{xmlpatterns} might produce the following output, which is not sorted in the expected order. \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 47 You can use a \e{for} loop to ensure that the order of the result set corresponds to the order of the input sequence: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 48 This version produces the same result set but in the expected order: \snippet snippets/code/doc_src_qtxmlpatterns.qdoc 49 \section2 Why can't I use \c{true} and \c{false} in my XQuery? You can, but not by just using the names \c{true} and \c{false} directly, because they are \l{Name Tests} {name tests} although they look like boolean constants. The simple way to create the boolean values is to use the builtin functions \c{true()} and \c{false()} wherever you want to use \c{true} and \c{false}. The other way is to invoke the boolean constructor: \quotefile snippets/patternist/xsBooleanTrue.xq */