XPath and Namespaces¶
Namespaces are the most common source of confusion with XPath. An element in a namespace will not be found by a bare name — you must register a prefix and use it in the query.
The Problem¶
Given this document:
<feed xmlns="http://www.w3.org/2005/Atom">
<title>My Feed</title>
<entry>
<title>First Post</title>
</entry>
</feed>
This XPath will find nothing:
The Solution: Register a Prefix¶
# Pass a namespace string — "prefix:uri"
doc.find('//atom:title', 'atom:http://www.w3.org/2005/Atom')
The prefix you choose (atom here) does not have to match the document — it's just a local binding for the XPath query. The URI must match exactly.
Multiple Namespaces¶
Array Form¶
doc.find('//atom:entry/dc:creator', [
'atom:http://www.w3.org/2005/Atom',
'dc:http://purl.org/dc/elements/1.1/'
])
Hash Form¶
doc.find('//atom:entry/dc:creator', {
'atom' => 'http://www.w3.org/2005/Atom',
'dc' => 'http://purl.org/dc/elements/1.1/'
})
Default Namespace Shortcut¶
If you query the same namespace-heavy document repeatedly, set a default prefix:
doc.root.namespaces.default_prefix = 'atom'
# Now queries automatically use the 'atom' prefix
doc.find('//atom:title')
doc.find('//atom:entry')
Complex Namespace Example¶
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<getResponse xmlns="http://services.example.com">
<IDList xmlns="http://services.example.com">
<ns1:Item xmlns:ns1="http://domain.example.com"/>
</IDList>
</getResponse>
</soap:Body>
</soap:Envelope>
# soap: is defined on the root, so it's automatically available
doc.find('/soap:Envelope')
# ns1: is defined deeper, register it explicitly
doc.find('//ns1:Item', 'ns1:http://domain.example.com')
# The default namespace on getResponse needs a prefix
doc.find('//svc:getResponse', 'svc:http://services.example.com')
# Combine multiple registrations
doc.find('/soap:Envelope/soap:Body/svc:getResponse/svc:IDList/ns1:Item', [
'svc:http://services.example.com',
'ns1:http://domain.example.com'
])