XPath
XPath is the primary way to find and extract data from XML documents in libxml-ruby. Unlike some other Ruby XML libraries, libxml-ruby does not support CSS selectors — XPath is the query language for all search operations.
Quick Reference
doc = XML::Parser.file('books.xml').parse
# Find all matching nodes — returns XML::XPath::Object
nodes = doc.find('//book')
# Find from a specific node
titles = doc.root.find('book/title')
# Find the first match
node = doc.find_first('//book[@id="1"]')
# XPath can return different types
doc.find('count(//book)') # => Float
doc.find('string(//title)') # => String
doc.find('1 = 1') # => true
XPath Crash Course
If you're new to XPath, here are the essentials.
Selecting Nodes
| Expression |
Selects |
/root |
Root element named "root" |
/root/child |
Direct children named "child" |
//book |
All "book" elements anywhere in the document |
. |
Current node |
.. |
Parent node |
@id |
Attribute named "id" |
Predicates (Filters)
| Expression |
Selects |
//book[1] |
First book element |
//book[last()] |
Last book element |
//book[@id] |
Books with an "id" attribute |
//book[@id="42"] |
Books where id is "42" |
//book[price>10] |
Books where price child > 10 |
Axes
| Expression |
Selects |
child::book |
Child elements named "book" (same as book) |
ancestor::catalog |
Ancestor elements named "catalog" |
following-sibling::* |
All following siblings |
preceding-sibling::* |
All preceding siblings |
descendant::* |
All descendants |
self::book |
Current node if it's named "book" |
Functions
| Function |
Returns |
count(//book) |
Number of matching nodes |
string(//title) |
Text content of first match |
contains(@class, 'active') |
True if attribute contains substring |
starts-with(name, 'J') |
True if string starts with prefix |
not(@disabled) |
Boolean negation |
position() |
Position of current node in set |
normalize-space(text()) |
Trimmed, collapsed whitespace |
Combining Expressions
# Union — combine multiple paths
doc.find('//title | //author')
# Boolean operators in predicates
doc.find('//book[@year > 2000 and @lang = "en"]')
For the full XPath 1.0 specification, see the W3C XPath Reference.
Practical Examples
doc = XML::Parser.file('feed.xml').parse
doc.find('//item').each do |item|
title = item.find_first('title').content
link = item.find_first('link').content
puts "#{title}: #{link}"
end
Find Elements by Attribute Value
# All books published after 2020
doc.find('//book[@year > 2020]').each do |book|
puts book.find_first('title').content
end
# Elements with a specific class
doc.find('//*[@class="highlight"]')
# Books with a price, sorted extraction
doc.find('//book[price]').each do |book|
title = book.find_first('title').content
price = book.find_first('price').content.to_f
puts "#{title}: $#{'%.2f' % price}" if price > 20
end
Count and Aggregate
# Count elements
total = doc.find('count(//book)') # => Float
# Get text content directly
first_title = doc.find('string(//book[1]/title)') # => String
Find First Match
# find_first is a convenience for find(...).first
node = doc.find_first('//book[@id="42"]')
if node
puts node.find_first('title').content
end
Navigate Relative to a Node
chapter = doc.find_first('//chapter[@id="3"]')
# All sections within this chapter
chapter.find('section').each do |section|
puts section['title']
end
# Paragraphs anywhere under this chapter
chapter.find('.//p').each do |p|
puts p.content
end