Skip to content

DOM Parser

The DOM parser (XML::Parser) loads an entire XML document into memory as a tree of nodes. This is the most common way to work with XML in libxml-ruby.

Parsing

# From a file
doc = XML::Parser.file('books.xml').parse

# From a string
doc = XML::Parser.string('<root><item/></root>').parse

# From an IO
File.open('books.xml') do |io|
  doc = XML::Parser.io(io).parse
end

Example: Parse and Extract Data

xml = <<~XML
  <library>
    <book id="1" available="true">
      <title>The Pragmatic Programmer</title>
      <author>Dave Thomas</author>
      <year>1999</year>
    </book>
    <book id="2" available="false">
      <title>Design Patterns</title>
      <author>Gang of Four</author>
      <year>1994</year>
    </book>
  </library>
XML

doc = XML::Parser.string(xml).parse

# Access the root
root = doc.root
puts root.name  # => "library"

# Iterate over children
root.each do |book|
  next unless book.element?
  puts book.find_first('title').content
end

# Use XPath
available = doc.find('//book[@available="true"]')
available.each do |book|
  title = book.find_first('title').content
  year = book.find_first('year').content
  puts "#{title} (#{year})"
end

Example: Parse a Configuration File

doc = XML::Parser.file('config.xml').parse

db_host = doc.find_first('//database/host').content
db_port = doc.find_first('//database/port').content.to_i
db_name = doc.find_first('//database/name').content

puts "Connecting to #{db_name} at #{db_host}:#{db_port}"

Example: Parse with Options

# Strip whitespace-only text nodes and disable network access
parser = XML::Parser.file('data.xml')
parser.options = XML::Parser::Options::NOBLANKS | XML::Parser::Options::NONET
doc = parser.parse

Example: Parse from a Web Response

require 'net/http'

uri = URI('https://example.com/api/data.xml')
xml = Net::HTTP.get(uri)

doc = XML::Parser.string(xml).parse
doc.find('//item').each do |item|
  puts item.find_first('name').content
end

Error Handling

begin
  doc = XML::Parser.string('<broken').parse
rescue XML::Error => e
  puts "Parse failed: #{e.message}"
end

See Error Handling for details on error properties and custom handlers.