Reader¶
The XML::Reader provides a pull-based streaming API for reading XML. It acts as a cursor moving forward through the document, stopping at each node. This is more memory efficient than DOM parsing for large documents.
Basic Usage¶
reader = XML::Reader.file('large.xml')
while reader.read
if reader.node_type == XML::Reader::TYPE_ELEMENT
puts reader.name
end
end
Node Properties¶
At each position, the reader exposes the current node's properties:
reader.name # node name
reader.value # node value (for text, attributes)
reader.node_type # node type constant
reader.depth # nesting depth
reader.empty_element? # self-closing element?
reader.has_attributes? # has attributes?
reader.local_name # local name (without prefix)
reader.namespace_uri # namespace URI
reader.prefix # namespace prefix
Node Type Constants¶
XML::Reader::TYPE_ELEMENT # opening tag
XML::Reader::TYPE_END_ELEMENT # closing tag
XML::Reader::TYPE_TEXT # text content
XML::Reader::TYPE_CDATA # CDATA section
XML::Reader::TYPE_COMMENT # comment
XML::Reader::TYPE_SIGNIFICANT_WHITESPACE
Reading Attributes¶
reader = XML::Reader.string('<book id="1" title="Ruby"/>')
reader.read
reader['id'] # => "1"
reader.get_attribute('title') # => "Ruby"
reader.attribute_count # => 2
# Walk attributes
reader.move_to_first_attribute
puts "#{reader.name}=#{reader.value}"
while reader.move_to_next_attribute
puts "#{reader.name}=#{reader.value}"
end
reader.move_to_element # move back to the element
Example: Extract Data from a Large File¶
reader = XML::Reader.file('products.xml')
products = []
while reader.read
if reader.node_type == XML::Reader::TYPE_ELEMENT && reader.name == 'product'
product = {}
product['id'] = reader['id']
# Read child elements
while reader.read
break if reader.node_type == XML::Reader::TYPE_END_ELEMENT && reader.name == 'product'
if reader.node_type == XML::Reader::TYPE_ELEMENT
name = reader.name
reader.read # move to text content
product[name] = reader.value if reader.has_value?
end
end
products << product
end
end
products.each { |p| puts "#{p['name']}: $#{p['price']}" }
Example: Count Elements¶
reader = XML::Reader.file('data.xml')
counts = Hash.new(0)
while reader.read
if reader.node_type == XML::Reader::TYPE_ELEMENT
counts[reader.name] += 1
end
end
counts.sort_by { |_, v| -v }.each do |name, count|
puts "#{name}: #{count}"
end
Navigating with next¶
reader.read descends into child nodes. Use reader.next to skip to the next sibling, skipping the current node's subtree:
reader = XML::Reader.file('data.xml')
while reader.read
if reader.node_type == XML::Reader::TYPE_ELEMENT && reader.name == 'skip_me'
reader.next # skip this element and its children
end
end
Expanding Nodes¶
You can expand the current node into a full DOM subtree for detailed inspection:
reader = XML::Reader.file('books.xml')
while reader.read
if reader.name == 'book' && reader.node_type == XML::Reader::TYPE_ELEMENT
node = reader.expand
# Use XPath on the expanded node (requires reader.doc first)
reader.doc
title = node.find_first('title').content
puts title
end
end
Warning
Expanded nodes are only valid until the next reader.read call. Do not store references to them.
Validation While Reading¶
The reader can validate against a schema as it reads:
reader = XML::Reader.file('data.xml')
reader.schema_validate('schema.xsd')
while reader.read
# reader.valid? returns the validation state
end
Or with RelaxNG: