Skip to content

SAX Parser

The SAX parser (XML::SaxParser) provides an event-driven, callback-based API for parsing XML. You define a handler class with callback methods that are invoked as the parser encounters elements, text, and other XML constructs.

Defining a Handler

Include XML::SaxParser::Callbacks and define the callbacks you need:

class MyHandler
  include XML::SaxParser::Callbacks

  def on_start_element_ns(name, attributes, prefix, uri, namespaces)
    puts "Start: #{name}"
  end

  def on_end_element_ns(name, prefix, uri)
    puts "End: #{name}"
  end

  def on_characters(chars)
    puts "Text: #{chars}" unless chars.strip.empty?
  end
end

Parsing

handler = MyHandler.new

parser = XML::SaxParser.file('data.xml')
parser.callbacks = handler
parser.parse

Available Callbacks

Callback Called When
on_start_document Document begins
on_end_document Document ends
on_start_element_ns Opening tag
on_end_element_ns Closing tag
on_characters Text content
on_cdata_block CDATA section
on_comment Comment
on_processing_instruction Processing instruction
on_error Parse error

Example: Extract All Text from a Document

class TextExtractor
  include XML::SaxParser::Callbacks

  attr_reader :text

  def initialize
    @text = []
  end

  def on_characters(chars)
    stripped = chars.strip
    @text << stripped unless stripped.empty?
  end
end

handler = TextExtractor.new
parser = XML::SaxParser.file('document.xml')
parser.callbacks = handler
parser.parse

puts handler.text.join(' ')

Example: Build a Hash from XML

class XMLToHash
  include XML::SaxParser::Callbacks

  attr_reader :result

  def initialize
    @result = {}
    @stack = []
    @current_text = ''
  end

  def on_start_element_ns(name, attributes, prefix, uri, namespaces)
    @stack.push(name)
    @current_text = ''
  end

  def on_characters(chars)
    @current_text << chars
  end

  def on_end_element_ns(name, prefix, uri)
    @stack.pop
    stripped = @current_text.strip
    @result[name] = stripped unless stripped.empty?
    @current_text = ''
  end
end

handler = XMLToHash.new
parser = XML::SaxParser.string('<user><name>Alice</name><email>alice@example.com</email></user>')
parser.callbacks = handler
parser.parse

puts handler.result  # => {"name"=>"Alice", "email"=>"alice@example.com"}

Example: Count Elements by Name

class ElementCounter
  include XML::SaxParser::Callbacks

  attr_reader :counts

  def initialize
    @counts = Hash.new(0)
  end

  def on_start_element_ns(name, attributes, prefix, uri, namespaces)
    @counts[name] += 1
  end
end

handler = ElementCounter.new
parser = XML::SaxParser.file('large_document.xml')
parser.callbacks = handler
parser.parse

handler.counts.sort_by { |_, v| -v }.first(10).each do |name, count|
  puts "#{name}: #{count}"
end

When to Use SAX

SAX parsing is useful when:

  • The document is too large for DOM parsing
  • You only need to extract specific data
  • You want to process the document in a single pass
  • Memory usage is a concern

For most streaming use cases, the Reader provides a simpler API.