I love XML, not because it’s an inherently beautiful format (it’s inelegant in a lot of ways, like why do we have both attributes and character data?) but because for once we have a sensible and widely supported standard in the computing world. The power of this shows when you want to parse an XML file in PHP. Support is built in by default, powered by the ExPat library. For small files you can use the SimpleXML wrapper that creates an object from the XML, but I need to parse large amounts of XML so I didn’t want to keep all of that information in memory. Instead I’m hooking directly into the ExPat event interface, which calls back to the client when tags and other data objects are encountered, and requires the caller to retain and assemble any information it wants to extract.
I’ve included the code below, and here’s a zip file of the example code together with a test XML file. It’s an expanded version of the example from the PHP manual, with the addition of character data handling and the storage of some data during the parsing. It takes the input XML file and outputs an indented version of all tags, showing any character data associated with each tag.
<?php
$file = "example.xml";
$depth = array();
$currenttagname = array();
$currenttagvalue = array();
function onStartElement($parser, $name, $attrs)
{
global $depth;
global $currenttagname;
global $currenttagvalue;
for ($i = 0; $i < $depth[$parser]; $i++) {
echo " ";
}
echo "$name\n";
$depth[$parser]++;
$currentdepth = $depth[$parser];
if ($currenttagname[$parser]==null)
$currenttagname[$parser] = array();
if ($currenttagvalue[$parser]==null)
$currenttagvalue[$parser] = array();
$currenttagname[$parser][$currentdepth] = $name;
$currenttagvalue[$parser][$currentdepth] = $value;
}
function onEndElement($parser, $name)
{
global $depth;
global $currenttagname;
global $currenttagvalue;
$currentdepth = $depth[$parser];
$storedname = $currenttagname[$parser][$currentdepth];
$storedvalue = $currenttagvalue[$parser][$currentdepth];
for ($i = 0; $i < $depth[$parser]; $i++) {
echo " ";
}
echo $storedname;
if ($storedvalue!="")
echo " = " . $storedvalue;
echo "\n";
$depth[$parser]--;
}
function onCharacterData($parser, $data)
{
global $depth;
global $currenttagvalue;
if ($currenttagvalue[$parser]==null)
return; // ignore character data outside of tags
// ignore new lines
$data = str_replace("\n", "", $data);
$data = str_replace("\r", "", $data);
$currentdepth = $depth[$parser];
$currenttagvalue[$parser][$currentdepth] .= $data;
}
$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "onStartElement", "onEndElement");
xml_set_character_data_handler($xml_parser, "onCharacterData");
if (!($fp = fopen($file, "r"))) {
die("could not open XML input");
}
?>
<html>
<head><title>PHP XML Parsing Example</title></head>
<body><pre>
<?php
while ($data = fread($fp, 4096)) {
if (!xml_parse($xml_parser, $data, feof($fp))) {
die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
}
}
xml_parser_free($xml_parser);
?>
</pre></body>
</html>