June 13, 2004, 5:41 p.m.
IT

DOM vs SAX

I always knew DOM and XPath were not as fast as SAX, but I have never realized exactly how slow they are when applied to a huge XML document.

By huge I mean 5MB+. I had a 32MB XML document consisting of a root element, that contained about 148000 child elements each containing 8 attributes. I needed to iterate over all of these nodes to retrieve the values and write them away to a database. I know the right tool for the job is SAX, but I insisted on using DOM and XPath as I am well acquainted with them. Needless to say, the parsing (extrapolated) would have taken 46 days - for a task that needs to happen once a day that was obviously unacceptable. Here is an exert of that code:

int vCnt = Utils.getInt(pXML.valueOf("count(/" + XML_DOCUMENT_NAME + "/article)"));
for (int i = 1; i < = vCnt; i++) {
  String vArtNo = pXML.valueOf("/" + XML_DOCUMENT_NAME + "/article[" + i + "]/@article_no");
  String vArtUOM = pXML.valueOf("/" + XML_DOCUMENT_NAME + "/article[" + i + "]/@article_uom");
  String vDescription = pXML.valueOf("/" + XML_DOCUMENT_NAME + "/article[" + i + "]/@description");
  // ...
  // Do something with them
}

What slows this process down so much is the fact that the XPath expression in valueOf() is applied to the whole document for every single attribute. This means it has to traverse through the 148000 entries until it locates the correct element. Changing that code to this reduced the time to 55 minutes:

List vNodes = pXML.selectNodes("/" + XML_DOCUMENT_NAME + "/article");
for (int i = 1; i < = vNodes.size(); i++) {
  String vArtNo = vNode.valueOf("@article_no");
  String vArtUOM = vNode.valueOf("@article_uom");
  String vDescription = vNode.valueOf("@description");
  // ...
  // Do something with them
}

Just shows you what difference a simple little optimization can do.