Linked by Eugenia Loli on Wed 26th May 2004 15:55 UTC
General Development An application developer can choose any one of a number of strategies to read and use an XML document. In some very simple examples a script containing a number of regular expressions might do the job, but normally a more rigorous technique is required. The Simple API for XML (SAX) is one of the two key techniques for analysing and processing XML documents (the other is the more complicated Document Object Model (DOM)). Read the article here.
Order by: Score:
Python's base XML support needs to be redone
by Mike on Wed 26th May 2004 17:32 UTC

I tried basing my site around Python XML as a test to get used to parsing XML when I was a college sophomore. The speeds I got were terrible. I asked a professor my junior year why it was and he asked me what parser it uses. I told him expat and he said that that was why, expat is slower that frozen dogshit on a flat surface.

It would be really nice if the Python team would switch over to using LibXML2 and LibXSLT. Since both of those libraries can be built on any of the Python-supported platforms, there's no reason for them to stick with such a slow C-based parser.

Re: Python's base XML support needs to be redone
by fuser on Wed 26th May 2004 18:05 UTC

Well, there are bindings fro libxml2 and libxslt IIRC.
To me this is a non problem, usually I don't need to be blazing fast when working with xml.

But the reason to use an austonding simple API at the cost of slowness (REXML comes to mind) it's the same that the one to use "scripting languages" over Cish-stuff: humans solve problems, machines are those that should make it fast.

And, in the end, are you sure that using a fast API like PULL you can't get good_enough performance?

RE: Python's base XML support needs to be redone
by Anonymous on Wed 26th May 2004 20:24 UTC

Expat can't be slower than python scripts ;)

v sex processing!
by sexor on Wed 26th May 2004 21:54 UTC
@Mike
by Rayiner Hashem on Thu 27th May 2004 00:19 UTC

Expat is widely regarded as the fastest XML parser around. If there is a performance issue, its in Python-land.

@Rayiner Hashem
by Pavel Penchev on Thu 27th May 2004 10:47 UTC

No flame here.

I've tried Expat/Sablotron/LibXML in real life scenarios and I must say LibXML is the fastest. It also has useful features not found in the other parser (eg. passing an already parsed DOM tree as a parameter to the parser). My experience was with Perl and C, I havent tried it with Python but still the case here is the same - Expat was the slowest solution.