An application developer can choose any one of a number of strategies to read and use an XML document. In some very simple examples a script containing a number of regular expressions might do the job, but normally a more rigorous technique is required. The Simple API for XML (SAX) is one of the two key techniques for analysing and processing XML documents (the other is the more complicated Document Object Model (DOM)). Read the article here.
SAX processing in Python
2004-05-26 General Development 5 Comments
I tried basing my site around Python XML as a test to get used to parsing XML when I was a college sophomore. The speeds I got were terrible. I asked a professor my junior year why it was and he asked me what parser it uses. I told him expat and he said that that was why, expat is slower that frozen dogshit on a flat surface.
It would be really nice if the Python team would switch over to using LibXML2 and LibXSLT. Since both of those libraries can be built on any of the Python-supported platforms, there’s no reason for them to stick with such a slow C-based parser.
Well, there are bindings fro libxml2 and libxslt IIRC.
To me this is a non problem, usually I don’t need to be blazing fast when working with xml.
But the reason to use an austonding simple API at the cost of slowness (REXML comes to mind) it’s the same that the one to use “scripting languages” over Cish-stuff: humans solve problems, machines are those that should make it fast.
And, in the end, are you sure that using a fast API like PULL you can’t get good_enough performance?
Expat can’t be slower than python scripts
Expat is widely regarded as the fastest XML parser around. If there is a performance issue, its in Python-land.
No flame here.
I’ve tried Expat/Sablotron/LibXML in real life scenarios and I must say LibXML is the fastest. It also has useful features not found in the other parser (eg. passing an already parsed DOM tree as a parameter to the parser). My experience was with Perl and C, I havent tried it with Python but still the case here is the same – Expat was the slowest solution.