XML Parsing: Using MINIDOM Vs Element Tree (etree) in Python

Author

Brief About XML- XML is Extensible Markup Language which stores information in Hierarchical structure. It is similar to HTML but allows you to make your own tags in a totally customized way.
The best part of XML is Self-Descriptive so easy to store data into XML & Awesome in exchanging between different sources

Now Let’s come to parsing

Parsing XML means by using a program we generate an internal representation of serialized XML tags. It forms a tree structure internally in memory which is easy to maintain.

minidom vs elementtree

Sample XML Code: sample.xml

<?xml version=”1.0” encoding=” UTF-8”?>
<person id=”001”>
<fname>Ram</fname>
<lname>kumar</lname>
<home resident=”local”>Delhi</home>
<salary>1000</salary>
</person>
<?xml>

Let’s see how MINIDOM parse XML:

It is a library provided by python which is a minimal implementation of a Document object model.
Step1: from xml.dom import minidom //Import minidom library
Step2: root=minidom.parse(‘sample.xml’) //take file as input and parse
Step3: print(root)

Output: < xml.dom.minidom.Document object at 0x03A5B308>
Once you execute this we will be able to devide XML file and retrieve the required data.

Finding Element of Interest using getElementByTagName:
Tag=root.getElementbyTagName(‘home’)[0] // I have put [0] for one output
print(Tag)

Output: < DOM Element: item at 0xc4bc00>
To access value of attribute
data=minidom.parse(‘sample.xml’)
tag=data.getElementbyTagName(‘home’)
print(tag[0].attributes(‘resident’).value)

Output: local
To retrieve data present in these tags
Print(tag[0].firstchild.data)

Output: Delhi
Now Let’s come across Element Tree implementation:
(with same xml code)

First using Parse() function:
Step1: import xml.etree.ElementTree as ET
Step2: tree=ET.parse(‘sample.xml’)
Step3: root=tree.getroot()
Step4: print(root)

Output: <Element ‘metadata’ at 0x03A5B308>
Fetching tags using loop
for x in root[0]
print (x.tag, x.attrib)

Output: fname {} lname {} home{‘resident’:’local’} salary {}
If want to print value stored in tags
for x in root[0]
print(x.text)

Output:
Ram Kumar Delhi 1000
Modifying XML File:
The elements present in xml file can be manipulated by using set() function. Let us see how?
for lname in root.iter(‘lname’):
New_lname=str(lname.text)+’nath’
lname.text=str(New_lname)
tree.write()
Output: The last name will be added with nath
Performance comparison b/w minidom and element tree

Pranshu Goyal, Director of Products at Mirekta, states: “We envision DSM to be used by every small to a medium-sized organization dealing with bad data and want to get rid of duplicates easily with no cost. We have faced issues dealing with duplicates in our organization. That inspired us to make a solution that is not only simple to use but can be used widely to make the organization’s data clean to make them more efficient and productive. We want DSM to be a solution for every organization looking for duplicate management capability better than the Salesforce out-of-the-box solution with no additional cost.”

Recent Posts

Categories

Featured by