Parsing XML using ElementTree Python

Parsing XML using ElementTree Python - python

Parsing an XML file with ElementTree in Python.
Here is the file:
<?xml version='1.0' encoding='utf-8'?>
<Device fqdm="DESKTOP-4OB3072">
<IP>192.168.203.1</IP>
<MAC>00:00:00:00:00:00</MAC>
</Device>
I am receiving the error (below) when trying to parse the file and retrieve the value of the attribute of 'fqdm'.
"xml.etree.ElementTree.ParseError: junk after document element: line 2, column 90"
Here is the parsing code (please ignore the stupid file handling, it will be changed):
with open('received_file.xml', 'a+') as f:
while True:
data = conn.recv(BUFFER_SIZE)
print data
if not data:
f.close()
break
f.write(data)
f.close()
g = open('received_file.xml', 'r+')
tree = ET.parse(g)
root = tree.getroot()
print root
test = root.find('./Device').attrib['fqdm']
print test
sock.close()

Try this:
with open('received_file.xml', 'a+') as f:
while True:
data = conn.recv(BUFFER_SIZE)
print data
if not data:
f.close()
break
f.write(data)
f.close()
g = open('received_file.xml', 'r+')
tree = ET.parse(g)
root = tree.getroot()
attributes = root.attrib
print root
test = attributes['fqdm']
print test
sock.close()

Your parse error is at column 90, but the xml snippet you shared only has 32 columns. If this file is generated by your socket object, you probably have extra unprintable characters following the valid xml in line 2. The code that creates this file probably needs to be updated to properly terminate the strings in the lines it receives.

yourTag.attrib.get("the_attribute")

Related

Search and replace strings in XML using python

I am trying to search and replace certain words in my .xml file and replace it with another, but I struggle a bit.
I have been using this code so far:
import xml.etree.ElementTree as ET
with open('Rom1.xml', encoding="utf8") as f:
tree = ET.parse(f)
#root = tree.find('ExportedObjects')
root = tree.getroot()
for elem in root.iter():
try:
elem.text = elem.text.replace('Rom1', 'Rom2')
except AttributeError:
pass
Rom1.xml this is a snapshot from the XML file showing the structure
The XML file is pretty big but it contains the string 'Rom1' 41 times and I would like to replace all of them.
I know a simple search and replace in text editor does the job, but I want to automate this since I will do it for several hundered of files.
Any help is appriciated :)

If there is no possibility of ambiguity then you could just do this:
with open('Rom1.xml', encoding='utf-8', mode='r+') as xml:
content = xml.read().replace('Rom1', 'Rom2')
xml.seek(0)
xml.write(content)
xml.truncate()
In this case the truncate() call is not necessary. However, if the second argument to replace() was shorter than the first then this would be crucial. Just leave it there to account for all eventualities

Ok so I tried something else with great success:
import xml.etree.ElementTree as ET
Rom2 = input('Number: ')
input_file = "Rom1.xml"
output_file = Rom2+".xml"
with open(input_file) as f:
xml_content = f.readlines()
with open(output_file, 'w+') as f:
for line in xml_content:
f.write(line.replace('Rom1', Rom2))
But if I want to replace a second string f.ex 'SQ4XXX' to 'SQ4050' then it replaces both and keeps the old as well? I'm confused.
import xml.etree.ElementTree as ET
Rom2 = input('Number: ')
sq = input('SQ: ')
input_file = "Rom1.xml"
output_file = Rom2+".xml"
with open(input_file) as f:
xml_content = f.readlines()
with open(output_file, 'w+') as f:
for line in xml_content:
f.write(line.replace('Rom1', Rom2))
f.write(line.replace('SQ4XXX', sq))

Ok I got it working like I wanted, thanks for the help guys!
Heres the final code:
import xml.etree.ElementTree as ET
Rom2 = input('Number: ')
sq4 = input('SQ4: ')
sq5 = input('SQ5: ')
input_file = "Rom1.xml"
output_file = Rom2+".xml"
with open(input_file) as f:
xml_content = f.readlines()
with open(output_file, 'w+') as f:
for line in xml_content:
f.write(line.replace('Rom1', Rom2))
with open(output_file, encoding='utf-8', mode='r+') as xml:
content = xml.read().replace('SQ4XXX', sq4)
xml.seek(0)
xml.write(content)
xml.truncate()
with open(output_file, encoding='utf-8', mode='r+') as xml:
content = xml.read().replace('SQ5XXX', sq5)
xml.seek(0)
xml.write(content)
xml.truncate()er code here

create a condition that separates text.tag when parsing xml with python

I have this xml file that i dawnload from a source filename file.xml that inside of it`s Details have two OrderDetail, what i do with this is that i decode it and i write a new xml file in witch i parse it to get some information from.
<root>
<Details>
<OrderDetail ParentLineID="">H4sIAAAAAAAEAOy963LbyJbn+/
lMxLwDwtO7qnaMYeF+8d7VHZJolV0lWypRLu/u6g4HCC
QljClCmwTLdn+aFzkvd57k4EJSJAGIyJWQ8E+Ve3qqLd7
XSlxW/jLzl3//ty83E+UPNpvHyfTHZ/oL7dm//ev//B9/P
06m4/hqMQvS7==
</OrderDetail>
<OrderDetail ParentLineID="">H4sIAAAAAAAEAOy963LbyJbn+/
lMxLwDwtO7qnaMYeF+8d7VHZJolV0lWypRLu/u6g4HCC
QljClCmwTLdn+aFzkvd57k4EJSJAGIyJWQ8E+Ve3qqLd7
XSlxW/jLzl3//ty83E+UPNpvHyfTHZ/oL7dm//ev//B9/P
06m4/hqMQvS7==
</OrderDetail>
</Details>
</root>
tree = ET.parse('file.xml')
root = tree.getroot()
DEST_FILE_NAME = "XMLparser\\decompresed.xml"
def translate_to_file():
for child in root.iter('OrderDetail'):
child.get('ParentLineID')
result = zlib.decompress(base64.b64decode(child.text), 16 + zlib.MAX_WBITS).decode('utf-8')
with open(DEST_FILE_NAME, "w") as file:
file.write(result)
def read_file():
with open(DEST_FILE_NAME) as file:
return file.readlines()
def clean_file(lines):
with open(DEST_FILE_NAME, 'w') as file:
lines = filter(lambda x: x.strip(), lines)
file.writelines(lines)
def main():
translate_to_file()
lines = read_file()
clean_file(lines)
main()
when this file is decodedcrates an xml file
how can i create two separated xml files for reach OrderDetail ? so i take the first base64 decompresed and create an XML file . i take the other base64 decompresed and create a separate XML file ?

Python: Writing list to xml file

I'm trying to write the list elements to an xml file. I have written the below code. The xml file is created, but the data is repeated. I'm unable to figure out why is the data written twice in the xml file.
users_list = ['Group1User1', 'Group1User2', 'Group2User1', 'Group2User2']
def create_xml(self):
usrconfig = Element("usrconfig")
usrconfig = ET.SubElement(usrconfig,"usrconfig")
for user in range(len( users_list)):
usr = ET.SubElement(usrconfig,"usr")
usr.text = str(users_list[user])
usrconfig.extend(usrconfig)
tree = ET.ElementTree(usrconfig)
tree.write("details.xml",encoding='utf-8', xml_declaration=True)
Output File: details.xml
-
<usr>Group1User1</usr>
<usr>Group1User2</usr>
<usr>Group2User1</usr>
<usr>Group2User2</usr>
<usr>Group1User1</usr>
<usr>Group1User2</usr>
<usr>Group2User1</usr>
<usr>Group2User2</usr>
enter image description here

usrconfig.extend(usrconfig)
This line looks suspicious to me. if userconfig was a list, this line would be equivalent to "duplicate every element in this list". I suspect that something similar happens for Elements, too. Try deleting that line.
import xml.etree.ElementTree as ET
users_list = ["Group1User1", "Group1User2", "Group2User1", "Group2User2"]
def create_xml():
usrconfig = ET.Element("usrconfig")
usrconfig = ET.SubElement(usrconfig,"usrconfig")
for user in range(len( users_list)):
usr = ET.SubElement(usrconfig,"usr")
usr.text = str(users_list[user])
tree = ET.ElementTree(usrconfig)
tree.write("details.xml",encoding='utf-8', xml_declaration=True)
create_xml()
Result:
<?xml version='1.0' encoding='utf-8'?>
<usrconfig>
<usr>Group1User1</usr>
<usr>Group1User2</usr>
<usr>Group2User1</usr>
<usr>Group2User2</usr>
</usrconfig>

For such a simple xml structure, we can directly write out the file. But this technique might also be useful if one is not up to speed with the python xml modules.
import os
users_list = ["Group1User1", "Group1User2", "Group2User1", "Group2User2"]
os.chdir("C:\\Users\\Mike\\Desktop")
xml_out_DD = open("test.xml", 'wb')
xml_out_DD.write(bytes('<usrconfig>', 'utf-8'))
for i in range(0, len(users_list)):
xml_out_DD.write(bytes('<usr>' + users_list[i] + '</usr>', 'utf-8'))
xml_out_DD.write(bytes('</usrconfig>', 'utf-8'))
xml_out_DD.close()

Convert CSV document to XML

I know the question is redundant but I tried all the Python code that I found and modified for my file but they did not work. I need to find a way to convert my file myData.csv in to a XML format file which can be read by a navigator.
I just started to learn Python this month so I'm a beginner. This is my code:
#! usr/bin/python
# -*- coding: utf-8 -*-
import csv, sys, os
from lxml import etree
csvFile = 'myData.csv' # création de la variable pour le fichier csv
reader= csv.reader(open(csvFile), delimiter=';', quoting=csv.QUOTE_NONE) # création d'une variable reader à qui on renvoie le tableau csv
print "<data>"
for record in reader:
if reader.line_num == 1:
header = record
else:
innerXml = ""
dontShow = False
type = ""
for i, field in enumerate(record):
innerXml += "<%s>" % header[i].lower() + field + "</%s>" % header[i].lower()
if i == 1 and field == "0":
type = "Next"
elif type == "" and i == 3 and field == "0":
type = "Next"
elif type == "" and i == 3 and field != "0":
type = "film"
if i == 1 and field == "X":
dontShow = True
if dontShow == False:
xml = "<%s>" % type
xml += innerXml
xml += "</%s>" % type
print xml
print "</data>"

Consider building your XML with dedicated DOM objects and not a concatenation of strings which you can do with the lxml module. Using methods such as Element(), SubElement(), etc. you can iteratively build XML tree from reading CSV data:
import csv
import lxml.etree as ET
headers = ['Titre', 'Realisateur', 'Date_Debut_Evenement', 'Date_Fin_Evenement', 'Cadre',
'Lieu', 'Adresse', 'Arrondissement', 'Adresse_complète', 'Geo_Coordinates']
# INITIALIZING XML FILE
root = ET.Element('root')
# READING CSV FILE AND BUILD TREE
with open('myData.csv') as f:
next(f) # SKIP HEADER
csvreader = csv.reader(f)
for row in csvreader:
data = ET.SubElement(root, "data")
for col in range(len(headers)):
node = ET.SubElement(data, headers[col]).text = str(row[col])
# SAVE XML TO FILE
tree_out = (ET.tostring(root, pretty_print=True, xml_declaration=True, encoding="UTF-8"))
# OUTPUTTING XML CONTENT TO FILE
with open('Output.xml', 'wb') as f:
f.write(tree_out)
Output
<?xml version='1.0' encoding='UTF-8'?>
<root>
<data>
<Titre>1</Titre>
<Realisateur>BUS PALLADIUM</Realisateur>
<Date_Debut_Evenement>CHRISTOPHER THOMPSON</Date_Debut_Evenement>
<Date_Fin_Evenement>21 mai 2009</Date_Fin_Evenement>
<Cadre>21 mai 2009</Cadre>
<Lieu>EXTERIEUR</Lieu>
<Adresse>PLACE</Adresse>
<Arrondissement>PIGALLE</Arrondissement>
<Adresse_complète>75018</Adresse_complète>
<Geo_Coordinates>PLACE PIGALLE 75018 Paris France</Geo_Coordinates>
</data>
<data>
<Titre>2</Titre>
<Realisateur>LES INVITES DE MON PERE</Realisateur>
<Date_Debut_Evenement>ANNE LE NY</Date_Debut_Evenement>
<Date_Fin_Evenement>20 mai 2009</Date_Fin_Evenement>
<Cadre>20 mai 2009</Cadre>
<Lieu>DOMAINE PUBLIC</Lieu>
<Adresse>SQUARE</Adresse>
<Arrondissement>DU CLIGNANCOURT</Arrondissement>
<Adresse_complète>75018</Adresse_complète>
<Geo_Coordinates>SQUARE DU CLIGNANCOURT 75018 Paris France</Geo_Coordinates>
</data>
<data>
<Titre>3</Titre>
<Realisateur>DEMAIN, A L'AUBE</Realisateur>
<Date_Debut_Evenement>GAEL CABOUAT</Date_Debut_Evenement>
<Date_Fin_Evenement>17 avril 2009</Date_Fin_Evenement>
<Cadre>17 avril 2009</Cadre>
<Lieu>EXTERIEUR</Lieu>
<Adresse>RUE</Adresse>
<Arrondissement>QUINCAMPOIX</Arrondissement>
<Adresse_complète>75004</Adresse_complète>
<Geo_Coordinates>RUE QUINCAMPOIX 75004 Paris France</Geo_Coordinates>
</data>
...

(posted as an answer so I can show a code block)
There are a lot of picky details when writing XML. In Python, you should probably use some version of ElementTree to help with that. One good tutorial is Creating XML Documents. Quoting from there:
from xml.etree.ElementTree import Element, SubElement, Comment, tostring
top = Element('top')
comment = Comment('Generated for PyMOTW')
top.append(comment)
child = SubElement(top, 'child')
child.text = 'This child contains text.'
child_with_tail = SubElement(top, 'child_with_tail')
child_with_tail.text = 'This child has regular text.'
child_with_tail.tail = 'And "tail" text.'
child_with_entity_ref = SubElement(top, 'child_with_entity_ref')
child_with_entity_ref.text = 'This & that'
print(tostring(top))
If you use this as an example of how to create a tree of XML elements, you should be able to translate your code into the XML structure you need.

Importing pandas and saving file name:
import pandas as pd
csvFile = 'myData.csv'
The following will read CSV into a pandas data frame, then convert to XML.
df = pd.read_csv(path)
df_xml = df.to_xml()
The below code will create a new file and then save the XML data to a file named "csv2xml"
f = open("csv2xml.xml", "w")
f.write(df_xml)
f.close()

Problems with parsing xml

I have some code that is parsing an xml file and saving it as a csv. I can do this two ways, one by manually downloading the xml file and then parsing it, the other by taking the xml feed directly using ET.fromstring and then parsing. When I go directly I get data errors it appears to be an integrity issue. I am trying to include the xml download in to the code, but I am not quite sure the best way to approach this.
import xml.etree.ElementTree as ET
import csv
import urllib
url = 'http://www.capitalbikeshare.com/data/stations/bikeStations.xml'
connection = urllib.urlopen(url)
data = connection.read()
#I need code here!!!
tree = ET.parse('bikeStations.xml')
root = tree.getroot()
#for child in root:
#print child.tag, child.attrib
locations = []
for station in root.findall('station'):
name = station.find('name').text
bikes = station.find('nbBikes').text
docks = station.find('nbEmptyDocks').text
time = station.find('latestUpdateTime').text
sublist = [name, bikes, docks, time]
locations.append(sublist)
#print 'Station:', name, 'has', bikes, 'bikes and' ,docks, 'docks'
#print locations
s = open('statuslog.csv', 'wb')
w = csv.writer(s)
w.writerows(locations)
s.close()
f = open('filelog.csv', 'ab')
w = csv.writer(f)
w.writerows(locations)
f.close()

What you need is:
root = ET.fromstring(data)
and omit the line of: tree = ET.parse('bikeStations.xml')
As the response from connection.read() returns String, you can directly read the XML string by using fromstring method, you can read more from HERE.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Parsing XML using ElementTree Python - python

yourTag.attrib.get("the_attribute")

Related

Search and replace strings in XML using python

create a condition that separates text.tag when parsing xml with python

Python: Writing list to xml file

Convert CSV document to XML

Problems with parsing xml

Categories

Resources