Python XML: Getting unnecessary new lines when adding new element

Python XML: Getting unnecessary new lines when adding new element - python

Basically I'm trying to add a new element and for it to be properly indented, but with this code I get unnecessary new lines between elements. What is causing it and how do I fix it? Thanks
Example:
from xml.dom import minidom
import xml.etree.ElementTree as ET
def example(name, category):
tree = ET.parse("example1.xml")
root = tree.getroot()
for i in root:
if i.tag == category:
ET.SubElement(i, name).text = name
xmlStr = minidom.parseString(ET.tostring(root)).toprettyxml(indent=" ")
with open("example1.xml", "w") as f:
f.write(xmlStr)
example("test", 'FRUITS')
XML File:
<?xml version="1.0" ?>
<root>
<FRUITS>
<APPLE>apple</APPLE>
<PEAR>pear</PEAR>
<PLUM>plum</PLUM>
</FRUITS>
<VEGETABLES>
<CARROT>carrot</CARROT>
<POTATO>potato</POTATO>
</VEGETABLES>

Related

Converting unusual XML file to CSV using Python

I'm having an issue with my XML file. I would like to achieve the same as in: https://www.delftstack.com/howto/python/xml-to-csv-python/
However, my XML file looks a bit different, for example:
<students>
<student name="Rick Grimes" rollnumber="1" age="15"/>
<student name="Lori Grimes" rollnumber="2" age="16"/>
<student name="Judith Grimes" rollnumber="4" age="13"/>
</students>
The code specified in the link does not work with this formatting.
from xml.etree import ElementTree
tree = ElementTree.parse("input.xml")
root = tree.getroot()
for student in root:
name = student.find("name").text
roll_number = student.find("rollnumber").text
age = student.find("age").text
print(f"{name},{roll_number},{age}")
I have very little coding experience, so hoping someone on here can help me out.
Expected result:
Rick Grimes,1,15
Lori Grimes,2,16
Carl Grimes,3,14
Judith Grimes,4,13
Actual result:
AttributeError: 'NoneType' object has no attribute 'text'

text refers to the actual text of the tag. To make it clear:
<student> text here </student>
You don't have any here since your tags are autoclosing. What you are looking for is the tag attribute attrib: doc here
Something like this should help you get what you're looking for:
for student in root:
print(student.attrib)

You cannot get the text if there aren't any text to get.
Instead you want to use .attrib[key] as you have the values as attributes.
I have modified your example so that it will work with your XML file.
from xml.etree import ElementTree
tree = ElementTree.parse("input.xml")
root = tree.getroot()
for student in root:
name = student.attrib["name"]
roll_number = student.attrib["rollnumber"]
age = student.attrib["age"]
print(f"{name},{roll_number},{age}")
I hope this will help you.

import io
from xml.etree import ElementTree
xml_string = """<students>
<student name="Rick Grimes" rollnumber="1" age="15"/>
<student name="Lori Grimes" rollnumber="2" age="16"/>
<student name="Judith Grimes" rollnumber="4" age="13"/>
</students>"""
file = io.StringIO(xml_string)
tree = ElementTree.parse(file)
root = tree.getroot()
result = ""
for student in root:
result += f"{student.attrib['name']},{student.attrib['rollnumber']},{student.attrib['age']} "
print(result)
result
Rick Grimes,1,15 Lori Grimes,2,16 Judith Grimes,4,13

For such easy structured XML you can use also the build in function from pandas in two lines of code:
import pandas as pd
df = pd.read_xml('caroline.xml', xpath='.//student')
csv = df.to_csv('caroline.csv', index=False)
# For visualization only
with open('caroline.csv', 'r') as f:
lines = f.readlines()
for line in lines:
print(line)
Output:
name,rollnumber,age
Rick Grimes,1,15
Lori Grimes,2,16
Judith Grimes,4,13
With the option header=False you can also switch off to write the header to the csv file.

How to avoid double escape using XML

I'm using python to make a program which will have to write data in a XML tag of a specific file.
The line of data I'm willing to write is the following.
<Stream>XXXX-XXXX-XXXX-XXXX?p=0</Stream><URL>rtmp://a.rtmp.youtube.com/live2</URL>
But what I get in my XML file after writing is pretty different.
&lt;Stream&gt;XXXX-XXXX-XXXX-XXXX?p=0&lt;/Stream&gt;&lt;URL&gt;rtmp://a.rtmp.youtube.com/live2&lt;/URL&gt;
The &lt and &gt are here for purpose, and are NOT < and >. I need to keep this formatting but when I use the export as xml file, it replaces all the & by &
I use this code to write data in the xml file:
from lxml import etree as ET
Name_with_single_quote= """IF [Calculation_1] = 'Day-1' THEN [begintime] + 1
ELSEIF[Calculation_1] < 'Day-2' THEN [begintime] + 2
ELSEIF [Calculation_1] > "Day-3" THEN [begintime] + 3
ELSE [begintime]
END"""
Name_with_single_quote = Name_with_single_quote.replace("\n", "
").replace("<", "<").replace("'", "&apos;").replace(">",">").replace("\"", """)
Name_with_single_quote = str(Name_with_single_quote)
xml = """<?xml version="1.0"?>
<column role="dimension" type="nominal" name="[Calculation_1]" datatype="boolean" caption="">
<calculation formula=""/>
</column>"""
tree = ET.fromstring(xml)
formula = tree.find('.//calculation')
formula.set('formula', Name_with_single_quote)
from xml.dom import minidom
xmlstr = minidom.parseString(ET.tostring(tree)).toprettyxml()
xmlstr = '\n'.join(list(filter(lambda x: len(x.strip()), xmlstr.split('\n'))))
with open('test_for_esc_result.xml', "w") as f:
f.write(xmlstr)

How can I parse the below XML data using Python?

Source XML
<?xml version='1.0' encoding='UTF-8'?>
<ProcessType xmlns:xmi="http://www.omg.org/XMI" xmi:version="2.0" defaultContext="Default">
<node componentName="tRedshiftRow" componentVersion="0.102" offsetLabelX="0" offsetLabelY="0" posX="-32" posY="96">
<elementParameter field="TECHNICAL" name="QUERYSTORE:QUERYSTORE_TYPE" value="BUILT_IN"/>
<elementParameter field="TEXT" name="DBNAME" value=""""/>
<elementParameter field="TEXT" name="SCHEMA_DB" value=""""/>
<elementParameter field="MEMO_SQL" name="QUERY" value=""DELETE FROM schema.tablename;""/>
</node>
</ProcessType>
I want to get the DELETE statement only where tag is "QUERY", and write it in a text file.
Expected output : DELETE FROM schema.tablename;
I was trying the following way, which obviously didn't work out !
from lxml import etree, objectify
import xml.etree.ElementTree as ET
def convert_xml_to_comp():
metadata = 'source.xml'
parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse(metadata, parser)
root = tree.getroot()
for elem in root.getiterator():
# print(elem)
i = elem.tag.find('}')
if i >= 0:
elem.tag = elem.tag[i+1 :]
objectify.deannotate(root, cleanup_namespaces=True)
tree.write('done.xml', pretty_print=True, xml_declaration=True, encoding='UTF-8')
tree = ET.parse('done.xml')
root = tree.getroot()
def get_sql_text():
file = open( "newdelete.txt", "w")
for root in tree.getroot():
### Get the elements' names ###
for elementParameter in root.iterfind('elementParameter[#name="UNIQUE_NAME"]') :
name=elementParameter.get('value')
### Get the elements' name and SQL ###
for elementParameter in root.iterfind('elementParameter[#name="QUERY"]') :
#print (root.attrib)
val=elementParameter.get('value')
print(root.find('val[#value="DELETE FROM schema.tablename;"]'))
file.close()
get_sql_text()
if __name__ == '__main__':
convert_xml_to_comp()

You do this all in a just a couple of statements using an xpath query. Something like:
>>> from lxml import etree
>>> doc = etree.parse(open('data.xml'))
>>> query = doc.xpath('//elementParameter[#name="QUERY"]')[0].get('value')
>>> print(query)
"DELETE FROM schema.tablename;"
This says "find all the elementParameter elements with name="QUERY" and then return the value of the value attribute of the first one.
To select just those elements that contain "DELETE" in their value attribute, use the contains() function:
>>> doc.xpath('//elementParameter[#name="QUERY" and contains(#value, "DELETE")]')

Python: Writing list to xml file

I'm trying to write the list elements to an xml file. I have written the below code. The xml file is created, but the data is repeated. I'm unable to figure out why is the data written twice in the xml file.
users_list = ['Group1User1', 'Group1User2', 'Group2User1', 'Group2User2']
def create_xml(self):
usrconfig = Element("usrconfig")
usrconfig = ET.SubElement(usrconfig,"usrconfig")
for user in range(len( users_list)):
usr = ET.SubElement(usrconfig,"usr")
usr.text = str(users_list[user])
usrconfig.extend(usrconfig)
tree = ET.ElementTree(usrconfig)
tree.write("details.xml",encoding='utf-8', xml_declaration=True)
Output File: details.xml
-
<usr>Group1User1</usr>
<usr>Group1User2</usr>
<usr>Group2User1</usr>
<usr>Group2User2</usr>
<usr>Group1User1</usr>
<usr>Group1User2</usr>
<usr>Group2User1</usr>
<usr>Group2User2</usr>
enter image description here

usrconfig.extend(usrconfig)
This line looks suspicious to me. if userconfig was a list, this line would be equivalent to "duplicate every element in this list". I suspect that something similar happens for Elements, too. Try deleting that line.
import xml.etree.ElementTree as ET
users_list = ["Group1User1", "Group1User2", "Group2User1", "Group2User2"]
def create_xml():
usrconfig = ET.Element("usrconfig")
usrconfig = ET.SubElement(usrconfig,"usrconfig")
for user in range(len( users_list)):
usr = ET.SubElement(usrconfig,"usr")
usr.text = str(users_list[user])
tree = ET.ElementTree(usrconfig)
tree.write("details.xml",encoding='utf-8', xml_declaration=True)
create_xml()
Result:
<?xml version='1.0' encoding='utf-8'?>
<usrconfig>
<usr>Group1User1</usr>
<usr>Group1User2</usr>
<usr>Group2User1</usr>
<usr>Group2User2</usr>
</usrconfig>

For such a simple xml structure, we can directly write out the file. But this technique might also be useful if one is not up to speed with the python xml modules.
import os
users_list = ["Group1User1", "Group1User2", "Group2User1", "Group2User2"]
os.chdir("C:\\Users\\Mike\\Desktop")
xml_out_DD = open("test.xml", 'wb')
xml_out_DD.write(bytes('<usrconfig>', 'utf-8'))
for i in range(0, len(users_list)):
xml_out_DD.write(bytes('<usr>' + users_list[i] + '</usr>', 'utf-8'))
xml_out_DD.write(bytes('</usrconfig>', 'utf-8'))
xml_out_DD.close()

variable in XML subelement

I'm thinking of Python code to create a dynamic xml ETREE subElement.
I have a hierarchical header to describe a peace of book as the following:
<Books>
<Booktype List= "Story > Fiction > Young">
#here the rest of book text
</Booktype>
<Booktype List= "Science > Math > Young">
#here the rest of book text
</Booktype>
</Books>
How to get a hierarchical xml tag like this :
<Books>
<Booktype>
<Story>
<Fiction>
<Young>
#here the rest of book text
</Young>
</Fiction>
</Story>
</Booktype>
</Books>
This is my code:
import re
import xml.etree.ElementTree as ET
from xml.etree import ElementTree
List= "Story>Fiction>Young"
List = List.split('>')
root = ET.Element('Books')
Booktype =ET.SubElement(root,'Booktype')
for l in List:
ND = ET.SubElement(Booktype,str(l))
Booktype.append(ND)
tree = ET.ElementTree(root)
ElementTree.tostring(root,'utf-8')
I got this bad result:
'<Books><Booktype><Story /><Story /><Story /><Fiction /><Fiction /><Young /><Young /><Story /><Story /><Fiction /><Fiction /><Young /><Young /></Booktype></Books>'

If you want to nest the list elements you have to keep the reference to the previous one so you can add the child element to it, and not to the Booktype element. See the variable currrent in the examples.
from xml.etree import ElementTree as ET
xml_string = '''<Books>
<Booktype List= "Story > Fiction > Young">
#here the rest of book text
</Booktype>
<Booktype List= "Science > Math > Young">
#here the rest of book text 2
</Booktype>
</Books>
'''
xml = ET.fromstring(xml_string)
for booktype in xml.findall('Booktype'):
types = map(lambda x: x.strip(), booktype.get('List').split('>'))
current = booktype
for t in types:
current = ET.SubElement(current, t)
current.text = booktype.text
booktype.text = ''
del booktype.attrib['List']
print ET.tostring(xml,'utf-8')
Gives me the result:
<Books>
<Booktype><Story><Fiction><Young>
#here the rest of book text
</Young></Fiction></Story></Booktype>
<Booktype><Science><Math><Young>
#here the rest of book text 2
</Young></Math></Science></Booktype>
</Books>
And if you want to create a completely new structure you can do:
xml = ET.fromstring(xml_string)
root = ET.Element('Books')
for booktype in xml.findall('Booktype'):
current = ET.SubElement(root, 'Booktype')
for t in map(lambda x: x.strip(), booktype.get('List').split('>')):
current = ET.SubElement(current, t)
current.text = booktype.text
print ET.tostring(root, 'utf-8')

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python XML: Getting unnecessary new lines when adding new element - python

Related

Converting unusual XML file to CSV using Python

How to avoid double escape using XML

How can I parse the below XML data using Python?

Python: Writing list to xml file

variable in XML subelement

Categories

Resources