Python parse and modify XML elements and subelements

Python parse and modify XML elements and subelements - python

I'm using ElementTree to parse and modify my XML-File with the structure below. The actual file is much bigger Platz_1 to Platz_250 but the structure is the same. Now I want to set all elements.text and subelements.text of Platz_X to "0" at once, when the element.text of "_Name" of Platz_X is None and continue with the next Platz_X+1
My problem is when i parse through the file in a loop to check all the values, I don't know how to stop my loop set all the texts to "0" and continue with the next Platz_X+1.
tree = ET.parse(xml)
root = tree.getroot()
wkz = root.getchildren()
for sub_wkz in wkz:
for platz in sub_wkz:
for child in platz:
if child.text:
if len(child.text.split()) > 0:
var = child.text
for subchild in child:
if subchild.text:
if len(child.text.split()) > 0:
var_sub = subchild.text
<?xml version='1.0' encoding='utf-8'?>
<Maschine>
<INDUSTRIE_WKZ_1>
<Platz_1>
<_Name>6006003</_Name>
<_Duplo>1</_Duplo>
<_Zustand>131</_Zustand>
<Schneide_1>
<_Sollstandzeit>60,0</_Sollstandzeit>
<_Iststandzeit>50,58213424682617</_Iststandzeit>
<_Vorwarngrenze>10,0</_Vorwarngrenze>
<_Laenge_L1>237,89599609375</_Laenge_L1>
<_Laenge_L2>0</_Laenge_L2>
<_Radius>0</_Radius>
</Schneide_1>
<Schneide_2>
<_Sollstandzeit>0</_Sollstandzeit>
<_Iststandzeit>0</_Iststandzeit>
<_Vorwarngrenze>0</_Vorwarngrenze>
<_Laenge_L1>0</_Laenge_L1>
<_Laenge_L2>0</_Laenge_L2>
<_Radius>0</_Radius>
</Schneide_2>
<Schneide_3>
<_Sollstandzeit>0</_Sollstandzeit>
<_Iststandzeit>0</_Iststandzeit>
<_Vorwarngrenze>0</_Vorwarngrenze>
<_Laenge_L1>0</_Laenge_L1>
<_Laenge_L2>0</_Laenge_L2>
<_Radius>0</_Radius>
</Schneide_3>
<Schneide_4>
<_Sollstandzeit>0</_Sollstandzeit>
<_Iststandzeit>0</_Iststandzeit>
<_Vorwarngrenze>0</_Vorwarngrenze>
<_Laenge_L1>0</_Laenge_L1>
<_Laenge_L2>0</_Laenge_L2>
<_Radius>0</_Radius>
</Schneide_4>
</Platz_1>
<INDUSTRIE_WKZ_1>
<Maschine>

I changed the XML you provided a bit :
added the missing slash (/) to the INDUSTRIE_WKZ_1 closing tag
added the missing slash (/) to the <Maschine>closing tag
removed the Schneide_2 through 4 for brevity (but it works fine with it)
added a Platz_2 whose _Name is empty (if that is what you mean by "is None") in an INDUSTRIE_WKZ_2 (so the code works if there are multiple "WKZ")
This is the input file I used :
<?xml version='1.0' encoding='utf-8'?>
<Maschine>
<INDUSTRIE_WKZ_1>
<Platz_1>
<_Name>6006003</_Name>
<_Duplo>1</_Duplo>
<_Zustand>131</_Zustand>
<Schneide_1>
<_Sollstandzeit>60,0</_Sollstandzeit>
<_Iststandzeit>50,58213424682617</_Iststandzeit>
<_Vorwarngrenze>10,0</_Vorwarngrenze>
<_Laenge_L1>237,89599609375</_Laenge_L1>
<_Laenge_L2>0</_Laenge_L2>
<_Radius>0</_Radius>
</Schneide_1>
</Platz_1>
</INDUSTRIE_WKZ_1>
<INDUSTRIE_WKZ_2>
<Platz_2>
<_Name></_Name>
<_Duplo>1</_Duplo>
<_Zustand>131</_Zustand>
<Schneide_1>
<_Sollstandzeit>60,0</_Sollstandzeit>
<_Iststandzeit>50,58213424682617</_Iststandzeit>
<_Vorwarngrenze>10,0</_Vorwarngrenze>
<_Laenge_L1>237,89599609375</_Laenge_L1>
<_Laenge_L2>0</_Laenge_L2>
<_Radius>0</_Radius>
</Schneide_1>
</Platz_2>
</INDUSTRIE_WKZ_2>
</Maschine>
I assume there is only one Maschine and that it only contains INDUSTRIE_WKZ_* which contains Platz_*.
And here is my code :
from itertools import islice
from xml.etree.ElementTree import ElementTree as ET
src_xmlfile_name = "68253543.xml"
dst_xmlfile_name = "68253543_post.xml"
ET = ET()
root = ET.parse(src_xmlfile_name)
for platz_elem in root.findall("*/*"): # all "Platz" children of "WKZ" children of the root
platz_name_elem = platz_elem.find("_Name")
if platz_name_elem.text is None:
# we want to put to 0 all values in this Platz's descendants
for platz_descendant in islice(platz_elem.iter(), 1, None): # skip the first one, which is the "Platz" elem
if (platz_descendant.tag != "_Name" # keep "_Name
and platz_descendant.text is not None # keep empty ones
and platz_descendant.text.strip() != ""): #
platz_descendant.text = "0"
ET.write(dst_xmlfile_name, encoding="utf-8", xml_declaration=True)
which produces this output :
<?xml version='1.0' encoding='utf-8'?>
<Maschine>
<INDUSTRIE_WKZ_1>
<Platz_1>
<_Name>6006003</_Name>
<_Duplo>1</_Duplo>
<_Zustand>131</_Zustand>
<Schneide_1>
<_Sollstandzeit>60,0</_Sollstandzeit>
<_Iststandzeit>50,58213424682617</_Iststandzeit>
<_Vorwarngrenze>10,0</_Vorwarngrenze>
<_Laenge_L1>237,89599609375</_Laenge_L1>
<_Laenge_L2>0</_Laenge_L2>
<_Radius>0</_Radius>
</Schneide_1>
</Platz_1>
</INDUSTRIE_WKZ_1>
<INDUSTRIE_WKZ_2>
<Platz_2>
<_Name />
<_Duplo>0</_Duplo>
<_Zustand>0</_Zustand>
<Schneide_1>
<_Sollstandzeit>0</_Sollstandzeit>
<_Iststandzeit>0</_Iststandzeit>
<_Vorwarngrenze>0</_Vorwarngrenze>
<_Laenge_L1>0</_Laenge_L1>
<_Laenge_L2>0</_Laenge_L2>
<_Radius>0</_Radius>
</Schneide_1>
</Platz_2>
</INDUSTRIE_WKZ_2>
</Maschine>
(including the XML declaration in the output file is based on this answer)

Related

Read XML with Python tree.getroot

I am new to Python, I have this XML and this code. This is an invoice, where "SalesOrderRet" is the header and "SalesOrderLineRet" is each line of the invoice. The problem that I have is I don't know how to read the SalesOrderLineRet individually for each header. The code that I have here is adding me all the "SalesOrderLineRet" from the entire XML and not just one for the header.
def read_xml():
tree = ET.parse('LastResponse.xml')
root = tree.getroot()
form_data = {}
collection = db["tracking"]
for item in root.iter('SalesOrderRet'):
WO = item.find('RefNumber').text
TimeCreatedQB = item.find('TimeCreated').text
Client = item.find('CustomerRef/FullName').text
for items in root.iter('SalesOrderLineRet'):
descrip = getattr(items.find('Desc'), 'text', None)

For an XML file like this,
<?xml version="1.0"?>
<data>
<SalesOrderRet>
<SalesOrderLineRet>
<RefNumber>1</RefNumber>
<TimeCreated>0:00</TimeCreated>
<CustomerRef>
<FullName>John Doe</FullName>
</CustomerRef>
</SalesOrderLineRet>
<SalesOrderLineRet>
<RefNumber>2</RefNumber>
<TimeCreated>0:00</TimeCreated>
<CustomerRef>
<FullName>Jack Doe</FullName>
</CustomerRef>
</SalesOrderLineRet>
</SalesOrderRet>
<SalesOrderRet>
<SalesOrderLineRet>
<RefNumber>3</RefNumber>
<TimeCreated>0:00</TimeCreated>
<CustomerRef>
<FullName>Mary Doe</FullName>
</CustomerRef>
</SalesOrderLineRet>
<SalesOrderLineRet>
<RefNumber>4</RefNumber>
<TimeCreated>0:00</TimeCreated>
<CustomerRef>
<FullName>Susan Doe</FullName>
</CustomerRef>
</SalesOrderLineRet>
</SalesOrderRet>
</data>
This function should read the tags and attributes individually. If not already, index each <SalesOrderRet> tag and store the individual attributes under that index.
def get_xml(filename):
tree = ET.parse(filename)
root = tree.getroot()
for SalesOrderRet in root:
print(SalesOrderRet.tag, SalesOrderRet.attrib)
for SalesOrderLineRet in SalesOrderRet.iter('SalesOrderLineRet'):
print(' ', SalesOrderLineRet.tag, SalesOrderLineRet.attrib)
WO = SalesOrderLineRet.find('RefNumber').text
TimeCreatedQB = SalesOrderLineRet.find('TimeCreated').text
Client = SalesOrderLineRet.find('CustomerRef/FullName').text
print(' ', WO, TimeCreatedQB, Client)
This code is based off of the docs

Iterating through xml file

I am trying to get all surnames from xml file, but if I am trying to use find, It throws an exception
TypeError: 'NoneType' object is not iterable
This is my code:
import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
root = tree.getroot()
for elem in root:
for subelem in elem:
for subsubelem in subelem.find('surname'):
print(subsubelem.text)
When I remove the find('surname') from code, It returning all texts from subsubelements.
This is xml:
<?xml version="1.0" encoding="UTF-8"?>
<pp:card xmlns:pp="http://xmlns.page.com/path/subpath">
<pp:id>1</pp:id>
<pp:customers>
<pp:customer>
<pp:name>John</pp:name>
<pp:surname>Walker</pp:surname>
<pp:adress>
<pp:street>Walker street</pp:street>
<pp:number>1/1</pp:number>
<pp:state>England</pp:state>
</pp:adress>
<pp:created>2021-03-08Z</pp:created>
</pp:customer>
<pp:customer>
<pp:name>Michael</pp:name>
<pp:surname>Jordan</pp:surname>
<pp:adress>
<pp:street>Jordan street</pp:street>
<pp:number>28</pp:number>
<pp:state>USA</pp:state>
</pp:adress>
<pp:created>2021-03-09Z</pp:created>
</pp:customer>
</pp:customers>
</pp:card>
How should I fix it?

Not really a python person, but should the "find" statement include the "pp:" in its search, such as,
find('pp:surname')
Neither the opening nor closing tags actually match "surname".

Use the namespace when you call findall
import xml.etree.ElementTree as ET
xml = '''<?xml version="1.0" encoding="UTF-8"?>
<pp:card xmlns:pp="http://xmlns.page.com/path/subpath">
<pp:id>1</pp:id>
<pp:customers>
<pp:customer>
<pp:name>John</pp:name>
<pp:surname>Walker</pp:surname>
<pp:adress>
<pp:street>Walker street</pp:street>
<pp:number>1/1</pp:number>
<pp:state>England</pp:state>
</pp:adress>
<pp:created>2021-03-08Z</pp:created>
</pp:customer>
<pp:customer>
<pp:name>Michael</pp:name>
<pp:surname>Jordan</pp:surname>
<pp:adress>
<pp:street>Jordan street</pp:street>
<pp:number>28</pp:number>
<pp:state>USA</pp:state>
</pp:adress>
<pp:created>2021-03-09Z</pp:created>
</pp:customer>
</pp:customers>
</pp:card>'''
ns = {'pp': 'http://xmlns.page.com/path/subpath'}
root = ET.fromstring(xml)
names = [sn.text for sn in root.findall('.//pp:surname', ns)]
print(names)
output
['Walker', 'Jordan']

Python XML findall is returning the wrong thing

I want to read data from an xml file, but its not returning the right thing.
i get only the first of the child nodes instead of all of them
The XML looks something like this:
<?xml version="1.0" encoding="UTF-8" ?>
<medicalData>
<pacijent> #patient1
<lbo>12345678901</lbo>
<ime>bob</ime>
<prezime>smith</prezime>
<datumRodj>13.10.1954.</datumRodj>
<pregledi>nema</pregledi>
</pacijent>
<pacijent> #patient2
<lbo>22345678901</lbo>
<ime>bobert</ime>
<prezime>smith</prezime>
<datumRodj>30.03.2003</datumRodj>
<pregledi>nema</pregledi>
</pacijent>
<lekar>
<id>111</id>
<ime>john</ime>
<prezime>doe</prezime>
<spacijalizacija>aaa</spacijalizacija>
</lekar>
</medicalData>
Here, if i search for a patient like:
d = etree.parse("pacijent.xml")
listaPodataka = d.getroot()
pacijenti = {}
p = []
for podatak in listaPodataka.findall('pacijent'):
p.append(podatak)
for pacijent in p:
lbo=pacijent[0].text
ime = pacijent[1].text
prezime = pacijent[2].text
datumRodjenja = pacijent[3].text
pregledi=pacijent[4].text
pacijenti[lbo]=Pacijent(lbo,ime,prezime,datumRodjenja,pregledi)
return pacijenti
it would return patient1 but not patient 2
Any ideas what i am doing wrong? I have tried different solutions but nothing seems to work (from the things i have tried).

Here (56605102.xml is the XML taken from you post)
import xml.etree.ElementTree as ET
root = ET.parse("56605102.xml")
for pacijent in root.findall('pacijent'):
print(pacijent)
for child in pacijent:
print('\t' + child.tag + ':' + child.text)
output
<Element 'pacijent' at 0x108d70d68>
lbo:12345678901
ime:bob
prezime:smith
datumRodj:13.10.1954.
pregledi:nema
<Element 'pacijent' at 0x108f50868>
lbo:22345678901
ime:bobert
prezime:smith
datumRodj:30.03.2003
pregledi:nema

Reshape xml using python?

I have a xml like this
<data>
<B>Head1</B>
<I>Inter1</I>
<I>Inter2</I>
<I>Inter3</I>
<I>Inter4</I>
<I>Inter5</I>
<O>,</O>
<B>Head2</B>
<I>Inter6</I>
<I>Inter7</I>
<I>Inter8</I>
<I>Inter9</I>
<O>,</O>
<O> </O>
</data>
and I want the XML to look like
<data>
<combined>Head1 Inter1 Inter2 Inter3 Inter4 Inter5</combined>,
<combined>Head2 Inter6 Inter7 Inter8 Inter9</combined>
</data>
I tried to get all values of "B"
for value in mod.getiterator(tag='B'):
print (value.text)
Head1
Head2
for value in mod.getiterator(tag='I'):
print (value.text)
Inter1
Inter2
Inter3
Inter4
Inter5
Inter6
Inter7
Inter8
Inter9
Now How should I save the first iteration value to one tag and then the second one in diffrent tag. ie. How do make the iteration to start at tag "B" find all the tag "I" which are following it and then iterate again if I again find a tag "B" and save them all in a new tag.
tag "O" will always be present at the end

You can use ElementTree module from xml.etree:
from xml.etree import ElementTree
struct = """
<data>
{}
</data>
"""
def reformat(tree):
root = tree.getroot()
seen = []
for neighbor in root.iter('data'):
for child in neighbor.getchildren():
tx = child.text
if tx == ',':
yield "<combined>{}<combined>".format(' '.join(seen))
seen = []
else:
seen.append(tx)
with open('test.xml') as f:
tree = ElementTree.parse(f)
print(struct.format(',\n'.join(reformat(tree))))
result:
<data>
<combined>Head1 Inter1 Inter2 Inter3 Inter4 Inter5<combined>,
<combined>Head2 Inter6 Inter7 Inter8 Inter9<combined>
</data>
Note that if you're not sure all the blocks are separated wit comma you can simply change the condition if tx == ',': according your file format. You can also check when the tx is started with 'Head' then if seen is not empty yield the seen and clear its content, otherwise append the tx and continue.

Remove XML node if childnode's childnode contains specific value

I need to filter an XML file for certain values, if the node contains this value, the node should be removed.
<?xml version="1.0" encoding="utf-8" ?>
<ogr:FeatureCollection
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://ogr.maptools.org/ TZwards.xsd"
xmlns:ogr="http://ogr.maptools.org/"
xmlns:gml="http://www.opengis.net/gml">
<gml:boundedBy></gml:boundedBy>
<gml:featureMember>
<ogr:TZwards fid="F0">
<ogr:Region_Nam>TARGET</ogr:Region_Nam>
<ogr:District_N>Kondoa</ogr:District_N>
<ogr:Ward_Name>Bumbuta</ogr:Ward_Name>
</ogr:TZwards>
</gml:featureMember>
<gml:featureMember>
<ogr:TZwards fid="F1">
<ogr:Region_Nam>REMOVE</ogr:Region_Nam>
<ogr:District_N>Kondoa</ogr:District_N>
<ogr:Ward_Name>Pahi</ogr:Ward_Name>
</ogr:TZwards>
</gml:featureMember>
</ogr:FeatureCollection>
The Python script should keep the <gml:featureMember> node if the <ogr:Region_Nam> contains TARGET and remove all other nodes.
from xml.dom import minidom
import xml.etree.ElementTree as ET
tree = ET.parse('input.xml').getroot()
removeList = list()
for child in tree.iter('gml:featureMember'):
if child.tag == 'ogr:TZwards':
name = child.find('ogr:Region_Nam').text
if (name == 'TARGET'):
removeList.append(child)
for tag in removeList:
parent = tree.find('ogr:TZwards')
parent.remove(tag)
out = ET.ElementTree(tree)
out.write(outputfilepath)
Desired output:
<?xml version="1.0" encoding="utf-8" ?>
<ogr:FeatureCollection>
<gml:boundedBy></gml:boundedBy>
<gml:featureMember>
<ogr:TZwards fid="F0">
<ogr:Region_Nam>TARGET</ogr:Region_Nam>
<ogr:District_N>Kondoa</ogr:District_N>
<ogr:Ward_Name>Bumbuta</ogr:Ward_Name>
</ogr:TZwards>
</gml:featureMember>
</ogr:FeatureCollection>
My output still contains all nodes..

You need to declare the namespaces in the python code:
from xml.dom import minidom
import xml.etree.ElementTree as ET
tree = ET.parse('/tmp/input.xml').getroot()
namespaces = {'gml': 'http://www.opengis.net/gml', 'ogr':'http://ogr.maptools.org/'}
for child in tree.findall('gml:featureMember', namespaces=namespaces):
if len(child.find('ogr:TZwards', namespaces=namespaces)):
name = child.find('ogr:TZwards', namespaces=namespaces).find('ogr:Region_Nam', namespaces=namespaces).text
if name != 'TARGET':
tree.remove(child)
out = ET.ElementTree(tree)
out.write("/tmp/out.xml")

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python parse and modify XML elements and subelements - python

Related

Read XML with Python tree.getroot

Iterating through xml file

Python XML findall is returning the wrong thing

Reshape xml using python?

Remove XML node if childnode's childnode contains specific value

Categories

Resources