python ElementTree the text of element who has a child - python

When I try to read a text of a element who has a child, it gives None:
See the xml (say test.xml):
<?xml version="1.0"?>
<data>
<test><ref>MemoryRegion</ref> abcd</test>
</data>
and the python code that wants to read 'abcd':
import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
root = tree.getroot()
print root.find("test").text
When I run this python, it gives None, rather than abcd.
How can I read abcd under this condition?

Use Element.tail attribute:
>>> import xml.etree.ElementTree as ET
>>> tree = ET.parse('test.xml')
>>> root = tree.getroot()
>>> print root.find(".//ref").tail
abcd

ElementTree has a rather different view of XML that is more suited for nested data. .text is the data right after a start tag. .tail is the data right after an end tag. so you want:
print root.find('test/ref').tail

Related

Change the key value in XML using Python

I need to modify the online_hostname key value in XML using python. I tried xml element tree but it does not work.
import xml.etree.ElementTree as ET
xml_tree = ET.parse('test.xml')
root = xml_tree.getroot()
root[0][0] = "requiredvalue"
test.xml file is as below:
<?xml version="1.0" encoding="UTF-8" ?>
<bzinfo>
<myidentity online_hostname="testdevice-air_2022_01_25"
bzlogin="me#abc.com" />
</bzinfo>
Error:
IndexError: child assignment index out of range
It is much better to explicitly search the required node (find will do in this case) instead of using indexes on the root node:
import xml.etree.ElementTree as ET
xml_tree = ET.parse('test.xml')
root = xml_tree.getroot()
myidentity_node = root.find('myidentity')
myidentity_node.attrib['online_hostname'] = 'required_value'
xml_tree.write('modified.xml')
modified.xml after running this code:
<bzinfo>
<myidentity bzlogin="me#abc.com" online_hostname="required_value" />
</bzinfo>

Parsing XML Attributes with Python

I am trying to parse out all the green highlighted attributes (some sensitive things have been blacked out), I have a bunch of XML files all with similar formats, I already know how to loop through all of them individually them I am having trouble parsing out the specific attributes though.
XML Document
I need the text in the attributes: name="text1"
from
project logLevel="verbose" version="2.0" mainModule="Main" name="text1">
destinationDir="/text2" from
put label="Put Files" destinationDir="/Trigger/FPDMMT_INBOUND">
destDir="/text3" from
copy disabled="false" version="1.0" label="Archive Files" destDir="/text3" suffix="">
I am using
import csv
import os
import re
import xml.etree.ElementTree as ET
tree = ET.parse(XMLfile_path)
item = tree.getroot()[0]
root = tree.getroot()
print (item.get("name"))
print (root.get("name"))
This outputs:
Main
text1
The item.get pulls the line at index [0] which is the first line root in the tree which is <module
The root.get pulls from the first line <project
I know there's a way to search for exactly the right part of the root/tree with something like:
test = root.find('./project/module/ftp/put')
print (test.get("destinationDir"))
I need to be able to jump directly to the thing I need and output the attributes I need.
Any help would be appreciated
Thanks.
Simplified copy of your XML:
xml = '''<project logLevel="verbose" version="2.0" mainModule="Main" name="hidden">
<module name="Main">
<createWorkspace version="1.0"/>
<ftp version="1.0" label="FTP connection to PRD">
<put label="Put Files" destinationDir="destination1">
</put>
</ftp>
<ftp version="1.0" label="FTP connection to PRD">
<put label="Put Files" destinationDir="destination2">
</put>
</ftp>
<copy disabled="false" destDir="destination3">
</copy>
</module>
</project>
'''
# solution using ETree
from xml.etree import ElementTree as ET
root = ET.fromstring(xml)
name = root.get('name')
ftp_destination_dir1 = root.findall('./module/ftp/put')[0].get('destinationDir')
ftp_destination_dir2 = root.findall('./module/ftp/put')[1].get('destinationDir')
copy_destination_dir = root.find('./module/copy').get('destDir')
print(name)
print(ftp_destination_dir1)
print(ftp_destination_dir2)
print(copy_destination_dir)
# solution using lxml
from lxml import etree as et
root = et.fromstring(xml)
name = root.get('name')
ftp_destination_dirs = root.xpath('./module/ftp/put/#destinationDir')
copy_destination_dir = root.xpath('./module/copy/#destDir')[0]
print(name)
print(ftp_destination_dirs[0])
print(ftp_destination_dirs[1])
print(copy_destination_dir)

python, xml: how to access the 3rd child by element' name

Would you help me, pleace, to get an access to elemnt with name 'id' by the following construction in Python (i have lxml and xml.etree.ElementTree libraries).
Desirable result: '0000000'
Desirable method:
Search in xml-document a child, where it's name is fcsProtocolEF3.
Search in fcsProtocolEF3 an element with name 'id'.
It is crucial to search by element name. Not by ordinal position.
I tried to use something like this: tree.findall('{http://zakupki.gov.ru/oos/export/1}fcsProtocolEF3')[0].findall('{http://zakupki.gov.ru/oos/types/1}id')[0].text
it works, but it requires to input namespaces. XML-document have different namespaces and I don't know how to define them beforehand.
Thank you.
That would be great to use something like XQuery in SQL:
value('(/*:export/*:fcsProtocolEF3/*:id)[1]', 'nvarchar(21)')) AS [id],
XML-document:
<?xml version="1.0" encoding="UTF-8" standalone="true"?>
<ns2:export xmlns:ns3="http://zakupki.gov.ru/oos/common/1" xmlns:ns4="http://zakupki.gov.ru/oos/base/1" xmlns:ns2="http://zakupki.gov.ru/oos/export/1" xmlns:ns10="http://zakupki.gov.ru/oos/printform/1" xmlns:ns11="http://zakupki.gov.ru/oos/control99/1" xmlns:ns9="http://zakupki.gov.ru/oos/SMTypes/1" xmlns:ns7="http://zakupki.gov.ru/oos/pprf615types/1" xmlns:ns8="http://zakupki.gov.ru/oos/EPtypes/1" xmlns:ns5="http://zakupki.gov.ru/oos/TPtypes/1" xmlns:ns6="http://zakupki.gov.ru/oos/CPtypes/1" xmlns="http://zakupki.gov.ru/oos/types/1">
<ns2:fcsProtocolEF3 schemeVersion="10.2">
<id>0000000</id>
<purchaseNumber>0000000000000000</purchaseNumber>
</ns2:fcsProtocolEF3>
</ns2:export>
lxml solution:
xml = '''<?xml version="1.0"?>
<ns2:export xmlns:ns3="http://zakupki.gov.ru/oos/common/1" xmlns:ns4="http://zakupki.gov.ru/oos/base/1" xmlns:ns2="http://zakupki.gov.ru/oos/export/1" xmlns:ns10="http://zakupki.gov.ru/oos/printform/1" xmlns:ns11="http://zakupki.gov.ru/oos/control99/1" xmlns:ns9="http://zakupki.gov.ru/oos/SMTypes/1" xmlns:ns7="http://zakupki.gov.ru/oos/pprf615types/1" xmlns:ns8="http://zakupki.gov.ru/oos/EPtypes/1" xmlns:ns5="http://zakupki.gov.ru/oos/TPtypes/1" xmlns:ns6="http://zakupki.gov.ru/oos/CPtypes/1" xmlns="http://zakupki.gov.ru/oos/types/1">
<ns2:fcsProtocolEF3 schemeVersion="10.2">
<id>0000000</id>
<purchaseNumber>0000000000000000</purchaseNumber>
</ns2:fcsProtocolEF3>
</ns2:export>'''
from lxml import etree as et
root = et.fromstring(xml)
text = root.xpath('//*[local-name()="export"]/*[local-name()="fcsProtocolEF3"]/*[local-name()="id"]/text()')[0]
print(text)
Below is ET based solution. NS are in use.
import xml.etree.ElementTree as ET
xml = '''<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ns2:export xmlns:ns3="http://zakupki.gov.ru/oos/common/1" xmlns:ns4="http://zakupki.gov.ru/oos/base/1" xmlns:ns2="http://zakupki.gov.ru/oos/export/1" xmlns:ns10="http://zakupki.gov.ru/oos/printform/1" xmlns:ns11="http://zakupki.gov.ru/oos/control99/1" xmlns:ns9="http://zakupki.gov.ru/oos/SMTypes/1" xmlns:ns7="http://zakupki.gov.ru/oos/pprf615types/1" xmlns:ns8="http://zakupki.gov.ru/oos/EPtypes/1" xmlns:ns5="http://zakupki.gov.ru/oos/TPtypes/1" xmlns:ns6="http://zakupki.gov.ru/oos/CPtypes/1" xmlns="http://zakupki.gov.ru/oos/types/1">
<ns2:fcsProtocolEF3 schemeVersion="10.2">
<id>0000000</id>
<purchaseNumber>0000000000000000</purchaseNumber>
</ns2:fcsProtocolEF3>
</ns2:export>
'''
def get_id_text():
root = ET.fromstring(xml)
fcs = root.find('{http://zakupki.gov.ru/oos/export/1}fcsProtocolEF3')
# assuming there is one fcs element and one id under fcs
return fcs.find('{http://zakupki.gov.ru/oos/types/1}id').text
print(get_id_text())
output
0000000

Is there a way to return the value for a tag from a XML based on a specific path in python?

I have this XML
<Body>
<Batch_Number>2000</Batch_Number>
<Total_No_Of_Batches>12312</Total_No_Of_Batches>
<requestNo>1923</requestNo>
<Parent1>
<Parent2>
<Parent3>
<lastModifiedDateTime>2022-11-11T11:07:30.000</lastModifiedDateTime>
<purpose>NeverMore</purpose>
<endDate>9999-12-31T00:00:00.000</endDate>
<createdDateTime>2019-06-06T06:32:16.000</createdDateTime>
<createdOn>2019-06-06T08:32:16.000</createdOn>
<address2>Forever street 21</address2>
<externalCode>code123</externalCode>
<lastModifiedBy>user2.thisUser</lastModifiedBy>
<lastModifiedOn>2039-06-11T13:07:30.000</lastModifiedOn>
<lastModifiedBy>MG</lastModifiedBy>
<PS>1234431</PS>
</Parent3>
</Parent2>
</Parent1>
</Body>
Is there a way to return the value for lastModifiedBy for example where the path has this specific structure :
Body.Parent1.Parent2.Parent3.lastModifiedBy
Idealy, I would like to populate a dictionary with the child tag name and its value, for example :
dict[lastModifiedBy.tag] = lastModifiedBy.text
You can use xml from standart library for working with xml files.
from xml.etree import ElementTree as ET
tree = ET.parse("d.xml") # our xml file
root = tree.getroot()
And then you can access elements as indexes or you can use root like as a list:
for i in root:
print(i)
A XML element may have more than one child with same tag name (even you have two lastModifiedBy in the Parent3). This is why we use them like lists, they works like a list. So you shouldn't try to use them like dictionary.
I think you need to use XPath. Like so:
from xml.etree import ElementTree as ET
tree = ET.parse("d.xml") # our xml file
root = tree.getroot()
s = root.findall(".Parent1/Parent2/Parent3/lastModifiedBy")
for i in s:
print(i.text)
This gives you all lastModifiedBy elements in the Parent3 element. You can access to any index if you want too, like this:
from xml.etree import ElementTree as ET
tree = ET.parse("d.xml") # our xml file
root = tree.getroot()
s = root.find(".Parent1/Parent2/Parent3/lastModifiedBy[1]") # first element with "lastModifiedBy" tag
print(s.text)

Python xml ElementTree findall returns empty result

I would like to parse following XML file using the Python xml ElementTree API.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<foos>
<foo_table>
<!-- bar -->
<fooelem>
<fname>BBBB</fname>
<group>SOMEGROUP</group>
<module>some module</module>
</fooelem>
<fooelem>
<fname>AAAA</fname>
<group>other group</group>
<module>other module</module>
</fooelem>
<!-- bar -->
</foo_table>
</foos>
In this example code I try to find all the elements under /foos/foo_table/fooelem/fname but obviously findall doesn't find anything when running this code.
import xml.etree.cElementTree as ET
tree = ET.ElementTree(file="min.xml")
for i in tree.findall("./foos/foo_table/fooelem/fname"):
print i
root = tree.getroot()
for i in root.findall("./foos/foo_table/fooelem/fname"):
print i
I am not experienced with the ElementTree API, but I've used the example under https://docs.python.org/2/library/xml.etree.elementtree.html#example. Why is it not working in my case?
foos is your root, you would need to start findall below, e.g.
root = tree.getroot()
for i in root.findall("foo_table/fooelem/fname"):
print i.text
Output:
BBBB
AAAA
This is because the path you are using begins BEFORE the root element (foos).
Use this instead: foo_table/fooelem/fname
findall doesn't work, but this does:
e = xml.etree.ElementTree.parse(myfile3).getroot()
mylist=list(e.iter('checksum'))
print (len(mylist))
mylist will have the proper length.

Categories

Resources