How to find if there are empty attributes in XML?

How to find if there are empty attributes in XML? - python

Having a XML like this one (located in /home/user/):
<?xml version="1.0" ?>
<DataClient xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:cnmc="http://www.example.com/Tipos_DataClient" xmlns="http://www.example.com/DataClient">
<PersonalData Operation="3" Date="2022-09-06">
<ExtendedData>
<Person Code="XXX" OtherCode="Y12354"/>
</ExtendedData>
<Home Type="Street" Num="10" Code="12003" Poblation="Imaginary street"/>
</PersonalData>
</DataClient>
How could I identify if the "Num" attribute is empty? And then generate a list of all those elements that have the "Num" empty...
I tried to count all those with "None" as value, but it always returns 0:
#! /usr/bin/python3
import xml.etree.ElementTree as ET
tree = ET.parse('/home/user/file.xml')
root = tree.getroot()
b = None
a = sum(1 for s in root.findall('./DataClient/PersonalData/ExtendedData/Num') if s.b)
print (a)

Since Python's etree API maps attributes to dictionaries, consider dict.get to check for specific attribute. Also, you need to use namespaces argument of findall since XML contains a default namespace.
import xml.etree.ElementTree as ET
tree = ET.parse('/home/user/file.xml')
nmsp = {"doc": "http://www.example.com/DataClient"}
xpath = "./doc:DataClient/doc:PersonalData/doc:Home"
a = sum(1 for node in tree.findall(xpath, nmsp) if node.attrib.get("Num") is None)

Related

Get children elements of multiple instances of the same name tag using ElementTree

I have an xml file looking like this:
<?xml version="1.0" encoding="UTF-8"?>
<data>
<boundary_conditions>
<rot>
<rot_instance>
<name>BC_1</name>
<rpm>200</rpm>
<parts>
<name>rim_FL</name>
<name>tire_FL</name>
<name>disk_FL</name>
<name>center_FL</name>
</parts>
</rot_instance>
<rot_instance>
<name>BC_2</name>
<rpm>100</rpm>
<parts>
<name>tire_FR</name>
<name>disk_FR</name>
</parts>
</rot_instance>
</data>
I actually know how to extract data corresponding to each instance. So I can do this for the names tag as follows:
import xml.etree.ElementTree as ET
tree = ET.parse('file.xml')
root = tree.getroot()
names= tree.findall('.//boundary_conditions/rot/rot_instance/name')
for val in names:
print(val.text)
which gives me:
BC_1
BC_2
But if I do the same thing for the parts tag:
names= tree.findall('.//boundary_conditions/rot/rot_instance/parts/name')
for val in names:
print(val.text)
It will give me:
rim_FL
tire_FL
disk_FL
center_FL
tire_FR
disk_FR
Which combines all data corresponding to parts/name together. I want output that gives me the 'parts' sub-element for each instance as separate lists. So this is what I want to get:
instance_BC_1 = ['rim_FL', 'tire_FL', 'disk_FL', 'center_FL']
instance_BC_2 = ['tire_FR', 'disk_FR']
Any help is appreciated,
Thanks.

You've got to first find all parts elements, then from each parts element find all name tags.
Take a look:
parts = tree.findall('.//boundary_conditions/rot/rot_instance/parts')
for part in parts:
for val in part.findall("name"):
print(val.text)
print()
instance_BC_1 = [val.text for val in parts[0].findall("name")]
instance_BC_2 = [val.text for val in parts[1].findall("name")]
print(instance_BC_1)
print(instance_BC_2)
Output:
rim_FL
tire_FL
disk_FL
center_FL
tire_FR
disk_FR
['rim_FL', 'tire_FL', 'disk_FL', 'center_FL']
['tire_FR', 'disk_FR']

Extract the Kth tag data in XML using ElementTree

Below is my current XML file (output.xml), and I hope that I can get its tag value using Python.
It is an XML file with namespace.
<data xmlns="urn:ietf:params:ns:netconf:base:1.0">
<interfaces xmlns="http://namespace.net">
<interface>
<name>Interface0</name>
</interface>
<interface>
<name>Interface1</name>
</interface>
<interface>
<name>Interface2</name>
</interface>
</interfaces>
</data>
And...below is my Python code to extract the value of tag <interface>:
from xml.etree import cElementTree as ET
tree = ET.ElementTree(file="output.xml")
root = tree.getroot()
nsmap = {'':'http://namespace.net'} # namespace
for name in root.iterfind('./interfaces/interface/name', namespaces=nsmap):
print(name.text)
My question is:
Is it possible to only fetch "Interface0", "Interface1", or "Interface2"?
If there are multiple <interface> tags, can I only fetch the values of the tags within the kth <interface>?

Use enumerate to get an enumerate object.
for index, name in enumerate(root.iterfind('./interfaces/interface/name', namespaces=nsmap)):
# if index <= kth:
if index == 0:# 1, 2?
print(name.text)

If Interface0, Interface1, ..., InterfaceN always come in sorted order then you don't require the sorted() function. Just remove it in that case.
To pick Kth value:-
import xml.etree.ElementTree as ET
tree = ET.parse("output.xml")
root = tree.getroot()
k = 3 # put the value assuming index starts with 0
sorted(list(map(lambda x: x.text, root.findall("./interfaces/interface/name", namespaces=nsmap))))[k]
To get within Kth values:-
import xml.etree.ElementTree as ET
tree = ET.parse("output.xml")
root = tree.getroot()
k = 3 # put the limit assuming index starts with 0
sorted(list(map(lambda x: x.text, root.findall("./interfaces/interface/name", namespaces=nsmap))))[:k]

How to update value between specific xml tags, where input is string, Python?

Consider I have a string that looks like the following below. It's type is string but it will always represents an xml document. I'm researching available python libraries for xml. How can I update a value in between 2 specific tags? What library would I be using for that?
<?xml version="1.0"?>
<PostTelemetryRequest xmlns:ns2="urn:com:onstar:global:common:schema:PostTelemetryData:1">
<ns2:PartnerVehicles>
<ns2:PartnerVehicle>
<ns2:partnerNotificationID>251029655</ns2:partnerNotificationID>
</ns2:PartnerVehicle>
</ns2:PartnerVehicles>
</PostTelemetryRequest>
For instance, if the input is the string above how can I update the value between <ns2:partnerNotificationID> and </ns2:partnerNotificationID> tags to a new value?

This is the base code:
>>> from xml.etree import ElementTree
>>> s = """<?xml version="1.0"?>
<PostTelemetryRequest xmlns:ns2="urn:com:onstar:global:common:schema:PostTelemetryData:1">
<ns2:PartnerVehicles>
<ns2:PartnerVehicle>
<ns2:partnerNotificationID>251029655</ns2:partnerNotificationID>
</ns2:PartnerVehicle>
</ns2:PartnerVehicles>
</PostTelemetryRequest>
"""
>>> root = ElementTree.fromstring(s)
>>> for e in root.iter():
... if e.tag=='{urn:com:onstar:global:common:schema:PostTelemetryData:1}partnerNotificationID':
... e.text='mytext'
...
>>> etree.ElementTree.tostring(root)
b'<PostTelemetryRequest xmlns:ns0="urn:com:onstar:global:common:schema:PostTelemetryData:1">\n <ns0:PartnerVehicles>\n <ns0:PartnerVehicle>\n <ns0:partnerNotificationID>mytext</ns0:partnerNotificationID>\n </ns0:PartnerVehicle>\n </ns0:PartnerVehicles>\n</PostTelemetryRequest>'

Python3 parse XML into dictionary

It seems the original post was too vague, so I'm narrowing down the focus of this post. I have an XML file from which I want to pull values from specific branches, and I am having difficulty in understanding how to effectively navigate the XML paths. Consider the XML file below. There are several <mi> branches. I want to store the <r> value of certain branches, but not others. In this example, I want the <r> values of counter1 and counter3, but not counter2.
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="Data.xsl" ?>
<!DOCTYPE mdc SYSTEM "Data.dtd">
<mdc xmlns:HTML="http://www.w3.org/TR/REC-xml">
<mfh>
<vn>TEST</vn>
<cbt>20140126234500.0+0000</cbt>
</mfh>
<mi>
<mts>20140126235000.0+0000</mts>
<mt>counter1</mt>
<mv>
<moid>DEFAULT</moid>
<r>58</r>
</mv>
</mi>
<mi>
<mts>20140126235000.0+0000</mts>
<mt>counter2</mt>
<mv>
<moid>DEFAULT</moid>
<r>100</r>
</mv>
</mi>
<mi>
<mts>20140126235000.0+0000</mts>
<mt>counter3</mt>
<mv>
<moid>DEFAULT</moid>
<r>7</r>
</mv>
</mi>
</mdc>
From that I would like to build a tuple with the following:
('20140126234500.0+0000', 58, 7)
where 20140126234500.0+0000 is taken from <cbt>, 58 is taken from the <r> value of the <mi> element that has <mt>counter1</mt> and 7 is taken from the <mi> element that has <mt>counter3</mt>.
I would like to use xml.etree.cElementTree since it seems to be standard and should be more than capable for my purposes. But I am having difficulty in navigating the tree and extracting the values I need. Below is some of what I have tried.
try:
import xml.etree.cElementTree as ET
except ImportError:
import xml.etree.ElementTree as ET
tree = ET.ElementTree(file='Data.xml')
root = tree.getroot()
for mi in root.iter('mi'):
print(mi.tag)
for mt in mi.findall("./mt") if mt.value == 'counter1':
print(mi.find("./mv/r").value) #I know this is invalid syntax, but it's what I want to do :)
From a pseudo code standpoint, what I am wanting to do is:
find the <cbt> value and store it in the first position of the tuple.
find the <mi> element where <mt>counter1</mt> exists and store the <r> value in the second position of the tuple.
find the <mi> element where <mt>counter3</mt> exists and store the <r> value in the third position of the tuple.
I'm not clear when to use element.iter() or element.findall(). Also, I'm not having the best of luck with using XPath within the functions, or being able to extract the info I'm needing.
Thanks,
Rusty

Starting with:
import xml.etree.cElementTree as ET # or with try/except as per your edit
xml_data1 = """<?xml version="1.0"?> and the rest of your XML here"""
tree = ET.fromstring(xml_data) # or `ET.parse(<filename>)`
xml_dict = {}
Now tree has the xml tree and xml_dict will be the dictionary you're trying to get the result.
# first get the key & val for 'cbt'
cbt_val = tree.find('mfh').find('cbt').text
xml_dict['cbt'] = cbt_val
The counters are in 'mi':
for elem in tree.findall('mi'):
counter_name = elem.find('mt').text # key
counter_val = elem.find('mv').find('r').text # value
xml_dict[counter_name] = counter_val
At this point, xml_dict is:
>>> xml_dict
{'counter2': '100', 'counter1': '58', 'cbt': '20140126234500.0+0000', 'counter3': '7'}
Some shortening, though possibly not as read-able: the code in the for elem in tree.findall('mi'): loop can be:
xml_dict[elem.find('mt').text] = elem.find('mv').find('r').text
# that combines the key/value extraction to one line
Or further, building the xml_dict can be done in just two lines with the counters first and cbt after:
xml_dict = {elem.find('mt').text: elem.find('mv').find('r').text for elem in tree.findall('mi')}
xml_dict['cbt'] = tree.find('mfh').find('cbt').text
Edit:
From the docs, Element.findall() finds only elements with a tag which are direct children of the current element.
find() only finds the first direct child.
iter() iterates over all the elements recursively.

Element Tree: How to parse subElements of child nodes

I have an XML tree, which I'd like to parse using Elementtree. My XML looks something like
<?xml version="1.0" encoding="UTF-8"?>
<GetOrdersResponse xmlns="urn:ebay:apis:eBLBaseComponents">
<Ack>Success</Ack>
<Version>857</Version>
<Build>E857_INTL_APIXO_16643800_R1</Build>
<PaginationResult>
<TotalNumberOfPages>1</TotalNumberOfPages>
<TotalNumberOfEntries>2</TotalNumberOfEntries>
</PaginationResult>
<HasMoreOrders>false</HasMoreOrders>
<OrderArray>
<Order>
<OrderID>221362908003-1324471823012</OrderID>
<CheckoutStatus>
<eBayPaymentStatus>NoPaymentFailure</eBayPaymentStatus>
<LastModifiedTime>2014-02-03T12:08:51.000Z</LastModifiedTime>
<PaymentMethod>PaisaPayEscrow</PaymentMethod>
<Status>Complete</Status>
<IntegratedMerchantCreditCardEnabled>false</IntegratedMerchantCreditCardEnabled>
</CheckoutStatus>
</Order>
<Order> ...
</Order>
<Order> ...
</Order>
</OrderArray>
</GetOrdersResponse>
I want to parse the 6th child of the XML () I am able to get the value of subelements by index. E.g if I want OrderID of first order, i can use root[5][0][0].text. But, I would like to get the values of subElements by name. I tried the following code, but it does not print anything:
tree = ET.parse('response.xml')
root = tree.getroot()
for child in root:
try:
for ids in child.find('Order').find('OrderID'):
print ids.text
except:
continue
Could someone please help me on his. Thanks

Since the XML document has a namespace declaration (xmlns="urn:ebay:apis:eBLBaseComponents"), you have to use universal names when referring to elements in the document. For example, you need {urn:ebay:apis:eBLBaseComponents}OrderID instead of just OrderID.
This snippet prints all OrderIDs in the document:
from xml.etree import ElementTree as ET
NS = "urn:ebay:apis:eBLBaseComponents"
tree = ET.parse('response.xml')
for elem in tree.iter("*"): # Use tree.getiterator("*") in Python 2.5 and 2.6
if elem.tag == '{%s}OrderID' % NS:
print elem.text
See http://effbot.org/zone/element-namespaces.htm for details about ElementTree and namespaces.

Try to avoid chaining your finds. If your first find does not find anything, it will return None.
for child in root:
order = child.find('Order')
if order is not None:
ids = order.find('OrderID')
print ids.text

You can find an OrderArray first and then just iterate its children by name:
tree = ET.parse('response.xml')
root = tree.getroot()
order_array = root.find("OrderArray")
for order in order_array.findall('Order'):
order_id_element = order.find('OrderID')
if order_id_element is not None:
print order_id_element.text
A side note. Never ever use except: continue. It hides any exception you get and makes debugging really hard.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to find if there are empty attributes in XML? - python

Related

Get children elements of multiple instances of the same name tag using ElementTree

Extract the Kth tag data in XML using ElementTree

How to update value between specific xml tags, where input is string, Python?

Python3 parse XML into dictionary

Element Tree: How to parse subElements of child nodes

Categories

Resources