Iterate through XML to get all child nodes text value - python

i have a xml with following data. i need to get value of and all other attribute. i return a python code there i get only first driver value.
My xml :
<volume name="sp" type="span" operation="create">
<driver>HDD1</driver>
<driver>HDD2</driver>
<driver>HDD3</driver>
<driver>HDD4</driver>
</volume>
My script:
import xml.etree.ElementTree as ET
doc = ET.parse("vol.xml")
root = doc.getroot() #Returns the root element for this tree.
root.keys() #Returns the elements attribute names as a list. The names are returned in an arbitrary order
root.attrib["name"]
root.attrib["type"]
root.attrib["operation"]
print root.get("name")
print root.get("type")
print root.get("operation")
for child in root:
#print child.tag, child.attrib
print root[0].text
My output:
sr-query:~# python volume_check.py aaa
sp
span
create
HDD1
sr-queryC:~#
I am not get HDD2, HDD3 and HDD4. How to itirate through this xml to get all values? Any optimized way? I think any for loop can do that but not familiar with Python.

In your for loop, it should be
child.text
not
root[0].text

Related

Instert an ElementTree.Element as a SubElement

I'm using xml.etree.ElementTree to create some basic XML in Python. I have a block of XML that I need to access on its own, so I make it as a root Element:
import xml.etree.ElementTree as Tree
def correction_xml(self):
correction = Tree.Element('ColourCorrection')
sop_node = Tree.SubElement(correction, "SOPNode")
slope = Tree.SubElement(sop_node, 'Slope')
offset = Tree.SubElement(sop_node, 'Offset')
power = Tree.SubElement(sop_node, 'Power')
return correction
I also need to insert multiple instances of this part of a bigger XML, so is there a way to insert my correction Element into another tree as a SubElement? Somthing like this, except the SubElement factory only accepts a single string, not an Element object:
def correction_list(self):
list = Tree.Element("List")
item_1 = Tree.SubElement(list, self.correction_xml()) /*insert correction_xml into list as a subelement, keeping its children intact*/

XML counting and printing elements

<?xml version="1.0" encoding="utf-8"?>
<export_full date="2022-03-15 07:01:30" version="20160107">
<items>
<item code="A1005" image="https://www.astramodel.cz/images/A/800x600/A1005.jpg" imageDate="2014-04-08" name="Uhlíková tyčka 0.6mm (1m)" brandId="32" brand="ASTRA" czk="89.00" eur="3.50" czksmap="89.00" eursmap="3.50" hasPrice="true" created="2014-01-09" changed="" new="false" stock="true" date="" stock2="true" date2="" stock3="high" date3="" discontinued="false" weight="0.001" length="0.001" width="0.001" height="1.000" recycling_fee="">
<descriptions>
<description title="Charakteristika" order="1"><p>Tyč z uhlíkových vláken kruhového průřezu ø0.6&nbsp;mm v délce 1&nbsp;m. Hmotnost 0,3&nbsp;g</p></description>
</descriptions>
</item>
I have a an XML file which is significantly large however I am trying to count the total number of items and try to type the name attribute of each item, above you can see of how each individual item with its tags looks like.I do get a number when trying to print the total item count however I'm not sure if I'm going about it the right way and in terms of name attributes I am getting nothing so far, please help.
import xml.etree.ElementTree as ET
tree = ET.parse('export_full.xml')
root = tree.getroot()
test = [elem.tag for elem in root.iter("item")]
print(len(test))
for item in root.iter('./item[#name]'):
print(item.attrib)
To evaluate an XPath expression use findall() function. Note the "item" elements are children of "items" element so need to add 'items' to the XPath if using an absolute path otherwise use ".//item[#name]".
for item in root.findall('./items/item[#name]'):
print(item.attrib)
If you want it iterate over all items and add the name attribute to a list.
items = [elem.get('name') for elem in root.iter("item")]
print(len(items), items) # print count of items and list of names
If XML is huge then you can benefit by doing an incremental parse of the XML using iterparse() function.
Example below iterate overs the XML and if tag is 'item' then print its 'name' attribute. You can add whatever logic you want to check.
count = 0
for _, elem in ET.iterparse('export_full.xml'):
if elem.tag == 'item':
print(elem.get('name')) # print out just the name
count += 1
# print(elem.attrib) # print out all attributes
print(count) # display number of items

How do I pull attributes from deeply nested XML sub elements using ElementTree?

I have .xml files that i am attempting to pull attributes from, and i am having trouble grabbing the attributes from the Raw_Material Sub element. Below is some sample data that represents what the files look like:
<XML_Order order_Id='1' terms='Net30' ship_via='UPS'>
<Line_Items>
<Line_Item upc='1234567' item_id='1' color='blk' qty='15'>
<Raw_Materials>
<Raw_Material Item_Id='H188' Vendor_Id='DI0001'> # This is what i need to grab
<Raw_Material Item_Id='ST03' Vendor_Id='DI0001'>
</Raw_Materials>
</Line_Item>
<Line_Item>
<Raw_Materials>
<Raw_Material>
<Raw_Material>
</Raw_Materials>
</Line_Item>
<Line_Item>
<Raw_Materials>
<Raw_Material>
<Raw_Material>
</Raw_Materials>
</Line_Item>
</Line_Items>
</XML_Order>
I am having no problem iterating and pulling the attributes from the Line_Item tags using the following code:
if filename.endswith('.xml'):
tree = Et.ElementTree(file = filename)
root = tree.getroot()
# order info
orderID = root.attrib['Order_Id'] # grab order ID from XML document
terms = root.attrib['terms']
shipVia = root.attrib['ship_via']
for child in root:
for grandchild in child:
upc = grandchild.get('upc')
lineItemID = grandchild.get('item_Id')
color = grandchild.get('item_Id')
# I assume this is where i would need a for loop to access the
# nested <Raw_Material> element and its attributes
I attempted to fill a list with the values with the following in my code (where the last comment is):
for element in tree.iter(tag = 'Raw_Material'):
itemID.append(element.get('Item_Id'))
and python returns the itemID list with the correct itemId's, but they are repeated over and over, when I only need it to grab the 2 item_Id attribute values. I think it is appending the list for each line item tag in my xml doc, instead of a new list for line item tag
Once I grab the data i need, would a list be the best way to hold the values? There will only ever be be 1 or 2 Raw_Material sub elements, and i don't want my variables to overwrite in a loop.
Try using xpath, something like this:
for raw_material in grandchild.findall('.//Raw_Material'):
# your code here
EDIT: Since grandchild refers to your LineItem elements, you might actually need something like .//Raw_Materials/RawMaterial as your xpath.
Below.
There are 6 Raw_Material elements in the xml doc. 4 of them are empty (zero attributes) and 2 of them has 2 attributes each. This is reflected in the 'output'
import xml.etree.ElementTree as ET
xml = """<XML_Order order_Id="1" terms="Net30" ship_via="UPS">
<Line_Items>
<Line_Item upc="1234567" item_id="1" color="blk" qty="15">
<Raw_Materials>
<Raw_Material Item_Id="H188" Vendor_Id="DI0001"/>
<Raw_Material Item_Id="ST03" Vendor_Id="DI0001"/>
</Raw_Materials>
</Line_Item>
<Line_Item>
<Raw_Materials>
<Raw_Material/>
<Raw_Material/>
</Raw_Materials>
</Line_Item>
<Line_Item>
<Raw_Materials>
<Raw_Material/>
<Raw_Material/>
</Raw_Materials>
</Line_Item>
</Line_Items>
</XML_Order>"""
root = ET.fromstring(xml)
# getting the attributes across the xml doc
print('Raw_Materials across the XML doc:')
raw_materials_lst = [entry.attrib for entry in
list(root.findall(".//Raw_Material"))]
print(raw_materials_lst)
# getting the attributes per Line_Item
print('Raw_Materials per line item:')
line_items = [entry for entry in list(root.findall(".//Line_Item"))]
for idx, line_item in enumerate(line_items,1):
print('{}) {}'.format(idx, [entry.attrib for entry in
list(line_item.findall(".//Raw_Material"))]))
output
Raw_Materials across the XML doc:
[{'Item_Id': 'H188', 'Vendor_Id': 'DI0001'}, {'Item_Id': 'ST03', 'Vendor_Id': 'DI0001'}, {}, {}, {}, {}]
Raw_Materials per line item:
1) [{'Item_Id': 'H188', 'Vendor_Id': 'DI0001'}, {'Item_Id': 'ST03', 'Vendor_Id': 'DI0001'}]
2) [{}, {}]
3) [{}, {}]
HUGE thanks to #audiodude, we worked through this for about an hour last night and managed to come up with a workable solution. Below is what he came up with, the attribute data is going into a FileMaker databse, so he set up some boolean flags to capture the item_Id in the Raw_Material tag (since some of my xml files have these tags, and some do not).
for child in root:
for grandchild in child:
# grab any needed attribute data from the line_items element
has_sticker = False
has_tag = False
for material_item in grandchild.findall('.//Raw_Material'):
item_id = material_item.get('Item_Id')
if item_id.startswith('H'):
has_tag = True
liRecord['hasTag'] = 'True' # this is the record in my database
fms.edit(liRecord)
if item_id.startswith('ST'):
has_sticker = True
liRecord['hasSticker'] = 'True'
fms.edit(liRecord)
if liRecord['hasTag'] == 'False' and liRecord['hasSticker'] == 'False':
liRecord['hasTag'] = 'True'
fms.edit(liRecord)

Python - lxml - how to 'move' around the tree when building the tree

Basic question - how do you 'move' around in a tree when you are building a tree.
I can populate the first level:
import lxml.etree as ET
def main():
root = ET.Element('baseURL')
root.attrib["URL"]='www.com'
root.attrib["title"]='Level Title'
myList = [["www.1.com","site 1 Title"],["www.2.com","site 2 Title"],["www.3.com","site 3 Title"]]
for i in xrange(len(myList)):
ET.SubElement(root, "link_"+str(i), URL=myList[i][0], title=myList[i][1])
This gives me something like:
baseURL:
link_0
link_1
link_2
from there, I want to add a subtree from each of the new nodes so it looks something like:
baseURL:
link_0:
link_A
link_B
link_C
link_1
link_2
I can't see how to 'point' the subElement call to the next node down - I tried:
myList2 = [["www.A.com","site A Title"],["www.B.com","site B Title"],["www.C.com","site C Title"]]
for i in xrange(len(myList2)):
ET.SubElement('link_0', "link_"+str(i), URL=myList2[i][0], title=myList2[i][1])
But that throws the error:
TypeError: Argument '_parent' has incorrect type (expected lxml.etree._Element, got str)
as I am giving the subElement call a string, not an element reference. I also tried it as a variable, (i.e. link_0' rather than"link_0"`) and that gives a global missing variable, so my reference is obviously incorrect.
How do I 'point' my lxml builder to a child as a parent, and write a new child?
ET.SubElement(parent_node,type) creates a new XML element node as a child of parent_node. It also returns this new node.
So you could do this:
import lxml.etree as ET
def main():
root = ET.Element('baseURL')
myList = [1,2,3]
children = []
for x in myList:
children.append( ET.SubElement(root, "link_"+str(x)) )
for y in myList:
ET.SubElement( children[0], "child_"+str(y) )
But keeping track of the children is probably excessive since lxml already provides you with many ways to get to them.
Here's a way using lxmls built in children lists:
node = root[0]
for y in myList:
ET.SubElement( node, "child_"+str(y) )
Here's a way using XPath (possibly better if your XML is getting ugly)
node = root.xpath("/baseURL/link_0")[0]
for y in myList:
ET.SubElement( node, "child_"+str(y) )
Found the answer. I should be using the python array referencing, root[n] not trying to get to it via list_0

Turning ElementTree findall() into a list

I'm using ElementTree findall() to find elements in my XML which have a certain tag. I want to turn the result into a list. At the moment, I'm iterating through the elements, picking out the .text for each element, and appending to the list. I'm sure there's a more elegant way of doing this.
#!/usr/bin/python2.7
#
from xml.etree import ElementTree
import os
myXML = '''<root>
<project project_name="my_big_project">
<event name="my_first_event">
<location>London</location>
<location>Dublin</location>
<location>New York</location>
<month>January</month>
<year>2013</year>
</event>
</project>
</root>
'''
tree = ElementTree.fromstring(myXML)
for node in tree.findall('.//project'):
for element in node.findall('event'):
event_name=element.attrib.get('name')
print event_name
locations = []
if element.find('location') is not None:
for events in element.findall('location'):
locations.append(events.text)
# Could I use something like this instead?
# locations.append(''.join.text(*events) for events in element.findall('location'))
print locations
Outputs this (which is correct, but I'd like to assign the findall() results directly to a list, in text format, if possible;
my_first_event
['London', 'Dublin', 'New York']
You can try this - it uses a list comprehension to generate the list without having to create a blank one and then append.
if element.find('location') is not None:
locations = [events.text for events in element.findall('location')]
With this, you can also get rid of the locations definition above, so your code would be:
tree = ElementTree.fromstring(myXML)
for node in tree.findall('.//project'):
for element in node.findall('event'):
event_name=element.attrib.get('name')
print event_name
if element.find('location') is not None:
locations = [events.text for events in element.findall('location')]
print locations
One thing you will want to be wary of is what you are doing with locations - it won't be defined if location doesn't exist, so you will get a NameError if you try to print it and it doesn't exist. If that is an issue, you can retain the locations = [] definition - if the matching element isn't found, the result will just be an empty list.

Categories

Resources