ElementTree, .set() and iteration - python

This is my first post on Stack Overflow and am a novice programmer.
I am having trouble using ElementTree and the .set() method. Using an f-string I am able to assign recipe_id with the correct number.
When I try to set the recipe_name attribute, it returns only the last element in the name_list array. I'm a bit lost! I'm sure it's something in my syntax or just my understanding of how I'm actually iterating through the items...I just don't understand because the recipe_id portion works just fine.
Expected output (within the XML)
<recipes recipe_id="1" recipe_name="Apples">
<recipes recipe_id="2" recipe_name="Oranges">
Instead I get:
<recipes recipe_id="1" recipe_name="Oranges">
<recipes recipe_id="2" recipe_name="Oranges">
My code:
#!/usr/bin/env python3
import os
import os.path
import xml.etree.ElementTree as ET
filename = "my_recipes.xml"
xmlTree = ET.parse(filename)
root = xmlTree.getroot()
#change the recipe id in recipes
i = 0
for element in root.iter("recipes"):
i += 1
element.set('recipe_id', f"{i}")
#for every tbody in the tree
name_list = []
for tbody in root.iter("tbody"):
#for every time you find a row
for row in tbody.findall('row'):
data = row.find('entry').text
#get those rows attribs
rec_name = row.find('entry').attrib
#if the row is the row that i want (contains the recipe name)...i couldn't figure out a better way to get this value precisely
if rec_name == {'namest': 'c1', 'nameend': 'c2', 'align': 'left', 'valign': 'bottom'}:
#yoink name and stick it in name_list
name_list.append(data)
for recipes in root.findall('recipes'):
for i in range(len(name_list)):
recipes.set('recipe_name', F"{name_list[i]}")
xmlTree.write(filename, encoding='UTF-8', xml_declaration=True)
My XML:
<?xml version='1.0' encoding='UTF-8'?>
<Root>
<recipes>
<tbody>
<row>
<entry namest="c1" nameend="c2" align="left" valign="bottom">Apples</entry>
</row>
</tbody>
</recipes>
<recipes>
<tbody>
<row>
<entry namest="c1" nameend="c2" align="left" valign="bottom">Oranges</entry>
</row>
</tbody>
</recipes>
</Root>
FIXED:
The code should be
for i,recipes in enumerate(root.findall('recipes')):
recipes.set('recipe_name',name_list[i])
and I am just stupid.

Related

ElementTree wrong encoding

im searching like for hours but I cant find the solution online so im trying to ask you now here in this topic.
I just want to print the inside Content of a html tag in a xml document but im getting only things like (&lt, &gt, and and and...)
It looks like this in the XML Document
<data table="tt_content" elementUid="2490" key="tt_content:NEW/1/2490:bodytext"><![CDATA[<img src="/fileadmin/public/Redaktion/Bilder/Icons/Icon-CE.png" width="28" height="21" class="float-left mt-1 mr-2">
<h4>EU-Baumusterprüfbescheinigung</h4>
When I print it it looks like this
<data table="tt_content" elementUid="2490" key="tt_content:NEW/1/2490:bodytext"><img src="/fileadmin/public/Redaktion/Bilder/Icons/Icon-CE.png" width="28" height="21" class="float-left mt-1 mr-2">
<h4>EU-Baumusterprüfbescheinigung</h4>
as you can see it is very different not only the german characters not being displayed but also the "CDATA" which is very important to me.
There are replaced with &lt.. and so on.
And now to my Code
raw = <data table="tt_content" elementUid="2490" key="tt_content:NEW/1/2490:bodytext"><![CDATA[<img src="/fileadmin/public/Redaktion/Bilder/Icons/Icon-CE.png" width="28" height="21" class="float-left mt-1 mr-2">
<h4>EU-Baumusterprüfbescheinigung</h4>
raw = ET.tostring(data).decode()
print(raw) # print is showed before
What I've also tried
# raw = ET.tostring(raw, encoding='unicode', method='xml')
# raw = ET.tostring(raw, encoding='unicode', method='xml')
At first im iterating to the position where i have the data table which i showed you before
def copy_content():
for pageGrp in root.findall('pageGrp'):
for data in pageGrp.iter('data'):
tag = data.get("key").split(":")[2]
if (tag == "bodytext"):
raw = ET.tostring(data).decode() IT Starts HERE
# ET.dump(data)
# print(raw)
# file = open('new.xml', 'a')
# file.write(raw)
print(raw)
I hope you can help me.. Thanks in advance

Remove a specific xml tag with ElementTree in python

I am searching for a way to remove a specific tag <e> that has value as mmm within xml file (i.e <e>mmm</e>. I am referring to this thread as staring guide: How to remove elements from XML using Python without using lxml library instead of using ElementTree with python v2.6.6. I was trying to connect a dot with the thread and reading upon ElementTree api doc but I haven't been successful.
I appreciate your advice and thought on this.
<?xml version='1.0' encoding='UTF-8'?>
<parent>
<first>
<a>123</a>
<c>987</c>
<d>
<e>mmm</e>
<e>yyy</e>
</d>
</first>
<second>
<a>456</a>
<c>345</c>
<d>
<e>mmm</e>
<e>hhh</e>
</d>
</second>
</parent>
It took a while for me to realise all <e> tags are subnodes of <d>.
If we can assume the above is true for all your target nodes (<e> nodes with value mmm), you can use this script. (I added some extra nodes to check if it worked
import xml.etree.ElementTree as ET
xml_string = """<?xml version='1.0' encoding='UTF-8'?>
<parent>
<first>
<a>123</a>
<c>987</c>
<d>
<e>mmm</e>
<e>aaa</e>
<e>mmm</e>
<e>yyy</e>
</d>
</first>
<second>
<a>456</a>
<c>345</c>
<d>
<e>mmm</e>
<e>hhh</e>
</d>
</second>
</parent>"""
# this is how I create my root, if you choose to do it in a different way the end of this script might not be useful
root = ET.fromstring(xml_string)
target_node_first_parent = 'd'
target_node = 'e'
target_text = 'mmm'
# find all <d> nodes
for node in root.iter(target_node_first_parent):
# find <e> subnodes of <d>
for subnode in node.iter(target_node):
if subnode.text == target_text:
node.remove(subnode)
# output the result
tree = ET.ElementTree(root)
tree.write('output.xml')
I tried to just remove nodes found by root.iter(yourtag) but apparently it's not possible from the root (apparently it was not that easy)
The answer by #Queuebee is exactly correct but incase you want to read from a file, the code below provides a way to do that.
import xml.etree.ElementTree as ET
file_loc = " "
xml_tree_obj = ET.parse(file_loc)
xml_roots = xml_tree_obj.getroot()
target_node_first_parent = 'd'
target_node = 'e'
target_text = 'mmm'
# find all <d> nodes
for node in xml_roots.iter(target_node_first_parent):
# find <e> subnodes of <d>
for subnode in node.iter(target_node):
if subnode.text == target_text:
node.remove(subnode)
out_tree = ET.ElementTree(xml_roots)
out_tree.write('output.xml')

How to parse XML grouped by specific tag id

I have the following xml file and I will like to structure it group it by Table Id.
xml = """
<Tables Count="19">
<Table Id="1" >
<Data>
<Cell>
<Brush/>
<Text>AA</Text>
<Text>BB</Text>
</Cell>
</Data>
</Table>
<Table Id="2" >
<Data>
<Cell>
<Brush/>
<Text>CC</Text>
<Text>DD</Text>
</Cell>
</Data>
</Table>
</Tables>
"""
I would like to parse it and get something like this.
I have tried something below but couldn't figure out it.
from lxml import etree
tree = etree.fromstring(xml)
users = {}
for user in tree.xpath("//Tables"):
name = user.xpath("Table")[0].text
users[name] = []
for group in user.xpath("Data/Cell/Text"):
users[name].append(group.text)
print (users)
Is that possible to get the above result? if so, could anyone help me to do this? I really appreciate your effort.
You need to change your xpath queries to:
from lxml import etree
tree = etree.fromstring(xml)
users = {}
for user in tree.xpath("//Tables/Table"):
# ^^^
name = user.attrib['Id']
users[name] = []
for group in user.xpath(".//Data/Cell/Text"):
# ^^^
users[name].append(group.text)
print (users)
...and use the attrib dictionary.
This yields for your string:
{'1': ['AA', 'BB'], '2': ['CC', 'DD']}
If you're into "one-liners", you could even do:
users = {name: [group.text for group in user.xpath(".//Data/Cell/Text")]
for user in tree.xpath("//Tables/Table")
for name in [user.attrib["Id"]]}

Extracting Data from Mysql XML dump with xml.dom.minidom

I exported a mysql database to xml with phpmyadmin and now I would like to parse it with minidom but I'm having trouble getting the content in the form that I need it.
Summary: I need to assign the variable title to the text contained within <column name="news_title">This is the title</column>
The extracted db looks like this:
<pma_xml_export version="1.0" >
<database name="dbname">
<!-- Table newsbox -->
<table name="newsbox">
<column name="news_id">1</column>
<column name="news_title">This is the title</column>
<column name="news_text">This is the news text</column>
<column name="date">Thu, 28 Feb 2008 20:10:30 -0500</column>
<column name="author">author</column>
<column name="category">site_announcement</column>
</table>
</database>
</pma_xml_export>
I am able to extract the text with the following script but it's not in the form that I need:
doc = parseString(document)
pmaexport = doc.getElementsByTagName("pma_xml_export")[0]
columns = pmaexport.getElementsByTagName("column")
for item in columns:
name = item.getAttribute("name")
text = item.firstChild.data.strip()
print name, text
What I need is something where I can assign the text contents of these elements to variables which can be passed on e.g.,
for item in columns:
title = ???
text = ???
date = ???
author = ???
If the db output was in the form of <title>Here's the Title</title> I would have plenty of examples to go off, but I just can't find any reference to something like <column name="news_title">This is the title</column>
It's been a while since I've used xml.dom.minidom but this should work...
columns = [c.firstChild.data for c in pmaexport.getElementsByTagName('column') if c.getAttribute('name') == 'news_title']
Plus, like, list comprehension!

Parsing nested xml with lxml and Python

I am having trouble parsing XML when it is in the form of:
<Cars>
<Car>
<Color>Blue</Color>
<Make>Ford</Make>
<Model>Mustant</Model>
</Car>
<Car>
<Color>Red</Color>
<Make>Chevy</Make>
<Model>Camaro</Model>
</Car>
</Cars>
I have figured out how to parse 1st level children like this:
<Car>
<Color>Blue</Color>
<Make>Chevy</Make>
<Model>Camaro</Model>
</Car>
With this kind of code:
from lxml import etree
a = os.path.join(localPath,file)
element = etree.parse(a)
cars = element.xpath('//Root/Foo/Bar/Car/node()[text()]')
parsedCars = [{field.tag: field.text for field in cars} for action in cars]
print parsedCars[0]['Make'] #Chevy
How can I parse our multiple "Car" tags that is a child tag of "Cars"?
Try this
from lxml import etree
a = os.path.join(localPath,file)
element = etree.parse(a)
cars = element.xpath('//Root/Foo/Bar/Car')
for car in cars:
colors = car.xpath('./Color')
makes = car.xpath('./Make')
models = car.xpath('./Model')

Categories

Resources