ElementTree, .set() and iteration

ElementTree, .set() and iteration - python

This is my first post on Stack Overflow and am a novice programmer.
I am having trouble using ElementTree and the .set() method. Using an f-string I am able to assign recipe_id with the correct number.
When I try to set the recipe_name attribute, it returns only the last element in the name_list array. I'm a bit lost! I'm sure it's something in my syntax or just my understanding of how I'm actually iterating through the items...I just don't understand because the recipe_id portion works just fine.
Expected output (within the XML)
<recipes recipe_id="1" recipe_name="Apples">
<recipes recipe_id="2" recipe_name="Oranges">
Instead I get:
<recipes recipe_id="1" recipe_name="Oranges">
<recipes recipe_id="2" recipe_name="Oranges">
My code:
#!/usr/bin/env python3
import os
import os.path
import xml.etree.ElementTree as ET
filename = "my_recipes.xml"
xmlTree = ET.parse(filename)
root = xmlTree.getroot()
#change the recipe id in recipes
i = 0
for element in root.iter("recipes"):
i += 1
element.set('recipe_id', f"{i}")
#for every tbody in the tree
name_list = []
for tbody in root.iter("tbody"):
#for every time you find a row
for row in tbody.findall('row'):
data = row.find('entry').text
#get those rows attribs
rec_name = row.find('entry').attrib
#if the row is the row that i want (contains the recipe name)...i couldn't figure out a better way to get this value precisely
if rec_name == {'namest': 'c1', 'nameend': 'c2', 'align': 'left', 'valign': 'bottom'}:
#yoink name and stick it in name_list
name_list.append(data)
for recipes in root.findall('recipes'):
for i in range(len(name_list)):
recipes.set('recipe_name', F"{name_list[i]}")
xmlTree.write(filename, encoding='UTF-8', xml_declaration=True)
My XML:
<?xml version='1.0' encoding='UTF-8'?>
<Root>
<recipes>
<tbody>
<row>
<entry namest="c1" nameend="c2" align="left" valign="bottom">Apples</entry>
</row>
</tbody>
</recipes>
<recipes>
<tbody>
<row>
<entry namest="c1" nameend="c2" align="left" valign="bottom">Oranges</entry>
</row>
</tbody>
</recipes>
</Root>
FIXED:
The code should be
for i,recipes in enumerate(root.findall('recipes')):
recipes.set('recipe_name',name_list[i])
and I am just stupid.

Related

ElementTree wrong encoding

im searching like for hours but I cant find the solution online so im trying to ask you now here in this topic.
I just want to print the inside Content of a html tag in a xml document but im getting only things like (&lt, &gt, and and and...)
It looks like this in the XML Document
<data table="tt_content" elementUid="2490" key="tt_content:NEW/1/2490:bodytext"><![CDATA[<img src="/fileadmin/public/Redaktion/Bilder/Icons/Icon-CE.png" width="28" height="21" class="float-left mt-1 mr-2">
<h4>EU-Baumusterprüfbescheinigung</h4>
When I print it it looks like this
<data table="tt_content" elementUid="2490" key="tt_content:NEW/1/2490:bodytext"><img src="/fileadmin/public/Redaktion/Bilder/Icons/Icon-CE.png" width="28" height="21" class="float-left mt-1 mr-2">
<h4>EU-Baumusterprüfbescheinigung</h4>
as you can see it is very different not only the german characters not being displayed but also the "CDATA" which is very important to me.
There are replaced with &lt.. and so on.
And now to my Code
raw = <data table="tt_content" elementUid="2490" key="tt_content:NEW/1/2490:bodytext"><![CDATA[<img src="/fileadmin/public/Redaktion/Bilder/Icons/Icon-CE.png" width="28" height="21" class="float-left mt-1 mr-2">
<h4>EU-Baumusterprüfbescheinigung</h4>
raw = ET.tostring(data).decode()
print(raw) # print is showed before
What I've also tried
# raw = ET.tostring(raw, encoding='unicode', method='xml')
# raw = ET.tostring(raw, encoding='unicode', method='xml')
At first im iterating to the position where i have the data table which i showed you before
def copy_content():
for pageGrp in root.findall('pageGrp'):
for data in pageGrp.iter('data'):
tag = data.get("key").split(":")[2]
if (tag == "bodytext"):
raw = ET.tostring(data).decode() IT Starts HERE
# ET.dump(data)
# print(raw)
# file = open('new.xml', 'a')
# file.write(raw)
print(raw)
I hope you can help me.. Thanks in advance

Remove a specific xml tag with ElementTree in python

I am searching for a way to remove a specific tag <e> that has value as mmm within xml file (i.e <e>mmm</e>. I am referring to this thread as staring guide: How to remove elements from XML using Python without using lxml library instead of using ElementTree with python v2.6.6. I was trying to connect a dot with the thread and reading upon ElementTree api doc but I haven't been successful.
I appreciate your advice and thought on this.
<?xml version='1.0' encoding='UTF-8'?>
<parent>
<first>
<a>123</a>
<c>987</c>
<d>
<e>mmm</e>
<e>yyy</e>
</d>
</first>
<second>
<a>456</a>
<c>345</c>
<d>
<e>mmm</e>
<e>hhh</e>
</d>
</second>
</parent>

It took a while for me to realise all <e> tags are subnodes of <d>.
If we can assume the above is true for all your target nodes (<e> nodes with value mmm), you can use this script. (I added some extra nodes to check if it worked
import xml.etree.ElementTree as ET
xml_string = """<?xml version='1.0' encoding='UTF-8'?>
<parent>
<first>
<a>123</a>
<c>987</c>
<d>
<e>mmm</e>
<e>aaa</e>
<e>mmm</e>
<e>yyy</e>
</d>
</first>
<second>
<a>456</a>
<c>345</c>
<d>
<e>mmm</e>
<e>hhh</e>
</d>
</second>
</parent>"""
# this is how I create my root, if you choose to do it in a different way the end of this script might not be useful
root = ET.fromstring(xml_string)
target_node_first_parent = 'd'
target_node = 'e'
target_text = 'mmm'
# find all <d> nodes
for node in root.iter(target_node_first_parent):
# find <e> subnodes of <d>
for subnode in node.iter(target_node):
if subnode.text == target_text:
node.remove(subnode)
# output the result
tree = ET.ElementTree(root)
tree.write('output.xml')
I tried to just remove nodes found by root.iter(yourtag) but apparently it's not possible from the root (apparently it was not that easy)

The answer by #Queuebee is exactly correct but incase you want to read from a file, the code below provides a way to do that.
import xml.etree.ElementTree as ET
file_loc = " "
xml_tree_obj = ET.parse(file_loc)
xml_roots = xml_tree_obj.getroot()
target_node_first_parent = 'd'
target_node = 'e'
target_text = 'mmm'
# find all <d> nodes
for node in xml_roots.iter(target_node_first_parent):
# find <e> subnodes of <d>
for subnode in node.iter(target_node):
if subnode.text == target_text:
node.remove(subnode)
out_tree = ET.ElementTree(xml_roots)
out_tree.write('output.xml')

How to parse XML grouped by specific tag id

I have the following xml file and I will like to structure it group it by Table Id.
xml = """
<Tables Count="19">
<Table Id="1" >
<Data>
<Cell>
<Brush/>
<Text>AA</Text>
<Text>BB</Text>
</Cell>
</Data>
</Table>
<Table Id="2" >
<Data>
<Cell>
<Brush/>
<Text>CC</Text>
<Text>DD</Text>
</Cell>
</Data>
</Table>
</Tables>
"""
I would like to parse it and get something like this.
I have tried something below but couldn't figure out it.
from lxml import etree
tree = etree.fromstring(xml)
users = {}
for user in tree.xpath("//Tables"):
name = user.xpath("Table")[0].text
users[name] = []
for group in user.xpath("Data/Cell/Text"):
users[name].append(group.text)
print (users)
Is that possible to get the above result? if so, could anyone help me to do this? I really appreciate your effort.

You need to change your xpath queries to:
from lxml import etree
tree = etree.fromstring(xml)
users = {}
for user in tree.xpath("//Tables/Table"):
# ^^^
name = user.attrib['Id']
users[name] = []
for group in user.xpath(".//Data/Cell/Text"):
# ^^^
users[name].append(group.text)
print (users)
...and use the attrib dictionary.
This yields for your string:
{'1': ['AA', 'BB'], '2': ['CC', 'DD']}
If you're into "one-liners", you could even do:
users = {name: [group.text for group in user.xpath(".//Data/Cell/Text")]
for user in tree.xpath("//Tables/Table")
for name in [user.attrib["Id"]]}

Extracting Data from Mysql XML dump with xml.dom.minidom

I exported a mysql database to xml with phpmyadmin and now I would like to parse it with minidom but I'm having trouble getting the content in the form that I need it.
Summary: I need to assign the variable title to the text contained within <column name="news_title">This is the title</column>
The extracted db looks like this:
<pma_xml_export version="1.0" >
<database name="dbname">
<!-- Table newsbox -->
<table name="newsbox">
<column name="news_id">1</column>
<column name="news_title">This is the title</column>
<column name="news_text">This is the news text</column>
<column name="date">Thu, 28 Feb 2008 20:10:30 -0500</column>
<column name="author">author</column>
<column name="category">site_announcement</column>
</table>
</database>
</pma_xml_export>
I am able to extract the text with the following script but it's not in the form that I need:
doc = parseString(document)
pmaexport = doc.getElementsByTagName("pma_xml_export")[0]
columns = pmaexport.getElementsByTagName("column")
for item in columns:
name = item.getAttribute("name")
text = item.firstChild.data.strip()
print name, text
What I need is something where I can assign the text contents of these elements to variables which can be passed on e.g.,
for item in columns:
title = ???
text = ???
date = ???
author = ???
If the db output was in the form of <title>Here's the Title</title> I would have plenty of examples to go off, but I just can't find any reference to something like <column name="news_title">This is the title</column>

It's been a while since I've used xml.dom.minidom but this should work...
columns = [c.firstChild.data for c in pmaexport.getElementsByTagName('column') if c.getAttribute('name') == 'news_title']
Plus, like, list comprehension!

Parsing nested xml with lxml and Python

I am having trouble parsing XML when it is in the form of:
<Cars>
<Car>
<Color>Blue</Color>
<Make>Ford</Make>
<Model>Mustant</Model>
</Car>
<Car>
<Color>Red</Color>
<Make>Chevy</Make>
<Model>Camaro</Model>
</Car>
</Cars>
I have figured out how to parse 1st level children like this:
<Car>
<Color>Blue</Color>
<Make>Chevy</Make>
<Model>Camaro</Model>
</Car>
With this kind of code:
from lxml import etree
a = os.path.join(localPath,file)
element = etree.parse(a)
cars = element.xpath('//Root/Foo/Bar/Car/node()[text()]')
parsedCars = [{field.tag: field.text for field in cars} for action in cars]
print parsedCars[0]['Make'] #Chevy
How can I parse our multiple "Car" tags that is a child tag of "Cars"?

Try this
from lxml import etree
a = os.path.join(localPath,file)
element = etree.parse(a)
cars = element.xpath('//Root/Foo/Bar/Car')
for car in cars:
colors = car.xpath('./Color')
makes = car.xpath('./Make')
models = car.xpath('./Model')

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

ElementTree, .set() and iteration - python

Related

ElementTree wrong encoding

Remove a specific xml tag with ElementTree in python

How to parse XML grouped by specific tag id

Extracting Data from Mysql XML dump with xml.dom.minidom

Parsing nested xml with lxml and Python

Categories

Resources