Using lxml to add a string as a sub element

Using lxml to add a string as a sub element - python

I have an lxml element with children built like this:
xml = etree.Element('presentation')
format_xml = etree.SubElement(xml, 'format')
content_xml = etree.SubElement(xml, 'slides')
I then have several strings that I would like it iterate over and add each as child element to slides. Each string will be something like this:
<slide1>
<title>My Presentation</title>
<subtitle>A sample presentation</subtitle>
<phrase>Some sample text
<subphrase>Some more text</subphrase>
</phrase>
</slide1>
How can I append these strings as children to the slides element?

Just append:
import lxml.etree as etree
xml = etree.Element('presentation')
format_xml = etree.SubElement(xml, 'format')
content_xml = etree.SubElement(xml, 'slides')
new = """<slide1>
<title>My Presentation</title>
<subtitle>A sample presentation</subtitle>
<phrase>Some sample text
<subphrase>Some more text</subphrase>
</phrase>
</slide1>"""
content_xml.append(etree.fromstring(new))
print(etree.tostring(xml,pretty_print=1))
Which will give you:
<presentation>
<format/>
<slides>
<slide1>
<title>My Presentation</title>
<subtitle>A sample presentation</subtitle>
<phrase>Some sample text
<subphrase>Some more text</subphrase>
</phrase>
</slide1>
</slides>
</presentation>

fromstring() function would load an XML string directly into an Element instance which you can append:
from lxml import etree as ET
slide = ET.fromstring(xml_string)
content_xml.append(slide)

Related

How to find if there are empty attributes in XML?

Having a XML like this one (located in /home/user/):
<?xml version="1.0" ?>
<DataClient xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:cnmc="http://www.example.com/Tipos_DataClient" xmlns="http://www.example.com/DataClient">
<PersonalData Operation="3" Date="2022-09-06">
<ExtendedData>
<Person Code="XXX" OtherCode="Y12354"/>
</ExtendedData>
<Home Type="Street" Num="10" Code="12003" Poblation="Imaginary street"/>
</PersonalData>
</DataClient>
How could I identify if the "Num" attribute is empty? And then generate a list of all those elements that have the "Num" empty...
I tried to count all those with "None" as value, but it always returns 0:
#! /usr/bin/python3
import xml.etree.ElementTree as ET
tree = ET.parse('/home/user/file.xml')
root = tree.getroot()
b = None
a = sum(1 for s in root.findall('./DataClient/PersonalData/ExtendedData/Num') if s.b)
print (a)

Since Python's etree API maps attributes to dictionaries, consider dict.get to check for specific attribute. Also, you need to use namespaces argument of findall since XML contains a default namespace.
import xml.etree.ElementTree as ET
tree = ET.parse('/home/user/file.xml')
nmsp = {"doc": "http://www.example.com/DataClient"}
xpath = "./doc:DataClient/doc:PersonalData/doc:Home"
a = sum(1 for node in tree.findall(xpath, nmsp) if node.attrib.get("Num") is None)

How to update value between specific xml tags, where input is string, Python?

Consider I have a string that looks like the following below. It's type is string but it will always represents an xml document. I'm researching available python libraries for xml. How can I update a value in between 2 specific tags? What library would I be using for that?
<?xml version="1.0"?>
<PostTelemetryRequest xmlns:ns2="urn:com:onstar:global:common:schema:PostTelemetryData:1">
<ns2:PartnerVehicles>
<ns2:PartnerVehicle>
<ns2:partnerNotificationID>251029655</ns2:partnerNotificationID>
</ns2:PartnerVehicle>
</ns2:PartnerVehicles>
</PostTelemetryRequest>
For instance, if the input is the string above how can I update the value between <ns2:partnerNotificationID> and </ns2:partnerNotificationID> tags to a new value?

This is the base code:
>>> from xml.etree import ElementTree
>>> s = """<?xml version="1.0"?>
<PostTelemetryRequest xmlns:ns2="urn:com:onstar:global:common:schema:PostTelemetryData:1">
<ns2:PartnerVehicles>
<ns2:PartnerVehicle>
<ns2:partnerNotificationID>251029655</ns2:partnerNotificationID>
</ns2:PartnerVehicle>
</ns2:PartnerVehicles>
</PostTelemetryRequest>
"""
>>> root = ElementTree.fromstring(s)
>>> for e in root.iter():
... if e.tag=='{urn:com:onstar:global:common:schema:PostTelemetryData:1}partnerNotificationID':
... e.text='mytext'
...
>>> etree.ElementTree.tostring(root)
b'<PostTelemetryRequest xmlns:ns0="urn:com:onstar:global:common:schema:PostTelemetryData:1">\n <ns0:PartnerVehicles>\n <ns0:PartnerVehicle>\n <ns0:partnerNotificationID>mytext</ns0:partnerNotificationID>\n </ns0:PartnerVehicle>\n </ns0:PartnerVehicles>\n</PostTelemetryRequest>'

Adding subElement at a specific location with xml.dom.minidom (appendChild)

I intend to insert a sub element at a specified location. However, I do not know how to do that using appendChild in xml.dom
Here is my xml code:
<?xml version='1.0' encoding='UTF-8'?>
<VOD>
<root>
<ab>sdsd
<pp>pras</pp>
<ps>sinha</ps>
</ab>
<ab>prashu</ab>
<ab>sakshi</ab>
<cd>dfdf</cd>
</root>
<root>
<ab>pratik</ab>
</root>
<root>
<ab>Mum</ab>
</root>
</VOD>
I would like to insert another sub element "new" in first "root" element just before the "cd" tag. The result should look like this:
<ab>prashu</ab>
<ab>sakshi</ab>
<new>Anydata</new>
<cd>dfdf</cd>
The code I used for this is:
import xml.dom.minidom as m
doc = m.parse("file_notes.xml")
root=doc.getElementsByTagName("root")
valeurs = doc.getElementsByTagName("root")[0]
element = doc.createElement("new")
element.appendChild(doc.createTextNode("Anydata"))
valeurs.appendChild(element)
doc.writexml(open("newxmlfile.xml","w"))
In what way can I achieve my goal?
Thank you in advance..!!

Try using insertBefore instead. Something along these lines:
element = doc.createElement("new")
element.appendChild(doc.createTextNode("Anydata"))
cd = doc.getElementsByTagName("cd")[0]
cd.parentNode.insertBefore(element, cd)
To insert new nodes based on an index you can just do:
cd_list = doc.getElementsByTagName("cd")
cd_list[0].parentNode.insertBefore(element, cd_list[0])

Extract all the text from xml data with python

I'm new to xml data processing. I want to extract the text data in the following xml file:
<data>
<p>12345<strong>45667</strong>abcde</p>
</data>
so that expected result is:
['12345','45667', 'abcde'] Currently I have tried:
tree = ET.parse('data.xml')
data = tree.getiterator()
text = [data[i].text for i in range(0, len(data))]
But the result only shows ['12345','45667'] . 'abcde' is missing. Can someone help me? Thanks in advance!

Try doing this using xpath and lxml :
import lxml.etree as etree
string = '''
<data>
<p>12345<strong>45667</strong>abcde</p>
</data>
'''
tree = etree.fromstring(string)
print(tree.xpath('//p//text()'))
The Xpath expression means: "select all p elements wich containing text recursively"
OUTPUT:
['12345', '45667', 'abcde']

getiterator() (or it's replacement iter()) iterates over child tags/elements, while abcde is a text node, a tail of the strong tag.
You can use itertext() method:
import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
print list(tree.find('p').itertext())
Prints:
['12345', '45667', 'abcde']

How would one remove the CDATA tags from but preserve the actual data in Python using LXML or BeautifulSoup

I have some XML I am parsing in which I am using BeautifulSoup as the parser. I pull the CDATA out with the following code, but I only want the data and not the CDATA TAGS.
myXML = open("c:\myfile.xml", "r")
soup = BeautifulSoup(myXML)
data = soup.find(text=re.compile("CDATA"))
print data
<![CDATA[TEST DATA]]>
What I would like to see if the following output:
TEST DATA
I don't care if the solution is in LXML or BeautifulSoup. Just want the best or easiest way to get the job done. Thanks!
Here is a solution:
parser = etree.XMLParser(strip_cdata=False)
root = etree.parse(self.param1, parser)
data = root.findall('./config/script')
for item in data: # iterate through list to find text contained in elements containing CDATA
print item.text

Based on the lxml docs:
>>> from lxml import etree
>>> parser = etree.XMLParser(strip_cdata=False)
>>> root = etree.XML('<root><data><![CDATA[test]]></data></root>', parser)
>>> data = root.findall('data')
>>> for item in data: # iterate through list to find text contained in elements containing CDATA
print item.text
test # just the text of <![CDATA[test]]>
This might be the best way to get the job done, depending on how amenable your xml structure is to this approach.

Based on BeautifulSoup:
>>> str='<xml> <MsgType><![CDATA[text]]></MsgType> </xml>'
>>> soup=BeautifulSoup(str, "xml")
>>> soup.MsgType.get_text()
u'text'
>>> soup.MsgType.string
u'text'
>>> soup.MsgType.text
u'text'
As the result, it just print the text from msgtype;

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Using lxml to add a string as a sub element - python

fromstring() function would load an XML string directly into an Element instance which you can append: from lxml import etree as ET slide = ET.fromstring(xml_string) content_xml.append(slide)

Related

How to find if there are empty attributes in XML?

How to update value between specific xml tags, where input is string, Python?

Adding subElement at a specific location with xml.dom.minidom (appendChild)

Extract all the text from xml data with python

How would one remove the CDATA tags from but preserve the actual data in Python using LXML or BeautifulSoup

Categories

Resources