Modifying element in xml using python - python

can anyone please explain how to modify xml element in python using elementtree.
I want to keep the rego AD-4214 and change make 'Tata' into 'Nissan' and model 'Sumo' into 'Skyline'.

If rewriting the entire file is acceptable1, the easiest way would be to turn the xml file into a dictionary (see for example here: How to convert an XML string to a dictionary?), do your modifications on that dictionary, and convert this dict back to xml (like for example here: https://pypi.org/project/dicttoxml/)
1 Consider lost formatting: whitespace, number formats etc may not be preserved by this.

This should work:
import xml.etree.ElementTree as ET
tree = ET.parse('your_xml_source.xml')
root = tree.getroot()
root[1][1].text = "Nissan"
root[1][2].text = "Skyline"
getroot() gives you the root element (<motorvehicle>), [1] selects its second child, the <vehicle> with rego AD-4214. The secondary indexing, [1] and [2], gives you AD-4214's <make> and <model> respectively. Then using the text attribute, you can change their text content.

Related

Get inner xml from lxml

I have the following string which is part of an bigger XML Document:
content = '<odvNameElem stopID="9001002"><itdMapItemList/>Rathaus</odvNameElem>'
And I want to access Rathaus. My current approach is to parse it with lxml and trying to access the text of the element 'odvNameElem':
from lxml import etree
content = '<odvNameElem stopID="9001002"><itdMapItemList/>Rathaus</odvNameElem>'
root = etree.fromstring(content)
print(root.text)
This however results in None. What am I doing wrong?
etree.__version__ = '4.2.5'
I am not sure why the following works:
root.xpath("string()") but root.xpath("//text()") only returns an empty list. Can somebody please explain this?
The "Rathaus" string is the value of the tail property of the itdMapItemList element. Examples:
root.xpath("itdMapItemList")[0].tail
root.find("itdMapItemList").tail
See https://lxml.de/tutorial.html#elements-contain-text.
root.xpath("string()") returns the concatenation of the string values of the root node and its descendants, which indeed is "Rathaus" in this case.
See https://www.w3.org/TR/xpath-10/#function-string.
root.xpath("//test") does not make sense (there is no test element). Did you mean root.xpath("//text()")?
root.xpath("//text()") returns a list of all text nodes, which in this case is ['Rathaus'].
If the input XML is changed to
<odvNameElem stopID="9001002">ABC<itdMapItemList/>Rathaus</odvNameElem>
then the result is ['ABC', 'Rathaus']

Modifying xml attributes through python

I have the following XML file in which the following information is present.
<PHYSICAL_TLINE>
<Traces general_diff="0" z_array="0" s_array="0" w_array="0" etch_factor="0.35" TS_track2track="0" TS_DQS="0" TW_DQS="0" TS_byte2dqs="0" TS_byte2byte="0" TS_DQ="0" TW_DQ="0" dsl_offset="0" D="20" TS="7" TW="5"/>
<PHYSICAL_TLINE>
Is there a way to set the values of these elements through python? For example, if I want to change the value of s_array to 5 instead of 0?.
I know that there is the xml.etree set command but I'm not too sure on how to set the values of these attributes in the child through python.
child.attrib["s_array"] = '0'
Assuming that child is the <Traces/> node.
Edit:
0 needs to be a string
This documentation may be helpful for you:
https://docs.python.org/2/library/xml.etree.elementtree.html
Note 19.7.1.4. Modifying an XML File
Modifying some code like this should acheive the desired result:
for rank in root.iter('rank')
rank.set('updated', 'yes')
tree.write('output.xml')

Creating Properly-Nested XML Output in Python

I'm attempting to save data from several lists in XML format, but I cannot understand how to make the XML display properly. An example of my code right now is as follows:
from lxml import etree
#Create XML Root
articles = etree.Element('root')
#Create Lists & Data
t_list = ['title1', 'title2', 'title3', 'title4', 'title5']
c_list = ['content1', 'content2', 'content3', 'content4', 'content5']
sum_list = ['summary1', 'summary2', 'summary3', 'summary4', 'summary5']
s_list = ['source1', 'source2', 'source3', 'source4', 'source5']
i = 0
for t in t_list:
for i in range(len(t_list)):
#Create SubElements of XML Root
article = etree.SubElement(articles, 'Article')
titles = etree.SubElement(article, 'Title')
summary = etree.SubElement(article, 'Summary')
source = etree.SubElement(article, 'Source')
content = etree.SubElement(article, 'Content')
#Add List Data to SubElements
titles.text = t_list[i]
summary.text = sum_list[i]
source.text = s_list[i]
content.text = c_list[i]
print(etree.tostring(articles, pretty_print=True))
My Current Output is written in one very jumbled fashion, all on a single line as follows:
b'<root>\n <Article>\n <Title>title1</Title>\n <Summary>summary1</Summary>\n <Source>source1</Source>\n <Content>content1</Content>\n </Article>\n
It looks like the pretty_print function within lxml is adding proper indentation, as well as \n breaks as I would want, but it doesn't seem to be getting interpreted correctly during output; it write on a single line.
The output I'm trying to get is as follows:
<root>
<Article>
<Title>title1</Title>
<Summary>summary1</Summary>
<Source>source1</Source>
<Content>content1</Content>
</Article>
Ideally, I'd like for my output to be viewed as a valid XML document, and display in proper nested format.
Your "Current Output" is the representation (internal python representation) of the bytestring generated by etree.tostring(), and seems that in Python3 print(somebytestring) prints the representation instead of the actual string.
Hopefully the solution is quite simple: just pass the desired encoding to etree.tostring(), ie:
xml = etree.tostring(articles, encoding="unicode", pretty_print=True)
print(xml)
I've only used the base ET module in Python and can't find an lxml download for python 3.5 (which I'm on) in order to test it, but the b before the line indicates bytes and a quick glance at the documentation indicates that tostring() has an encoding keyword, so you should just need to set that to unicode or utf-8.
I'll also mention that you don't need to set "i" before your for-loop (python will create the "i" it needs for the for-loop), though I- personally- would zip the lists and iterate the items in the lists themselves (though that's not going to have any real impact on the code in this situation).

Python XML 'TypeError: must be xml.etree.ElementTree.Element, not str'

I currently am trying to build an XML file from a CSV file. Currently my code reads the CSV file to data and begins creating the XML from the data that is stored within the CSV.
CSV Example:
Element,XMLFile
SubElement,XMLName,XMLFile
SubElement,XMLDate,XMLName
SubElement,XMLInformation,XMLDate
SubElement,XMLTime,XMLName
Expected Output:
<XMLFile>
<XMLName>
<XMLDate>
<XMLInformation />
</XMLDate>
<XMLTime />
</XMLName>
</XMLFile>
Currently my code attempts to look at the CSV to see what the parent is for the new subelement:
# Defines main element
# xmlElement = xml.Element(XMLFile)
xmlElement = xml.Element(csvData[rowNumber][columnNumber])
# Should Define desired parent (FAIL) and SubElement name (PASS)
# xmlSubElement = xml.SubElement(XMLFile, XMLName)
xmlSubElement = xml.SubElement(csvData[rowNumber][columnNumber + 2], csvData[rowNumber][columnNumber + 1])
When the code attempts to use the CSV source string as the parent parameter, Python 3.5 generates the following error:
TypeError: must be xml.etree.ElementTree.Element, not str
Known cause of the error is that the parent paramenter is being returned as a string, when it is expected to be an Element or SubElement.
Is it possible to recall the stored value from the CSV and have it reference the Element or SubElement, instead of a string? The goal is to allow the code to read the CSV file and assign any SubElement to the parent listed in the CSV.
I cannot tell for sure, but it looks like you are doing:
ElementTree.SubElement(str, str)
when you should be doing:
ElementTree.SubElement(Element, str)
It also seems like you already know this. The real question, then, is how are you going to reference the parent object when you only know its tag string? You could search for Elements in the ElementTree with that particular tag string, but this is generally not a good idea as XML allows multiple instances of similar elements.
I would suggest you either:
Find a strategy to store references to parent elements
See if there is a way to uniquely identify the parent element using XPath

Copy and Write XML Node in Python

I've got a large XML file that I need to parse and look for a specific node. Once it has been found, I need to make a copy, edit a couple of values and write the file again.
So far I've managed to get the DOM element that I want. There is actually two of these elements already in the XML so after I'm finished, there will be three. Once I've made a copy of the DOM and edited the value, how do I then write this into the DOM (and thus the file)?
I'm using Python's from xml.dom import minidom at the moment.
In minidom you start with creating Document:
Document doc = Document("your_root")
then if it is a text node you want to add, you append it with:
text_node = doc.createTextNode(str(some content))
doc.appendChild(text_node)
if you had for example <some_elem key="my value">some my text</some_elem>:
do it like this:
text_node = doc.createTextNode('some my text')
elem.appendChild(text_node)
elem.setAttribute('key', 'my value')
if it is complex element create it with:
elem = doc.createElement('your_elem')
if you need to set attributes do:
elem.setAttribute("some-attribute",your_attr)
if you need to append something to it:
elem.appendChild( some_other_elem )
then append the element:
doc.appendChild( elem )
if you need a string representation do:
doc.toxml()
of
doc.toprettyxml()
From the minidom documentation:
from xml.dom.minidom import getDOMImplementation
impl = getDOMImplementation()
newdoc = impl.createDocument(None, "some_tag", None)
top_element = newdoc.documentElement
text = newdoc.createTextNode('Some textual content.')
top_element.appendChild(text)
So I guess appendChild is what you ask for?

Categories

Resources