Changing a specific xml element using Python 3 ElementTree - python

I have a set of metadata files in xml which are updated regularly and I'm trying to automate.
I've worked out how to itteratively find and then replace text in the desired element of the xml but thought there must be a direct way to access and change the element. I just can't work it out.
The metadata xml is formatted:
<?xml version="1.0" ?>
<metadata xml:lang="en">
<Esri>
<CreaDate>20120405</CreaDate>
<CreaTime>13113000</CreaTime>
<ArcGISFormat>1.0</ArcGISFormat>
<SyncOnce>TRUE</SyncOnce>
<ModDate>20121129</ModDate>
<ModTime>11433300</ModTime>
<ArcGISProfile>ItemDescription</ArcGISProfile>
</Esri>
<dataIdInfo>
<idPurp>Updated :: 121129_114038</idPurp>
</dataIdInfo>
</metadata>
My iterative approach was:
for child in root:
for xel in child.iter('idPurp'):
download_new_datetime = strftime('%y%m%d_%H%M%S')
download_new_text = 'Downloaded :: '
xel.text = download_new_text + download_new_datetime
tree.write(xmlfile)
Ideas appreciated on a better way.

I would write to the file only once I'm done with the loop:
import xml.etree.ElementTree as ET
from time import strftime
xmlfile = '/tmp/file'
tree = ET.parse(xmlfile)
root = tree.getroot()
for child in root:
for xel in child.iter('idPurp'):
download_new_datetime = strftime('%y%m%d_%H%M%S')
download_new_text = 'Downloaded :: '
xel.text = download_new_text + download_new_datetime
tree.write(xmlfile)
I would even simplify that loop further to:
for child in root:
for xel in child.iter('idPurp'):
xel.text = 'Downloaded :: ' + time.strftime('%y%m%d_%H%M%S')

Two simpler ways, both work, tested.
First:
import xml.etree.ElementTree as ET
from time import strftime
xmlfile = 'metadata.xml'
tree = ET.parse(xmlfile)
root = tree.getroot()
xel = root.find('./dataIdInfo/idPurp')
xel.text = 'Downloaded :: ' + strftime('%y%m%d_%H%M%S')
tree.write(xmlfile)
Second:
import xml.etree.ElementTree as ET
from time import strftime
xmlfile = 'metadata.xml'
tree = ET.parse(xmlfile)
root = tree.getroot()
xel = root[1][0]
xel.text = 'Downloaded :: ' + strftime('%y%m%d_%H%M%S')
tree.write(xmlfile)
I prefer the first one, it's more readable in my opinion.

Related

How to extract values from xml file with namespaces?

I have the xml file shown below, that has namespaces, for which I'm trying to extract the values of Node24
My current code is below, that is not printing anything:
import xml.etree.ElementTree as ET
filename = 'ifile.xml'
tree = ET.parse(filename)
root = tree.getroot()
for neighbor in root.iter('Node24'):
print(neighbor)
My expected output would be:
03-c34ko
04-c64ko
07-c54ko
The is the ifile.xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<data-main-43:DATAMAINXZ123 xmlns="https://example.com/DATA-MAIN-XZ123" xmlns:data-gen="https://example.com/DATA-GEN" xmlns:data-main-43="https://example.com/DATA-MAIN-XZ123" xmlns:xsi="http://www.w3.org/2011/XMLSchema-instance" xsi:schemaLocation="https://example.com/DATA-MAIN-XZ123 data-main-ir21-12.1.xsd">
<MAINXZ123FileHeader>
<DATAGenSchemaVersion>2.4</DATAGenSchemaVersion>
<DATAMAINXZ123SchemaVersion>12.1</DATAMAINXZ123SchemaVersion>
</MAINXZ123FileHeader>
<Node1>
<Node2>WTRT DDK</Node2>
<Node3>XYZW</Node3>
<Node4>
<Node5>
<Node6>XYZW882</Node6>
<Node5Type>Ter</Node5Type>
<Node5Data>
<Node9>
<Node10>
<Node11>2019-02-18</Node11>
<Node12>
<Node13>
<Node14>
<Node15>Ermso</Node15>
<Node16>
<PrimaryNode16>
<Node18>19.32</Node18>
<Node18>12.11</Node18>
</PrimaryNode16>
<SecondaryNode16>
<Node18>82.97</Node18>
<Node18>12.41</Node18>
</SecondaryNode16>
</Node16>
<Node20>Muuatippw</Node20>
</Node14>
</Node13>
</Node12>
<Node21>
<Node22>
<Node23>
<Node24>03-c34ko</Node24>
<Node24>04-c64ko</Node24>
<Node24>07-c54ko</Node24>
</Node23>
<Node26Node22EdgeAgent>
<Node26>jjkksonem</Node26>
<PrimaryNode18DEANode26>
<Node18>2.40</Node18>
</PrimaryNode18DEANode26>
</Node26Node22EdgeAgent>
</Node22>
</Node21>
<Node28>
<Node29>
<Node30>false</Node30>
<Node31>true</Node31>
</Node29>
</Node28>
</Node10>
</Node9>
</Node5Data>
</Node5>
</Node4>
</Node1>
</data-main-43:DATAMAINXZ123>
How can I do this? Thanks in advance.
Like the duplicate mzjn referenced, just add the namespace uri to the element name...
import xml.etree.ElementTree as ET
filename = 'ifile.xml'
tree = ET.parse(filename)
root = tree.getroot()
for neighbor in root.iter('{https://example.com/DATA-MAIN-XZ123}Node24'):
print(neighbor.text)
Note: I also added .text to neighbor so you'd get the requested result.
I'm using regular expression so this is an alternative answer.
I converted the xml into string then search for all strings between Node24
import xml.etree.ElementTree as ET
import re
filename = 'ifile.xml'
tree = ET.parse(filename)
root = tree.getroot()
xml_str = ET.tostring(root)
for s in re.findall(r'ns0:Node24>(.*?)</ns0:Node24', str(xml_str)):
print(s)
Result:
03-c34ko
04-c64ko
07-c54ko

case insensitive xml and python

I got this piece of code and I am trying to read all the 'ref' 'href' tags. I am not sure how to make this to be case insensitive as some of my xml files have REF or Ref or ref.
Any suggestions?
f = urllib.urlopen(url)
tree = ET.parse(f)
root = tree.getroot()
for child in root.iter('ref'):
t = child.get('href')
if t not in self.href:
self.href.append(t)
print self.href[-1]
You can normalize tags and attributes by converting them to lowercase using the functions below as a step of preprocessing:
import xml.etree.ElementTree as ET
f = urllib.urlopen(url)
tree = ET.parse(f)
root = tree.getroot()
def normalize_tags(root):
root.tag = root.tag.lower()
for child in root:
normalize_tags(child)
def normalize_attr(root):
for attr,value in root.attrib.items():
norm_attr = attr.lower()
if norm_attr != attr:
root.set(norm_attr,value)
root.attrib.pop(attr)
for child in root:
normalize_attr(child)
normalize_tags(root)
normalize_attr(root)
print(ET.tostring(root))
The following should help
f = urllib.urlopen(url)
tree = ET.parse(f)
root = tree.getroot()
for child in root:
if child.tag.lower() == 'ref':
t = child.attribute.get('href')
if t not in self.href:
self.href.append(t)
print self.href[-1]
If you are using lxml then one option is to use XPath with regular expressions through XSLT extensions (https://stackoverflow.com/a/2756994/2997179):
root.xpath("./*[re:test(local-name(), '(?i)href')]",
namespaces={"re": "http://exslt.org/regular-expressions"})

Copy a node from one xml file to another using lxml

I'm trying to find the simplest way of copying one node to another XML file. Both files will contain the same node - just the contents of that node will be different.
In the past I've done some crazy copying of each element and subelement - but there has to be a better way..
#Master XML
parser = etree.XMLParser(strip_cdata=False)
tree = etree.parse('file1.xml', parser)
# Find the //input node - which has a lot of subelems
inputMaster= tree.xpath('//input')[0]
#Dest XML -
parser2 = etree.XMLParser(strip_cdata=False)
tree2 = etree.parse('file2.xml', parser2)
# this won't work but.. it would be nice
etree.SubElement(tree2,'input') = inputMaster
Here's one way - its not brilliant as it loses the position (i.e. it pops the node at the end) but hey..
def getMaster(somefile):
parser = etree.XMLParser(strip_cdata=False)
tree = etree.parse(somefile, parser)
doc = tree.getroot()
inputMaster = doc.find('input')
return inputMaster
inputXML = getMaster('master_file.xml')
parser = etree.XMLParser(strip_cdata=False)
tree = etree.parse('file_to_copy_node_to.xml', parser)
doc = tree.getroot()
doc.remove(doc.find('input'))
doc.append(inputXML)
# Now write it
newxml = etree.tostring(tree, pretty_print=True)
f = open('file_to_copy_node_to.xml', 'w')
f.write(newxml)
f.close()

Python: specify XMLNS on xml.etree elements

in my Python code I'm currently using the xml.etree library to create a tree and then dump it to an XML string. Unfortunately I can't use modules other than the ones in the Python Standard Libraries to do that.
Here is my code:
import xml.etree.ElementTree as ET
def dump_to_XML():
root_node = ET.Element("root")
c1_node = ET.SubElement(root_node, "child1")
c1_node.text = "foo"
c2_node = ET.SubElement(root_node, "child2")
gc1_node = ET.SubElement(c2_node, "grandchild1")
gc1_node.text = "bar"
return ET.tostring(root_node, encoding='utf8', method='xml')
which gives the string:
<?xml version='1.0' encoding='utf8'?>
<root>
<child1>foo</child1>
<child2>
<grandchild1>bar</grandchild1>
</child2>
</root>
Now, I have two schema files located - say - http://myhost.com/p.xsd and http://myhost.com/q.xsd, I want the output string to be turned into:
<?xml version='1.0' encoding='UTF-8'?>
<root xmlns:p="http://myhost.com/p.xsd" xmlns:q="http://myhost.com/q.xsd">
<p:child1>foo</p:child1>
<p:child2>
<q:grandchild1>bar</q:grandchild1>
</p:child2>
</root>
How can I leverage the etree library in order to achieve that?
Thanks in advance
Here we go:
import xml.etree.ElementTree as ET
xmlns_uris = {'p': 'http://myhost.com/p.xsd',
'q': 'http://myhost.com/q.xsd'}
def dump_to_XML():
root_node = ET.Element("root")
c1_node = ET.SubElement(root_node, "child1")
c1_node.text = "foo"
c2_node = ET.SubElement(root_node, "child2")
gc1_node = ET.SubElement(c2_node, "grandchild1")
gc1_node.text = "bar"
annotate_with_XMLNS_prefixes(gc1_node, 'q', False)
annotate_with_XMLNS_prefixes(root_node, 'p')
add_XMLNS_attributes(root_node, xmlns_uris)
return ET.tostring(root_node, encoding='UTF-8', method='xml')
def annotate_with_XMLNS_prefixes(tree, xmlns_prefix, skip_root_node=True):
if not ET.iselement(tree):
tree = tree.getroot()
iterator = tree.iter()
if skip_root_node: # Add XMLNS prefix also to the root node?
iterator.next()
for e in iterator:
if not ':' in e.tag:
e.tag = xmlns_prefix + ":" + e.tag
def add_XMLNS_attributes(tree, xmlns_uris_dict):
if not ET.iselement(tree):
tree = tree.getroot()
for prefix, uri in xmlns_uris_dict.items():
tree.attrib['xmlns:' + prefix] = uri
Executing: print dump_to_XML() gives:
<?xml version='1.0' encoding='UTF-8'?>
<root xmlns:p="http://myhost.com/p.xsd" xmlns:q="http://myhost.com/q.xsd">
<p:child1>foo</p:child1>
<p:child2>
<q:grandchild1>bar</q:grandchild1>
</p:child2>
</root>
from lxml import etree
xmlns_uris = {'p': 'http://myhost.com/p.xsd', 'q': 'http://myhost.com/q.xsd'}
root = etree.Element('root', nsmap = xmlns_uris)
child1 = etree.SubElement(root,'{%s}child1'%xmlns_uris['p'])
child1.text = 'foo'
child2 = etree.SubElement(root,'{%s}child2'%xmlns_uris['p'])
grandchild1 = etree.SubElement(child2,'{%s}grandchild1'%xmlns_uris['q'])
grandchild1.text = 'bar'
print(etree.tostring(root, pretty_print=True, encoding='UTF-8', xml_declaration=True).decode('cp1251'))

Code being dropped from xml created using python

I am copying and then updating a metadata xml file using python -this works fine except that the following code from the original metafile is being deleted
<?xml version="1.0" encoding="utf-8"?><?xml-stylesheet type='text/xsl' href='ANZMeta.xsl'?>
It needs to go at the start of the file.
The answer for this in PHP is # xml insertion at specific point of xml file but I need a solution for Python.
The code and full explanation is in my original post but I am seperating this question as it is different from the original issues I had. Search and replace multiple lines in xml/text files using python
Thanks,
FULL CODE
import os, xml, arcpy, shutil, datetime, Tkinter, tkFileDialog, tkSimpleDialog
from xml.etree import ElementTree as et
path=os.getcwd()
RootDirectory=path
currentPath=path
arcpy.env.workspace = path
Count=0
DECLARATION = """<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type='text/xsl' href='ANZMeta.xsl'?>\n"""
Generated_XMLs=RootDirectory+'\GeneratedXML_LOG.txt'
f = open(Generated_XMLs, 'a')
f.write("Log of Metadata Creation Process - Update: "+str(datetime.datetime.now())+"\n")
f.close()
for root, dirs, files in os.walk(RootDirectory, topdown=False):
#print root, dirs
for directory in dirs:
try:
currentPath=os.path.join(root,directory)
except:
pass
os.chdir(currentPath)
arcpy.env.workspace = currentPath
print currentPath
#def Create_xml(currentPath):
FileList = arcpy.ListFeatureClasses()
zone="_Zone"
for File in FileList:
Count+=1
FileDesc_obj = arcpy.Describe(File)
FileNm=FileDesc_obj.file
check_meta=os.listdir(currentPath)
existingXML=FileNm[:FileNm.find('.')]
existingExtension=FileNm[FileNm.find('.'):]
print "XML: "+existingXML
#print check_meta
#if existingXML+'.xml' in check_meta:
#newMetaFile='new'
for f in check_meta:
if f.startswith(existingXML) and f.endswith('.xml'):
print "exists, file name:", f
newMetaFile=FileNm+"_2012Metadata.xml"
try:
shutil.copy2(f, newMetaFile)
except:
pass
break
else:
#print "Does not exist"
newMetaFile=FileNm+"_BaseMetadata.xml"
print "New meta file: "+newMetaFile+ " for: "+File
if newMetaFile.endswith('_BaseMetadata.xml'):
print "calling tkinter"
root = Tkinter.Tk()
root.withdraw()
file = tkFileDialog.askopenfile(parent=root,mode='rb',title='Choose a xml base file to match with: '+File)
if file != None:
metafile=os.path.abspath(file.name)
file.close()
#print metafile
shutil.copy2(metafile,newMetaFile)
print "copied"+metafile
root.destroy
else:
shutil.copy2('L:\Data_Admin\QA\Metadata_python_toolset\Master_Metadata.xml', newMetaFile)
#root = Tkinter.Tk()
#root.withdraw()
#newTitle=tkSimpleDialog.askstring('title', 'prompt')
#root.destroy
#print newTitle
print "Parsing meta file: "+newMetaFile
tree=et.parse(newMetaFile)
print "Processing: "+str(File)
for node in tree.findall('.//title'):
node.text = str(FileNm)
for node in tree.findall('.//procstep/srcused'):
node.text = str(currentPath+"\\"+existingXML+".xml")
dt=dt=str(datetime.datetime.now())
for node in tree.findall('.//procstep/date'):
node.text = str(dt[:10])
for node in tree.findall('.//procstep/time'):
node.text = str(dt[11:13]+dt[16:19])
for node in tree.findall('.//metd/date'):
node.text = str(dt[:10])
for node in tree.findall('.//northbc'):
node.text = str(FileDesc_obj.extent.YMax)
for node in tree.findall('.//southbc'):
node.text = str(FileDesc_obj.extent.YMin)
for node in tree.findall('.//westbc'):
node.text = str(FileDesc_obj.extent.XMin)
for node in tree.findall('.//eastbc'):
node.text = str(FileDesc_obj.extent.XMax)
for node in tree.findall('.//native/nondig/formname'):
node.text = str(os.getcwd()+"\\"+File)
for node in tree.findall('.//native/digform/formname'):
node.text = str(FileDesc_obj.featureType)
for node in tree.findall('.//avlform/nondig/formname'):
node.text = str(FileDesc_obj.extension)
for node in tree.findall('.//avlform/digform/formname'):
node.text = str(float(os.path.getsize(File))/int(1024))+" KB"
for node in tree.findall('.//theme'):
node.text = str(FileDesc_obj.spatialReference.name +" ; EPSG: "+str(FileDesc_obj.spatialReference.factoryCode))
print node.text
projection_info=[]
Zone=FileDesc_obj.spatialReference.name
if "GCS" in str(FileDesc_obj.spatialReference.name):
projection_info=[FileDesc_obj.spatialReference.GCSName, FileDesc_obj.spatialReference.angularUnitName, FileDesc_obj.spatialReference.datumName, FileDesc_obj.spatialReference.spheroidName]
print "Geographic Coordinate system"
else:
projection_info=[FileDesc_obj.spatialReference.datumName, FileDesc_obj.spatialReference.spheroidName, FileDesc_obj.spatialReference.angularUnitName, Zone[Zone.rfind(zone)-3:]]
print "Projected Coordinate system"
x=0
for node in tree.findall('.//spdom'):
for node2 in node.findall('.//keyword'):
#print node2.text
node2.text = str(projection_info[x])
#print node2.text
x=x+1
tree.write(newMetaFile)
with open(newMetaFile, 'w') as output: # would be better to write to temp file and rename
output.write(DECLARATION)
tree.write(output, xml_declaration=False, encoding='utf-8')
# xml_declaration=False - don't write default declaration
f = open(Generated_XMLs, 'a')
f.write(str(Count)+": "+File+"; "+newMetaFile+"; "+currentPath+";"+existingXML+"\n")
f.close()
# Create_xml(currentPath)
Error message from Wing IDE
xml.parsers.expat.ExpatError: no element found: line 3, column 0 File
"L:\Data_Admin\QA\Metadata_python_toolset\test2\update_Metadata1f.py",
line 78, in tree=et.parse(newMetaFile) File
"C:\Python26\ArcGIS10.0\Lib\xml\etree\ElementTree.py", line 862, in
parse tree.parse(source, parser) File
"C:\Python26\ArcGIS10.0\Lib\xml\etree\ElementTree.py", line 587, in
parse self._root = parser.close() File
"C:\Python26\ArcGIS10.0\Lib\xml\etree\ElementTree.py", line 1254, in
close self._parser.Parse("", 1) # end of data
I struggled with adding PI's to the start of an ElementTree document too. I came up with a solution using a fake root node (with None as the element tag) to hold any required processing instructions and then the real document root node.
import xml.etree.ElementTree as ET
# Build your XML document as normal...
root = ET.Element('root')
# Create 'fake' root node
fake_root = ET.Element(None)
# Add desired processing instructions. Repeat as necessary.
pi = ET.PI("xml-stylesheet", "type='text/xsl' href='ANZMeta.xsl'")
pi.tail = "\n"
fake_root.append(pi)
# Add real root as last child of fake root
fake_root.append(root)
# Write to file, using ElementTree.write( ) to generate <?xml ...?> tag.
tree = ET.ElementTree(fake_root)
tree.write("doc.xml", xml_declaration=True)
The resulting doc.xml file:
<?xml version='1.0' encoding='us-ascii'?>
<?xml-stylesheet type='text/xsl' href='ANZMeta.xsl'?>
<root />
If all your xml files have the same declaration, you can write it by yourself:
import xml.etree.ElementTree as ET
DECLARATION = """<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type='text/xsl' href='ANZMeta.xsl'?>\n"""
tree = ET.parse(filename)
# do some work on tree
with open(filename, 'w') as output: # would be better to write to temp file and rename
output.write(DECLARATION)
tree.write(output, xml_declaration=False, encoding='utf-8')
# xml_declaration=False - don't write default declaration

Categories

Resources