Python in XML: Getting text from grandchildren - python

I'm quite novice in programming but I believe the data I'm looking for is quite easy to get, however I can't seem to wrap my head around it.
My XML has several parents and each have of course their children with siblings and siblings with children. I am trying to reach a specific grandchild where one of its siblings has a specific word in a certain tag.
The XML (actually a KML) looks like this:
<Folder>
<name> Run-1</name>
<Placemark>
<name> run 1</name>
<Snippet></Snippet>
<styleUrl>#flightline</styleUrl>
<LineString>
<extrude>0</extrude>
<altitudeMode>clampToGround</altitudeMode>
<coordinates>54.72664746,24.91070844,2008 54.76968330,24.91068150,2008
</coordinates>
</LineString>
</Placemark>
</Folder>
Each folder named Run-X can have an infinite number of placemarks.
I want the name of each folder and the coordinates in the (there is only one) placemark containing the the <styleUrl>#flightline</styleUrl> ONLY.
That would build me a list of the run number and the 'flight line' coordinates.
Of course I am trying the python and w3 schools tutorials and I understand the basics but I can't seem to put it all together. Do I need a for loop to reach each child and a nested loop to reach every sub-child? Or can I just look for tags throughout the tree and get the coordinates value IF there is a <styleUrl>#flightline</styleUrl> tag?
I have been playing around with root.iter and root.findall but I can't seem to get any kind of result.

How about following? Assuming your kml data resides in data.xml
from collections import OrderedDict
from xml.etree import ElementTree as ET
tree = ET.parse("data.xml")
root = tree.getroot()
result = OrderedDict()
for folder in root.iter('Folder'):
for placemark in folder.findall('Placemark'):
if placemark.find('styleUrl').text == '#flightline':
result[folder.find('name').text.strip()] = placemark.find('LineString/coordinates').text.strip()
print(result)

Thanks so much for your help. I found a solution based on to your code:
for folder in root.iter('Folder'):
for placemark in folder.findall('Placemark'):
if placemark.find('styleUrl').text == '#flightline':
runLine = folder.find('name').text[5:]
startLat = placemark.find('LineString/coordinates').text[:11]
startLong = placemark.find('LineString/coordinates').text[12:23]
endLat = placemark.find('LineString/coordinates').text[29:40]
endLong = placemark.find('LineString/coordinates').text[41:52]
print ('Flightline: ' + runLine + ', coordinates start: ' + startLat + ' ' + startLong + '. Coordinates end: ' + endLat + ' ' + endLong + '.')
In case you are wondering, I'm trying to read files outputted by an aerial survey program (flightlines are lines flown to take pictures) and create a csv and flight plan file for the GPS in the aircraft to read so it can fly them automatically.
Now I need to find a way to remove the <kml> </kml> tags from the intial .kml file (on whatever line they might be) and only then open and parse it, output the line number and coordinates (with custom name) according to the flightline in a CSV and also output another flightplan file in a Garmin specific format. At least now I know how to scan the file. Thanks again Sir!

Related

Parse deeply nested XML to pandas dataframe

I'm trying to fetch particular parts of a XML file and move it into a pandas dataframe. Following some tutorials from xml.etree I'm still stuck at getting the output. So far, I've managed to find the child nodes, but I can't access them (i.e. can't get the actual data out of it). So, here is what I've got so far.
tree=ET.parse('data.xml')
root=tree_edu.getroot()
root.tag
#find all nodes within xml data
tree_edu.findall(".//")
#access the node
tree.findall(".//{http://someUrl.nl/schema/enterprise/program}programSummaryText")
What I want is to get the data from the node programDescriptions and specifically the child programDescriptionText xml:lang="nl", and of course a couple extra. But first focus on this one.
Some data to work with:
<?xml version="1.0" encoding="UTF-8"?>
<programs xmlns="http://someUrl.nl/schema/enterprise/program">
<program xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://someUrl.nl/schema/enterprise/program http://someUrl.nl/schema/enterprise/program.xsd">
<customizableOnRequest>true</customizableOnRequest>
<editor>webmaster#url</editor>
<expires>2019-04-21</expires>
<format>Edu-dex 1.0</format>
<generator>www.Url.com</generator>
<includeInCatalog>Catalogs</includeInCatalog>
<inPublication>true</inPublication>
<lastEdited>2019-04-12T20:03:09Z</lastEdited>
<programAdmission>
<applicationOpen>true</applicationOpen>
<applicationType>individual</applicationType>
<maxNumberOfParticipants>12</maxNumberOfParticipants>
<minNumberOfParticipants>8</minNumberOfParticipants>
<paymentDue>up-front</paymentDue>
<requiredLevel>academic bachelor</requiredLevel>
<startDateDetermination>fixed starting date</startDateDetermination>
</programAdmission>
<programCurriculum>
<instructionMode>training</instructionMode>
<teacher>
<id>{D83FFC12-0863-44A6-BDBB-ED618627F09D}</id>
<name>SomeName</name>
<summary xml:lang="nl">
Long text of the summary. Not needed.
</summary>
</teacher>
<studyLoad period="hour">26</studyLoad>
</programCurriculum>
<programDescriptions>
<programName xml:lang="nl">Program Course Name</programName>
<programSummaryText xml:lang="nl">short Program Course Name summary</programSummaryText>
<programSummaryHtml xml:lang="nl">short Program Course Name summary in HTML format</programSummaryHtml>
<programDescriptionText xml:lang="nl">This part is needed from the XML.
Big program description text. This part is needed to parse from the XML file.
</programDescriptionText>
<programDescriptionHtml xml:lang="nl">Not needed;
Not needed as well;
</programDescriptionHtml>
<subjectText>
<subject>curriculum</subject>
<header1 xml:lang="nl">Beschrijving</header1>
<descriptionHtml xml:lang="nl">Yet another HTML desscription;
Not necessarily needed;</descriptionHtml>
</subjectText>
<searchword xml:lang="nl">search word</searchword>
<webLink xml:lang="nl">website-url</webLink>
</programDescriptions>
<programSchedule>
<programRun>
<id>PR-019514</id>
<status>application opened</status>
<startDate isFinal="true">2019-06-26</startDate>
<endDate isFinal="true">2020-02-11</endDate>
</programRun>
</programSchedule>
</program>
</programs>
Try the code below: (55703748.xml contains the xml you have posted)
import xml.etree.ElementTree as ET
tree = ET.parse('55703748.xml')
root = tree.getroot()
nodes = root.findall(".//{http://someUrl.nl/schema/enterprise/program}programSummaryText")
for node in nodes:
print(node.text)
Output
short Program Course Name summary

Python Parsing XML with a complex hierarchy - Nuke9.0v8

I am working with NukeX9.0v8, Adobe Premiere Pro CC 2015 and nukes internal python interrupter.
# Result: 2.7.3 (default, Jul 24 2013, 15:50:23)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]
I am a vfx artist and I'm trying to wrap my brain around the best method to parse xml files in order to: create a folder structure, batch create .nk comp files and plug in the data within specific parts as I make my .nk comps. I have a bit of a grasp of how to do each of these things in isolation, but plugging it all together, and trying to find tutorials on such complex parse as ground me to a halt.
I know the scope of this is big but any small pieces of advice are appreciated.
Right now I have a nuke comp that has a node tree that takes in camera inputs and stitches them into a latlong image for 360 video, I am going to wrap that up into a gizmo for each different kind of rig configuration. This just simplifies the .nk files that are created and I can expose the parts of that gizmo I can feed data into.
Every day we receive a ton of footage from a shoot and we have to make a new .nk comp for each shot and set it to render right away. What I want to do is have the guys on set create a premiere project and organize the files based on this folder structure. That premiere project will be exported as an .xml file.
The design of the structure in premiere.
Day_01 (the day of the shoot)
-^-R001 (Roll number for the shots. R referring to camera type)
--^-R001_C001 (The name of the shot)
---^-Acamera clip (path to file name, video in point as frame#)
---^-Bcamera clip (path to file name, video in point as frame#)
---^-Ccamera clip (path to file name, video in point as frame#)
Right now in my script panel inside Nuke I can enter the information of where is the xml for the day what day to look for. Then it is suppose to look into each folder name for the roll, and using the first letter (R for RED camera) and looks inside for the clip folder. It then uses the pathurl directory for the camera files on the drive and also can take it data like the in and out points if present in the xml. I also have points to enter for the template version if I update a stitch process. That will tell the nuke comp which gizmo to use.
Here is my panel in Nuke.
def sesquixmlparse():
'''
This imports the xml file from premiere. It looks for the bin that it is working for today and starts looking in what is inside the bins
It then sees the bins inside and uses them to create nuke scripts with these as inputs
It asks what template version to use for the rig. things change or maybe even get better
'''
# Lets build the Nuke Panel that tells us our inputs
p = nuke.Panel("Sesqui XML Parse for Dailies")
xml_file = 'Daily XML'
daynumber = 'Day_##'
nk_output_dir = 'Directory to build VFX folder structure'
dnx_render_dir = 'Directory for write nodes'
r_template_vr = 'VER1'
g_template_vr = 'VER1'
c_template_vr = 'VER1'
p.addFilenameSearch("Daily XML", xml_file)
p.addSingleLineInput("Bin to process", daynumber)
p.addFilenameSearch("Directory to build VFX folder structure", nk_output_dir)
p.addFilenameSearch("Directory to render from write nodes", dnx_render_dir)
p.addSingleLineInput("3 Red stmap version", r_template_vr)
p.addSingleLineInput("6 Gopro stmap verison", g_template_vr)
p.addSingleLineInput("5 Canon stmap verison", c_template_vr)
p.setWidth(600)
print "Panel created"
if not p.show():
return
# Assign var from nuke panel user-entered data
xml_file = p.value("Daily XML")
daynumber = p.value("Bin to process")
nk_output_dir = p.value("Directory to build VFX folder structure")
dnx_render_dir = p.value("Directory to render from write nodes")
r_template_vr = p.value("3 Red stmap version")
g_template_vr = p.value("6 Gopro stmap verison")
c_template_vr = p.value("5 Canon stmap verison")
print "var's assigned from panel"
# Create paths for render directory if it does not exist
if not os.path.isdir(dnx_render_dir):
os.mkdir(dnx_render_dir)
print dnx_render_dir + " directory created"
if not os.path.isdir(nk_output_dir):
os.mkdir(nk_output_dir)
print nk_output_dir + " directory created"
I am at a loss on how to best read the xml file. All the tutorials I have seen on both DOM and elementtree are very basic and deal with direct code to read known XML tags and break data down to a simple str output.
I need to enter variables, which then constrain the parsing to a specific part of the tree, and go into an unknown hierarchy setup and seeing what is inside, and then make decisions on what to do with what it finds.
Here is a sample of my test XML file. The eventual plan is to have other different roll types that reference different camera types but for now I'm just working with 3 camera red rigs.
It's a very big file so here is a pastebin: http://pastebin.com/vLaRA0X8
Basically I am wanting to constrain the script to looking within my variable <bin><name>'daynumber'</name>~~~~</bin>. In this case looking in the Day_00 bin. If there is anything else in the root hierarchy I want to ignore it as sequences, unused clips and other data can get very very huge. I then want to create the directory of daynumber in the nk_output_dir & dnx_render_dir so that everything for this shoot day is contained in that folder.
A annoying part of the XML file is the name of a bin is a child to the <bin> itself, so once a bin name is found, any <children> of that bin would be the same level of the tree as the <name>. I can't find sample code of locating a tag and then looking working with the tags that are in the same branch instead of it's children.
Now that it has found the bin for the day I want it to start to look for all the bins in <children></children>. Example being <bin><name>R001</name>~~~</bin> and create directories inside the Day_00 folder I made in nk_output_dir & dnx_render_dir for each bin it finds in this part of the structure. Every time the camera reloads that will roll up to R002, R003, etc etc. Also different camera types like Gopros will create G001, G002, G003.
Then I want to look for in the <children> of the above bins and find all the bins inside like <bin><name>R001_C001</name>~~~</bin> and create folders in the nk_output_dir\daynumber\~whatever bin this is contained~\~name of this bin~\. Which is user created of the roll number and clip number. (R001_C001, R001_C002, etc etc) This will be the new clip name, the name of the .nk comp that will be generated and the file name of the render on the write node.
The goal here is to recreate the bin folder structure in the directory I've choosen for nk_output_dir.
The dnx_render_dir that is for being plugged into the write nodes of my nuke scripts later to where the files should be rendered to. It's separate because I'd have a different RAID drive that it will go to that will change as they fill up. The renders just need to be put in a directory for the daynumber\~rollnumber~ but doesn't need to be constrained into a folder for the clipname.
Here is where I am really lost. Now, because I have to account for user error, I can't be entirely sure how deep in the tree I need to be going. I know I want the <pathurl>~</pathurl> which I can plug into the .nk (nuke) scripts I make. With red camera files they can either be the directly here .R3D or the folder structure which can been 2-3 bins deep. I know that I can't 100% rely on the guys on set to be consistent on how they make this bin.
All I can trust them to do is make sure they are in correct alphabetic order. If you look at the xml so the order of them is important. I also know if I am looking at a R### roll bin that I need 3 <pathurl></pathurl> and if im looking inside G### I need 6 and for C### only 5.
The order of them is important as they can rename the name tag inside `~~~~ to rename cameras that were the wrong setting without renaming source files. (which breaks important metadata that is needed in other programs)
While in this part of the tree I'd also like to grab the <clip id=~><in>###</in> to grab the in marker frame offset. If the cameras have gone out of sync and their start points can be set. But of course this tag is not child to the <pathurl></pathurl> and is actually 3 parents up! Also this tag won't be on every clip so I can't look for it first!
<clip id="masterclip-40" explodedTracks="true" frameBlend="FALSE">
<uuid>85f87acc-308f-401e-bf82-55e8ea41e55a</uuid>
<masterclipid>masterclip-40</masterclipid>
<ismasterclip>TRUE</ismasterclip>
<duration>5355</duration>
<rate>
<timebase>30</timebase>
<ntsc>TRUE</ntsc>
</rate>
<in>876</in>
<name>B002_C002_0216AM_002.R3D</name>
<media>
<video>
<track>
<clipitem id="clipitem-118" frameBlend="FALSE">
<masterclipid>masterclip-40</masterclipid>
<name>B002_C002_0216AM_002.R3D</name>
<rate>
<timebase>30</timebase>
<ntsc>TRUE</ntsc>
</rate>
<alphatype>none</alphatype>
<pixelaspectratio>square</pixelaspectratio>
<anamorphic>FALSE</anamorphic>
<file id="file-40">
<name>B002_C002_0216AM_002.R3D</name>
<pathurl>file://localhost/Volumes/REDLAB_3A/SESQUI/MASTER_FILES/DAY_00/RED/R002/B002/B002_0216G4.RDM/B002_C002_0216AM.RDC/B002_C002_0216AM_002.R3D</pathurl>
So once I've parsed all this the information I'd like to have is.
The original bin folder structure of the XML contained in the daynumber. Take the names of the bins and construct the same folder structure in the nk_output_dir (Day_00/R001/R001_C001 etc etc)
I also want to make a daynumber directory in the dnx_render_dir folder and a directory for each bin referencing a camera roll.
Based on if the clipname is starts with a R, G or C I want to be able to access that for selecting what kind of .nk to make.
I want the pathurl information for each bin that is referring to a clip and plug. I also want any <in> information if there is any for that clip. That way I can plug it into the read node information for my nuke gizmo.
I think once I figure out how to parse such a complicated xml tree I'll able to fuss and fumble the rest of the process.
I am just really struggling with finding examples of parsing an complicated XML file like this.
Whenever faced with a complex XML, consider an XSLT script to transform your XML into a simpler structure. As information, XSLT is a special-purpose, declarative language (same type as SQL) designed to transform XML into various structures for end use needs. Python like other general purpose languages maintains an XSLT processor, specifically in its lxml module.
While this transformation does not address your entire needs, you can parse the simpler structure for your Nuke application needs. Directories and names are simplified and labeled for daynumber, rollnumber, shotnames, and clip with pathurls.
XSLT script (save as .xsl or .xslt to be referenced in .py script below)
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>
<xsl:key name="idkey" match="ctype" use="#id" />
<xsl:template match="/">
<root>
<xsl:apply-templates select="*"/>
</root>
</xsl:template>
<xsl:template match="xmeml/bin">
<daynumber><xsl:value-of select="name"/></daynumber>
<xsl:apply-templates select="children/bin"/>
</xsl:template>
<xsl:template match="xmeml/bin/children/bin">
<roll>
<rollnumber><xsl:value-of select="name"/></rollnumber>
<rollnumberdir><xsl:value-of select="concat(ancestor::bin/name,
'/', name)"/></rollnumberdir>
<xsl:apply-templates select="children/bin"/>
</roll>
</xsl:template>
<xsl:template match="xmeml/bin/children/bin/children/bin">
<shot>
<shotname><xsl:value-of select="name"/></shotname>
<shotnamedir><xsl:value-of select="concat(/xmeml/bin/name, '/',
/xmeml/bin/children/bin/name, '/', name)"/></shotnamedir>
<xsl:apply-templates select="descendant::clip[position() < 4]"/>
</shot>
</xsl:template>
<xsl:template match="clip">
<clip>
<clipname><xsl:value-of select="descendant::name"/></clipname>
<xsl:copy-of select="in"/>
<pathurl><xsl:value-of select="descendant::pathurl"/></pathurl>
</clip>
</xsl:template>
</xsl:transform>
Python script (transform, parse, and export simpler structure)
#!/usr/bin/python
import lxml.etree as ET
# LOAD INPUT XML AND XSLT
dom = ET.parse('Input.xml'))
xslt = ET.parse('XSLTScript.xsl')
# TRANSFORM XML (SIMPLER NEWDOM CAN BE FURTHER PARSED: ITER(), FINDALL(), XPATH())
transform = ET.XSLT(xslt)
newdom = transform(dom)
# XPATH EXPRESSIONS (LIST OUTPUTS)
daynumber = newdom.xpath('//daynumber/text()')
# ['Day_00']
rolls = newdom.xpath('//rollnumber/text()')
# ['R001', 'R002']
shots = newdom.xpath('//shotname/text()')
# ['R001_C001', 'R002_C001', 'R002_C002']
# CONVERT TO STRING (IF NEEDED)
tree_out = ET.tostring(newdom, encoding='UTF-8', pretty_print=True, xml_declaration=True)
print(tree_out.decode("utf-8"))
# OUTPUT TO FILE (IF NEEDED)
xmlfile = open('Output.xml'),'wb')
xmlfile.write(tree_out)
xmlfile.close()
TRANSFORMED XML (contained in newdom object in .py script)
<?xml version='1.0' encoding='UTF-8'?>
<root>
<daynumber>Day_00</daynumber>
<roll>
<rollnumber>R001</rollnumber>
<rollnumberdir>Day_00/R001</rollnumberdir>
<shot>
<shotname>R001_C001</shotname>
<shotnamedir>Day_00/R001/R001_C001</shotnamedir>
<clip>
<clipname>A002_C001_0216MW_001.R3D</clipname>
<pathurl>file://localhost/Volumes/REDLAB_3A/SESQUI/MASTER_FILES/DAY_00/RED/R001/A002/A002_0216FE.RDM/A002_C001_0216MW.RDC/A002_C001_0216MW_001.R3D</pathurl>
</clip>
<clip>
<clipname>A002_C001_0216MW_002.R3D</clipname>
<pathurl>file://localhost/Volumes/REDLAB_3A/SESQUI/MASTER_FILES/DAY_00/RED/R001/A002/A002_0216FE.RDM/A002_C001_0216MW.RDC/A002_C001_0216MW_002.R3D</pathurl>
</clip>
<clip>
<clipname>A002_C001_0216MW_003.R3D</clipname>
<pathurl>file://localhost/Volumes/REDLAB_3A/SESQUI/MASTER_FILES/DAY_00/RED/R001/A002/A002_0216FE.RDM/A002_C001_0216MW.RDC/A002_C001_0216MW_003.R3D</pathurl>
</clip>
</shot>
</roll>
<roll>
<rollnumber>R002</rollnumber>
<rollnumberdir>Day_00/R002</rollnumberdir>
<shot>
<shotname>R002_C001</shotname>
<shotnamedir>Day_00/R001/R002_C001</shotnamedir>
<clip>
<clipname>A003_C001_0216XI_001.R3D</clipname>
<pathurl>file://localhost/Volumes/REDLAB_3A/SESQUI/MASTER_FILES/DAY_00/RED/R002/A003/A003_0216XO.RDM/A003_C001_0216XI.RDC/A003_C001_0216XI_001.R3D</pathurl>
</clip>
<clip>
<clipname>B002_C001_02169H_002.R3D</clipname>
<pathurl>file://localhost/Volumes/REDLAB_3A/SESQUI/MASTER_FILES/DAY_00/RED/R002/B002/B002_0216G4.RDM/B002_C001_02169H.RDC/B002_C001_02169H_002.R3D</pathurl>
</clip>
<clip>
<clipname>C002_C001_02168R_001.R3D</clipname>
<pathurl>file://localhost/Volumes/REDLAB_3A/SESQUI/MASTER_FILES/DAY_00/RED/R002/C002/C002_0216RL.RDM/C002_C001_02168R.RDC/C002_C001_02168R_001.R3D</pathurl>
</clip>
</shot>
<shot>
<shotname>R002_C002</shotname>
<shotnamedir>Day_00/R001/R002_C002</shotnamedir>
<clip>
<clipname>C002_C002_0216M9_001.R3D</clipname>
<pathurl>file://localhost/Volumes/REDLAB_3A/SESQUI/MASTER_FILES/DAY_00/RED/R002/C002/C002_0216RL.RDM/C002_C002_0216M9.RDC/C002_C002_0216M9_001.R3D</pathurl>
</clip>
<clip>
<clipname>C002_C002_0216M9_002.R3D</clipname>
<pathurl>file://localhost/Volumes/REDLAB_3A/SESQUI/MASTER_FILES/DAY_00/RED/R002/C002/C002_0216RL.RDM/C002_C002_0216M9.RDC/C002_C002_0216M9_002.R3D</pathurl>
</clip>
<clip>
<clipname>C002_C002_0216M9_003.R3D</clipname>
<pathurl>file://localhost/Volumes/REDLAB_3A/SESQUI/MASTER_FILES/DAY_00/RED/R002/C002/C002_0216RL.RDM/C002_C002_0216M9.RDC/C002_C002_0216M9_003.R3D</pathurl>
</clip>
</shot>
</roll>
</root>

How can I modify the attributes of an SVG file from Python?

I have an svg file that was generated by the map data-visualisation software 'Kartograph'. It contains a large number of paths representing areas on a map. These paths each have some data fields:
<path d=" ...path info... " data-electorate="Canberra" data-id="Canberra" data-no="23" data-nop="0.92" data-percentile="6" data-state="ACT" data-totalvotes="25" data-yes="2" data-yesp="0.08" id="Canberra"/>
So that I don't have to generate a new svg file every time, I want to modify some attributes, such as the number of 'yes' votes, from within python. Specifically, I would like to increment/increase the 'yes' votes value by one (for each execution of the code).
I have tried lxml and have browsed the documentation for it extensively, but so far this code has not worked:
from lxml import etree
filename = "aus4.svg"
tree = etree.parse(open(filename, 'r'))
for element in tree.iter():
if element.tag.split("}")[1] == "path":
if element.get("id") == "Lingiari":
yes_votes = element.get("data-yes")
print(yes_votes)
yes_votes.set(yes_votes, str(int(yes_votes) + 1))
print(yes_votes)
Is python the best tool to use for this task? If so how might I change the above code or start afresh. Apologies for any confusion. I am new to this 'lxml' module and svg files, so I'm a bit lost.
You do not set the attribute again, but use its value instead of the elmenet in this line:
yes_votes.set(yes_votes, str(int(yes_votes) + 1))
yes_votes contains the content of the attribute and not a reference to the attribute itself. Change it to:
element.set( "data-yes", str(int(yes_votes) + 1))

Find and Replace tags in XML using Python

I have proposed a similar question before, but this one is slightly different. I want to find and replace XML tags using python. I am using the XML's to upload as metadata for some GIS shapefiles. In the metadata editor, I have options to choose dates for when certain data is collected. The options are 'single date', 'multiple dates' and 'range of dates'. In the first XML, which contains tags for a range of dates, you will see tags "rngdates" with some subelements 'begdate', 'begtime', 'enddate' and . I want to edit these tags out so that it looks like the second XML which contains multiple single dates. The new tags are 'mdattim', 'sngdate' and 'caldate'. I hope this is clear enough, but please ask for more info if needed. XML is a weird beast, and I'm still not fully understanding it.
Thanks,
Mike
First XML:
<idinfo>
<citation>
<citeinfo>
<origin>My Company Name</origin>
<pubdate>05/04/2009</pubdate>
<title>Feature Class Name</title>
<edition>0</edition>
<geoform>vector digital data</geoform>
<onlink>.</onlink>
</citeinfo>
</citation>
<descript>
<abstract>This dataset represents the GPS location of inspection points collected in the field for the Site Name</abstract>
<purpose>This dataset was created to accompany the clients Assessment Plan. This point feature class represents the location within the area that the field crews collected related data.</purpose>
</descript>
<timeperd>
<timeinfo>
<rngdates>
<begdate>7/13/2010</begdate>
<begtime>unknown</begtime>
<enddate>7/15/2010</enddate>
<endtime>unknown</endtime>
</rngdates>
</timeinfo>
<current>ground condition</current>
</timeperd>
Second XML:
<idinfo>
<citation>
<citeinfo>
<origin>My Company Name</origin>
<pubdate>03/07/2011</pubdate>
<title>Feature Class Name</title>
<edition>0</edition>
<geoform>vector digital data</geoform>
<onlink>.</onlink>
</citeinfo>
</citation>
<descript>
<abstract>This dataset represents the GPS location of inspection points collected in the field for the Site Name</abstract>
<purpose>This dataset was created to accompany the clients Assessment Plan. This point feature class represents the location within the area that the field crews collected related data.</purpose>
</descript>
<timeperd>
<timeinfo>
<mdattim>
<sngdate>
<caldate>08-24-2009</caldate>
<time>unknown</time>
</sngdate>
<sngdate>
<caldate>08-26-2009</caldate>
</sngdate>
<sngdate>
<caldate>08-26-2009</caldate>
</sngdate>
<sngdate>
<caldate>07-07-2010</caldate>
</sngdate>
</mdattim>
</timeinfo>
This is my Python code so far:
folderPath = "Z:\ESRI\Figure_Sourcing\Figures\Metadata\IOR_Run_Metadata_2009"
for filename in glob.glob(os.path.join(folderPath, "*.xml")):
fullpath = os.path.join(folderPath, filename)
if os.path.isfile(fullpath):
basename, filename2 = os.path.split(fullpath)
root = ElementTree(file=r"Z:\ESRI\Figure_Sourcing\Figures\Metadata\Run_Metadata_2009\\" + filename2)
iter = root.getiterator()
#Iterate
for element in iter:
print element.tag
if element.tag == "begdate":
element.tag.replace("begdate", "sngdate")
I believe I succeeded in making the code work. This will allow you to edit certain tags if you need to change them from an existing XML file. I needed to do this to create metadata for some GIS shapefiles in a batch processing script to change certain date values depending on if they were single dates, multiple dates or a range of dates.
This webpage helped a lot: http://lxml.de/tutorial.html
I have some more work to do, but this was the answer I was looking for from my original question :) I'm sure this can be used in many other applications.
# Set workspace location for XML files
folderPath = "Z:\ESRI\Figure_Sourcing\Figures\Metadata\IOR_Run_Metadata_2009"
# Loop through each file and search for files with .xml extension
for filename in glob.glob(os.path.join(folderPath, "*.xml")):
fullpath = os.path.join(folderPath, filename)
# Split file name from the directory path
if os.path.isfile(fullpath):
basename, filename2 = os.path.split(fullpath)
# Set variable to XML files
root = ElementTree(file=r"Z:\ESRI\Figure_Sourcing\Figures\Metadata\IOR_Run_Metadata_2009\\" + filename2)
# Set variable for iterator
iter = root.getiterator()
#Iterate through the tags in each XML file
for element in iter:
if element.tag == "timeinfo":
tree = root.find(".//timeinfo")
# Clear all tags below the "timeinfo" tag
tree.clear()
# Append new Element
element.append(ET.Element("mdattim"))
# Create SubElements to the parent tag
child1 = ET.SubElement(tree, "sngdate")
child2 = ET.SubElement(child1, "caldate")
child3 = ET.SubElement(child1, "time")
# Set text values for tags
child2.text = "08-24-2009"
child3.text = "unknown

Editing all text in childNodes of XML file with Python

I'm trying to edit the text inside of all of the tags named "Volume" in an XML file by multiplying that text by a number entered by the user. The text inside of the "Volume" tag will always be a number. My code works so far, but only on the first instance of the "Volume" text.
Here's an example of the XML:
<blah>
<moreblah> sometext </moreblah> ;
<blah2>
<blah3> <blah4> 30 </blah4> <Volume> 15 </Volume> </blah3>
</blah2>
</blah>
<blah>
<moreblah> sometext </moreblah> ;
<blah2>
<blah3> <blah4> 30 </blah4> <Volume> 25 </Volume> </blah3>
</blah2>
</blah>
And here's my Python code:
#import modules
import xml.dom.minidom
from xml.dom.minidom import parse
import os
import fileinput
#create a backup of original file
new_file_name = 'blah.xml'
old_file_name = new_file_name + "_old"
os.rename(new_file_name, old_file_name)
#find all instances of "Volume"
doc = parse(old_file_name)
volume = doc.getElementsByTagName('Volume')[0]
child = volume.childNodes[0]
txt = child.nodeValue
#ask for percentage input
print
percentage = raw_input("Set Volume Percentage (1 - 100): ")
if percentage.isdigit():
if int(percentage) <101 >1:
print 'Thank You'
#append text of <Volume> tag
child.nodeValue = str(int(float(txt) * (int(percentage)/100.0)))
#persist changes to new file
xml_file = open(new_file_name, "w")
doc.writexml(xml_file)
xml_file.close()
#remove XML Declaration
text = open("blah.xml", "r").read()
text = text.replace('<?xml version="1.0" ?>', '')
open("blah.xml", "w").write(text)
else:
print
print 'Please enter a number between 1 and 100.'
print
print 'Try again.'
print
print 'Exiting.'
xml_file = open(new_file_name, "w")
doc.writexml(xml_file)
xml_file.close()
os.remove(old_file_name)
I know that in my code, I have "doc.getElementsByTagName('Volume')[0]" which denotes the first instance of the "Volume" tag, but I was just doing that as a test to see if it would work. So I'm aware that the code is working exactly as it should. But I'm wondering if anyone has any suggestions, or could tell me the easiest way to apply the user input percentage to all of the instances of the "Volume" tag.
This is also my first attempt at Python, so if you see anything else that seems weird, please let me know.
Thank you for your help!
You'll be much happier if you use a more modern XML API, like ElementTree (in the standard library) or lxml (more advanced).
In ElementTree or lxml you get access to XPath (or something close), which allows for a much more flexible syntax in finding elements and attributes in XML documents.
In ElementTree:
volumes = my_parsed_xml_file.find('.//Volume')
...will find all occurrences of the Volume element.
If you stick with the current syntax, by doing:
doc.getElementsByTagName('Volume')[0]
...you're specifically asking for the zero-th (first) Volume. If you want to process them all, you want a loop:
for volume in doc.getElementsByTagName('Volume'):
child = volume.childNodes[0]
// ... rest of your code inside the loop
If constructs like loops are unfamiliar to you, you should probably step back and read an introductory programming guide, as things will get pretty complicated quickly without some fundamentals. Best of luck!

Categories

Resources