The XML looks more less like this:
<title>Introduction to Anthropology</title>
Then here is my code to get the title and such....And I would like to get the time or the place tag which have a child element. How can I do that, and I also tried different methods but none of them seem to works. Thank you! Any help is appreciated
for c in courses:
title = c.find('title').text
num = c.find('crse').text
days = c.find('days').text
# time = c.find('time').text
# for t in c:
# timeSlot1 = t.find('start_time')
# timeSlot2 = t.find('end_time')
# format text using {}
print(' *{} {} [{}] {} {} {}'.format(b, title, days, num, timeSlot1, timeSlot2))
# how to get date
You're almost there: just select the correct child by specifying a path relative to <course>:
for c in courses:
title = c.find('title').text
# [...]
timeSlot1 = c.find('time/start_time').text
timeSlot2 = c.find('time/end_time').text
I want to get the text value of every child that has one and every attribute value of every child that has one. I can get the text values but I am having trouble getting the attribute values one by one and assigning each to a variable.
I have the following XML file:
<CardAuthorisation xmlns:xsi="http://...">
<Card productid="16" PAN="64256700991593" product="MC" programid="AUST" branchcode="" />
I have the following code:
import xml.etree.ElementTree as et
xFile = "test.XML"
xtree = et.parse(xFile)
xRoot = xtree.getroot()
for cardAuthorisation in xRoot.findall('CardAuthorisation'):
recType = cardAuthorisation.find('./RecType').text
authId = cardAuthorisation.find('./AuthId').text
localDate = cardAuthorisation.find('./LocalDate').text
settlementDate = cardAuthorisation.find('./SettlementDate').text
#here is where I am having trouble with
for every attribute in Card:
card_productid = #the value of productid if not None else None
branchcode = #the value of branchcode if not None else None
This is my first time working with XML files, I have done a lot of research but none of them matches my use case. Any help would be highly appreciated, thanks in advance.
To get all <Card> tags and each attribute/value of <Card>, you can do:
for c in cardAuthorisation.findall('Card'):
for k, v in c.items():
print(k, v)
productid 16
PAN 64256700991593
product MC
programid AUST
You can access attributes of the "Card" element as follows:
card = cardAuthorisation.find('./Card')
for key in card.keys():
print(key, card.get(key))
I'm trying to parse the following xml to pull out certain data then eventually edit the data as needed.
Here is the xml:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
The data that I'm looking to pull from this is the STATUS, COMMENTS and the ATTRIBUTE_DATA directly following VULN_ATTRIBUTE that matches == Rule_Ver. So in this example.
I should get the following:
Gen000000 NotAFinding None
What I have so far is that I can get the Status and Comments easy, but can't figure out the ATTRIBUTE_DATA portion. I can find the first one (Vuln_Num), then I tried to add a index but that gives a "list index out of range" error.
This is where I'm at now.
import xml.etree.ElementTree as ET
doc = ET.parse('test.ckl')
TagList = doc.findall("./VULN")
for curTag in TagList:
StatusTag = curTag.find("STATUS")
CommentTag = curTag.find("COMMENTS")
DataTag = curTag.find("./STIG_DATA/ATTRIBUTE_DATA")
print "GEN:[%s] Status:[%s] Comments: %s" %( DataTag.text, StatusTag.text, CommentTag.text)
This gives the following output:
GEN:[V-38438] Status:[NotAFinding] Comments: None
I want:
GEN:[Gen000000] Status:[NotAFinding] Comments: None
So the end goal is to be able to parse hundreds of these and edit the comments field as needed. I don't think the editing part will be that hard once I get the right element.
Logically I see two ways of doing this. Either go to the ATTRIBUTE_DATA[5] and grab the text or find VULN_ATTRIBUTE == Rule_Ver then grab the next ATTRIBUTE_DATA.
I have tried doing this:
DataTag = curTag.find(".//STIG_DATA//ATTRIBUTE_DATA")[5]
and both give meIndexError: list index out of range
I saw lxml had get_element_by_id and xpath, but I can't add modules to this system so it is etree for me.
Thanks in advance.
One can find an element by position, but you've used the incorrect XPath syntax. Either of the following lines should work:
DataTag = curTag.find("./STIG_DATA[5]/ATTRIBUTE_DATA") # Note: 5, not 4
DataTag = curTag.findall("./STIG_DATA/ATTRIBUTE_DATA")[4] # Note: 4, not 5
However, I strongly recommend against using that. There is no guarantee that the Rule_Ver instance of STIG_DATA is always the fifth item.
If you could change to lxml, then this works:
DataTag = curTag.xpath(
Since you can't use lxml, you must iterate the STIG_DATA elements by hand, like so:
def GetData(curTag):
for stig in curTag.findall('STIG_DATA'):
if stig.find('VULN_ATTRIBUTE').text == 'Rule_Ver':
return stig.find('ATTRIBUTE_DATA')
Here is a complete program with error checking added to GetData():
import xml.etree.ElementTree as ET
doc = ET.parse('test.ckl')
TagList = doc.findall("./VULN")
def GetData(curTag):
for stig in curTag.findall('STIG_DATA'):
vuln = stig.find('VULN_ATTRIBUTE')
if vuln is not None and vuln.text == 'Rule_Ver':
data = stig.find('ATTRIBUTE_DATA')
return data
for curTag in TagList:
StatusTag = curTag.find("STATUS")
CommentTag = curTag.find("COMMENTS")
DataTag = GetData(curTag)
print "GEN:[%s] Status:[%s] Comments: %s" %( DataTag.text, StatusTag.text, CommentTag.text)
OK I'll be the first to admit its is, just not the path I want and I don't know how to get it.
I'm using Python 3.3 in Eclipse with Pydev plugin in both Windows 7 at work and ubuntu 13.04 at home. I'm new to python and have limited programming experience.
I'm trying to write a script to take in an XML Lloyds market insurance message, find all the tags and dump them in a .csv where we can easily update them and then reimport them to create an updated xml.
I have managed to do all of that except when I get all the tags it only gives the tag name and not the tags above it.
<TechAccount Sender="broker" Receiver="insurer">
That is a fragment of the XML. What I want is to find all the tags and their path. For example for I want to show it as ItemsInGroupTotal/Count but can only get it as Count.
Here is my code:
xml = etree.parse(fullpath)
print( xml.xpath('.//*'))
all_xpath = xml.xpath('.//*')
every_tag = []
for i in all_xpath:
single_tag = '%s,%s' % (i.tag, i.text)
This gives:
'{}ServiceProviderGroupReference,8-2012-08-10', '{}ServiceProviderGroupItemsTotal,\n', '{}Count,13',
As you can see Count is shown as {namespace}Count, 13 and not {namespace}ItemsInGroupTotal/Count, 13
Can anyone point me towards what I need?
Thanks (hope my first post is OK)
This is my code now:
with open(fullpath, 'rb') as xmlFilepath:
xmlfile =
fulltext = '%s' % xmlfile
text = fulltext[2:]
xml = etree.fromstring(fulltext)
tree = etree.ElementTree(xml)
every_tag = ['%s, %s' % (tree.getpath(e), e.text) for e in xml.iter()]
But this returns an error:
ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.
I remove the first two chars as thy are b' and it complained it didn't start with a tag
I have been playing around with this and if I remove the xis: xxx tags and the namespace stuff at the top it works as expected. I need to keep the xis tags and be able to identify them as xis tags so can't just delete them.
Any help on how I can achieve this?
ElementTree objects have a method getpath(element), which returns a
structural, absolute XPath expression to find that element
Calling getpath on each element in a iter() loop should work for you:
from pprint import pprint
from lxml import etree
text = """
<TechAccount Sender="broker" Receiver="insurer">
xml = etree.fromstring(text)
tree = etree.ElementTree(xml)
every_tag = ['%s, %s' % (tree.getpath(e), e.text) for e in xml.iter()]
['/TechAccount, \n',
'/TechAccount/UUId, 2EF40080-F618-4FF7-833C-A34EA6A57B73',
'/TechAccount/BrokerReference, HOY123/456',
'/TechAccount/ServiceProviderReference, 2012080921401A1',
'/TechAccount/CreationDate, 2012-08-10',
'/TechAccount/AccountTransactionType, premium',
'/TechAccount/GroupReference, 2012080921401A1',
'/TechAccount/ItemsInGroupTotal, \n',
'/TechAccount/ItemsInGroupTotal/Count, 1',
'/TechAccount/ServiceProviderGroupReference, 8-2012-08-10',
'/TechAccount/ServiceProviderGroupItemsTotal, \n',
'/TechAccount/ServiceProviderGroupItemsTotal/Count, 13']
If your xml data is in the file test.xml, the code would look like:
from pprint import pprint
from lxml import etree
xml = etree.parse('test.xml').getroot()
tree = etree.ElementTree(xml)
every_tag = ['%s, %s' % (tree.getpath(e), e.text) for e in xml.iter()]
Hope that helps.
getpath() does indeed return an xpath that's not suited for human consumption. From this xpath, you can build up a more useful one though. Such as with this quick-and-dirty approach:
def human_xpath(element):
full_xpath = element.getroottree().getpath(element)
xpath = ''
human_xpath = ''
for i, node in enumerate(full_xpath.split('/')[1:]):
xpath += '/' + node
element = element.xpath(xpath)[0]
namespace, tag = element.tag[1:].split('}', 1)
if element.getparent() is not None:
nsmap = {'ns': namespace}
same_name = element.getparent().xpath('./ns:' + tag,
if len(same_name) > 1:
tag += '[{}]'.format(same_name.index(element) + 1)
human_xpath += '/' + tag
return human_xpath
I am new to Python. Now I have to replace a number of values in an XML file with Python. The example snippet of XML is:
<gmd:description gco:nilReason="missing">
<gco:CharacterString />
What I want to do is to replace those decimal values, i.e. 112.907, with a specified value.
<gmd:description gco:nilReason="missing">
<gco:CharacterString />
<gco:Decimal>new value</gco:Decimal>
<gco:Decimal>new value</gco:Decimal>
<gco:Decimal>new value</gco:Decimal>
<gco:Decimal>new value</gco:Decimal>
I tried with a few methods but none of them worked with my assumption that the difficulty is with the namespace prefix gmd and gco.
Please help me out. Thanks in advance!
Cheers, Alex
I couldn't get lxml to process your xml without adding fake namespace declarations at the top so here is how your input looked
<gmd:extent xmlns:gmd="urn:x:y:z:1" xmlns:gco="urn:x:y:z:1">
<gmd:description gco:nilReason="missing">
<gco:CharacterString />
I assumed you have two lists one for the current values and one for the new ones like this
old = [112.907, 158.96, -54.7539, -10.1357]
new = [1,2,3,4]
d = dict(zip(old,new))
Here is the full code
#!/usr/bin/env python
import sys
from lxml import etree
def process(fname):
f = open(fname)
tree = etree.parse(f)
root = tree.getroot()
old = [112.907, 158.96, -54.7539, -10.1357]
new = [1,2,3,4]
d = dict(zip(old,new))
nodes = root.findall('.//gco:Decimal', root.nsmap)
for node in nodes:
node.text = str(d[float(node.text)])
return etree.tostring(root, pretty_print=True)
def main():
fname = sys.argv[1]
text = process(fname)
outfile = open('out.xml', 'w+')
if __name__ == '__main__':
and here is how the output looked like
<gmd:extent xmlns:gmd="urn:x:y:z:1" xmlns:gco="urn:x:y:z:1">
<gmd:description gco:nilReason="missing">
I'm trying to generate customized xml files from a template xml file in python.
Conceptually, I want to read in the template xml, remove some elements, change some text attributes, and write the new xml out to a file. I wanted it to work something like this:
conf_base = ConvertXmlToDict('config-template.xml')
conf_base_dict = conf_base.UnWrap()
del conf_base_dict['root-name']['level1-name']['leaf1']
del conf_base_dict['root-name']['level1-name']['leaf2']
conf_new = ConvertDictToXml(conf_base_dict)
now I want to write to file, but I don't see how to get to
Is there some way to do this, or can someone suggest doing this a different way?
This'll get you a dict minus attributes. I don't know, if this is useful to anyone. I was looking for an xml to dict solution myself, when I came up with this.
import xml.etree.ElementTree as etree
tree = etree.parse('test.xml')
root = tree.getroot()
def xml_to_dict(el):
if el.text:
d[el.tag] = el.text
d[el.tag] = {}
children = el.getchildren()
if children:
d[el.tag] = map(xml_to_dict, children)
return d
<body>Don't forget me this weekend!</body>
Would equal this:
{'note': [{'to': 'Tove'},
{'from': 'Jani'},
{'heading': 'Reminder'},
{'body': "Don't forget me this weekend!"}]}
I'm not sure if converting the info set to nested dicts first is easier. Using ElementTree, you can do this:
import xml.etree.ElementTree as ET
doc = ET.parse("template.xml")
lvl1 = doc.findall("level1-name")[0]
# or use del lvl1[idx]
ElementTree was designed so that you don't have to convert your XML trees to lists and attributes first, since it uses exactly that internally.
It also support as small subset of XPath.
For easy manipulation of XML in python, I like the Beautiful Soup library. It works something like this:
Sample XML File:
Python code:
from BeautifulSoup import BeautifulStoneSoup, Tag, NavigableString
soup = BeautifulStoneSoup('config-template.xml') # get the parser for the xml file
# u'root'
You can use the node names as methods:
# u'level1'
It is also possible to use regexes:
import re
tags_starting_with_level = soup.findAll(re.compile('^level'))
for tag in tags_starting_with_level: print
# level1
# level2
Adding and inserting new nodes is pretty straightforward:
# build and insert a new level with a new leaf
level3 = Tag(soup, 'level3')
level3.insert(0, NavigableString('leaf3')
soup.root.insert(2, level3)
print soup.prettify()
# <root>
# <level1>
# leaf1
# </level1>
# <level2>
# leaf2
# </level2>
# <level3>
# leaf3
# </level3>
# </root>
My modification of Daniel's answer, to give a marginally neater dictionary:
def xml_to_dictionary(element):
l = len(namespace)
tag = element.tag[l:]
if element.text:
if (element.text == ' '):
dictionary[tag] = {}
dictionary[tag] = element.text
children = element.getchildren()
if children:
subdictionary = {}
for child in children:
for k,v in xml_to_dictionary(child).items():
if k in subdictionary:
if ( isinstance(subdictionary[k], list)):
subdictionary[k] = [subdictionary[k], v]
subdictionary[k] = v
if (dictionary[tag] == {}):
dictionary[tag] = subdictionary
dictionary[tag] = [dictionary[tag], subdictionary]
if element.attrib:
attribs = {}
for k,v in element.attrib.items():
attribs[k] = v
if (dictionary[tag] == {}):
dictionary[tag] = attribs
dictionary[tag] = [dictionary[tag], attribs]
return dictionary
namespace is the xmlns string, including braces, that ElementTree prepends to all tags, so here I've cleared it as there is one namespace for the entire document
NB that I adjusted the raw xml too, so that 'empty' tags would produce at most a ' ' text property in the ElementTree representation
spacepattern = re.compile(r'\s+')
mydictionary = xml_to_dictionary(ElementTree.XML(spacepattern.sub(' ', content)))
would give for instance
{'note': {'to': 'Tove',
'from': 'Jani',
'heading': 'Reminder',
'body': "Don't forget me this weekend!"}}
it's designed for specific xml that is basically equivalent to json, should handle element attributes such as
<elementName attributeName='attributeContent'>elementContent</elementName>
there's the possibility of merging the attribute dictionary / subtag dictionary similarly to how repeat subtags are merged, although nesting the lists seems kind of appropriate :-)
Adding this line
d.update(('#' + k, v) for k, v in el.attrib.iteritems())
in the user247686's code you can have node attributes too.
Found it in this post
import xml.etree.ElementTree as etree
from urllib import urlopen
xml_file = "http://your_xml_url"
tree = etree.parse(urlopen(xml_file))
root = tree.getroot()
def xml_to_dict(el):
if el.text:
d[el.tag] = el.text
d[el.tag] = {}
children = el.getchildren()
if children:
d[el.tag] = map(xml_to_dict, children)
d.update(('#' + k, v) for k, v in el.attrib.iteritems())
return d
Call as
Have you tried this?
print xml.etree.ElementTree.tostring( conf_new )
most direct way to me :
root = ET.parse(xh)
data = root.getroot()
xdic = {}
if data > None:
for part in data.getchildren():
xdic[part.tag] = part.text
XML has a rich infoset, and it takes some special tricks to represent that in a Python dictionary. Elements are ordered, attributes are distinguished from element bodies, etc.
One project to handle round-trips between XML and Python dictionaries, with some configuration options to handle the tradeoffs in different ways is XML Support in Pickling Tools. Version 1.3 and newer is required. It isn't pure Python (and in fact is designed to make C++ / Python interaction easier), but it might be appropriate for various use cases.