Accessing XMLNS attribute with Python Elementree? - python

How can one access NS attributes through using ElementTree?
With the following:
<data xmlns="http://www.foo.net/a" xmlns:a="http://www.foo.net/a" book="1" category="ABS" date="2009-12-22">
When I try to root.get('xmlns') I get back None, Category and Date are fine, Any help appreciated..

I think element.tag is what you're looking for. Note that your example is missing a trailing slash, so it's unbalanced and won't parse. I've added one in my example.
>>> from xml.etree import ElementTree as ET
>>> data = '''<data xmlns="http://www.foo.net/a"
... xmlns:a="http://www.foo.net/a"
... book="1" category="ABS" date="2009-12-22"/>'''
>>> element = ET.fromstring(data)
>>> element
<Element {http://www.foo.net/a}data at 1013b74d0>
>>> element.tag
'{http://www.foo.net/a}data'
>>> element.attrib
{'category': 'ABS', 'date': '2009-12-22', 'book': '1'}
If you just want to know the xmlns URI, you can split it out with a function like:
def tag_uri_and_name(elem):
if elem.tag[0] == "{":
uri, ignore, tag = elem.tag[1:].partition("}")
else:
uri = None
tag = elem.tag
return uri, tag
For much more on namespaces and qualified names in ElementTree, see effbot's examples.

Look at the effbot namespaces documentation/examples; specifically the parse_map function. It shows you how to add an *ns_map* attribute to each element which contains the prefix/URI mapping that applies to that specific element.
However, that adds the ns_map attribute to all the elements. For my needs, I found I wanted a global map of all the namespaces used to make element look up easier and not hardcoded.
Here's what I came up with:
import elementtree.ElementTree as ET
def parse_and_get_ns(file):
events = "start", "start-ns"
root = None
ns = {}
for event, elem in ET.iterparse(file, events):
if event == "start-ns":
if elem[0] in ns and ns[elem[0]] != elem[1]:
# NOTE: It is perfectly valid to have the same prefix refer
# to different URI namespaces in different parts of the
# document. This exception serves as a reminder that this
# solution is not robust. Use at your own peril.
raise KeyError("Duplicate prefix with different URI found.")
ns[elem[0]] = "{%s}" % elem[1]
elif event == "start":
if root is None:
root = elem
return ET.ElementTree(root), ns
With this you can parse an xml file and obtain a dict with the namespace mappings. So, if you have an xml file like the following ("my.xml"):
<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:dc="http://purl.org/dc/elements/1.1/"\
>
<feed>
<item>
<title>Foo</title>
<dc:creator>Joe McGroin</dc:creator>
<description>etc...</description>
</item>
</feed>
</rss>
You will be able to use the xml namepaces and get info for elements like dc:creator:
>>> tree, ns = parse_and_get_ns("my.xml")
>>> ns
{u'content': '{http://purl.org/rss/1.0/modules/content/}',
u'dc': '{http://purl.org/dc/elements/1.1/}'}
>>> item = tree.find("/feed/item")
>>> item.findtext(ns['dc']+"creator")
'Joe McGroin'

Try this:
import xml.etree.ElementTree as ET
import re
import sys
with open(sys.argv[1]) as f:
root = ET.fromstring(f.read())
xmlns = ''
m = re.search('{.*}', root.tag)
if m:
xmlns = m.group(0)
print(root.find(xmlns + 'the_tag_you_want').text)

Related

XML parse does not show nodes

from xml.etree import ElementTree
t = """<collection xmlns:y="http://tail-f.com/ns/rest">
<appliance xmlns="http://networks.com/vnms/nms">
<uuid>088fbb70-40d1-4aaf-8ea3-590fd8238828</uuid>
<name>SRVDHCPE1</name>
<num-cpus>0</num-cpus>
<memory-size>0</memory-size>
<num-nics>4</num-nics>
</appliance>
<appliance xmlns="http://networks.com/vnms/nms">
<uuid>088fbb70-40d1-4aaf-8ea3-590fd8238828</uuid>
<name>SRVDHCPE2</name>
<num-cpus>0</num-cpus>
<memory-size>0</memory-size>
<num-nics>4</num-nics>
</appliance>
</collection>"""
dom = ElementTree.fromstring(t)
for n in dom.findall("collection/appliance/name"):
print(n.text)
Looking for all the names but it does not show. What am I doing wrong here.
You case definitely related to Parsing XML with Namespaces:
dom = ET.fromstring(t)
ns = {'rest': 'http://tail-f.com/ns/rest','nms': 'http://versa-networks.com/vnms/nms'}
for n in dom.findall("nms:appliance/nms:name", ns):
print(n.text)
The output:
SRVDHCPE1
SRVDHCPE2
You need to namespace your selectors:
from xml.etree import ElementTree
from xml.etree.ElementTree import Element
t = """<collection xmlns:y="http://tail-f.com/ns/rest">
<appliance xmlns="http://versa-networks.com/vnms/nms">
<uuid>088fbb70-40d1-4aaf-8ea3-590fd8238828</uuid>
<name>SRVDHCPE1</name>
<num-cpus>0</num-cpus>
<memory-size>0</memory-size>
<num-nics>4</num-nics>
</appliance>
<appliance xmlns="http://versa-networks.com/vnms/nms">
<uuid>088fbb70-40d1-4aaf-8ea3-590fd8238828</uuid>
<name>SRVDHCPE2</name>
<num-cpus>0</num-cpus>
<memory-size>0</memory-size>
<num-nics>4</num-nics>
</appliance>
</collection>"""
if __name__ == '__main__':
dom: Element = ElementTree.fromstring(t)
namespaces = {'n': 'http://versa-networks.com/vnms/nms'}
for name in dom.findall("./n:appliance/n:name", namespaces=namespaces):
print(name.text)
which prints:
SRVDHCPE1
SRVDHCPE2
For reference:
https://docs.python.org/3.7/library/xml.etree.elementtree.html#parsing-xml-with-namespaces

Python parse standalone-full.xml from Wildfly

I'm trying to parse the standalone-full.xml from Wildfly 8.1 Final with python to extract some information as datasources.
The example XML below.
<?xml version="1.0" ?>
<server xmlns="urn:jboss:domain:2.1">
<profile>
<subsystem xmlns="urn:jboss:domain:datasources:2.0">
<datasources>
<datasource jndi-name="java:jboss/datasources/JNDI" pool-name="JNDI" enabled="true">
<connection-url>jdbc:oracle:thin:#//HOST</connection-url>
<driver>ojdbc6</driver>
<pool>
<min-pool-size>50</min-pool-size>
<max-pool-size>100</max-pool-size>
</pool>
<security>
<user-name>USER</user-name>
<password>USER</password>
</security>
<validation>
<valid-connection-checker class-name="org.jboss.jca.adapters.jdbc.extensions.oracle.OracleValidConnectionChecker"/>
<validate-on-match>false</validate-on-match>
<background-validation>true</background-validation>
<background-validation-millis>10000</background-validation-millis>
<exception-sorter class-name="org.jboss.resource.adapter.jdbc.vendor.OracleExceptionSorter"/>
</validation>
</datasource>
<drivers>
<driver name="h2" module="com.h2database.h2">
<xa-datasource-class>org.h2.jdbcx.JdbcDataSource</xa-datasource-class>
</driver>
<driver name="ojdbc6" module="oracle.ojdbc">
<xa-datasource-class>oracle.ojdbc.xa.client.OracleXADataSource</xa-datasource-class>
</driver>
</drivers>
</datasources>
</subsystem>
</profile>
EDIT: How can I get deeper in the tree?
I tried something like this:
In[16]: from lxml import etree
In[18]: xml = etree.parse('standalone-full.xml')
In[21]: root = xml.getroot()
In[28]: children = root[0].getchildren()
In[31]: children[0]
Out[31]: <Element {urn:jboss:domain:datasources:2.0}subsystem at 0x4bef208>
In[32]: datasources = children[0]
In[33]: datasources.getchildren()
Out[33]: [<Element {urn:jboss:domain:datasources:2.0}datasources at 0x4befa48>]
Your question is rather unspecific, but as far as I can see from the regex you posted, you want to grab the text values of the connection-url, user-name, and password nodes under each datasource node that has a pool-name attribute with a value of JNDI. Here is one possibility of doing that (tested under Python 2.7):
import xml.etree.cElementTree as ET
ns = {'ds': 'urn:jboss:domain:datasources:2.0'}
root = ET.parse('standalone-full.xml').getroot()
children = root.findall(".//ds:datasource[#pool-name='JNDI']", ns)
for child in children:
print child.find("ds:connection-url", ns).text
security = child.find("ds:security", ns)
print security.find("ds:user-name", ns).text
print security.find("ds:password", ns).text
You could use Augeas to parse it:
$ augtool -At "Xml.lns incl $PWD/standalone-full.xml"
augtool> get //standalone-full.xml//datasource//password/#text
//standalone-full.xml//datasource//password/#text = USER
Just use the python-augeas bindings with Python:
import augeas
a = augeas.Augeas(flags=augeas.Augeas.NO_MODL_AUTOLOAD)
a.transform("Xml", "/home/raphink/bas/augeas/standalone-full.xml")
a.load()
v = a.get("//standalone-full.xml//datasource//password/#text")
I've solved my problem with regex which is a bad idea but it works.
import re
data = "standalone-full.xml"
regex_result = re.findall(r'.*:domain:datasources[\S\s]*?pool-name="JNDI"[\S\s]*?connection-url>.*' +
'#//(.*)<.*[\S\s]*?user-name>(.*)<.*\s*<password>(.*)<', data, re.M)

xmlns namespace breaking lxml

I am trying to open an xml file, and get values from certain tags. I have done this a lot but this particular xml is giving me some issues. Here is a section of the xml file:
<?xml version='1.0' encoding='UTF-8'?>
<package xmlns="http://apple.com/itunes/importer" version="film4.7">
<provider>filmgroup</provider>
<language>en-GB</language>
<actor name="John Smith" display="Doe John"</actor>
</package>
And here is a sample of my python code:
metadata = '/Users/mylaptop/Desktop/Python/metadata.xml'
from lxml import etree
parser = etree.XMLParser(remove_blank_text=True)
open(metadata)
tree = etree.parse(metadata, parser)
root = tree.getroot()
for element in root.iter(tag='provider'):
providerValue = tree.find('//provider')
providerValue = providerValue.text
print providerValue
tree.write('/Users/mylaptop/Desktop/Python/metadataDone.xml', pretty_print = True, xml_declaration = True, encoding = 'UTF-8')
When I run this it can't find the provider tag or its value. If I remove xmlns="http://apple.com/itunes/importer" then all work as expected.
My question is how can I remove this namespace, as i'm not at all interested in this, so I can get the tag values I need using lxml?
The provider tag is in the http://apple.com/itunes/importer namespace, so you either need to use the fully qualified name
{http://apple.com/itunes/importer}provider
or use one of the lxml methods that has the namespaces parameter, such as root.xpath. Then you can specify it with a namespace prefix (e.g. ns:provider):
from lxml import etree
parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse(metadata, parser)
root = tree.getroot()
namespaces = {'ns':'http://apple.com/itunes/importer'}
items = iter(root.xpath('//ns:provider/text()|//ns:actor/#name',
namespaces=namespaces))
for provider, actor in zip(*[items]*2):
print(provider, actor)
yields
('filmgroup', 'John Smith')
Note that the XPath used above assumes that <provider> and <actor> elements always appear in alternation. If that is not true, then there are of course ways to handle it, but the code becomes a bit more verbose:
for package in root.xpath('//ns:package', namespaces=namespaces):
for provider in package.xpath('ns:provider', namespaces=namespaces):
providerValue = provider.text
print providerValue
for actor in package.xpath('ns:actor', namespaces=namespaces):
print actor.attrib['name']
My suggestion is to not ignore the namespace but, instead, to take it into account. I wrote some related functions (copied with slight modification) for my work on the django-quickbooks library. With these functions, you should be able to do this:
providers = getels(root, 'provider', ns='http://apple.com/itunes/importer')
Here are those functions:
def get_tag_with_ns(tag_name, ns):
return '{%s}%s' % (ns, tag_name)
def getel(elt, tag_name, ns=None):
""" Gets the first tag that matches the specified tag_name taking into
account the QB namespace.
:param ns: The namespace to use if not using the default one for
django-quickbooks.
:type ns: string
"""
res = elt.find(get_tag_with_ns(tag_name, ns=ns))
if res is None:
raise TagNotFound('Could not find tag by name "%s"' % tag_name)
return res
def getels(elt, *path, **kwargs):
""" Gets the first set of elements found at the specified path.
Example:
>>> xml = (
"<root>" +
"<item>" +
"<id>1</id>" +
"</item>" +
"<item>" +
"<id>2</id>"* +
"</item>" +
"</root>")
>>> el = etree.fromstring(xml)
>>> getels(el, 'root', 'item', ns='correct/namespace')
[<Element item>, <Element item>]
"""
ns = kwargs['ns']
i=-1
for i in range(len(path)-1):
elt = getel(elt, path[i], ns=ns)
tag_name = path[i+1]
return elt.findall(get_tag_with_ns(tag_name, ns=ns))

Reading Maven Pom xml in Python

I have a pom file that has the following defined:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.welsh</groupId>
<artifactId>my-site</artifactId>
<version>1.0.0</version>
<packaging>pom</packaging>
<profiles>
<profile>
<build>
<plugins>
<plugin>
<groupId>org.welsh.utils</groupId>
<artifactId>site-tool</artifactId>
<version>1.0</version>
<executions>
<execution>
<configuration>
<mappings>
<property>
<name>homepage</name>
<value>/content/homepage</value>
</property>
<property>
<name>assets</name>
<value>/content/assets</value>
</property>
</mappings>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</profile>
</profiles>
</project>
And I am looking to build a dictionary off the name & value elements under property under the mappings element.
So what I'm trying to figure out how to get all possible mappings elements (Incase of multiple build profiles) so I can get all property elements under it and from reading about Supported XPath syntax the following should print out all possible text/value elements:
import xml.etree.ElementTree as xml
pomFile = xml.parse('pom.xml')
root = pomFile.getroot()
for mapping in root.findall('*/mappings'):
for prop in mapping.findall('.//property'):
logging.info(prop.find('name').text + " => " + prop.find('value').text)
Which is returning nothing. I tried just printing out all the mappings elements and get:
>>> print root.findall('*/mappings')
[]
And when I print out the everything from root I get:
>>> print root.findall('*')
[<Element '{http://maven.apache.org/POM/4.0.0}modelVersion' at 0x10b38bd50>, <Element '{http://maven.apache.org/POM/4.0.0}groupId' at 0x10b38bd90>, <Element '{http://maven.apache.org/POM/4.0.0}artifactId' at 0x10b38bf10>, <Element '{http://maven.apache.org/POM/4.0.0}version' at 0x10b3900d0>, <Element '{http://maven.apache.org/POM/4.0.0}packaging' at 0x10b390110>, <Element '{http://maven.apache.org/POM/4.0.0}name' at 0x10b390150>, <Element '{http://maven.apache.org/POM/4.0.0}properties' at 0x10b390190>, <Element '{http://maven.apache.org/POM/4.0.0}build' at 0x10b390310>, <Element '{http://maven.apache.org/POM/4.0.0}profiles' at 0x10b390390>]
Which made me try to print:
>>> print root.findall('*/{http://maven.apache.org/POM/4.0.0}mappings')
[]
But that's not working either.
Any suggestions would be great.
Thanks,
The main issues of the code in the question are
that it doesn't specify namespaces, and
that it uses */ instead of // which only matches direct children.
As you can see at the top of the XML file, Maven uses the namespace http://maven.apache.org/POM/4.0.0. The attribute xmlns in the root node defines the default namespace. The attribute xmlns:xsi defines a namespace that is only used for xsi:schemaLocation.
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
To specify tags like profile in methods like find, you have to specify the namespace as well. For example, you could write the following to find all profile-tags.
import xml.etree as xml
pom = xml.parse('pom.xml')
for profile in pom.findall('//{http://maven.apache.org/POM/4.0.0}profile'):
print(repr(profile))
Also note that I'm using //. Using */ would have the same result for your specific xml file above. However, it would not work for other tags like mappings. Since * represents only one level, */child can be expanded to parent/tag or xyz/tag but not to xyz/parent/tag.
Now, you should be able to come up with something like this to find all mappings:
pom = xml.parse('pom.xml')
map = {}
for mapping in pom.findall('//{http://maven.apache.org/POM/4.0.0}mappings'
'/{http://maven.apache.org/POM/4.0.0}property'):
name = mapping.find('{http://maven.apache.org/POM/4.0.0}name').text
value = mapping.find('{http://maven.apache.org/POM/4.0.0}value').text
map[name] = value
Specifying the namespaces like this is quite verbose. To make it easier to read, you can define a namespace map and pass it as second argument to find and findall:
# ...
nsmap = {'m': 'http://maven.apache.org/POM/4.0.0'}
for mapping in pom.findall('//m:mappings/m:property', nsmap):
name = mapping.find('m:name', nsmap).text
value = mapping.find('m:value', nsmap).text
map[name] = value
Ok, found out that when I remove maven stuff from the project element so its just <project> I can do this:
for mapping in root.findall('*//mappings'):
logging.info(mapping)
for prop in mapping.findall('./property'):
logging.info(prop.find('name').text + " => " + prop.find('value').text)
Which would result in:
INFO:root:<Element 'mappings' at 0x10d72d350>
INFO:root:homepage => /content/homepage
INFO:root:assets => /content/assets
However, if I leave the Maven stuff in at the top I can do this:
for mapping in root.findall('*//{http://maven.apache.org/POM/4.0.0}mappings'):
logging.info(mapping)
for prop in mapping.findall('./{http://maven.apache.org/POM/4.0.0}property'):
logging.info(prop.find('{http://maven.apache.org/POM/4.0.0}name').text + " => " + prop.find('{http://maven.apache.org/POM/4.0.0}value').text)
Which results in:
INFO:root:<Element '{http://maven.apache.org/POM/4.0.0}mappings' at 0x10aa7f310>
INFO:root:homepage => /content/homepage
INFO:root:assets => /content/assets
However, I'd love to be able to figure out how to avoid having to account for the maven stuff since it locks me into this one format.
EDIT:
Ok, I managed to get something a bit more verbose:
import xml.etree.ElementTree as xml
def getMappingsNode(node, nodeName):
if node.findall('*'):
for n in node.findall('*'):
if nodeName in n.tag:
return n
else:
return getMappingsNode(n, nodeName)
def getMappings(rootNode):
mappingsNode = getMappingsNode(rootNode, 'mappings')
mapping = {}
for prop in mappingsNode.findall('*'):
key = ''
val = ''
for child in prop.findall('*'):
if 'name' in child.tag:
key = child.text
if 'value' in child.tag:
val = child.text
if val and key:
mapping[key] = val
return mapping
pomFile = xml.parse('pom.xml')
root = pomFile.getroot()
mappings = getMappings(root)
print mappings

Editing XML as a dictionary in python?

I'm trying to generate customized xml files from a template xml file in python.
Conceptually, I want to read in the template xml, remove some elements, change some text attributes, and write the new xml out to a file. I wanted it to work something like this:
conf_base = ConvertXmlToDict('config-template.xml')
conf_base_dict = conf_base.UnWrap()
del conf_base_dict['root-name']['level1-name']['leaf1']
del conf_base_dict['root-name']['level1-name']['leaf2']
conf_new = ConvertDictToXml(conf_base_dict)
now I want to write to file, but I don't see how to get to
ElementTree.ElementTree.write()
conf_new.write('config-new.xml')
Is there some way to do this, or can someone suggest doing this a different way?
This'll get you a dict minus attributes. I don't know, if this is useful to anyone. I was looking for an xml to dict solution myself, when I came up with this.
import xml.etree.ElementTree as etree
tree = etree.parse('test.xml')
root = tree.getroot()
def xml_to_dict(el):
d={}
if el.text:
d[el.tag] = el.text
else:
d[el.tag] = {}
children = el.getchildren()
if children:
d[el.tag] = map(xml_to_dict, children)
return d
This: http://www.w3schools.com/XML/note.xml
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
Would equal this:
{'note': [{'to': 'Tove'},
{'from': 'Jani'},
{'heading': 'Reminder'},
{'body': "Don't forget me this weekend!"}]}
I'm not sure if converting the info set to nested dicts first is easier. Using ElementTree, you can do this:
import xml.etree.ElementTree as ET
doc = ET.parse("template.xml")
lvl1 = doc.findall("level1-name")[0]
lvl1.remove(lvl1.find("leaf1")
lvl1.remove(lvl1.find("leaf2")
# or use del lvl1[idx]
doc.write("config-new.xml")
ElementTree was designed so that you don't have to convert your XML trees to lists and attributes first, since it uses exactly that internally.
It also support as small subset of XPath.
For easy manipulation of XML in python, I like the Beautiful Soup library. It works something like this:
Sample XML File:
<root>
<level1>leaf1</level1>
<level2>leaf2</level2>
</root>
Python code:
from BeautifulSoup import BeautifulStoneSoup, Tag, NavigableString
soup = BeautifulStoneSoup('config-template.xml') # get the parser for the xml file
soup.contents[0].name
# u'root'
You can use the node names as methods:
soup.root.contents[0].name
# u'level1'
It is also possible to use regexes:
import re
tags_starting_with_level = soup.findAll(re.compile('^level'))
for tag in tags_starting_with_level: print tag.name
# level1
# level2
Adding and inserting new nodes is pretty straightforward:
# build and insert a new level with a new leaf
level3 = Tag(soup, 'level3')
level3.insert(0, NavigableString('leaf3')
soup.root.insert(2, level3)
print soup.prettify()
# <root>
# <level1>
# leaf1
# </level1>
# <level2>
# leaf2
# </level2>
# <level3>
# leaf3
# </level3>
# </root>
My modification of Daniel's answer, to give a marginally neater dictionary:
def xml_to_dictionary(element):
l = len(namespace)
dictionary={}
tag = element.tag[l:]
if element.text:
if (element.text == ' '):
dictionary[tag] = {}
else:
dictionary[tag] = element.text
children = element.getchildren()
if children:
subdictionary = {}
for child in children:
for k,v in xml_to_dictionary(child).items():
if k in subdictionary:
if ( isinstance(subdictionary[k], list)):
subdictionary[k].append(v)
else:
subdictionary[k] = [subdictionary[k], v]
else:
subdictionary[k] = v
if (dictionary[tag] == {}):
dictionary[tag] = subdictionary
else:
dictionary[tag] = [dictionary[tag], subdictionary]
if element.attrib:
attribs = {}
for k,v in element.attrib.items():
attribs[k] = v
if (dictionary[tag] == {}):
dictionary[tag] = attribs
else:
dictionary[tag] = [dictionary[tag], attribs]
return dictionary
namespace is the xmlns string, including braces, that ElementTree prepends to all tags, so here I've cleared it as there is one namespace for the entire document
NB that I adjusted the raw xml too, so that 'empty' tags would produce at most a ' ' text property in the ElementTree representation
spacepattern = re.compile(r'\s+')
mydictionary = xml_to_dictionary(ElementTree.XML(spacepattern.sub(' ', content)))
would give for instance
{'note': {'to': 'Tove',
'from': 'Jani',
'heading': 'Reminder',
'body': "Don't forget me this weekend!"}}
it's designed for specific xml that is basically equivalent to json, should handle element attributes such as
<elementName attributeName='attributeContent'>elementContent</elementName>
too
there's the possibility of merging the attribute dictionary / subtag dictionary similarly to how repeat subtags are merged, although nesting the lists seems kind of appropriate :-)
Adding this line
d.update(('#' + k, v) for k, v in el.attrib.iteritems())
in the user247686's code you can have node attributes too.
Found it in this post https://stackoverflow.com/a/7684581/1395962
Example:
import xml.etree.ElementTree as etree
from urllib import urlopen
xml_file = "http://your_xml_url"
tree = etree.parse(urlopen(xml_file))
root = tree.getroot()
def xml_to_dict(el):
d={}
if el.text:
d[el.tag] = el.text
else:
d[el.tag] = {}
children = el.getchildren()
if children:
d[el.tag] = map(xml_to_dict, children)
d.update(('#' + k, v) for k, v in el.attrib.iteritems())
return d
Call as
xml_to_dict(root)
Have you tried this?
print xml.etree.ElementTree.tostring( conf_new )
most direct way to me :
root = ET.parse(xh)
data = root.getroot()
xdic = {}
if data > None:
for part in data.getchildren():
xdic[part.tag] = part.text
XML has a rich infoset, and it takes some special tricks to represent that in a Python dictionary. Elements are ordered, attributes are distinguished from element bodies, etc.
One project to handle round-trips between XML and Python dictionaries, with some configuration options to handle the tradeoffs in different ways is XML Support in Pickling Tools. Version 1.3 and newer is required. It isn't pure Python (and in fact is designed to make C++ / Python interaction easier), but it might be appropriate for various use cases.

Categories

Resources