"None" result at Parsing XML with ElementTree - python

I've tried everything to get a XML content but all I've got is a 'None' as return. Could anybody help me?
The code I'm trying is:
import xml.etree.cElementTree as ET
parsedXML = ET.parse("C:\\Users\\denis\\Documents\\Projetos\\NFe\\Arquivos\\33180601279711000100550020001554261733208443-nfeo.xml")
for node in parsedXML.getroot():
email = node.find('cNF')
phone = node.find('natOp')
street = node.find('nNF')
print(email)
Part of the XML (content is bigger than this) is right bellow:
<?xml version="1.0" encoding="ISO-8859-1"?>
<nfeProc xmlns="http://www.portalfiscal.inf.br/nfe" versao="3.10">
<NFe xmlns="http://www.portalfiscal.inf.br/nfe">
<infNFe versao="3.10" Id="NFe33180601279711000100550020001554261733208443">
<ide>
<cUF>33</cUF>
<cNF>73320844</cNF>
<natOp>VENDA DE PRODUCAO DO ESTABELECIMENTO</natOp>
<indPag>1</indPag>
<mod>55</mod>
<serie>2</serie>
<nNF>155426</nNF>
<dhEmi>2018-06-25T16:06:33-03:00</dhEmi>
<dhSaiEnt>2018-06-25T16:06:08-03:00</dhSaiEnt>
<tpNF>1</tpNF>
<idDest>2</idDest>
<cMunFG>3304557</cMunFG>
<tpImp>2</tpImp>
<tpEmis>1</tpEmis>
<cDV>3</cDV>
<tpAmb>1</tpAmb>
<finNFe>1</finNFe>
<indFinal>1</indFinal>
<indPres>9</indPres>
<procEmi>0</procEmi>
<verProc>NeoGrid NFe 1.63.4</verProc>
</ide>
<emit>
I appreciate your help!

You are using an XML document with namespaces, so you need to provide it during you call, as shown in this answer.
Here, we get
namespaces = {'n': 'http://www.portalfiscal.inf.br/nfe'}
root = parsedXML.getroot()
root.find('n:NFe', namespaces)
to return the element, while root.find('NFe') returns None.
Also note that find and findall only search the direct children, not nested children (cf. documentation), which mean that you will have to iter over children (see e.g. here for an example).

Related

Generate XML Document in Python 3 using Namespaces and ElementTree

I am having problems generating a XML document using the ElementTree framework in Python 3. I tried registering the namespace before setting up the document. Right now it seems that I can generate a XML document only by adding the namespace to each element like a=Element("{full_namespace_URI}element_name") which seems tedious.
How do I setup the default namespace and can omit putting it in each element?
Any help is appreciated.
I have written a small demo program for Python 3:
from io import BytesIO
from xml.etree import ElementTree as ET
ET.register_namespace("", "urn:dslforum-org:service-1-0")
"""
desired output
==============
<?xml version='1.0' encoding='utf-8'?>
<topNode xmlns="urn:dslforum-org:service-1-0"">
<childNode>content</childNode>
</topNode>
"""
# build XML document without namespaces
a = ET.Element("topNode")
b = ET.Element("childNode")
b.text = "content"
a.append(b)
tree = ET.ElementTree(a)
# build XML document with namespaces
a_ns = ET.Element("{dsl}topNode")
b_ns = ET.Element("{dsl}childNode")
b_ns.text = "content"
a_ns.append(b_ns)
tree_ns = ET.ElementTree(a_ns)
def print_element_tree(element_tree, comment, default_namespace=None):
"""
print element tree with comment to standard out
"""
with BytesIO() as buf:
element_tree.write(buf, encoding="utf-8", xml_declaration=True,
default_namespace=default_namespace)
buf.seek(0)
print(comment)
print(buf.read().decode("utf-8"))
print_element_tree(tree, "Element Tree without XML namespace")
print_element_tree(tree_ns, "Element Tree with XML namespace", "dsl")
I believe you are overthinking this.
Registering a default namespace in your code avoids the ns0: aliases.
Registering any namespaces you will use while creating a document allows you to designate the alias used for each namespace.
To achieve your desired output, assign the namespace to your top element:
a = ET.Element("{urn:dslforum-org:service-1-0}topNode")
The preceding ET.register_namespace("", "urn:dslforum-org:service-1-0") will make that the default namespace in the document, assign it to topNode, and not prefix your tag names.
<?xml version='1.0' encoding='utf-8'?>
<topNode xmlns="urn:dslforum-org:service-1-0"><childNode>content</childNode></topNode>
If you remove the register_namespace() call, then you get this monstrosity:
<?xml version='1.0' encoding='utf-8'?>
<ns0:topNode xmlns:ns0="urn:dslforum-org:service-1-0"><childNode>content</childNode></ns0:topNode>

How can I print XPaths of lxml tree elements?

I'm trying to print XPaths of all elements in XML tree, but I get strange output when using lxml. Instead of xpath which contains name of each node within path, I get strange "*"-kind of output.
Do you know what might be the issue here? Here the code, as well as XML I am trying to analyze.
from lxml import etree
xml = """
<filter xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<bundles xmlns="http://cisco.com/ns/yang/Cisco-IOS-XR-bundlemgr-oper">
<bundles>
<bundle>
<data>
<bundle-status/>
<lacp-status/>
<minimum-active-links/>
<ipv4bfd-status/>
<active-member-count/>
<active-member-configured/>
</data>
<members>
<member>
<member-interface/>
<interface-name/>
<member-mux-data>
<member-state/>
</member-mux-data>
</member>
</members>
<bundle-interface>{{bundle_name}}</bundle-interface>
</bundle>
</bundles>
</bundles>
<bfd xmlns="http://cisco.com/ns/yang/Cisco-IOS-XR-ip-bfd-oper">
<session-briefs>
<session-brief>
<state/>
<interface-name>{{bundle_name}}</interface-name>
</session-brief>
</session-briefs>
</bfd>
</filter>
"""
root = etree.XML(xml)
tree = etree.ElementTree(root)
for element in root.iter():
print(tree.getpath(element))
The output looks like this (there should be node names instead of "*"):
/*
/*/*[1]
/*/*[1]/*
/*/*[1]/*/*
/*/*[1]/*/*/*[1]
/*/*[1]/*/*/*[1]/*[1]
/*/*[1]/*/*/*[1]/*[2]
/*/*[1]/*/*/*[1]/*[3]
/*/*[1]/*/*/*[1]/*[4]
/*/*[1]/*/*/*[1]/*[5]
/*/*[1]/*/*/*[1]/*[6]
/*/*[1]/*/*/*[2]
/*/*[1]/*/*/*[2]/*
/*/*[1]/*/*/*[2]/*/*[1]
/*/*[1]/*/*/*[2]/*/*[2]
/*/*[1]/*/*/*[2]/*/*[3]
/*/*[1]/*/*/*[2]/*/*[3]/*
/*/*[1]/*/*/*[3]
/*/*[2]
/*/*[2]/*
/*/*[2]/*/*
/*/*[2]/*/*/*[1]
/*/*[2]/*/*/*[2]
Thanks a lot!
Dragan
I found that besides getpath, etree contains also a "sibling"
method called getelementpath, giving proper result also for
namespaced elements.
So change your code to:
for element in root.iter():
print(tree.getelementpath(element))
For your source sample, with namespaces shortened for readability,
the initial part of the result is:
.
{http://cisco.com/ns}bundles
{http://cisco.com/ns}bundles/{http://cisco.com/ns}bundles

How to parse tiered XML String

I have an xml string that I need to parse in python that looks like this:
<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">
<s:Body>
<PostLoadsResponse xmlns="http://webservices.truckstop.com/v11">
<PostLoadsResult xmlns:a="http://schemas.datacontract.org/2004/07/WebServices.Objects" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<Errors xmlns="http://schemas.datacontract.org/2004/07/WebServices">
<Error>
<ErrorMessage>Invalid Location</ErrorMessage>
</Error>
</Errors>
</PostLoadsResult>
</PostLoadsResponse>
</s:Body>
</s:Envelope>'
I'm having trouble using xmltree to get to the error message of this tree without something like:
import xml.etree.ElementTree as ET
ET.fromstring(text).findall('{http://schemas.xmlsoap.org/soap/envelope/}Body')[0].getchildren()[0].getchildren()[0].getchildren()
You need to handle namespaces and you can do it with xml.etree.ElementTree:
tree = ET.fromstring(data)
namespaces = {
's': 'http://schemas.xmlsoap.org/soap/envelope/',
'd': "http://schemas.datacontract.org/2004/07/WebServices"
}
print(tree.find(".//d:ErrorMessage", namespaces=namespaces).text)
Prints Invalid Location.
Using the partial XPath support:
ET.fromstring(text).find('.//{http://schemas.datacontract.org/2004/07/WebServices}ErrorMessage')
That will instruct it to find the first element named ErrorMessage with namespace http://schemas.datacontract.org/2004/07/WebServices at any depth.
However, it may be faster to use something like
ET.fromstring(text).find('{http://schemas.xmlsoap.org/soap/envelope/}Body').find('{http://webservices.truckstop.com/v11}PostLoadsResponse').find('{http://webservices.truckstop.com/v11}PostLoadsResult').find('{http://schemas.datacontract.org/2004/07/WebServices}Errors').find('{http://schemas.datacontract.org/2004/07/WebServices}Error').find('{http://schemas.datacontract.org/2004/07/WebServices}ErrorMessage'
If you know your message will always contain those elements.
You can use the getiterator method on the tree to iterate through the items in it. You can check the tag on each item to see if it's the right one.
>>> err = [node.text for node in tree.getiterator() if node.tag.endswith('ErrorMessage')]
>>> err
['Invalid Location']

XML parsing specific values - Python

I've been attempting to parse a list of xml files. I'd like to print specific values such as the userName value.
<?xml version="1.0" encoding="utf-8"?>
<Drives clsid="{8FDDCC1A-0C3C-43cd-A6B4-71A6DF20DA8C}"
disabled="1">
<Drive clsid="{935D1B74-9CB8-4e3c-9914-7DD559B7A417}"
name="S:"
status="S:"
image="2"
changed="2007-07-06 20:57:37"
uid="{4DA4A7E3-F1D8-4FB1-874F-D2F7D16F7065}">
<Properties action="U"
thisDrive="NOCHANGE"
allDrives="NOCHANGE"
userName=""
cpassword=""
path="\\scratch"
label="SCRATCH"
persistent="1"
useLetter="1"
letter="S"/>
</Drive>
</Drives>
My script is working fine collecting a list of xml files etc. However the below function is to print the relevant values. I'm trying to achieve this as suggested in this post. However I'm clearly doing something incorrectly as I'm getting errors suggesting that elm object has no attribute text. Any help would be appreciated.
Current Code
from lxml import etree as ET
def read_files(files):
for fi in files:
doc = ET.parse(fi)
elm = doc.find('userName')
print elm.text
doc.find looks for a tag with the given name. You are looking for an attribute with the given name.
elm.text is giving you an error because doc.find doesn't find any tags, so it returns None, which has no text property.
Read the lxml.etree docs some more, and then try something like this:
doc = ET.parse(fi)
root = doc.getroot()
prop = root.find(".//Properties") # finds the first <Properties> tag anywhere
elm = prop.attrib['userName']
userName is an attribute, not an element. Attributes don't have text nodes attached to them at all.
for el in doc.xpath('//*[#userName]'):
print el.attrib['userName']
You can try to take the element using the tag name and then try to take its attribute (userName is an attribute for Properties):
from lxml import etree as ET
def read_files(files):
for fi in files:
doc = ET.parse(fi)
props = doc.getElementsByTagName('Properties')
elm = props[0].attributes['userName']
print elm.value

Python and ElementTree: return "inner XML" excluding parent element

In Python 2.6 using ElementTree, what's a good way to fetch the XML (as a string) inside a particular element, like what you can do in HTML and javascript with innerHTML?
Here's a simplified sample of the XML node I am starting with:
<label attr="foo" attr2="bar">This is some text and a link in embedded HTML</label>
I'd like to end up with this string:
This is some text and a link in embedded HTML
I've tried iterating over the parent node and concatenating the tostring() of the children, but that gave me only the subnodes:
# returns only subnodes (e.g. and a link)
''.join([et.tostring(sub, encoding="utf-8") for sub in node])
I can hack up a solution using regular expressions, but was hoping there'd be something less hacky than this:
re.sub("</\w+?>\s*?$", "", re.sub("^\s*?<\w*?>", "", et.tostring(node, encoding="utf-8")))
How about:
from xml.etree import ElementTree as ET
xml = '<root>start here<child1>some text<sub1/>here</child1>and<child2>here as well<sub2/><sub3/></child2>end here</root>'
root = ET.fromstring(xml)
def content(tag):
return tag.text + ''.join(ET.tostring(e) for e in tag)
print content(root)
print content(root.find('child2'))
Resulting in:
start here<child1>some text<sub1 />here</child1>and<child2>here as well<sub2 /><sub3 /></child2>end here
here as well<sub2 /><sub3 />
This is based on the other solutions, but the other solutions did not work in my case (resulted in exceptions) and this one worked:
from xml.etree import Element, ElementTree
def inner_xml(element: Element):
return (element.text or '') + ''.join(ElementTree.tostring(e, 'unicode') for e in element)
Use it the same way as in Mark Tolonen's answer.
The following worked for me:
from xml.etree import ElementTree as etree
xml = '<root>start here<child1>some text<sub1/>here</child1>and<child2>here as well<sub2/><sub3/></child2>end here</root>'
dom = etree.XML(xml)
(dom.text or '') + ''.join(map(etree.tostring, dom)) + (dom.tail or '')
# 'start here<child1>some text<sub1 />here</child1>and<child2>here as well<sub2 /><sub3 /></child2>end here'
dom.text or '' is used to get the text at the start of the root element. If there is no text dom.text is None.
Note that the result is not a valid XML - a valid XML should have only one root element.
Have a look at the ElementTree docs about mixed content.
Using Python 2.6.5, Ubuntu 10.04

Categories

Resources