Python lxml iterfind w/ namespace but prefix=None

Python lxml iterfind w/ namespace but prefix=None - python

I want to perform iterfind() for elements which have a namespace but no prefix. I'd like to call
iterfind([tagname]) or iterfind([tagname], [namespace dict])
I don't care to enter the tag as follows every time:
"{%s}tagname" % tree.nsmap[None]
Details
I'm running through an xml response from a Google API. The root node defines several namespaces, including one for which there is no prefix: xmlns="http://www.w3.org/2005/Atom"
It looks as though when I try to search through my etree, everything behaves as I would expect for elements with a prefix. e.g.:
>>> for x in root.iterfind('dxp:segment'): print x
...
<Element {http://schemas.google.com/analytics/2009}segment at 0x1211b98>
<Element {http://schemas.google.com/analytics/2009}segment at 0x1211d78>
<Element {http://schemas.google.com/analytics/2009}segment at 0x1211a08>
>>>
But when I try to search for something without a prefix, the search doesn't automatically add the namespace for root.nsmap[None]. e.g.:
>>> for x in root.iterfind('entry'): print x
...
>>>
Even if I try to throw the namespace map in as the optional argument for iterfind, It won't attach the namespace.

Try this:
for x in root.iterfind('{http://www.w3.org/2005/Atom}entry'):
print x
For more information: read the docs: http://lxml.de/tutorial.html#namespaces
If you do not want to type that, and you want to provide a namespace map, you always have to use a prefix, like this for example:
nsmap = {'atom': 'http://www.w3.org/2005/Atom'}
for x in root.iterfind('atom:entry', namespaces=nsmap):
print x
(same thing goes if you want to use xpath)
What prefix is used in the document, if any, is not important, it's about you specifying the fully qualified name of the element, either writing it out complete with URI using the curly bracket notation, or using a prefix that is mapped to a URI.

I found that you can simply add an empty string that maps to the default namespace (verified in Python 3.9):
nsmap = {'': 'http://www.w3.org/2005/Atom'}
for x in root.iterfind('entry', namespaces=nsmap):
print(x)

Related

ElementTree namespace dictionary not working with find() or findall()

I'm stumped with how to do the ElementTree namespace dictionary and subsequent find() and findall() calls using the documented sytnax:
A better way to search the namespaced XML example is to create a
dictionary with your own prefixes and use those in the search
functions:
ns = {'real_person': 'http://people.example.com',
'role': 'http://characters.example.com'}
for actor in root.findall('real_person:actor', ns):
name = actor.find('real_person:name', ns)
print(name.text)
for char in actor.findall('role:character', ns):
print(' |-->', char.text)
The issue i'm having is if i try to use the syntax noted in that doc, by passing the "ns" dictionary as a 2nd argument in find() or findall(), i get an empty list. If I type out the full namespace without passing the 2nd argument, it returns all of the expected elements.
I've defined my namespace dictionary as such:
ns = {'ws':'{urn:com.workday/workersync}'}
And here is the ElementTree and root setup:
xmlparser = ET.parse(xmlfile)
xmlroot = xmlparser.getroot()
Here is what i get when i try to use the dictionary shortcut syntax noted in the docs:
>>> xmlroot.findall('ws:Worker', ns)
[]
Just an empty list... Here is what i get if type out the namespace in the call:
xmlroot.findall('{urn:com.workday/workersync}Worker')
[<Element '{urn:com.workday/workersync}Worker' at 0x03220A78>, <Element'{urn:com.workday/workersync}Worker' at 0x0322D8C0>]
That returns the expected 2 elements in my sample file.
Here is what the top of my sample file looks like for reference:
<?xml version="1.0" encoding="UTF-8"?>
<ws:Worker_Sync xmlns:ws="urn:com.workday/workersync" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<ws:Header>
<ws:Version>34.0</ws:Version>
<ws:Prior_Entry_Time>2020-07-04T21:40:25.822-07:00</ws:Prior_Entry_Time>
<ws:Current_Entry_Time>2020-07-04T22:03:47.458-07:00</ws:Current_Entry_Time>
<ws:Prior_Effective_Time>2020-07-04T00:00:00.000-07:00</ws:Prior_Effective_Time>
<ws:Current_Effective_Time>2020-07-05T00:00:00.000-07:00</ws:Current_Effective_Time>
<ws:Full_File>true</ws:Full_File>
<ws:Document_Retention_Policy>30</ws:Document_Retention_Policy>
<ws:Worker_Count>2</ws:Worker_Count>
</ws:Header>
<ws:Worker>
*<snipped rest of XML data>*
The snipped XML data contains 2 <ws:Worker> elements with many subchildren under them.
I've been messing with this for longer than i'd care to admit. I feel like I'm missing something incredibly obvious, as to my eyes, my code looks like every example i've found online and the example code on the docs.
Please help!

Remove the curly brackets from the URI string. The namespace dictionary should look like this:
ns = {'ws': 'urn:com.workday/workersync'}
Another option is to use a wildcard for the namespace. This is supported for find() and findall() since Python 3.8:
print(xmlroot.findall('{*}Worker'))
Output:
[<Element '{urn:com.workday/workersync}Worker' at 0x033E6AC8>]

Parsing XML with Python - accessing elements

I'm using lxml to parse some xml, but for some reason I can't find a specific element.
I'm trying to access the <Constant> elements.
Here's an xml snippet:
</rdf:Description>
</rdf:RDF>
</MiriamAnnotation>
<ListOfSubstrates>
<Substrate metabolite="Metabolite_5" stoichiometry="1"/>
</ListOfSubstrates>
<ListOfModifiers>
<Modifier metabolite="Metabolite_9" stoichiometry="1"/>
</ListOfModifiers>
<ListOfConstants>
<Constant key="Parameter_4344" name="Kcat" value="433.724"/>
<Constant key="Parameter_4343" name="km" value="479.617"/>
The code I'm using is like this:
>>> from lxml import etree as ET
>>> parsed = ET.parse('ct.cps')
>>> root = parsed.getroot()
>>> for a in root.findall(".//Constant"):
... print a.attrib['key']
...
>>> for a in root.findall('Constant'):
... print a.get('key')
...
>>> for a in root.findall('Constant'):
... print a.attrib['key']
...
As you can see, none of these things seem to work.
What am I doing wrong?
EDIT: I'm wondering if it has something to do with the fact that <Constant> elements are empty?
EDIT2: Source xml here: https://www.dropbox.com/s/i6hga7nvmcd6rxx/ct.cps?dl=0

Here is how you can get the values you are looking for:
from lxml import etree
parsed = etree.parse('ct.cps')
for a in parsed.findall("//{http://www.copasi.org/static/schema}Constant"):
print a.attrib["key"]
Output:
Parameter_4344
Parameter_4343
Parameter_4342
Parameter_4341
Parameter_4340
Parameter_4339
Parameter_4338
Parameter_4337
Parameter_4336
Parameter_4335
Parameter_4334
Parameter_4333
Parameter_4332
Parameter_4331
Parameter_4330
Parameter_4329
Parameter_4328
Parameter_4327
Parameter_4326
Parameter_4325
Parameter_4324
Parameter_4323
Parameter_4322
Parameter_4321
Parameter_4320
Parameter_4319
The important thing here is that the COPASI root element in your XML file (the real one at the Dropbox URL) declares a default namespace (http://www.copasi.org/static/schema). This means that the element and all its descendants, including Constant, belong to that namespace.
So instead of Constant elements, you need to look for {http://www.copasi.org/static/schema}Constant elements.
See http://lxml.de/tutorial.html#namespaces.
Here is how you could do it using XPath instead of findall:
from lxml import etree
NSMAP = {"c": "http://www.copasi.org/static/schema"}
parsed = etree.parse('ct.cps')
for a in parsed.xpath("//c:Constant", namespaces=NSMAP):
print a.attrib["key"]
See http://lxml.de/xpathxslt.html#namespaces-and-prefixes.

First, please disregard my comment. It turns out that xml.etree is much better than the standard xml.etree.ElementTree in that it takes care of the namespace. The problem you have is you want to search for '//Constant', which means the nodes can be at any level. However, the root element does not allow you to do it:
>>> root.findall('//Constant')
SyntaxError: cannot use absolute path on element
However, you can do that at higher level:
>>> parsed.findall('//Constant')
[<Element Constant at 0x10a7ce128>, <Element Constant at 0x10a7ce170>]
Update
I am posting here the full text. Since I don't have your full XML file, I make something up to fill in the blank.
from lxml import etree as ET
from StringIO import StringIO
xml_text = """<?xml version='1.0' encoding='utf-8' ?>
<rdf:root xmlns:rdf='http://foo.bar.com/rdf'>
<rdf:RDF>
<rdf:Description>
DescriptionX
</rdf:Description>
</rdf:RDF>
<rdf:foo>
<MiriamAnnotation>
bar
</MiriamAnnotation>
<ListOfSubstrates>
<Substrate metabolite="Metabolite_5" stoichiometry="1"/>
</ListOfSubstrates>
<ListOfModifiers>
<Modifier metabolite="Metabolite_9" stoichiometry="1"/>
</ListOfModifiers>
<ListOfConstants>
<Constant key="Parameter_4344" name="Kcat" value="433.724"/>
<Constant key="Parameter_4343" name="km" value="479.617"/>
</ListOfConstants>
</rdf:foo>
</rdf:root>
"""
buffer = StringIO(xml_text)
tree = ET.parse(buffer)
for constant_node in tree.findall('//Constant'):
print constant_node.attrib['key']

Don't use findall. It is has a limited featureset and is designed to be compatible with ElementTree.
Instead, use xpath, which supports namespaces. From the above, it appears that you probably want to say something like
# possibilities, you need to get these right...
ns_dict = {'atom':"http://www.w3.org/2005/Atom",,
"rdf":"http://www.w3.org/2000/01/rdf-schema#" }
root = parsed.getroot()
for a in root.xpath('.//rdf:Constant', namespaces=ns_dict):
print a.attrib['key']
Note that you must include a namespace prefix in your xpath expression whenever an element has a non-blank namespace, and they must map to one of the namespace URLs that match the same URLs in your document.
Update
Since you posted your original document, I see that there is no namespace assigned to the elements you are looking for. This will work, I just tried it with your source document:
for a in tree.xpath("//Constant"):
print a.attrib['key']
You don't need a namespace because there is no default namespace specified in the document itself.

XML Python Parsing Namespace [duplicate]

I have the following XML which I want to parse using Python's ElementTree:
<rdf:RDF xml:base="http://dbpedia.org/ontology/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns="http://dbpedia.org/ontology/">
<owl:Class rdf:about="http://dbpedia.org/ontology/BasketballLeague">
<rdfs:label xml:lang="en">basketball league</rdfs:label>
<rdfs:comment xml:lang="en">
a group of sports teams that compete against each other
in Basketball
</rdfs:comment>
</owl:Class>
</rdf:RDF>
I want to find all owl:Class tags and then extract the value of all rdfs:label instances inside them. I am using the following code:
tree = ET.parse("filename")
root = tree.getroot()
root.findall('owl:Class')
Because of the namespace, I am getting the following error.
SyntaxError: prefix 'owl' not found in prefix map
I tried reading the document at http://effbot.org/zone/element-namespaces.htm but I am still not able to get this working since the above XML has multiple nested namespaces.
Kindly let me know how to change the code to find all the owl:Class tags.

You need to give the .find(), findall() and iterfind() methods an explicit namespace dictionary:
namespaces = {'owl': 'http://www.w3.org/2002/07/owl#'} # add more as needed
root.findall('owl:Class', namespaces)
Prefixes are only looked up in the namespaces parameter you pass in. This means you can use any namespace prefix you like; the API splits off the owl: part, looks up the corresponding namespace URL in the namespaces dictionary, then changes the search to look for the XPath expression {http://www.w3.org/2002/07/owl}Class instead. You can use the same syntax yourself too of course:
root.findall('{http://www.w3.org/2002/07/owl#}Class')
Also see the Parsing XML with Namespaces section of the ElementTree documentation.
If you can switch to the lxml library things are better; that library supports the same ElementTree API, but collects namespaces for you in .nsmap attribute on elements and generally has superior namespaces support.

Here's how to do this with lxml without having to hard-code the namespaces or scan the text for them (as Martijn Pieters mentions):
from lxml import etree
tree = etree.parse("filename")
root = tree.getroot()
root.findall('owl:Class', root.nsmap)
UPDATE:
5 years later I'm still running into variations of this issue. lxml helps as I showed above, but not in every case. The commenters may have a valid point regarding this technique when it comes merging documents, but I think most people are having difficulty simply searching documents.
Here's another case and how I handled it:
<?xml version="1.0" ?><Tag1 xmlns="http://www.mynamespace.com/prefix">
<Tag2>content</Tag2></Tag1>
xmlns without a prefix means that unprefixed tags get this default namespace. This means when you search for Tag2, you need to include the namespace to find it. However, lxml creates an nsmap entry with None as the key, and I couldn't find a way to search for it. So, I created a new namespace dictionary like this
namespaces = {}
# response uses a default namespace, and tags don't mention it
# create a new ns map using an identifier of our choice
for k,v in root.nsmap.iteritems():
if not k:
namespaces['myprefix'] = v
e = root.find('myprefix:Tag2', namespaces)

Note: This is an answer useful for Python's ElementTree standard library without using hardcoded namespaces.
To extract namespace's prefixes and URI from XML data you can use ElementTree.iterparse function, parsing only namespace start events (start-ns):
>>> from io import StringIO
>>> from xml.etree import ElementTree
>>> my_schema = u'''<rdf:RDF xml:base="http://dbpedia.org/ontology/"
... xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
... xmlns:owl="http://www.w3.org/2002/07/owl#"
... xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
... xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
... xmlns="http://dbpedia.org/ontology/">
...
... <owl:Class rdf:about="http://dbpedia.org/ontology/BasketballLeague">
... <rdfs:label xml:lang="en">basketball league</rdfs:label>
... <rdfs:comment xml:lang="en">
... a group of sports teams that compete against each other
... in Basketball
... </rdfs:comment>
... </owl:Class>
...
... </rdf:RDF>'''
>>> my_namespaces = dict([
... node for _, node in ElementTree.iterparse(
... StringIO(my_schema), events=['start-ns']
... )
... ])
>>> from pprint import pprint
>>> pprint(my_namespaces)
{'': 'http://dbpedia.org/ontology/',
'owl': 'http://www.w3.org/2002/07/owl#',
'rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
'rdfs': 'http://www.w3.org/2000/01/rdf-schema#',
'xsd': 'http://www.w3.org/2001/XMLSchema#'}
Then the dictionary can be passed as argument to the search functions:
root.findall('owl:Class', my_namespaces)

I've been using similar code to this and have found it's always worth reading the documentation... as usual!
findall() will only find elements which are direct children of the current tag. So, not really ALL.
It might be worth your while trying to get your code working with the following, especially if you're dealing with big and complex xml files so that that sub-sub-elements (etc.) are also included.
If you know yourself where elements are in your xml, then I suppose it'll be fine! Just thought this was worth remembering.
root.iter()
ref: https://docs.python.org/3/library/xml.etree.elementtree.html#finding-interesting-elements
"Element.findall() finds only elements with a tag which are direct children of the current element. Element.find() finds the first child with a particular tag, and Element.text accesses the element’s text content. Element.get() accesses the element’s attributes:"

To get the namespace in its namespace format, e.g. {myNameSpace}, you can do the following:
root = tree.getroot()
ns = re.match(r'{.*}', root.tag).group(0)
This way, you can use it later on in your code to find nodes, e.g using string interpolation (Python 3).
link = root.find(f"{ns}link")

This is basically Davide Brunato's answer however I found out that his answer had serious problems the default namespace being the empty string, at least on my python 3.6 installation. The function I distilled from his code and that worked for me is the following:
from io import StringIO
from xml.etree import ElementTree
def get_namespaces(xml_string):
namespaces = dict([
node for _, node in ElementTree.iterparse(
StringIO(xml_string), events=['start-ns']
)
])
namespaces["ns0"] = namespaces[""]
return namespaces
where ns0 is just a placeholder for the empty namespace and you can replace it by any random string you like.
If I then do:
my_namespaces = get_namespaces(my_schema)
root.findall('ns0:SomeTagWithDefaultNamespace', my_namespaces)
It also produces the correct answer for tags using the default namespace as well.

My solution is based on #Martijn Pieters' comment:
register_namespace only influences serialisation, not search.
So the trick here is to use different dictionaries for serialization and for searching.
namespaces = {
'': 'http://www.example.com/default-schema',
'spec': 'http://www.example.com/specialized-schema',
}
Now, register all namespaces for parsing and writing:
for name, value in namespaces.iteritems():
ET.register_namespace(name, value)
For searching (find(), findall(), iterfind()) we need a non-empty prefix. Pass these functions a modified dictionary (here I modify the original dictionary, but this must be made only after the namespaces are registered).
self.namespaces['default'] = self.namespaces['']
Now, the functions from the find() family can be used with the default prefix:
print root.find('default:myelem', namespaces)
but
tree.write(destination)
does not use any prefixes for elements in the default namespace.

Processing large XML file [duplicate]

I have the following XML which I want to parse using Python's ElementTree:
<rdf:RDF xml:base="http://dbpedia.org/ontology/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns="http://dbpedia.org/ontology/">
<owl:Class rdf:about="http://dbpedia.org/ontology/BasketballLeague">
<rdfs:label xml:lang="en">basketball league</rdfs:label>
<rdfs:comment xml:lang="en">
a group of sports teams that compete against each other
in Basketball
</rdfs:comment>
</owl:Class>
</rdf:RDF>
I want to find all owl:Class tags and then extract the value of all rdfs:label instances inside them. I am using the following code:
tree = ET.parse("filename")
root = tree.getroot()
root.findall('owl:Class')
Because of the namespace, I am getting the following error.
SyntaxError: prefix 'owl' not found in prefix map
I tried reading the document at http://effbot.org/zone/element-namespaces.htm but I am still not able to get this working since the above XML has multiple nested namespaces.
Kindly let me know how to change the code to find all the owl:Class tags.

You need to give the .find(), findall() and iterfind() methods an explicit namespace dictionary:
namespaces = {'owl': 'http://www.w3.org/2002/07/owl#'} # add more as needed
root.findall('owl:Class', namespaces)
Prefixes are only looked up in the namespaces parameter you pass in. This means you can use any namespace prefix you like; the API splits off the owl: part, looks up the corresponding namespace URL in the namespaces dictionary, then changes the search to look for the XPath expression {http://www.w3.org/2002/07/owl}Class instead. You can use the same syntax yourself too of course:
root.findall('{http://www.w3.org/2002/07/owl#}Class')
Also see the Parsing XML with Namespaces section of the ElementTree documentation.
If you can switch to the lxml library things are better; that library supports the same ElementTree API, but collects namespaces for you in .nsmap attribute on elements and generally has superior namespaces support.

Here's how to do this with lxml without having to hard-code the namespaces or scan the text for them (as Martijn Pieters mentions):
from lxml import etree
tree = etree.parse("filename")
root = tree.getroot()
root.findall('owl:Class', root.nsmap)
UPDATE:
5 years later I'm still running into variations of this issue. lxml helps as I showed above, but not in every case. The commenters may have a valid point regarding this technique when it comes merging documents, but I think most people are having difficulty simply searching documents.
Here's another case and how I handled it:
<?xml version="1.0" ?><Tag1 xmlns="http://www.mynamespace.com/prefix">
<Tag2>content</Tag2></Tag1>
xmlns without a prefix means that unprefixed tags get this default namespace. This means when you search for Tag2, you need to include the namespace to find it. However, lxml creates an nsmap entry with None as the key, and I couldn't find a way to search for it. So, I created a new namespace dictionary like this
namespaces = {}
# response uses a default namespace, and tags don't mention it
# create a new ns map using an identifier of our choice
for k,v in root.nsmap.iteritems():
if not k:
namespaces['myprefix'] = v
e = root.find('myprefix:Tag2', namespaces)

Note: This is an answer useful for Python's ElementTree standard library without using hardcoded namespaces.
To extract namespace's prefixes and URI from XML data you can use ElementTree.iterparse function, parsing only namespace start events (start-ns):
>>> from io import StringIO
>>> from xml.etree import ElementTree
>>> my_schema = u'''<rdf:RDF xml:base="http://dbpedia.org/ontology/"
... xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
... xmlns:owl="http://www.w3.org/2002/07/owl#"
... xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
... xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
... xmlns="http://dbpedia.org/ontology/">
...
... <owl:Class rdf:about="http://dbpedia.org/ontology/BasketballLeague">
... <rdfs:label xml:lang="en">basketball league</rdfs:label>
... <rdfs:comment xml:lang="en">
... a group of sports teams that compete against each other
... in Basketball
... </rdfs:comment>
... </owl:Class>
...
... </rdf:RDF>'''
>>> my_namespaces = dict([
... node for _, node in ElementTree.iterparse(
... StringIO(my_schema), events=['start-ns']
... )
... ])
>>> from pprint import pprint
>>> pprint(my_namespaces)
{'': 'http://dbpedia.org/ontology/',
'owl': 'http://www.w3.org/2002/07/owl#',
'rdf': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#',
'rdfs': 'http://www.w3.org/2000/01/rdf-schema#',
'xsd': 'http://www.w3.org/2001/XMLSchema#'}
Then the dictionary can be passed as argument to the search functions:
root.findall('owl:Class', my_namespaces)

I've been using similar code to this and have found it's always worth reading the documentation... as usual!
findall() will only find elements which are direct children of the current tag. So, not really ALL.
It might be worth your while trying to get your code working with the following, especially if you're dealing with big and complex xml files so that that sub-sub-elements (etc.) are also included.
If you know yourself where elements are in your xml, then I suppose it'll be fine! Just thought this was worth remembering.
root.iter()
ref: https://docs.python.org/3/library/xml.etree.elementtree.html#finding-interesting-elements
"Element.findall() finds only elements with a tag which are direct children of the current element. Element.find() finds the first child with a particular tag, and Element.text accesses the element’s text content. Element.get() accesses the element’s attributes:"

To get the namespace in its namespace format, e.g. {myNameSpace}, you can do the following:
root = tree.getroot()
ns = re.match(r'{.*}', root.tag).group(0)
This way, you can use it later on in your code to find nodes, e.g using string interpolation (Python 3).
link = root.find(f"{ns}link")

This is basically Davide Brunato's answer however I found out that his answer had serious problems the default namespace being the empty string, at least on my python 3.6 installation. The function I distilled from his code and that worked for me is the following:
from io import StringIO
from xml.etree import ElementTree
def get_namespaces(xml_string):
namespaces = dict([
node for _, node in ElementTree.iterparse(
StringIO(xml_string), events=['start-ns']
)
])
namespaces["ns0"] = namespaces[""]
return namespaces
where ns0 is just a placeholder for the empty namespace and you can replace it by any random string you like.
If I then do:
my_namespaces = get_namespaces(my_schema)
root.findall('ns0:SomeTagWithDefaultNamespace', my_namespaces)
It also produces the correct answer for tags using the default namespace as well.

My solution is based on #Martijn Pieters' comment:
register_namespace only influences serialisation, not search.
So the trick here is to use different dictionaries for serialization and for searching.
namespaces = {
'': 'http://www.example.com/default-schema',
'spec': 'http://www.example.com/specialized-schema',
}
Now, register all namespaces for parsing and writing:
for name, value in namespaces.iteritems():
ET.register_namespace(name, value)
For searching (find(), findall(), iterfind()) we need a non-empty prefix. Pass these functions a modified dictionary (here I modify the original dictionary, but this must be made only after the namespaces are registered).
self.namespaces['default'] = self.namespaces['']
Now, the functions from the find() family can be used with the default prefix:
print root.find('default:myelem', namespaces)
but
tree.write(destination)
does not use any prefixes for elements in the default namespace.

Find tag text in xml

I have the following xml:
<a:something>text-a</a:something>
<a:otherthing>text-b</a:otherthing>
and I want to assign a variable with the text of <a:otherthing>.
I tried txt = xml.find("a:otherthing").text but it shows me SyntaxError: prefix 'a' not found in prefix map
how do I do this?

Your XML shall somewhere above declare namespace for given prefix "a".
Note, that XML allows changing purpose of namespace few times in one document (but this is not used often).
Then you will find, that for "ns:a" there is something line "http://a.alfa.aa/a/aaa.aa" string, which is so called fully qualified namespace.
In your find you shall then use a namespace map in form of
nsmap = {"a": "http://a.alfa.aa/a/aaa.aa"}
xml.find("a:otherthing", namespaces=nsmap)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python lxml iterfind w/ namespace but prefix=None - python

I found that you can simply add an empty string that maps to the default namespace (verified in Python 3.9): nsmap = {'': 'http://www.w3.org/2005/Atom'} for x in root.iterfind('entry', namespaces=nsmap): print(x)

Related

ElementTree namespace dictionary not working with find() or findall()

Parsing XML with Python - accessing elements

XML Python Parsing Namespace [duplicate]

Processing large XML file [duplicate]

Find tag text in xml

Categories

Resources