Xpath in Python . Getting SyntaxError("invalid predicate") - python

import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
xpathobjects = tree.findall(".//BuildingNodeBase[name = 'Building name']")
I am wanting to pull a BuildingNodeBase with a child tag name that has value Building name.
But Getting:
SyntaxError("invalid predicate")

The XPath support in ElementTree is limited, but your type of expression is supported. It's just that you need to remove the extra spaces around the =:
.//BuildingNodeBase[name='Building name']

I use lxml but I guess you can adopt this for your use:
from lxml import etree
tree = etree.parse('test.xml')
xpathobjects = tree.xpath(".//BuildingNodeBase[#name = 'Building name']")

Related

lxml create CDATA element

I am trying to create CDATA element as per https://lxml.de/apidoc/lxml.etree.html#lxml.etree.CDATA
The simplified version of my code looks like this:
description = ET.SubElement(item, "description")
description.text = CDATA('test')
But when I later try to convert it to string:
xml_str = ET.tostring(self.__root, xml_declaration=True).decode()
I get an exception
cannot serialize <lxml.etree.CDATA object at 0x122c30ef0> (type CDATA)
Could you advise me what am I missing?
Here is a simple example:
import xml.etree.cElementTree as ET
from lxml.etree import CDATA
root = ET.Element('rss')
root.set("version", "2.0")
description = ET.SubElement(root, "description")
description.text = CDATA('test')
xml_str = ET.tostring(root, xml_declaration=True).decode()
print(xml_str)
lxml.etree and xml.etree are two different libraries; you should pick one and stick with it, rather than using both and trying to pass objects created by one to the other.
A working example, using lxml only:
import lxml.etree as ET
from lxml.etree import CDATA
root = ET.Element('rss')
root.set("version", "2.0")
description = ET.SubElement(root, "description")
description.text = CDATA('test')
xml_str = ET.tostring(root, xml_declaration=True).decode()
print(xml_str)
You can run this yourself at https://replit.com/#CharlesDuffy2/JovialMediumLeadership

Restore CDATA during lxml serialization

I know that I can preserve CDATA sections during XML parsing, using the following:
from lxml import etree
parser = etree.XMLParser(strip_cdata=False)
root = etree.XML('<root><![CDATA[test]]></root>', parser)
See APIs specific to lxml.etree
But, is there a simple way to "restore" CDATA section during serialization?
For example, by specifying a list of tag names…
For instance, I want to turn:
<CONFIG>
<BODY>This is a <message>.</BODY>
</CONFIG>
to:
<CONFIG>
<BODY><![CDATA[This is a <message>.]]></BODY>
</CONFIG>
Just by telling that BODY should contains CDATA…
Something like this?
from lxml import etree
parser = etree.XMLParser(strip_cdata=True)
root = etree.XML('<root><x><![CDATA[<test>]]></x></root>', parser)
print etree.tostring(root)
for elem in root.findall('x'):
elem.text = etree.CDATA(elem.text)
print etree.tostring(root)
Produces:
<root><x><test></x></root>
<root><x><![CDATA[<test>]]></x></root>

Adding a blank space in an XML attrib with lxml in Python

from lxml import etree
html = etree.Element("html")
body = etree.SubElement(html, "body")
body.text = "TEXT"
body.set("p style", "color:red")
print(etree.tostring(html))
Gives me the error: ValueError: Invalid attribute name u'p style'
You can't have an attribute with a space in it in XML, which is what lxml and etree are for. The XML specification states what a valid attribute name is here.
If you are trying to achieve this:
<html><body p style="color:red">TEXT</body></html>
You can't do that in XML. You can do something similar in HTML: empty attributes. See the HTML5 specification for details. But you wouldn't use the kind of code written above to get that result.
If you are trying to get the following result (which seems more likely):
<html><body><p style="color:red">TEXT</p></body></html>
Then it is very easy.
from lxml import etree
html = etree.Element("html")
body = etree.SubElement(html, "body")
p = etree.subElement(body, "p")
p.text = "TEXT"
p.set("style", "color:red")
print(etree.tostring(html))

Python - use lxml to return value of title.text attrib

I'm trying to figure out how to use lxml to parse the xml from a url to return the value of the title attribute. Does anyone know what I have wrong or what would return the Title value/text? So in the example below I want to return the value of 'Weeds - S05E05 - Van Nuys - HD TV'
XML from URL:
<?xml version="1.0" encoding="UTF-8"?>
<subsonic-response xmlns="http://subsonic.org/restapi" status="ok" version="1.8.0">
<song id="11345" parent="11287" title="Weeds - S05E05 - Van Nuys - HD TV" album="Season 5" artist="Weeds" isDir="false" created="2009-07-06T22:21:16" duration="1638" bitRate="384" size="782304110" suffix="mkv" contentType="video/x-matroska" isVideo="true" path="Weeds/Season 5/Weeds - S05E05 - Van Nuys - HD TV.mkv" transcodedSuffix="flv" transcodedContentType="video/x-flv"/>
</subsonic-response>
My current Python code:
import lxml
from lxml import html
from urllib2 import urlopen
url = 'https://myurl.com'
tree = html.parse(urlopen(url))
songs = tree.findall('{*}song')
for song in songs:
print song.attrib['title']
With the above code I get no data return, any ideas?
print out of tree =
<lxml.etree._ElementTree object at 0x0000000003348F48>
print out of songs =
[]
First of all, you are not actually using lxml in your code. You import the lxml HTML parser, but otherwise ignore it and just use the standard library xml.etree.ElementTree module instead.
Secondly, you search for data/song but you do not have any data elements in your document, so no matches will be found. And last, but not least, you have a document there that uses namespaces. You'll have to include those when searching for elements, or use a {*} wildcard search.
The following finds songs for you:
from lxml import etree
tree = etree.parse(URL) # lxml can load URLs for you
songs = tree.findall('{*}song')
for song in songs:
print song.attrib['title']
To use an explicit namespace, you'd have to replace the {*} wildcard with the full namespace URL; the default namespace is available in the .nsmap namespace dict on the tree object:
namespace = tree.nsmap[None]
songs = tree.findall('{%s}song' % namespace)
The whole issue is with the fact that the subsonic-response tag has a xmlns attribute indicating that there is an xml namespace in effect. The below code takes that into account and correctly pigs up the song tags.
import xml.etree.ElementTree as ET
root = ET.parse('test.xml').getroot()
print root.findall('{http://subsonic.org/restapi}song')
Thanks for the help guys, I used a combination of both of yours to get it working.
import xml.etree.ElementTree as ET
from urllib2 import urlopen
url = 'https://myurl.com'
root = ET.parse(urlopen(url)).getroot()
for song in root:
print song.attrib['title']

Python Minidom XML Query

I'm trying to query this XML with lxml:
<lista_tareas>
<tarea id="1" realizzato="False" data_limite="12/10/2012" priorita="1">
<description>XML TEST</description>
</tarea>
<tarea id="2" realizzato="False" data_limite="12/10/2012" priorita="1">
<description>XML TEST2</description>
</tarea>
I wrote this code:
from lxml import etree
doc = etree.parse(file_path)
root = etree.Element("lista_tareas")
for x in root:
z = x.Element("tarea")
for y in z:
element_text = y.Element("description").text
print element_text
It doesn't print anything, could you suggest me how to do?
You do not want to use the minidom; use the ElementTree API instead. The DOM API is a very verbose and constrained API, the ElementTree API plays to Python's strengths instead.
The MiniDOM module doesn't offer any query API like you are looking for.
You can use the bundled xml.etree.ElementTree module, or you could install lxml, which offers more powerful XPath and other query options.
import xml.etree.ElementTree as ET
root = ET.parse('document.xml').getroot()
for c in root.findall("./Root_Node[#id='1']/sub_node"):
# Do something with c
Using lxml:
from lxml import etree
doc = etree.parse ( source )
for c in doc.xpath ( "//Root_Node[#id='1']" ):
subnode = c.find ( "sub_node" )
# ... etc ...

Categories

Resources