I want to find all the Placemarks in a kml file:
from lxml import etree
doc = etree.parse(filename)
for elem in doc.findall('<Placemark>'):
print(elem.find("<Placemark>").text)
This doesn't work, i.e. it doesn't find anything, I think because each Placemark is unique in that each has its own id, e.g.:
<Placemark id="ID_09795">
<Placemark id="ID_15356">
<Placemark id="ID_64532">
How do I do this?
Edit: changed code based on #ScottHunter comment:
placemark_list = doc.findall("Placemark")
print ("length:" + str(len(placemark_list)))
for placemark in placemark_list:
print(placemark.text)
length is 0
It's hard to tell without seeing the full file, but try something like this
placemark_list = doc.xpath("//*[local-name()='Placemark']")
print(len(placemark_list))
and see if it works.
Related
I've tried everything to get a XML content but all I've got is a 'None' as return. Could anybody help me?
The code I'm trying is:
import xml.etree.cElementTree as ET
parsedXML = ET.parse("C:\\Users\\denis\\Documents\\Projetos\\NFe\\Arquivos\\33180601279711000100550020001554261733208443-nfeo.xml")
for node in parsedXML.getroot():
email = node.find('cNF')
phone = node.find('natOp')
street = node.find('nNF')
print(email)
Part of the XML (content is bigger than this) is right bellow:
<?xml version="1.0" encoding="ISO-8859-1"?>
<nfeProc xmlns="http://www.portalfiscal.inf.br/nfe" versao="3.10">
<NFe xmlns="http://www.portalfiscal.inf.br/nfe">
<infNFe versao="3.10" Id="NFe33180601279711000100550020001554261733208443">
<ide>
<cUF>33</cUF>
<cNF>73320844</cNF>
<natOp>VENDA DE PRODUCAO DO ESTABELECIMENTO</natOp>
<indPag>1</indPag>
<mod>55</mod>
<serie>2</serie>
<nNF>155426</nNF>
<dhEmi>2018-06-25T16:06:33-03:00</dhEmi>
<dhSaiEnt>2018-06-25T16:06:08-03:00</dhSaiEnt>
<tpNF>1</tpNF>
<idDest>2</idDest>
<cMunFG>3304557</cMunFG>
<tpImp>2</tpImp>
<tpEmis>1</tpEmis>
<cDV>3</cDV>
<tpAmb>1</tpAmb>
<finNFe>1</finNFe>
<indFinal>1</indFinal>
<indPres>9</indPres>
<procEmi>0</procEmi>
<verProc>NeoGrid NFe 1.63.4</verProc>
</ide>
<emit>
I appreciate your help!
You are using an XML document with namespaces, so you need to provide it during you call, as shown in this answer.
Here, we get
namespaces = {'n': 'http://www.portalfiscal.inf.br/nfe'}
root = parsedXML.getroot()
root.find('n:NFe', namespaces)
to return the element, while root.find('NFe') returns None.
Also note that find and findall only search the direct children, not nested children (cf. documentation), which mean that you will have to iter over children (see e.g. here for an example).
I'm trying to rebuild a TEI-XML file with lxml.
The beginning of my file looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="https://www.ssrq-sds-fds.ch/tei/TEI_Schema_SSRQ.rng"
type="application/xml"
schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model href="https://www.ssrq-sds-fds.ch/tei/TEI_Schema_SSRQ.rng"
type="application/xml"
schematypens="http://purl.oclc.org/dsdl/schematron"?>
<?xml-stylesheet type="text/css"
href="https://www.ssrq-sds-fds.ch/tei/Textkritik_Version_tei-ssrq.css"?>
<TEI xmlns:xi="http://www.w3.org/2001/XInclude"
xmlns="http://www.tei-c.org/ns/1.0" n=""
xml:id="[To be generated]" <!-- e.g. StAAG_U-17_0007a --> >
The first four lines should not matter too much in my opinion, but I included them for completeness. My problem starts with the TEI-Element.
So my code to copy this looks like this:
NSMAP = {"xml":"http://www.tei-c.org/ns/1.0",
"xi":"http://www.w3.org/2001/XInclude"}
root = et.Element('TEI', n="", nsmap=NSMAP)
root.attrib["id"] = xml_id
root.attrib["xmlns"] = "http://www.tei-c.org/ns/1.0"
The String xml_id is assigned at some point before and does not matter for my question. So my codes returns me this line:
<TEI xmlns:xi="http://www.w3.org/2001/XInclude"
n=""
id="StAAG_U-17_0006"
xmlns="http://www.tei-c.org/ns/1.0">
So the only thing that is missing is this xml:id attribute. I found this specification page: https://www.w3.org/TR/xml-id/ and I know it is mentioned to be supported in lxml in its FAQ.
Btw, root.attrib["xml:id"] does not work, as it is not a viable attribute name.
So, does anyone know how I can assign my id to an elemnt's xml:id attribute?
You need to specify that id is part of the default xml namespace. Try this:
root.attrib["{http://www.w3.org/XML/1998/namespace}id"] = xml_id
Reference: https://www.w3.org/TR/xml-names/#ns-decl
I have a Test.xml file as:
<?xml version="1.0" encoding="utf-8"?>
<SetupConf>
<LocSetup>
<Src>
<Dir1>C:\User1\test1</Dir1>
<Dir2>C:\User2\log</Dir2>
<Dir3>D:\Users\Checkup</Dir3>
<Dir4>D:\Work1</Dir4>
<Dir5>E:\job1</Dir5>
</Src>
</LocSetup>
</SetupConf>
Where node depends on user input. In "Dir" node it may be 1,2,5,10 dir structure defined. As per requirement I am able to extract data from the Test.xml with help of #Padraic Cunningham using below Python code:
from xml.dom import minidom
from StringIO import StringIO
dom = minidom.parse('Test.xml')
Src = dom.getElementsByTagName('Src')
output = ", ".join([a.childNodes[0].nodeValue for node in Src for a in node.getElementsByTagName('Dir')])
print [output]
And getting the output:
C:\User1\test1, C:\User2\log, D:\Users\Checkup, D:\Work1, E:\job1
But the expected output is:
['C:\\User1\\test1', 'C:\\User2\\log', 'D:\\Users\\Checkup', 'D:\\Work1', 'E:\\job1']
Well it's solved by myself:
from xml.dom import minidom
DOMTree = minidom.parse('Test0001.xml')
dom = DOMTree.documentElement
Src = dom.getElementsByTagName('Src')
for node in Src:
output = [a.childNodes[0].nodeValue for a in node.getElementsByTagName('Dir')]
print output
And getting output:
[u'C:\User1\test1', u'C:\User2\log', u'D:\Users\Checkup', u'D:\Work1', u'E:\job1']
I am sure there is more simple another way .. please let me know.. Thanks in adv.
I've been attempting to parse a list of xml files. I'd like to print specific values such as the userName value.
<?xml version="1.0" encoding="utf-8"?>
<Drives clsid="{8FDDCC1A-0C3C-43cd-A6B4-71A6DF20DA8C}"
disabled="1">
<Drive clsid="{935D1B74-9CB8-4e3c-9914-7DD559B7A417}"
name="S:"
status="S:"
image="2"
changed="2007-07-06 20:57:37"
uid="{4DA4A7E3-F1D8-4FB1-874F-D2F7D16F7065}">
<Properties action="U"
thisDrive="NOCHANGE"
allDrives="NOCHANGE"
userName=""
cpassword=""
path="\\scratch"
label="SCRATCH"
persistent="1"
useLetter="1"
letter="S"/>
</Drive>
</Drives>
My script is working fine collecting a list of xml files etc. However the below function is to print the relevant values. I'm trying to achieve this as suggested in this post. However I'm clearly doing something incorrectly as I'm getting errors suggesting that elm object has no attribute text. Any help would be appreciated.
Current Code
from lxml import etree as ET
def read_files(files):
for fi in files:
doc = ET.parse(fi)
elm = doc.find('userName')
print elm.text
doc.find looks for a tag with the given name. You are looking for an attribute with the given name.
elm.text is giving you an error because doc.find doesn't find any tags, so it returns None, which has no text property.
Read the lxml.etree docs some more, and then try something like this:
doc = ET.parse(fi)
root = doc.getroot()
prop = root.find(".//Properties") # finds the first <Properties> tag anywhere
elm = prop.attrib['userName']
userName is an attribute, not an element. Attributes don't have text nodes attached to them at all.
for el in doc.xpath('//*[#userName]'):
print el.attrib['userName']
You can try to take the element using the tag name and then try to take its attribute (userName is an attribute for Properties):
from lxml import etree as ET
def read_files(files):
for fi in files:
doc = ET.parse(fi)
props = doc.getElementsByTagName('Properties')
elm = props[0].attributes['userName']
print elm.value
I've been trying for a while now to do this.
Basically, i have an XML document in the following format(which contains the information i need - the ID and coordinates of some points):
<root>
<!-- Title element missing here -->
<Table>
<Point>
<ID>Point1</ID>
<latitude>numbers</latitude>
<longitude>numbers</longitude>
</Point>
</Table> <!-- This line should be eliminated -->
<Table> <!-- This line should be eliminated -->
<Point>
<ID>Point2</ID>
<latitude>numbers</latitude>
<longitude>numbers</longitude>
</Point>
</Table>
</root>
What i need to do is to take this document and output it in a different format (like i displayed above, in the original XML file), without changing the original XML file.
I wrote the following code for the above task, but i hit a brick wall, so to speak. I am also rather new to python as well.
from lxml import etree
import xml.etree.ElementTree as ET
doc=etree.parse('test2.xml')
root=doc.getroot()
elements=root.findall(".//Point")
root=ET.Element('root')
title=ET.SubElement(root,'Title')
title.text="Title"
table=ET.SubElement(root,'Table')
for element in elements:
point=ET.SubElement(table,'Point')
elem=ET.SubElement(point,'ID')
elem.text="Name"
elem2=ET.SubElement(point,'latitude')
elem2.text="coords"
elem3=ET.SubElement(point,'longitude')
elem3.text="coords"
ET.dump(root) # using ET.dump just to display the output in the python SHELL
The code above gives me the following output in SHELL, which is what i need.
<root>
<Title>Title</Title>
<Table>
<Point>
<ID>Name</ID>
<latitude>coords</latitude>
<longitude>coords</longitude>
</Point>
<Point>
<ID>Name</ID>
<latitude>coords</latitude>
<longitude>coords</longitude>
</Point>
</Table>
</root>
My problem comes when i have to take the values of ID,latitude and longitude from the original XML file and writing the whole new document in a new XML file, with pretty_print as well, for easier reading. I simply can't fingure it out. Some tips would be greatly appreciated.
If you just want to copy the Point element from the original XML, you can just do:
from copy import deepcopy
for element in elements:
table.append(deepcopy(element))
If you want to manipulate the values in some way, you can iterate over the element:
point=ET.SubElement(table,'Point')
for subelement in element:
elem = ET.SubElement(point, subelement.tag)
if elem.tag == 'ID':
elem.text = dowhatyouwantwith(subelement.text)
elif ....
Also, do you really need to use both lxml.etree and xml.etree at the same time? Why don't you just pick one of them and stick to it?
You can use the ET.write() and xml.dom.minidom to achieve what you want.
(Considering we do not use lxml and use only standard Python's ElementTree)
Just extending your code:
import xml.etree.ElementTree as ET
import xml.dom.minidom
doc=ET.parse('test2.xml')
root=doc.getroot()
elements=root.findall(".//Point")
root=ET.Element('root')
title=ET.SubElement(root,'Title')
title.text="Title"
table=ET.SubElement(root,'Table')
for element in elements:
point=ET.SubElement(table,'Point')
elem=ET.SubElement(point,'ID')
elem.text="Name"
elem2=ET.SubElement(point,'latitude')
elem2.text="coords"
elem3=ET.SubElement(point,'longitude')
elem3.text="coords"
ET.dump(root) # using ET.dump just to display the output in the python SHELL
tree = ET.ElementTree(root)
tree.write('test3.xml') # This is enough but not yet pretty-print
# Using xml.dom.minidom to parse the non-pretty file to make it pretty
a = xml.dom.minidom.parse('test3.xml')
pretty_xml_as_string = a.toprettyxml()
with open('test3.xml', 'w') as f:
f.write(pretty_xml_as_string) # Write again in pretty-print format