I have the following XML.
I am using ElementTree library to scrape the values.
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc> Test1</loc>
</url>
<url>
<loc>Test 2</loc>
</url>
<url>
<loc>Test 3</loc>
</url>
</urlset>
I need to get the values out of 'loc tag'.
Desired Output:
Test 1
Test 2
Test 3
Tried Code:
tree = ET.parse('sitemap.xml')
root = tree.getroot()
for atype in root.findall('url'):
rank = atype.find('loc').text
print (rank)
Any suggestions on where am I wrong ?
Your XML has a default namespace (http://www.sitemaps.org/schemas/sitemap/0.9) so you either have to address all your tags as:
tree = ET.parse('sitemap.xml')
root = tree.getroot()
for atype in root.findall('{http://www.sitemaps.org/schemas/sitemap/0.9}url'):
rank = atype.find('{http://www.sitemaps.org/schemas/sitemap/0.9}loc').text
print(rank)
Or to define a namespace map:
nsmap = {"ns": "http://www.sitemaps.org/schemas/sitemap/0.9"}
tree = ET.parse('sitemap.xml')
root = tree.getroot()
for atype in root.findall('ns:url', nsmap):
rank = atype.find('ns:loc', nsmap).text
print(rank)
from lxml import etree
tree = etree.parse('sitemap.xml')
for element in tree.iter('*'):
if element.text.find('Test') != -1:
print element.text
Probably isn't the most beautiful solution, but it works :)
Related
I have an issue while creating an XML-file for a SOAP Call
I am using the following code to create the XML:
from lxml import etree as ET
SOAP_NS = "URL"
ENCODE_NS = "URL2/soap-encoding"
ns_map = {'soap' : SOAP_NS, 'encodingStyle' : ENCODE_NS}
root = ET.Element(ET.QName(SOAP_NS, 'Envelope'), nsmap=ns_map)
body = ET.SubElement(root, ET.QName(SOAP_NS, 'Body'), nsmap=ns_map)
Data = ET.SubElement(body, 'Data')
Data.text="1234"
Data.set('type','import')
xml_file = ET.ElementTree(root)
xml_file.write('Test.xml', pretty_print=True)
Thus I get the following XML-file:
<soap:Envelope xmlns:soap="URL1" xmlns:encodingStyle="URL2/soap-encoding">
<soap:Body>
<Data type="import">1234</Data>
</soap:Body>
</soap:Envelope>
The first line of the XML-file I need to create has to be like that
<soap:Envelope xmlns:soap="URL1" soap:encodingStyle="URL2/soap-encoding">
<soap:Body>
<Data type="import">1234</Data>
</soap:Body>
</soap:Envelope>
How do I change the prefix/namespace of URL 2 from xmlns:encodingStyle to soap:encodingStyle or if my approach is wrong how do I add soap:encodingStyle to the envolope?
Thanks in advance
soap:encodingStyle is an attribute bound to a namespace. Add it using the set() method.
from lxml import etree as ET
SOAP_NS = "URL"
ENCODE_NS = "URL2/soap-encoding"
ns_map = {'soap' : SOAP_NS}
root = ET.Element(ET.QName(SOAP_NS, 'Envelope'), nsmap=ns_map)
root.set(ET.QName(SOAP_NS, "encodingStyle"), ENCODE_NS)
body = ET.SubElement(root, ET.QName(SOAP_NS, 'Body'), nsmap=ns_map)
Data = ET.SubElement(body, 'Data')
Data.text="1234"
Data.set('type','import')
xml_file = ET.ElementTree(root)
xml_file.write('Test.xml', pretty_print=True)
I cant get my mind around this nor working properly:
data='''<?xml version="1.0" encoding="UTF-8"?>\n<div type="docs" xml:base="/kime-api/prod/api/emi/2" xml:lang="ja" xml:id="39532e30"> <div n="0001" type="doc" xml:id="_5738d00002"></div></div>'''
parser = etree.XMLParser(resolve_entities=False, strip_cdata=False, recover=True, ns_clean=True)
# I tried with and without this following line
#data = data.replace('<?xml version="1.0" encoding="UTF-8"?>','')
XML_tree = etree.fromstring(data.encode() , parser=parser)
lang = XML_tree.xpath('.//div[#xml:lang]')
lang
lang is an empty list and there is ONE element like: xml:lang="ja" in the XML.
What am I doing wrong please?
You could just do xpath(#xml:lang).
XML_tree = etree.fromstring(data.encode() , parser=parser)
lang = XML_tree.xpath('#xml:lang')
print(lang)
Output:
['ja']
XML_tree represents the root element (the <div> with an xml:lang attribute).
If you want to get the language, use the following:
lang = XML_tree.xpath('#xml:lang')
I want to create this xml, but I don't know how to create the subelement IsAddSegments with the namespace:
<?xml version="1.0" encoding="UTF-8"?>
<soapenv:Envelope xmlns:soapenv="http://www.w3.org/2003/05/soap-envelope">
<soapenv:Body>
<ISAddSegments xmlns="http://www.blue-order.com/ma/integrationservicews/api">
<accessKey>key</accessKey>
<objectID>
<guid>guid</guid>
</objectID>
<StratumName>STRATUM</StratumName>
<Segments>
<Segment>
<Begin>00:00:00:00</Begin>
<Content>TEXT</Content>
<End>00:00:10:00</End>
</Segment>
</Segments>
</ISAddSegments>
</soapenv:Body>
</soapenv:Envelope>
This is what I have:
import xml.etree.ElementTree as ET
Envelope = ET.Element("{http://www.w3.org/2003/05/soap-envelope}Envelope")
Body = ET.SubElement(Envelope, '{http://www.w3.org/2003/05/soap-envelope}Body')
ET.register_namespace('soapenv', 'http://www.w3.org/2003/05/soap-envelope')
ISAddSegments = ET.SubElement(Body, '{http://www.blue-order.com/ma/integrationservicews/api}ISAddSegments')
...
But this creates an extra namespace in the main element and that's not what I need.
<?xml version='1.0' encoding='utf-8'?>
<soapenv:Envelope xmlns:ns1="http://www.blue-order.com/ma/integrationservicews/api" xmlns:soapenv="http://www.w3.org/2003/05/soap-envelope">
<soapenv:Body>
<ns1:ISAddSegments>
I solved it with lxml:
from lxml import etree as etree
ns1 = 'http://www.w3.org/2003/05/soap-envelope'
ns2 = 'http://www.blue-order.com/ma/integrationservicews/api'
Envelope = etree.Element('{ns1}Envelope', nsmap = {'soapenv': ns1})
Body = etree.SubElement(Envelope, '{ns1}Body')
ISAddSegments = etree.SubElement(Body, 'ISAddSegments', nsmap = {None : ns2})
accessKey = etree.SubElement(ISAddSegments, 'accessKey')
...
Consider using a dedicated SOAP module such as suds. Then you can create a custom namespace by providing ns to Element. The value should be a tuple containing the namespace name and a url in which it's defined:
from suds.sax.element import Element
custom_namespace = ('custom_namespace', 'http://url/namespace.xsd')
element_with_custom_namespace = Element('Element', ns=custom_namespace)
print(element_with_custom_namespace)
# <custom_namespace:Element xmlns:custom_namespace="http://url/namespace.xsd"/>
Hello I am making a requests call to return order data from a online store. My issue is that once I have passed my data to a root variable the method iter is not returning the correct results. e.g. Display multiple tags of the same name rather than one and not showing the data within the tag.
I thought this was due to the XML not being correctly formatted so I formatted it by saving it to a file using pretty_print but that hasn't fixed the error.
How do I fix this? - Thanks in advance
Code:
import requests, xml.etree.ElementTree as ET, lxml.etree as etree
url="http://publicapi.ekmpowershop24.com/v1.1/publicapi.asmx"
headers = {'content-type': 'application/soap+xml'}
body = """<?xml version="1.0" encoding="utf-8"?>
<soap12:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap12="http://www.w3.org/2003/05/soap-envelope">
<soap12:Body>
<GetOrders xmlns="http://publicapi.ekmpowershop.com/">
<GetOrdersRequest>
<APIKey>my_api_key</APIKey>
<FromDate>01/07/2018</FromDate>
<ToDate>04/07/2018</ToDate>
</GetOrdersRequest>
</GetOrders>
</soap12:Body>
</soap12:Envelope>"""
#send request to ekm
r = requests.post(url,data=body,headers=headers)
#save output to file
file = open("C:/Users/Mark/Desktop/test.xml", "w")
file.write(r.text)
file.close()
#take the file and format the xml
x = etree.parse("C:/Users/Mark/Desktop/test.xml")
newString = etree.tostring(x, pretty_print=True)
file = open("C:/Users/Mark/Desktop/test.xml", "w")
file.write(newString.decode('utf-8'))
file.close()
#parse the file to get the roots
tree = ET.parse("C:/Users/Mark/Desktop/test.xml")
root = tree.getroot()
#access elements names in the data
for child in root.iter('*'):
print(child.tag)
#show orders elements attributes
tree = ET.parse("C:/Users/Mark/Desktop/test.xml")
root = tree.getroot()
for order in root.iter('{http://publicapi.ekmpowershop.com/}Order'):
out = {}
for child in order:
if child.tag in ('OrderID'):
out[child.tag] = child.text
print(out)
Elements output:
{http://publicapi.ekmpowershop.com/}Orders
{http://publicapi.ekmpowershop.com/}Order
{http://publicapi.ekmpowershop.com/}OrderID
{http://publicapi.ekmpowershop.com/}OrderNumber
{http://publicapi.ekmpowershop.com/}CustomerID
{http://publicapi.ekmpowershop.com/}CustomerUserID
{http://publicapi.ekmpowershop.com/}Order
{http://publicapi.ekmpowershop.com/}OrderID
{http://publicapi.ekmpowershop.com/}OrderNumber
{http://publicapi.ekmpowershop.com/}CustomerID
{http://publicapi.ekmpowershop.com/}CustomerUserID
Orders Output:
{http://publicapi.ekmpowershop.com/}Order {}
{http://publicapi.ekmpowershop.com/}Order {}
XML Structure after formating:
<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<GetOrdersResponse xmlns="http://publicapi.ekmpowershop.com/">
<GetOrdersResult>
<Status>Success</Status>
<Errors/>
<Date>2018-07-10T13:47:00.1682029+01:00</Date>
<TotalOrders>10</TotalOrders>
<TotalCost>100</TotalCost>
<Orders>
<Order>
<OrderID>100</OrderID>
<OrderNumber>102/040718/67</OrderNumber>
<CustomerID>6910</CustomerID>
<CustomerUserID>204</CustomerUserID>
<FirstName>TestFirst</FirstName>
<LastName>TestLast</LastName>
<CompanyName>Test Company</CompanyName>
<EmailAddress>test#Test.com</EmailAddress>
<OrderStatus>Dispatched</OrderStatus>
<OrderStatusColour>#00CC00</OrderStatusColour>
<TotalCost>85.8</TotalCost>
<OrderDate>10/07/2018 14:30:43</OrderDate>
<OrderDateISO>2018-07-10T14:30:43</OrderDateISO>
<AbandonedOrder>false</AbandonedOrder>
<EkmStatus>SUCCESS</EkmStatus>
</Order>
</Orders>
<Currency>GBP</Currency>
</GetOrdersResult>
</GetOrdersResponse>
</soap:Body>
</soap:Envelope>
You need to consider the namespace when checking for tags.
>>> # Include the namespace part of the tag in the tag values that we check.
>>> tags = ('{http://publicapi.ekmpowershop.com/}OrderID', '{http://publicapi.ekmpowershop.com/}OrderNumber')
>>> for order in root.iter('{http://publicapi.ekmpowershop.com/}Order'):
... out = {}
... for child in order:
... if child.tag in tags:
... out[child.tag] = child.text
... print(out)
...
{'{http://publicapi.ekmpowershop.com/}OrderID': '100', '{http://publicapi.ekmpowershop.com/}OrderNumber': '102/040718/67'}
If you don't want the namespace prefixes in the output, you can strip them by only including that part of the tag after the } character.
>>> for order in root.iter('{http://publicapi.ekmpowershop.com/}Order'):
... out = {}
... for child in order:
... if child.tag in tags:
... out[child.tag[child.tag.index('}')+1:]] = child.text
... print(out)
...
{'OrderID': '100', 'OrderNumber': '102/040718/67'}
Here is a little xml example:
<?xml version="1.0" encoding="UTF-8"?>
<list>
<person id="1">
<name>Smith</name>
<city>New York</city>
</person>
<person id="2">
<name>Pitt</name>
</person>
...
...
</list>
Now I need all Persons with a name and city.
I tried:
#!/usr/bin/python
# coding: utf8
import xml.dom.minidom as dom
tree = dom.parse("test.xml")
for listItems in tree.firstChild.childNodes:
for personItems in listItems.childNodes:
if personItems.nodeName == "name" and personItems.nextSibling == "city":
print personItems.firstChild.data.strip()
But the ouput is empty. Without the "and" condition I become all names. How can I check that the next tag after "name" is "city"?
You can do this in minidom:
import xml.dom.minidom as minidom
def getChild(n,v):
for child in n.childNodes:
if child.localName==v:
yield child
xmldoc = minidom.parse('test.xml')
person = getChild(xmldoc, 'list')
for p in person:
for v in getChild(p,'person'):
attr = v.getAttributeNode('id')
if attr:
print attr.nodeValue.strip()
This prints id of person nodes:
1
2
use element tree check this element tree
import xml.etree.ElementTree as ET
tree = ET.parse('a.xml')
root = tree.getroot()
for person in root.findall('person'):
name = person.find('name').text
try:
city = person.find('city').text
except:
continue
print name, city
for id u can get it by id= person.get('id')
output:Smith New York
Using lxml, you can use xpath to get in one step what you need:
from lxml import etree
xmlstr = """
<list>
<person id="1">
<name>Smith</name>
<city>New York</city>
</person>
<person id="2">
<name>Pitt</name>
</person>
</list>
"""
xml = etree.fromstring(xmlstr)
xp = "//person[city]"
for person in xml.xpath(xp):
print etree.tostring(person)
lxml is external python package, but is so useful, that to me it is always worth to install.
xpath is searching for any (//) element person having (declared by content of []) subelement city.