How to process xml response from flickr - python

import flickrapi
from xml.etree import ElementTree as ET
from lxml import etree
flickr = flickrapi.FlickrAPI(api_key,secret=api_secret)
r = flickr.photos_search(tags='e-waste', has_geo="1", per_page='100')
tree = ET.ElementTree(r)
xml_input = etree.parse("response_clean.xml")
transform = etree.XSLT(xslt_root)
links = str(transform(xml_input))
The idea of this little script is to get xml response from Flickr, and then use xsl file to process it further.
I want to convert r object (which is of type lxml.etree._Element)
to xml_input (of type lxml.etree._ElementTree).
I used tree = ET.ElementTree(r) but result is of type xml.etree.ElementTree.ElementTree.
I see that this is not exactly the same, but I don't understand the difference.
How should r be converted to xml_input ?

The code creates xml.etree.ElementTree.ElementTree because ET in the corresponding import statement references xml.etree.ElementTree. You should've used etree.ElementTree instead, which was imported from lxml :
>>> from xml.etree import ElementTree as ET
>>> from lxml import etree
>>> raw ='''<root></root>'''
>>> r = etree.fromstring(raw)
>>> root = etree.ElementTree(r)
>>> type(r)
<type 'lxml.etree._Element'>
>>> type(root)
<type 'lxml.etree._ElementTree'>

Related

lxml create CDATA element

I am trying to create CDATA element as per https://lxml.de/apidoc/lxml.etree.html#lxml.etree.CDATA
The simplified version of my code looks like this:
description = ET.SubElement(item, "description")
description.text = CDATA('test')
But when I later try to convert it to string:
xml_str = ET.tostring(self.__root, xml_declaration=True).decode()
I get an exception
cannot serialize <lxml.etree.CDATA object at 0x122c30ef0> (type CDATA)
Could you advise me what am I missing?
Here is a simple example:
import xml.etree.cElementTree as ET
from lxml.etree import CDATA
root = ET.Element('rss')
root.set("version", "2.0")
description = ET.SubElement(root, "description")
description.text = CDATA('test')
xml_str = ET.tostring(root, xml_declaration=True).decode()
print(xml_str)
lxml.etree and xml.etree are two different libraries; you should pick one and stick with it, rather than using both and trying to pass objects created by one to the other.
A working example, using lxml only:
import lxml.etree as ET
from lxml.etree import CDATA
root = ET.Element('rss')
root.set("version", "2.0")
description = ET.SubElement(root, "description")
description.text = CDATA('test')
xml_str = ET.tostring(root, xml_declaration=True).decode()
print(xml_str)
You can run this yourself at https://replit.com/#CharlesDuffy2/JovialMediumLeadership

Parsing of xml in Python

I am having issue parsing an xml result using python. I tried using etree.Element(text), but the error says Invalid tag name. Does anyone know if this is actually an xml and any way of parsing the result using a standard package? Thank you!
import requests, sys, json
from lxml import etree
response = requests.get("https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=1593319917&report=XML")
text=response.text
print(text)
<?xml version="1.0" ?>
<ExchangeSet xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance" xmlns="https://www.ncbi.nlm.nih.gov/SNP/docsum" xsi:schemaLocation="https://www.ncbi.nlm.nih.gov/SNP/docsum ftp://ftp.ncbi.nlm.nih.gov/snp/specs/docsum_eutils.xsd" ><DocumentSummary uid="1593319917"><SNP_ID>1593319917</SNP_ID><ALLELE_ORIGIN/><GLOBAL_MAFS><MAF><STUDY>SGDP_PRJ</STUDY><FREQ>G=0.5/1</FREQ></MAF></GLOBAL_MAFS><GLOBAL_POPULATION/><GLOBAL_SAMPLESIZE>0</GLOBAL_SAMPLESIZE><SUSPECTED/><CLINICAL_SIGNIFICANCE/><GENES><GENE_E><NAME>FLT3</NAME><GENE_ID>2322</GENE_ID></GENE_E></GENES><ACC>NC_000013.11</ACC><CHR>13</CHR><HANDLE>SGDP_PRJ</HANDLE><SPDI>NC_000013.11:28102567:G:A</SPDI><FXN_CLASS>upstream_transcript_variant</FXN_CLASS><VALIDATED>by-frequency</VALIDATED><DOCSUM>HGVS=NC_000013.11:g.28102568G>A,NC_000013.10:g.28676705G>A,NG_007066.1:g.3001C>T|SEQ=[G/A]|LEN=1|GENE=FLT3:2322</DOCSUM><TAX_ID>9606</TAX_ID><ORIG_BUILD>154</ORIG_BUILD><UPD_BUILD>154</UPD_BUILD><CREATEDATE>2020/04/27 06:19</CREATEDATE><UPDATEDATE>2020/04/27 06:19</UPDATEDATE><SS>3879653181</SS><ALLELE>R</ALLELE><SNP_CLASS>snv</SNP_CLASS><CHRPOS>13:28102568</CHRPOS><CHRPOS_PREV_ASSM>13:28676705</CHRPOS_PREV_ASSM><TEXT/><SNP_ID_SORT>1593319917</SNP_ID_SORT><CLINICAL_SORT>0</CLINICAL_SORT><CITED_SORT/><CHRPOS_SORT>0028102568</CHRPOS_SORT><MERGED_SORT>0</MERGED_SORT></DocumentSummary>
</ExchangeSet>
You're using the wrong method to parse your XML. The etree.Element
class is for creating a single XML element. For example:
>>> a = etree.Element('a')
>>> a
<Element a at 0x7f8c9040e180>
>>> etree.tostring(a)
b'<a/>'
As Jayvee has pointed how, to parse XML contained in a string you use
the etree.fromstring method (to parse XML content in a file you
would use the etree.parse method):
>>> response = requests.get("https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=1593319917&report=XML")
>>> doc = etree.fromstring(response.text)
>>> doc
<Element {https://www.ncbi.nlm.nih.gov/SNP/docsum}ExchangeSet at 0x7f8c9040e180>
>>>
Note that because this XML document sets a default namespace, you'll
need properly set namespaces when looking for elements. E.g., this
will fail:
>>> doc.find('DocumentSummary')
>>>
But this works:
>>> doc.find('docsum:DocumentSummary', {'docsum': 'https://www.ncbi.nlm.nih.gov/SNP/docsum'})
<Element {https://www.ncbi.nlm.nih.gov/SNP/docsum}DocumentSummary at 0x7f8c8e987200>
You can check if the xml is well formed by try converting it:
import requests, sys, json
from lxml import etree
response = requests.get("https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=1593319917&report=XML")
text=response.text
try:
doc=etree.fromstring(text)
print("valid")
except:
print("not a valid xml")

How to get value from XML file?

I have that xml file, and I need only to get value from steamID64 (76561198875082603).
<profile>
<steamID64>76561198875082603</steamID64>
<steamID>...</steamID>
<onlineState>online</onlineState>
<stateMessage>...</stateMessage>
<privacyState>public</privacyState>
<visibilityState>3</visibilityState>
<avatarIcon>...</avatarIcon>
<avatarMedium>...</avatarMedium>
<avatarFull>...</avatarFull>
<vacBanned>0</vacBanned>
<tradeBanState>None</tradeBanState>
<isLimitedAccount>0</isLimitedAccount>
<customURL>...</customURL>
<memberSince>December 8, 2018</memberSince>
<steamRating/>
<hoursPlayed2Wk>0.0</hoursPlayed2Wk>
<headline>...</headline>
<location>...</location>
<realname>
<![CDATA[ THEMakci7m87 ]]>
</realname>
<summary>...</summary>
<mostPlayedGames>...</mostPlayedGames>
<groups>...</groups>
</profile>
Now I have only that code:
xml_url = f'{url}?xml=1'
then I don't know what to do.
It's fairly simple with lxml:
import lxml.html as lh
steam = """your html above"""
doc = lh.fromstring(steam)
doc.xpath('//steamid64/text()')
Output:
['76561198875082603']
Edit:
With the actual url, it's clear that the underlying data is xml; so the better way to do it is:
import requests
from lxml import etree
url = 'https://steamcommunity.com/id/themakci7m87/?xml=1'
req = requests.get(url)
doc = etree.XML(req.text.encode())
doc.xpath('//steamID64/text()')
Same output.
You better use builtin XML lib named ElementTree
lxml is an external XML lib that requires a separate installation.
See below
import requests
import xml.etree.ElementTree as ET
r = requests.get('https://steamcommunity.com/id/themakci7m87/?xml=1')
if r.status_code == 200:
root = ET.fromstring(r.text)
steam_id_64 = root.find('./steamID64').text
print(steam_id_64)
else:
print('Failed to read data.')
output:
76561198875082603

XML Not Parsing in Python 2.7 with ElementTree

I have the following XML file which I get from REST API
<?xml version="1.0" encoding="utf-8"?>
<boxes>
<home id="1" name="productname"/>
<server>111.111.111.111</server>
<approved>yes</approved>
<creation>2007 handmade</creation>
<description>E-Commerce, buying and selling both attested</description>
<boxtype>
<sizes>large, medium, small</sizes>
<vendor>Some Organization</vendor>
<version>ANY</version>
</boxtype>
<method>Handmade, Handcrafted</method>
<time>2014</time>
</boxes>
I am able to get the above output, store in a string variable and print in console,
but when I send this to xml ElementTree
import base64
import urllib2
from xml.dom.minidom import Node, Document, parseString
from xml.etree import ElementTree as ET
from xml.etree.ElementTree import XML, fromstring, tostring
print outputxml ##Printing xml correctly, outputxml contains xml above
content = ET.fromstring(outputxml)
boxes = content.find('boxes')
print boxes
boxtype = boxes.find("boxes/boxtype")
If I print the boxes it is giving me None and hence is giving me below error
boxtype = boxes.find("boxes/boxtype")
AttributeError: 'NoneType' object has no attribute 'find'
The root level node is boxes, and it cannot find boxes within itself.
boxtype = content.find("boxtype")
should be sufficient.
DEMO:
>>> import base64
>>> import urllib2
>>> from xml.dom.minidom import Node, Document, parseString
>>> from xml.etree import ElementTree as ET
>>> from xml.etree.ElementTree import XML, fromstring, tostring
>>>
>>> print outputxml ##Printing xml correctly, outputxml contains xml above
<?xml version="1.0" encoding="utf-8"?>
<boxes>
<home id="1" name="productname"/>
<server>111.111.111.111</server>
<approved>yes</approved>
<creation>2007 handmade</creation>
<description>E-Commerce, buying and selling both attested</description>
<boxtype>
<sizes>large, medium, small</sizes>
<vendor>Some Organization</vendor>
<version>ANY</version>
</boxtype>
<method>Handmade, Handcrafted</method>
<time>2014</time>
</boxes>
>>> content = ET.fromstring(outputxml)
>>> boxes = content.find('boxes')
>>> print boxes
None
>>>
>>> boxes
>>> content #note that the content is the root level node - boxes
<Element 'boxes' at 0x1075a9250>
>>> content.find('boxtype')
<Element 'boxtype' at 0x1075a93d0>
>>>

_ElementInterface instance has no attribute 'tostring'

The code below generates this error. I can't figure out why. If ElementTree has parse, why doesn't it have tostring? http://docs.python.org/library/xml.etree.elementtree.html#xml.etree.ElementTree.ElementTree
from xml.etree.ElementTree import ElementTree
...
tree = ElementTree()
node = ElementTree()
node = tree.parse(open("my_xml.xml"))
text = node.tostring()
tostring is a method of the xml.etree.ElementTree module, not the confusingly similarly-named xml.etree.ElementTree.ElementTree class.
from xml.etree.ElementTree import ElementTree
from xml.etree.ElementTree import tostring
tree = ElementTree()
node = tree.parse(open("my_xml.xml"))
text = tostring(node)
tostring() is actually a function of the ElementTree module not a method of the ElementTree wrapper class.
>>> import xml.etree.ElementTree as ET
>>> x = ET.fromstring('<xml><one>one</one></xml>')
>>> x
<Element xml at 7f749572f710>
>>> ET.tostring(x)
'<xml><one>one</one></xml>'
The docs you've linked to do not support the existence of a ElementTree.tostring() method.
Also, your call to tree.parse() rebinds node.

Categories

Resources