Reading an xml response and printing a required data in python - python

I have got an xml data as a output for my code. And Now I wanted to get an element value from the obtained xml data.
I have used following commands
data1 = r1.read()
dom = xml.dom.minidom.parseString(data1)
conference=dom.getElementsByTagName('totalResults')
print conference.node value
But I was unable get the value.
My xml code will be
<first:totalresults>100</first:totalresults>
and so on
So now I want the value 100 to be printed
So can any one help me in solving this. I have been trying for this since last night please any one kindly help me.

I'd recommend you'd use etree for an easier XML parsing :
from lxml import etree
myFile = open("file.xml", 'r')
tree = etree.parse(myFile)
data = tree.xpath('//ns:totalresults', namespaces={'ns': 'http://api.com'})
print data

Related

How to extract some text from json file without loading it?

python lxml can be used to extract text (e.g., with xpath) from XML files without having to fully parse XML. For example, I can do the following which is faster than BeautifulSoup, especially for large input. I'd like to have some equivalent code for JSON.
from lxml import etree
tree = etree.XML('<foo><bar>abc</bar></foo>')
print type(tree)
r = tree.xpath('/foo/bar')
print [x.tag for x in r]
I see http://goessner.net/articles/JsonPath/. But I don't see an example python code to extract some text from a json file without having use json.load(). Could anybody show me an example? Thanks.
I'm assuming you don't want to load the entire JSON for performance reasons.
If that's the case, perhaps ijson is what you need. I used it to search huge JSON files (>8gb) and it works well.
However, you will have to implement the search code yourself.

Scraping certain parts of a website [Python]

Let's say we have a website www.example.com
and I need 5 certain elements from the website, I have found every element and declared them using BeautifulSoup.
g_data1 = soup.find_all("td", {"class": "title"})
for item in g_data1:
try
print item.****[3].text
except:
pass
Now I have to save this information in a CSV file named ****.csv
This is my code for trying to save it in the CSV file:
def save_csv(f, tvseries):
'''
Output a CSV file containing highest ranking TV-series.
'''
import urllib2
url = *example url*
response = urllib2.urlopen(url)
with open('****.csv', 'w') as f:
f.write(response.read())
Im getting the entire html website.. because i've obviously declared it to grab the url but can someone explain me a different kind of approach, because I don't really understand how to :L
with kind regards,
1337
You should be using Python's csv module.
Specifically the CSVWriter.
Take the text items you grabbed using BeautifulSoup and write them into the CSV file.

Parsing XML from API response

I've been trying for some hours to grab the response from the imgur API. I got the XML in the terminal, but I don't know how to grab it and parse it. Here's my code.
c = pycurl.Curl()
values = [
("key", "Super Secret API Number"),
("image", (c.FORM_FILE, "pic.jpg"))]
c.setopt(c.URL, "http://api.imgur.com/2/upload.xml")
c.setopt(c.HTTPPOST, values)
c.perform()
c.close()
I'm a big noob with python, this is my first time. Python virgin. I read that you can parse the xml with ElementTree, but I can't find any cool documentation.
Hope you can help me. Thanks.
Store the response from imgur-api into a file.Than need to use a xml parser to parse the xml response/file you are getting from Imgur-API.
There are lots of option available like lxml or BeautifulSoup.
Here is an example of how to use lxml with XPath expressions.
from lxml import etree
xml = """<foo>baz!</foo>"""
>>> xml = """<foo>baz!</foo>"""
>>> xp = etree.fromstring(xml)
>>> values = xp.xpath("//foo/text()")
>>> values
['baz!']
If you need to parse a xml file:
# parse from file
et = etree.parse(source_xml)
value = et.xpath("your xpath xpr here")
If you need to parse directly from url
# parse from URL
etree.parse("http://example.com/somefile.xml")
For, XPath use firefox's firebug extension or install firepath
When I started using the included ElementTree module I found the documentation lacking good examples (currently there are only 3, and only one of those shows anything immediately practical).
I've answered a couple of questions here on SO related to lxml/ElementTree, and I usually see people getting stuck trying to write these weird list comprehensions to deal with something XPath handles in one line much more clearly:
Parsing lxml.etree._Element contents
lxml classic: Get text content except for that of nested tags?
If you have a more specific question, please post some source XML and desired effect.
I hope this helps,

In Python - Parsing a response xml and finding a specific text vaule

I'm new to python and I'm having a particularly difficult time working with xml and python. The situation I have is this, I'm trying to count the number of times a word appears in an xml document. Simple enough, but the xml document is a response from a server. Is it possible to do this without writing to a file? It would be great trying to do it from memory.
Here is a sample xml code:
<xml>
<title>Info</title>
<foo>aldfj</foo>
<data>Text I want to count</data>
</xml>
Here is what I have in python
import urllib2
import StringIO
import xml.dom.minidom
from xml.etree.ElementTree import parse
usock = urllib.urlopen('http://www.example.com/file.xml')
xmldoc = minidom.parse(usock)
print xmldoc.toxml()
Past This point I have tried using StringIO, ElementTree, and minidom to no success and I have gotten to a point where I'm not sure what else to do.
Any help would be greatly appreciated
It's quite simple, as far as I can tell:
import urllib2
from xml.dom import minidom
usock = urllib2.urlopen('http://www.example.com/file.xml')
xmldoc = minidom.parse(usock)
for element in xmldoc.getElementsByTagName('data'):
print element.firstChild.nodeValue
So to count the occurrences of a string, try this (a bit condensed, but I like one-liners):
count = sum(element.firstChild.nodeValue.find('substring') for element in xmldoc.getElementsByTagName('data'))
If you are just trying to count the number of times a word appears in an XML document, just read the document as a string and do a count:
import urllib2
data = urllib2.urlopen('http://www.example.com/file.xml').read()
print data.count('foobar')
Otherwise, you can just iterate through the tags you are looking for:
from xml.etree import cElementTree as ET
xml = ET.fromstring(urllib2.urlopen('http://www.example.com/file.xml').read())
for data in xml.getiterator('data'):
# do something with
data.text
Does this help ...
from xml.etree.ElementTree import XML
txt = """<xml>
<title>Info</title>
<foo>aldfj</foo>
<data>Text I want to count</data>
</xml>"""
# this will give us the contents of the data tag.
data = XML(txt).find("data").text
# ... so here we could do whatever we want
print data
Just replace the string 'count' with whatever word you want to count. If you want to count phrases, then you'll have to adapt this code as this is for word counting. But anyway, the answer to how to get at all the embedded text is XML('<your xml string here>').itertext()
from xml.etree.ElementTree import XML
from re import findall
txt = """<xml>
<title>Info</title>
<foo>aldfj</foo>
<data>Text I want to count</data>
</xml>"""
sum([len(filter(lambda w: w == 'count', findall('\w+', t))) for t in XML(txt).itertext()])

Editing all text in childNodes of XML file with Python

I'm trying to edit the text inside of all of the tags named "Volume" in an XML file by multiplying that text by a number entered by the user. The text inside of the "Volume" tag will always be a number. My code works so far, but only on the first instance of the "Volume" text.
Here's an example of the XML:
<blah>
<moreblah> sometext </moreblah> ;
<blah2>
<blah3> <blah4> 30 </blah4> <Volume> 15 </Volume> </blah3>
</blah2>
</blah>
<blah>
<moreblah> sometext </moreblah> ;
<blah2>
<blah3> <blah4> 30 </blah4> <Volume> 25 </Volume> </blah3>
</blah2>
</blah>
And here's my Python code:
#import modules
import xml.dom.minidom
from xml.dom.minidom import parse
import os
import fileinput
#create a backup of original file
new_file_name = 'blah.xml'
old_file_name = new_file_name + "_old"
os.rename(new_file_name, old_file_name)
#find all instances of "Volume"
doc = parse(old_file_name)
volume = doc.getElementsByTagName('Volume')[0]
child = volume.childNodes[0]
txt = child.nodeValue
#ask for percentage input
print
percentage = raw_input("Set Volume Percentage (1 - 100): ")
if percentage.isdigit():
if int(percentage) <101 >1:
print 'Thank You'
#append text of <Volume> tag
child.nodeValue = str(int(float(txt) * (int(percentage)/100.0)))
#persist changes to new file
xml_file = open(new_file_name, "w")
doc.writexml(xml_file)
xml_file.close()
#remove XML Declaration
text = open("blah.xml", "r").read()
text = text.replace('<?xml version="1.0" ?>', '')
open("blah.xml", "w").write(text)
else:
print
print 'Please enter a number between 1 and 100.'
print
print 'Try again.'
print
print 'Exiting.'
xml_file = open(new_file_name, "w")
doc.writexml(xml_file)
xml_file.close()
os.remove(old_file_name)
I know that in my code, I have "doc.getElementsByTagName('Volume')[0]" which denotes the first instance of the "Volume" tag, but I was just doing that as a test to see if it would work. So I'm aware that the code is working exactly as it should. But I'm wondering if anyone has any suggestions, or could tell me the easiest way to apply the user input percentage to all of the instances of the "Volume" tag.
This is also my first attempt at Python, so if you see anything else that seems weird, please let me know.
Thank you for your help!
You'll be much happier if you use a more modern XML API, like ElementTree (in the standard library) or lxml (more advanced).
In ElementTree or lxml you get access to XPath (or something close), which allows for a much more flexible syntax in finding elements and attributes in XML documents.
In ElementTree:
volumes = my_parsed_xml_file.find('.//Volume')
...will find all occurrences of the Volume element.
If you stick with the current syntax, by doing:
doc.getElementsByTagName('Volume')[0]
...you're specifically asking for the zero-th (first) Volume. If you want to process them all, you want a loop:
for volume in doc.getElementsByTagName('Volume'):
child = volume.childNodes[0]
// ... rest of your code inside the loop
If constructs like loops are unfamiliar to you, you should probably step back and read an introductory programming guide, as things will get pretty complicated quickly without some fundamentals. Best of luck!

Categories

Resources