I am working on writing a keyword extractor in Python. I would like to use the Yahoo Content API. The question is, is there a Python2.7 (or even 3.x) wrapper for the Yahoo Content API? I could not find one doing normal searches.
In parallel, I am trying alchemyAPI, OpenCalais, DBPedia Spotlight. I would love to make a comparison to figure out which one to use in production.
Any guidance would be most appreciated.
Thanks
I was interested in the answer as well. This is a possible solution:
import requests
text = """
Italian sculptors and painters of the renaissance favored the Virgin Mary for inspiration
"""
payload = {'q': "select * from contentanalysis.analyze where text='{text}'".format(text=text)}
r = requests.post("http://query.yahooapis.com/v1/public/yql", data=payload)
print(r.text)
According to the documentation, you can POST requests to the Yahoo content API and get JSON back. Python has the urllib2, requests and json libraries for that, all of which are well-documented and easy to use.
Related
For example, I want to download the latest WHO PDF on COVID-19. I'm really not sure how to do this.
If you type in 'who covid19 pdf' on Google, the pdf and link will come up.
I noticed that the links branch off from the main WHO domain name - maybe this can help?
Does anyone know how I can go about this?
From Python's standard library, use the urllib package. Specifically the retrieve function. A succient example is recreated below from this reference.
import urllib
testfile = urllib.URLopener()
testfile.retrieve("http://randomsite.com/file.pdf", "file.pdf")
I am trying to use the Microsoft Academic Graph API with Python, to get information about the Author's Affiliations. However, the information provided in
https://learn.microsoft.com/en-us/azure/cognitive-services/academic-knowledge/graphsearchmethod
is not clear to me.
I have also read Microsoft Academic Graph Search - retrieving all papers from a journal within a time-frame?
I am trying with something like this:
import requests
url = "https://westus.api.cognitive.microsoft.com/academic/v1.0/graph/search"
querystring = {"mode":"json%0A"}
payload = "{}"
response = requests.request("POST", url, data=payload, params=querystring)
print(response.text)
What should I put in "payload" to retrieve the affiliation of, for example, the Author "John Doe" ?
It seems you are using the wrong endpoint.
As for anything experimental the documentation seems out of date.
I have had success calling https://api.labs.cognitive.microsoft.com/academic/v1.0/evaluate
These endpoints can be seen in the cognitive labs documentation.
I am yet to figure out how to retrieve academic profiles, as the query below yields no results, whereas academic.microsoft.com has loads.
https://api.labs.cognitive.microsoft.com/academic/v1.0/evaluate?expr=Composite(AA.AuN='Harry L. Anderson')&model=latest&count=10&attributes=Id,Ti,AA.AuN,E,AA.AuId
Hope this may help anyone stumbling upon this.
Update :
Here is a working query for the same author :
https://api.labs.cognitive.microsoft.com/academic/v1.0/evaluate?model=latest&count=100&expr=Composite(AA.AuN=='harry l anderson')&attributes=Id,Ti,AA.AuN,E,AA.AuId
Notice the author name has to be in lowercase.
there's a tool for migrating the MAG into an Apache Elasticsearch ;)
https://github.com/vwoloszyn/mag2elasticsearch
I've been trying for hours using requests and urllib. I'm so lost, misunderstood by google too. Some tips, or even anything would be useful. Thank you.
Goals: Post country code and phone numbers, then get mobile carrier etc.
Problem: Not printing anything. Variable "name" prints out None.
def do_ccc(self): #Part of bigger class
"""Phone Number Scan"""
#prefix=input("Phone Number Prefix:")
#number=input("Phone Number: ")
url=("https://freecarrierlookup.com/index.php")
from bs4 import BeautifulSoup
import urllib
data = {'cc' : "COUNTRY CODE",
'phonenum' : "PHONE NUMBER"}#.encode('ascii')
data=json.dump(data,sys.stdout)
page=urllib.request.urlopen(url, data)
soup=BeautifulSoup(page, 'html.parser')
name=soup.find('div', attrs={'class': 'col-sm-6 col-md-8'})
#^^^# Test(should print phone number)
print(name)
As Zags pointed out, it is not a good idea to use a website and violate their terms of service, especially when the site of offers a cheap API.
But answering your original question:
You are using json.loads instead of json.load resulting in an empty an empty data object.
If you look at the page, you will see that the URL for POST requests is different, getcarrier.php instead of index.php.
You would also need to convert your str from json.dumps to bytes and even then the site will reject your calls, since a hidden token is added to each request submitted by the website to prevent automatic scraping.
The problem is with what you're code is trying to do. freecarrierlookup.com is just a website. In order to do what you want, you'd need to do web scraping, which is complicated, unmaintainable, and usually a violation of the site's terms of service.
What you need to do is find an API that provides the data you're looking for. A good API will usually have either sample code or a Python library that you can use to make requests.
I've read a lot about multipart/forms, mechanize and twill, but I couldn' findout howto implement a code.
Using MultipartPostHandler to POST form-data with Python
First I Tried to fill the forms on
www.imagebam.com/basic-upload
I can fill the forms but cant send the data really even if I submit() it.
after looking the source code at the page above, I realized all I need to do is "post" data in correct content-type to the page (correct me if Im wrong please)
http://www.imagebam.com/sys/upload/save
directly..
I tried to use poster.py, but couldnt understand how this stuff works. I can use mechanize and twill a little bit, but I am stucked since this is more complex than simple form posting, I think.
So my questions;
-How can I use poster.py (or user-created multipartform classes) to upload images to imagebam.com
-or any other alternative solutions :)
Don't rely completely on third party libraries like mechanize. Either implement its official api in python API ImageBam
or see this project developed in pyqt4 pymguploader to upload image and than try to implement yourself.
Mechanize is not the right tool for the task.
Implementing http://code.google.com/p/imagebam-api/ in python is way more robust.
The examples are in PHP/curl, converting them to python/urllib2 should be trivial.
Yes! I did it. I used
this question.
Here is the code:
>>> from poster.encode import multipart_encode
>>> from poster.streaminghttp import register_openers
>>> import urllib2
>>> register_openers()
<urllib2.OpenerDirector instance at 0x02CDD828>
>>> datagen, headers = multipart_encode({"file[]": open("D:\hedef\myfile.jpg","rb"),"content_type":"1","thumb_size":"350"})
>>> request = urllib2.Request("http://www.imagebam.com/sys/upload/save", datagen, headers)
>>> print urllib2.urlopen(request).read()
Now all I need to do is use BeautifulSoup to fecth the thumbnail codes :)
I was wondering how Blippy is able to get my data? It requires me to put in my bank name, bank card number and password, so is it doing a simple website scrape by logging in?
My bank, however also requires a seperate passphrase as well. How does it get around that?
Can urllib and such libraries be used in Python to replicate Blippy functionality?
Blippy probably uses a service like Yodlee to interface with the bank rather that simple screen scraping. Having said that it is possible to implement similar functionality by combining urllib, BeautifulSoup and Regex modules.