Python program to search videos using Bing - python

I have been trying to search videos using bing search engine. But every-time I try I get error HTTPError:HTTPError 403:Forbidden
import urllib
import urllib2
import json
def main():
query = "'pyscripter'"
print bing_search(query, 'Video')
def bing_search(query, search_type):
#search_type: Web, Image, News, Video
key= 'LsE7jElMmTDfbrnCEmrCmCEBbaPxMG5BvKr9CsfmSNS'
query = urllib.quote(query)
#create credential for authentication
user_agent = 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; FDM; .NET CLR 2.0.50727; InfoPath.2; .NET CLR 1.1.4322)'
credentials = (':%s' % key).encode('base64')[:-1]
auth = 'Basic %s' % credentials
url = 'https://api.datamarket.azure.com/Data.ashx/Bing/Search/'+search_type+'?Query=%27'+query+'%27&$top=5&$format=json'
request = urllib2.Request(url)
request.add_header('Authorization', auth)
request.add_header('User-Agent', user_agent)
request_opener = urllib2.build_opener()
response = request_opener.open(request)
response_data = response.read()
json_result = json.loads(response_data)
result_list = json_result['d']['results']
print result_list
return result_list
if __name__ == '__main__':
main()
The error shown is:
Traceback (most recent call last):
File "<module1>", line 30, in <module>
File "<module1>", line 7, in main
File "<module1>", line 22, in bing_search
File "C:\Python27\lib\urllib2.py", line 410, in open
response = meth(req, response)
File "C:\Python27\lib\urllib2.py", line 523, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python27\lib\urllib2.py", line 448, in error
return self._call_chain(*args)
File "C:\Python27\lib\urllib2.py", line 382, in _call_chain
result = func(*args)
File "C:\Python27\lib\urllib2.py", line 531, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 403: Forbidden
Before trying this I worked with YouTube search API which worked fine. But the only problem was that it was limited to the videos present in YouTube database. What I want is the list of URL's of all the videos related to the keyword present in internet. So I started with Bing search engine. Any help regarding this would be appreciated.

I had save issue,
A web server may return a 403 Forbidden HTTP status code in response to a request from a client for a web page or resource to indicate that the server can be reached and understood the request, but refuses to take any further action. Status code 403 responses are the result of the web server being configured to deny access, for some reason, to the requested resource by the client.
in my case, I forgot to activate "bing search" subscription, so go to "https://datamarket.azure.com/dataset/bing/search" and activate "bing search" subscription

Related

Is 2 legged oauth possible with Rauth or alternative python 3 library?

I have been searching for a way to implement 2 legged oauth in python 3 to work with the brightcloud api. They offer several code examples using java, php, ruby, .NET C# here: https://bcws.brightcloud.com/code-samples.php. I tried following the same logic to convert the java example into python, however, I'm relatively new to python and I quickly came unstuck.
I have tried implementing using rauth, however, the basic setup utilises a request_token_url which is not provided by brightcloud. I also tried implementing with the following code which was based on this answer -
How do I send a POST using 2-legged oauth2 in python?
import oauth2
import urllib #for url-encode
import urllib.request
import time #Unix timestamp import oauth2
# construct request url
base_url = "http://thor.brightcloud.com/rest"
uri_info_path = "/uris"
url = urllib.parse.quote_plus("http://www.booking.com")
# api key and secret
consumer_key = 'MY_CONSUMER_KEY'
consumer_secret = 'MY_CONSUMER_SECRET'
# contruct endpoint
endpoint = rest_endpoint + uri_info_path + '/' + url
# build request
def build_request(url, method):
params = {
'oauth_version': "1.0",
'oauth_nonce': oauth2.generate_nonce(),
'oauth_timestamp': int(time.time())
}
consumer = oauth2.Consumer(key=consumer_key, secret=consumer_secret)
params['oauth_consumer_key'] = consumer.key
req = oauth2.Request(method=method, url=url, parameters=params)
signature_method = oauth2.SignatureMethod_HMAC_SHA1()
req.sign_request(signature_method, consumer, None)
return req
# call
request = build_request(endpoint,'GET')
u = urllib.request.urlopen(request.to_url())
data = u.read()
print (data)
There is a problem with this line:
u = urllib.request.urlopen(request.to_url())
Which generates the following traceback:
Traceback (most recent call last):
File "bright.py", line 37, in
u = urllib.request.urlopen(request.to_url())
File "/usr/lib/python3.5/urllib/request.py", line 163, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.5/urllib/request.py", line 472, in open
response = meth(req, response)
File "/usr/lib/python3.5/urllib/request.py", line 582, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python3.5/urllib/request.py", line 510, in error
return self._call_chain(*args)
File "/usr/lib/python3.5/urllib/request.py", line 444, in _call_chain
result = func(*args)
File "/usr/lib/python3.5/urllib/request.py", line 590, in >http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 401: Unauthorized
Any help would be much appreciated.

"HTTP Error 401: Unauthorized" when querying youtube api for playlist with python

I try to write a simple python3 script that gets some playlist informations via the youtube API. However I always get a 401 Error whereas it works perfectly when I enter the request string in a browser or making a request with w-get. I'm relatively new to python and I guess I'm missing some important point here.
This is my script. Of course I actually use a real API-Key.
from urllib.request import Request, urlopen
from urllib.parse import urlencode
api_key = "myApiKey"
playlist_id = input('Enter playlist id: ')
output_file = input('Enter name of output file (default is playlist id')
if output_file == '':
output_file = playlist_id
url = 'https://www.googleapis.com/youtube/v3/playlistItems'
params = {'part': 'snippet',
'playlistId': playlist_id,
'key': api_key,
'fields': 'items/snippet(title,description,position,resourceId/videoId),nextPageToken,pageInfo/totalResults',
'maxResults': 50,
'pageToken': '', }
data = urlencode(params)
request = Request(url, data.encode('utf-8'))
response = urlopen(request)
content = response.read()
print(content)
Unfortunately it rises a error at response = urlopen(request)
Traceback (most recent call last):
File "gpd-helper.py", line 35, in <module>
response = urlopen(request)
File "/usr/lib/python3.4/urllib/request.py", line 153, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.4/urllib/request.py", line 461, in open
response = meth(req, response)
File "/usr/lib/python3.4/urllib/request.py", line 571, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python3.4/urllib/request.py", line 499, in error
return self._call_chain(*args)
File "/usr/lib/python3.4/urllib/request.py", line 433, in _call_chain
result = func(*args)
File "/usr/lib/python3.4/urllib/request.py", line 579, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 401: Unauthorized
I looked up the documentation but couldn't find any hint. According to the docs other authentication than the api key is not required for listing a public playlist.
After diving deeper into the docs of python and google I found the solution to my problem.
Pythons Request object automatically creates a POST request when the data parameter is given but the youtube api expects GET (with post params)
The Solution is to ether supply the GET argument for the method parameter in python 3.4
request = Request(url, data.encode('utf-8'), method='GET')
or concatenate the url with the urlencoded post data
request = Request(url + '?' + data)

Logging into a Coursera account Using Python

I had learned a lot of things from MOOCs so I wanted to return something back to them for this purpose I was thinking of designing a small app in kivy which thus requires python implementation, Actually the thing I wanted to achieve was to log in to my Coursera account via program and collect the information about the courses I am currently pursuing, for this first I have to log in to the coursera( https://accounts.coursera.org/signin?post_redirect=https%3A%2F%2Fwww.coursera.org%2F ), Upon searching the Web I came across this piece of code :
import urllib2, cookielib, urllib
username = "abcdef#abcdef.com"
password = "uvwxyz"
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'username' : username, 'password' : password})
info = opener.open("https://accounts.coursera.org/signin",login_data)
for line in info:
print line
and some similar codes as well, but none worked for me, every approach lead to me this type of error:
Traceback (most recent call last):
File "C:\Python27\Practice\web programming\coursera login.py", line 9, in <module>
info = opener.open("https://accounts.coursera.org/signin",login_data)
File "C:\Python27\lib\urllib2.py", line 410, in open
response = meth(req, response)
File "C:\Python27\lib\urllib2.py", line 523, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python27\lib\urllib2.py", line 448, in error
return self._call_chain(*args)
File "C:\Python27\lib\urllib2.py", line 382, in _call_chain
result = func(*args)
File "C:\Python27\lib\urllib2.py", line 531, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 404: Not Found
Is the error due to https protocol or there is something that I am missing?
I don't want to use any 3rd party libraries.
I'm using requests for this purpose and I think it is a great python library. Here is some example code how it could work:
import requests
from requests.auth import HTTPBasicAuth
credentials = HTTPBasicAuth('username', 'password')
response = requests.get("https://accounts.coursera.org/signin", auth=credentials)
print response.status_code
# if everything was fine then it prints
>>> 200
Here is the link to requests:
http://docs.python-requests.org/en/latest/
I think you need to use HTTPBasicAuthHandler module of urllib2. Check section 'Basic Authentication'. https://docs.python.org/2/howto/urllib2.html
And I strongly recommend you requests module. It will make your code better. http://docs.python-requests.org/en/latest/

python automatic re-accessing without changing cookie

I have a problem with accessing specific web site.
The Web site automatically redirect to Check Page which is displaying "check your Browser"
The Check page returns HTTP 503 errors in first time.
Then web browser(chrome, IE etc) automatically re-access again.
Finally I can get into web site.
The problem is I want to access to site in Python.
So I use urllib and urllib2 both.
u = urllib.open(url)
print u.read()
Same with urllib2, but it doesn't work raising 503 error.
urllib also get HTTP 503 code but it doesn't raise error.
So I need to re-access without changing cookie
u = urllib.open(url)
u = urllib.open(url) ## cookie is changed
print u.read()
Simply I tried to call open function twice. But cookie is changed and it doesn't work
(Check Page Again)
So I use urllib2 with cooklib
import os.path
cj = None
ClientCookie = None
cookielib = None
import cookielib
import urllib2
cj = cookielib.LWPCookieJar()
if os.path.isfile('cookie.lpw'):
cj.load('cookie.lpw')
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
urllib2.install_opener(opener)
theurl = url
txdata = None
txheaders = {'User-Agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'}
req = urllib2.Request(theurl, txdata, txheaders)
handle = urllib2.urlopen(req) ## error raised
Error Code Here
Traceback (most recent call last):
File "<pyshell#20>", line 1, in <module>
handle = urlopen(req)
File "C:\Python27\lib\urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "C:\Python27\lib\urllib2.py", line 410, in open
response = meth(req, response)
File "C:\Python27\lib\urllib2.py", line 523, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python27\lib\urllib2.py", line 448, in error
return self._call_chain(*args)
File "C:\Python27\lib\urllib2.py", line 382, in _call_chain
result = func(*args)
File "C:\Python27\lib\urllib2.py", line 531, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 503: Service Temporarily Unavailable
Simply I want to re-access the site when got HTTP 503 error without change cookies.
But I don't know how to do it.
Somebody help me please.

Intermittent DownloadError Application Error 2 on Google App Engine

We have two applications that are both running on Google App Engine. App1 makes requests to app2 as an authenticated user. The authentication works by requesting an authentication token from Google ClientLogin that is exchanged for a cookie. The cookie is then used for subsequent requests (as described here). App1 runs the following code:
class AuthConnection:
def __init__(self):
self.cookie_jar = cookielib.CookieJar()
self.opener = urllib2.OpenerDirector()
self.opener.add_handler(urllib2.ProxyHandler())
self.opener.add_handler(urllib2.UnknownHandler())
self.opener.add_handler(urllib2.HTTPHandler())
self.opener.add_handler(urllib2.HTTPRedirectHandler())
self.opener.add_handler(urllib2.HTTPDefaultErrorHandler())
self.opener.add_handler(urllib2.HTTPSHandler())
self.opener.add_handler(urllib2.HTTPErrorProcessor())
self.opener.add_handler(urllib2.HTTPCookieProcessor(self.cookie_jar))
self.headers = {'User-Agent': 'Mozilla/5.0 (Windows; U; ' +\
'Windows NT 6.1; en-US; rv:1.9.1.2) ' +\
'Gecko/20090729 Firefox/3.5.2 ' +\
'(.NET CLR 3.5.30729)'
}
def fetch(self, url, method, payload=None):
self.__updateJar(url)
request = urllib2.Request(url)
request.get_method = lambda: method
for key, value in self.headers.iteritems():
request.add_header(key, value)
response = self.opener.open(request)
return response.read()
def __updateJar(self, url):
cache = memcache.Client()
cookie = cache.get('auth_cookie')
if cookie:
self.cookie_jar.set_cookie(cookie)
else:
cookie = self.__retrieveCookie(url=url)
cache.set('auth_cookie', cookie, 5000)
def __getCookie(self, url):
auth_url = 'https://www.google.com/accounts/ClientLogin'
auth_data = urllib.urlencode({'Email': USER_NAME,
'Passwd': PASSPHRASE,
'service': 'ah',
'source': 'app1',
'accountType': 'HOSTED_OR_GOOGLE' })
auth_request = urllib2.Request(auth_url, data=auth_data)
auth_response_body = self.opener.open(auth_request).read()
auth_response_dict = dict(x.split('=')
for x in auth_response_body.split('\n') if x)
cookie_args = {}
cookie_args['continue'] = url
cookie_args['auth'] = auth_response_dict['Auth']
cookie_url = 'https://%s/_ah/login?%s' %\
('app2.appspot.com', (urllib.urlencode(cookie_args)))
cookie_request = urllib2.Request(cookie_url)
for key, value in self.headers.iteritems():
cookie_request.add_header(key, value)
try:
self.opener.open(cookie_request)
except:
pass
for cookie in self.cookie_jar:
if cookie.domain == 'app2domain':
return cookie
For 10-30% of the requests a DownloadError is raised:
Error fetching https://app2/Resource
Traceback (most recent call last):
File "/base/data/home/apps/app1/5.344034030246386521/source/main/connection/authenticate.py", line 112, in fetch
response = self.opener.open(request)
File "/base/python_runtime/python_dist/lib/python2.5/urllib2.py", line 381, in open
response = self._open(req, data)
File "/base/python_runtime/python_dist/lib/python2.5/urllib2.py", line 399, in _open
'_open', req)
File "/base/python_runtime/python_dist/lib/python2.5/urllib2.py", line 360, in _call_chain
result = func(*args)
File "/base/python_runtime/python_dist/lib/python2.5/urllib2.py", line 1115, in https_open
return self.do_open(httplib.HTTPSConnection, req)
File "/base/python_runtime/python_dist/lib/python2.5/urllib2.py", line 1080, in do_open
r = h.getresponse()
File "/base/python_runtime/python_dist/lib/python2.5/httplib.py", line 197, in getresponse
self._allow_truncated, self._follow_redirects)
File "/base/data/home/apps/app1/5.344034030246386521/source/main/connection/monkeypatch_urlfetch_deadline.py", line 18, in new_fetch
follow_redirects, deadline, *args, **kwargs)
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/urlfetch.py", line 241, in fetch
return rpc.get_result()
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 501, in get_result
return self.__get_result_hook(self)
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/urlfetch.py", line 325, in _get_fetch_result
raise DownloadError(str(err))
DownloadError: ApplicationError: 2
The request logs for app2 (the "server") seem fine, as expected (according to the docs DownloadError is only raised if there was no valid HTTP response).
Why is the exception raised?
see this:
http://bitbucket.org/guilin/gae-rproxy/src/tip/gae_rproxy/niceurllib.py
because of urllib and urllib2 default to handle http 302 code, and automatically redirect to what the server told it. But when redirect it does not contains the cookie which the server told it.
for example:
urllib2 request //server/login
server response 302,
//server/profile , set-cookie :
session-id:xxxx
urllib2 request
//server/profile
server response not login error or
500 error cause there is no
session-id found.
urllib2 throw error
so, there is no chance for you to set cookie.
self.opener.add_handler(urllib2.HTTPRedirectHandler())
I think you should remove this line and add your own HTTPRedirectHandler which neighter throw error nor automatically redirect , just return the http code and headers, so you have the chance to set cookie.

Categories

Resources