Get JSON object from Python urllib2 request

Get JSON object from Python urllib2 request - python

I'm trying to use python urllib2 to access the files in a SharePoint 2013 folder using the SharePoint REST API. https://msdn.microsoft.com/en-us/library/office/dn450841.aspx
I can't seem to get the response as a json object even after adding the header 'accept':'application/json;odata=verbose'.
I am using Python 2.7.5, urllib2, and can not use the python requests module due to restrictions and permission constraints.
If I use the .../_api/web/GetFolderByServerRelativeUrl('/File Name')/Files url I get an AttributeError("'str' object has no attribute'read'"). If I just use the normal url to the site I get an error that there is not a 'JSON object'in the response.
I don't know what I am doing wrong and why I can't get at any json from this request?
import sys
import urllib
import urllib2
import json
from ntlm import HTTPNtlmAuthHandler
try:
url = 'https://sharepointsite/subfolder/subfolder/subfolder/Forms/AllItems.aspx/_api/web/GetFolderByServerRelativeUrl(/Folder Name)/Files'
user = r'DOMAIN\USER'
password = '*********'
req = urllib2.Request(url)
req.add_header('accept', 'application/json;odata=verbose')
passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, url, user, password)
handler = HTTPNtlmAuthHandler.HTTPNtlmAuthHandler(passman)
opener = urllib2.build_opener(handler)
urllib2.install_opener(opener)
response = urllib2.urlopen(req)
data_json = json.load(response.read())
except Exception as e:
print '{0}'.format(e)

What the request actually returns is a string representation of the json object.
You can use use the json.loads() function instead of the json.load() function in order to convert the string into a json object.

Related

How to Manually craft the Authorization header for Github in Python?

Im using requests Python module to access Github's API. Github responses 404 Not Found, as expected.
This is my code:
import requests
import sys
username = sys.argv[1]
password = sys.argv[2]
url = 'https://api.github.com/{}'.format(username)
print(url)
response = requests.get(url, data={'password': password})
print(response)
How this header must be done?
I need to get the user's id!
Thanks!

https get request with python urllib2

I am trying to fetch data from quandl using urllib2.Please check code below.
import json
from pymongo import MongoClient
import urllib2
import requests
import ssl
#import quandl
codes = [100526];
for id in codes:
url = 'https://www.quandl.com.com//api/v3/datasets/AMFI/"+str(id)+".json?api_key=XXXXXXXX&start_date=2013-08-30'
req = urllib2.Request(url)
response = urllib2.urlopen(req)
data = response.read()
print data
OR
for id in codes:
url = "https://www.quandl.com.com//api/v3/datasets/AMFI/"+str(id)+".json?api_key=XXXXXXXX&start_date=2013-08-30"
request = requests.get(url,verify=False)
print request
I am getting HTTPERROR exception 404 in 1st case. and when I use request module I get SSL error even after using verify=false. I am looking through previous posts but most of them are related to HTTP request.
Thanks for help.
J

This is working for me, but you get a warning about the SSL certificate but you don't need to care about it.
import requests
codes = [100526];
for id in codes:
url = "https://www.quandl.com.com//api/v3/datasets/AMFI/"+str(id)+".json?api_key=XXXXXXXX&start_date=2013-08-30"
request = requests.get(url, verify=False)
print request.text
request.text has your response data.

You seem to be using a wrong URL (.com.com instead of .com) as well as a combination of different quotes in the first version of your code. Use the following instead and it should work:
import urllib2
import requests
codes = [100526]
for id in codes:
url = "https://www.quandl.com//api/v3/datasets/AMFI/"+str(id)+".json?start_date=2013-08-30"
req = urllib2.Request(url)
response = urllib2.urlopen(req)
print response.read()
for id in codes:
url = "https://www.quandl.com//api/v3/datasets/AMFI/"+str(id)+".json?start_date=2013-08-30"
response = requests.get(url,verify=False)
print response.text
To disable the warning about the SSL certificate, use the following code before making the request using requests:
from requests.packages.urllib3.exceptions import InsecureRequestWarning
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)

REST GET request on Python sending named argument

I am playing from Linux using curl inside bash but now I need to move my script to python in order to be more effective.
What is the best way to do something like the following line on python?
curl baseuri:port/resource -u user:pswd
I played already with urllib2 but I dont get a clue on how to send the "-u user:pswd"

You'll need to use a HTTPPasswordMgrWithDefaultRealm instance to handle the authentication, and add in a HTTPBasicAuthHandler handler to respond to the authentication challenge:
import urllib2
url = 'baseuri:port/resource'
username = 'user'
password = 'pswd'
pwmgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
pwmgr.add_password(None, url, username, password)
authhandler = urllib2.HTTPBasicAuthHandler(passman)
opener = urllib2.build_opener(authhandler)
response = opener.open(theurl)
Yes, this is a handful.
If you can install a 3rd party library, then add use the requests library; it'll be so much easier:
import requests
url = 'baseuri:port/resource'
username = 'user'
password = 'pswd'
response = requests.get(url, auth=(username, password))

Actually, we use pycurl instead which also works pretty well. Here an example to get a json from the request:
import pycurl
import json
from io import BytesIO
data = BytesIO()
pyCurl = pycurl.Curl()
pyCurl.setopt(pyCurlClass.URL,string_http)
pyCurl.setopt(pycurl.USERPWD, 'user:password')
pycurl.setopt(c.WRITEFUNCTION, data.write)
pyCurl.perform()
dictionary = json.loads(data.getvalue())

using python urlopen for a url query

Using urlopen also for url queries seems obvious. What I tried is:
import urllib2
query='http://www.onvista.de/aktien/snapshot.html?ID_OSI=86627'
f = urllib2.urlopen(query)
s = f.read()
f.close()
However, for this specific url query it fails with HTTP error 403 forbidden
When entering this query in my browser, it works.
Also when using http://www.httpquery.com/ to submit the query, it works.
Do you have suggestions how to use Python right to grab the correct response?

Looks like it requires cookies... (which you can do with urllib2), but an easier way if you're doing this, is to use requests
import requests
session = requests.session()
r = session.get('http://www.onvista.de/aktien/snapshot.html?ID_OSI=86627')
This is generally a much easier and less-stressful method of retrieving URLs in Python.
requests will automatically store and re-use cookies for you. Creating a session is slightly overkill here, but is useful for when you need to submit data to login pages etc..., or re-use cookies across a site... etc...
using urllib2 is something like
import urllib2, cookielib
cookies = cookielib.CookieJar()
opener = urllib2.build_opener( urllib2.HTTPCookieProcessor(cookies) )
data = opener.open('url').read()

It appears that the urllib2 default user agent is banned by the host. You can simply supply your own user agent string:
import urllib2
url = 'http://www.onvista.de/aktien/snapshot.html?ID_OSI=86627'
request = urllib2.Request(url, headers={"User-Agent" : "MyUserAgent"})
contents = urllib2.urlopen(request).read()
print contents

Python urllib2, basic HTTP authentication, and tr.im

I'm playing around, trying to write some code to use the tr.im
APIs to shorten a URL.
After reading http://docs.python.org/library/urllib2.html, I tried:
TRIM_API_URL = 'http://api.tr.im/api'
auth_handler = urllib2.HTTPBasicAuthHandler()
auth_handler.add_password(realm='tr.im',
uri=TRIM_API_URL,
user=USERNAME,
passwd=PASSWORD)
opener = urllib2.build_opener(auth_handler)
urllib2.install_opener(opener)
response = urllib2.urlopen('%s/trim_simple?url=%s'
% (TRIM_API_URL, url_to_trim))
url = response.read().strip()
response.code is 200 (I think it should be 202). url is valid, but
the basic HTTP authentication doesn't seem to have worked, because the
shortened URL isn't in my list of URLs (at http://tr.im/?page=1).
After reading http://www.voidspace.org.uk/python/articles/authentication.shtml#doing-it-properly
I also tried:
TRIM_API_URL = 'api.tr.im/api'
password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
password_mgr.add_password(None, TRIM_API_URL, USERNAME, PASSWORD)
auth_handler = urllib2.HTTPBasicAuthHandler(password_mgr)
opener = urllib2.build_opener(auth_handler)
urllib2.install_opener(opener)
response = urllib2.urlopen('http://%s/trim_simple?url=%s'
% (TRIM_API_URL, url_to_trim))
url = response.read().strip()
But I get the same results. (response.code is 200 and url is valid,
but not recorded in my account at http://tr.im/.)
If I use query string parameters instead of basic HTTP authentication,
like this:
TRIM_API_URL = 'http://api.tr.im/api'
response = urllib2.urlopen('%s/trim_simple?url=%s&username=%s&password=%s'
% (TRIM_API_URL,
url_to_trim,
USERNAME,
PASSWORD))
url = response.read().strip()
...then not only is url valid but it's recorded in my tr.im account.
(Though response.code is still 200.)
There must be something wrong with my code though (and not tr.im's API), because
$ curl -u yacitus:xxxx http://api.tr.im/api/trim_url.json?url=http://www.google.co.uk
...returns:
{"trimpath":"hfhb","reference":"nH45bftZDWOX0QpVojeDbOvPDnaRaJ","trimmed":"11\/03\/2009","destination":"http:\/\/www.google.co.uk\/","trim_path":"hfhb","domain":"google.co.uk","url":"http:\/\/tr.im\/hfhb","visits":0,"status":{"result":"OK","code":"200","message":"tr.im URL Added."},"date_time":"2009-03-11T10:15:35-04:00"}
...and the URL does appear in my list of URLs on http://tr.im/?page=1.
And if I run:
$ curl -u yacitus:xxxx http://api.tr.im/api/trim_url.json?url=http://www.google.co.uk
...again, I get:
{"trimpath":"hfhb","reference":"nH45bftZDWOX0QpVojeDbOvPDnaRaJ","trimmed":"11\/03\/2009","destination":"http:\/\/www.google.co.uk\/","trim_path":"hfhb","domain":"google.co.uk","url":"http:\/\/tr.im\/hfhb","visits":0,"status":{"result":"OK","code":"201","message":"tr.im URL Already Created [yacitus]."},"date_time":"2009-03-11T10:15:35-04:00"}
Note code is 201, and message is "tr.im URL Already Created [yacitus]."
I must not be doing the basic HTTP authentication correctly (in either attempt). Can you spot my problem? Perhaps I should look and see what's being sent over the wire? I've never done that before. Are there Python APIs I can use (perhaps in pdb)? Or is there another tool (preferably for Mac OS X) I can use?

This seems to work really well (taken from another thread)
import urllib2, base64
request = urllib2.Request("http://api.foursquare.com/v1/user")
base64string = base64.encodestring('%s:%s' % (username, password)).replace('\n', '')
request.add_header("Authorization", "Basic %s" % base64string)
result = urllib2.urlopen(request)

Really cheap solution:
urllib.urlopen('http://user:xxxx#api.tr.im/api')
(which you may decide is not suitable for a number of reasons, like security of the url)
Github API example:
>>> import urllib, json
>>> result = urllib.urlopen('https://personal-access-token:x-oauth-basic#api.github.com/repos/:owner/:repo')
>>> r = json.load(result.fp)
>>> result.close()

Take a look at this SO post answer and also look at this basic authentication tutorial from the urllib2 missing manual.
In order for urllib2 basic authentication to work, the http response must contain HTTP code 401 Unauthorized and a key "WWW-Authenticate" with the value "Basic" otherwise, Python won't send your login info, and you will need to either use Requests, or urllib.urlopen(url) with your login in the url, or add a the header like in #Flowpoke's answer.
You can view your error by putting your urlopen in a try block:
try:
urllib2.urlopen(urllib2.Request(url))
except urllib2.HTTPError, e:
print e.headers
print e.headers.has_key('WWW-Authenticate')

The recommended way is to use requests module:
#!/usr/bin/env python
import requests # $ python -m pip install requests
####from pip._vendor import requests # bundled with python
url = 'https://httpbin.org/hidden-basic-auth/user/passwd'
user, password = 'user', 'passwd'
r = requests.get(url, auth=(user, password)) # send auth unconditionally
r.raise_for_status() # raise an exception if the authentication fails
Here's a single source Python 2/3 compatible urllib2-based variant:
#!/usr/bin/env python
import base64
try:
from urllib.request import Request, urlopen
except ImportError: # Python 2
from urllib2 import Request, urlopen
credentials = '{user}:{password}'.format(**vars()).encode()
urlopen(Request(url, headers={'Authorization': # send auth unconditionally
b'Basic ' + base64.b64encode(credentials)})).close()
Python 3.5+ introduces HTTPPasswordMgrWithPriorAuth() that allows:
..to eliminate unnecessary 401 response handling, or to unconditionally send credentials on the first request in order to communicate with servers that return a 404 response instead of a 401 if the Authorization header is not sent..
#!/usr/bin/env python3
import urllib.request as urllib2
password_manager = urllib2.HTTPPasswordMgrWithPriorAuth()
password_manager.add_password(None, url, user, password,
is_authenticated=True) # to handle 404 variant
auth_manager = urllib2.HTTPBasicAuthHandler(password_manager)
opener = urllib2.build_opener(auth_manager)
opener.open(url).close()
It is easy to replace HTTPBasicAuthHandler() with ProxyBasicAuthHandler() if necessary in this case.

I would suggest that the current solution is to use my package urllib2_prior_auth which solves this pretty nicely (I work on inclusion to the standard lib.

Same solutions as Python urllib2 Basic Auth Problem apply.
see https://stackoverflow.com/a/24048852/1733117; you can subclass urllib2.HTTPBasicAuthHandler to add the Authorization header to each request that matches the known url.
class PreemptiveBasicAuthHandler(urllib2.HTTPBasicAuthHandler):
'''Preemptive basic auth.
Instead of waiting for a 403 to then retry with the credentials,
send the credentials if the url is handled by the password manager.
Note: please use realm=None when calling add_password.'''
def http_request(self, req):
url = req.get_full_url()
realm = None
# this is very similar to the code from retry_http_basic_auth()
# but returns a request object.
user, pw = self.passwd.find_user_password(realm, url)
if pw:
raw = "%s:%s" % (user, pw)
auth = 'Basic %s' % base64.b64encode(raw).strip()
req.add_unredirected_header(self.auth_header, auth)
return req
https_request = http_request

Try python-request or python-grab

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Get JSON object from Python urllib2 request - python

What the request actually returns is a string representation of the json object. You can use use the json.loads() function instead of the json.load() function in order to convert the string into a json object.

Related

How to Manually craft the Authorization header for Github in Python?

https get request with python urllib2

REST GET request on Python sending named argument

using python urlopen for a url query

Python urllib2, basic HTTP authentication, and tr.im

Categories

Resources