grequests to get .csv file from API - python

I need to get multiple .csv files from SharePoint.
If I make this request via Postman
https://mycompany.sharepoint.com/teams/a/g/_api/web/GetFolderByServerRelativeUrl('Data%20Sources\')/Files('sharepoint_test.csv')/$value
With headers
Authorization: Bearer eyJ...
Accept: application/json;odata=verbose
I get the contents of "test_sharepoint.csv":
column a,column b,column c
32,523,88
46,34,659
25,767,78
I need to download multiple files at once and SharePoint doesn't seem to provide an endpoint for it. So using python and grequests, I get a response, but not the binary data:
>>> base_url = "https://mycompany.sharepoint.com/teams/a/g/_api/web/GetFolderByServerRelativeUrl('Data%20Sources\')/"
>>> url_1 = "Files('sharepoint_test.csv')/$value"
>>> url_2 = "Files('sharepoint_test_2.csv')/$value"
>>> allurls = [base_url + url_1, base_url + url_2]
>>> headers = {"Authorization": authtoken, "Content-Type": "application/json;odata=verbose", "Accept": "application/json;odata=verbose"}
>>> rs = (grequests.get(u, headers=headers, stream=True) for u in allurls)
>>> s = grequests.map(rs)
>>> s
[<Response [200]>, <Response [200]>]
>>> data = open(s[0], "rb").read()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: expected str, bytes or os.PathLike object, not Response
How can I actually get the binary data via grequests?

grequests.get, like requests.get, returns a Response object.
The very first example shows how to use this object:
>>> r.status_code
200
>>> r.headers['content-type']
'application/json; charset=utf8'
>>> r.encoding
'utf-8'
>>> r.text
u'{"type":"User"...'
>>> r.json()
{u'private_gists': 419, u'total_private_repos': 77, ...}
The Binary Response Content section says:
You can also access the response body as bytes, for non-text requests:
>>> r.content
b'[{"repository":{"open_issues":0,"url":"https://github.com/...
So, what you're looking for is:
>>> data = open(s[0].content, "rb").read()
Although I'm not sure what good you expect this to do (is the HTTP response content really going to be a path to a file in your current working directory or local filesystem, encoded in your default filesystem encoding?), it is what you asked for.
Also, it's worth noting that the first thing the documentation for GRequests that you linked to says is:
Note: You should probably use requests-threads or requests-futures instead.
GRequests is barely maintained nowadays, and will probably break with Requests 3.0, while the newer alternatives are among the main drivers behind 3.0's redesign.

Related

Parsing a JSON string and store in a variable

Hi Guys I am calling this API to see the live data of a price from Coingecko, I am trying to parse the json file but keep getting a error in my code when i use json.loads. I imported json and still get this error
Here is a snippet of my code
import json
import requests
class LivePrice(object): #Coingecko API
def GetPrice(self, coin):
coinprice = coin
Gecko_endpoint = 'https://api.coingecko.com/api/v3/simple/price?ids='
currency = '&vs_currencies=usd'
url = Gecko_endpoint + coinprice + currency
r = requests.get(url, headers = {'accept': 'application/json'})
y = json.loads(r)
#print(r.json()[coinprice]['usd'])
if I use this print function i get the price but I want to be able to use the variable and pass it to another class to do some calculation
Just trying to make a simple trading bot for fun while using Alpaca API for paper trading
Traceback (most recent call last):
File "AlapacaBot.py", line 76, in <module>
r.GetPrice(Bitcoin)
File "AlapacaBot.py", line 65, in GetPrice
y = json.loads(r)
File "/usr/lib/python3.8/json/__init__.py", line 341, in loads
raise TypeError(f'the JSON object must be str, bytes or bytearray, '
TypeError: the JSON object must be str, bytes or bytearray, not Response
I am following the example from w3schools but I keep getting an error
https://www.w3schools.com/python/python_json.asp
json.loads only accepts the types listed in your error.
requests get method returns a Response object, not one of those types. The W3Schools link is not a replacement for the Python Requests module documentation, as it only shows strings, not Response objects.
Response objects have a json() function to get the body as a dictionary, which you commented out
r = requests.get(url, headers = {'accept': 'application/json'})
y = r.json()
print(y[coin]['usd'])
Your code is almost correct. You only need to use the requests.json() to retrieve the json information
import json
import requests
class LivePrice: #Coingecko API
def GetPrice(coin):
coinprice = coin
Gecko_endpoint = 'https://api.coingecko.com/api/v3/simple/price?ids='
currency = '&vs_currencies=usd'
url = Gecko_endpoint + coinprice + currency
r = requests.get(url, headers = {'accept': 'application/json'})
y = r.json()
print(y[coinprice]['usd'])
LivePrice.GetPrice("bitcoin")

Content-length header not the same as when manually calculating it?

An answer here (Size of raw response in bytes) says :
Just take the len() of the content of the response:
>>> response = requests.get('https://github.com/')
>>> len(response.content)
51671
However doing that does not get the accurate content length. For example check out this python code:
import sys
import requests
def proccessUrl(url):
try:
r = requests.get(url)
print("Correct Content Length: "+r.headers['Content-Length'])
print("bytes of r.text : "+str(sys.getsizeof(r.text)))
print("bytes of r.content : "+str(sys.getsizeof(r.content)))
print("len r.text : "+str(len(r.text)))
print("len r.content : "+str(len(r.content)))
except Exception as e:
print(str(e))
#this url contains a content-length header, we will use that to see if the content length we calculate is the same.
proccessUrl("https://stackoverflow.com")
If we try and manually calculate the content length and compare it to what is in the header, we get an answer that is much larger?
Correct Content Length: 51504
bytes of r.text : 515142
bytes of r.content : 257623
len r.text : 257552
len r.content : 257606
Why does len(r.content) not return the correct content length? And how can we manually calculate it accurately if the header is missing?
The Content-Length header reflects the body of the response. That's not the same thing as the length of the text or content attributes, because the response could be compressed. requests decompresses the response for you.
You'd have to bypass a lot of internal plumbing to get the original, compressed, raw content, and then you have to access some more internals if you want the response object to still work correctly. The 'easiest' method is to enable streaming, then reading from the raw socket:
from io import BytesIO
r = requests.get(url, stream=True)
# read directly from the raw urllib3 connection
raw_content = r.raw.read()
content_length = len(raw_content)
# replace the internal file-object to serve the data again
r.raw._fp = BytesIO(raw_content)
Demo:
>>> import requests
>>> from io import BytesIO
>>> url = "https://stackoverflow.com"
>>> r = requests.get(url, stream=True)
>>> r.headers['Content-Encoding'] # a compressed response
'gzip'
>>> r.headers['Content-Length'] # the raw response contains 52055 bytes of compressed data
'52055'
>>> r.headers['Content-Type'] # we are served UTF-8 HTML data
'text/html; charset=utf-8'
>>> raw_content = r.raw.read()
>>> len(raw_content) # the raw content body length
52055
>>> r.raw._fp = BytesIO(raw_content)
>>> len(r.content) # the decompressed binary content, byte count
258719
>>> len(r.text) # the Unicode content decoded from UTF-8, character count
258658
This reads the full response into memory, so don't use this if you expect large responses! In that case, you could instead use shutil.copyfileobj() to copy the data from the r.raw file to a spooled temporary file (which will switch to an on-disk file once a certain size is reached), get the file size of that file, then stuff that file onto r.raw._fp.
A function that adds a Content-Type header to any request that is missing that header would look like this:
import requests
import shutil
import tempfile
def ensure_content_length(
url, *args, method='GET', session=None, max_size=2**20, # 1Mb
**kwargs
):
kwargs['stream'] = True
session = session or requests.Session()
r = session.request(method, url, *args, **kwargs)
if 'Content-Length' not in r.headers:
# stream content into a temporary file so we can get the real size
spool = tempfile.SpooledTemporaryFile(max_size)
shutil.copyfileobj(r.raw, spool)
r.headers['Content-Length'] = str(spool.tell())
spool.seek(0)
# replace the original socket with our temporary file
r.raw._fp.close()
r.raw._fp = spool
return r
This accepts an existing session, and lets you specify the request method too. Adjust max_size as needed for your memory constraints. Demo on https://github.com, which lacks a Content-Length header:
>>> r = ensure_content_length('https://github.com/')
>>> r
<Response [200]>
>>> r.headers['Content-Length']
'14490'
>>> len(r.content)
54814
Note that if there is no Content-Encoding header present or the value for that header is set to identity, and the Content-Length is available, then just you can rely on Content-Length being the full size of the response. That's because then there is obviously no compression applied.
As a side note: you should not use sys.getsizeof() if what your are after is the length of a bytes or str object (the number of bytes or characters in that object). sys.getsizeof() gives you the internal memory footprint of a Python object, which covers more than just the number of bytes or characters in that object. See What is the difference between len() and sys.getsizeof() methods in python?

python requests gives 'None' response, where json data is expected

Firstly, I should add that you can find this request by doing the following:
1- Go to [airline site][1]
2- Type in "From" = "syd"
3- Type in "To" = "sin"
4- Make the departure date sep.3 and click one-way and search
5- On the search result page check your network get request when you click on an available seat option radio button
I'm trying to use the requests module to get the response for example from this site
And this is what I'm trying:
url = "http://www.singaporeair.com/chooseFlightJson.form?"
payload = {'selectedFlightIdDetails[0]':amount_data,'hid_flightIDs':'','hid_recommIDs':'','tripType':"O",'userPreferedCurrency':""}
response = requests.get(url, params=payload)
print response.json()
The response is supposed to be:
{"price":{"total":"595.34","currency":{"code":"AUD","label":""},"adult":{"count":1,"label":"Adult","cost":"328.00","total":"328.00"},"child":{"count":0,"label":"Child","cost":"0.00","total":"0.00"},"infant":{"count":0,"label":"Infant","cost":"0.00","total":"0.00"},"addOns":[{"label":"Airport / Government taxes ","cost":"83.24"},{"label":"Carrier Surcharges","cost":"184.10"}],"disclaimer":"Prices are shown in Canadian Dollar(CAD)","rate":"595.34 AUD \u003d 913.80 CAD","ratehint":"Estimated","unFormattedTotal":"595.34"},"messages":{"O3FF11SYD":"A few seats left","O1FF31SYD":" ","R0FF31SYD":"A few seats left","O2FF31SYD":"A few seats left","O0FF31SYD":" ","O1FF11SYD":"A few seats left","O0FF21SYD":" ","O2FF21SYD":" ","O3FF21SYD":" ","O1FF21SYD":" "},"cabinClass":{"onwardCabin":["Economy"]}}
The response is the value None, encoded in JSON; the server returns null\r\n, which means the same as None in Python.
The content type is wrong here; it is set to text/html, but the response.json() return value is entirely correct for what the server sent:
>>> import requests
>>> url = "http://www.singaporeair.com/chooseFlightJson.form?"
>>> amount_data = 0
>>> payload = {'selectedFlightIdDetails[0]':amount_data,'hid_flightIDs':'','hid_recommIDs':'','tripType':"O",'userPreferedCurrency':""}
>>> response = requests.get(url, params=payload)
>>> response
<Response [200]>
>>> response.headers['content-type']
'text/html; charset=ISO-8859-1'
>>> response.text
'null\r\n'
>>> response.json() is None
True
change protocol from http to https.
url = "https://www.singaporeair.com/chooseFlightJson.form?"
The solution is to use requests session, like so:
session = requests.Session()
Then to call all of the urls you need to simply do:
response = session.get(url)
This sets the cookies and session variables which are necessary to retrieve the data.
I have seen jsessionid used in different ways, in the url, or in this case in a cookie. But there are probably other session info that are required, which is taken care of by a requests session object.
while making http request make sure that response_type set to exact use case you are trying with.
In my case response_type='object' worked to eliminate None type in response.

How to make a Python HTTP Request with POST data and Cookie?

I am trying to do a HTTP POST using cookies in Python.
I have the values of URL, POST data and cookie.
import urllib2
url="http://localhost/testing/posting.php"
data="subject=Alice-subject&addbbcode18=%23444444&addbbcode20=0&helpbox=Close+all+open+bbCode+tags&message=alice-body&poll_title=&add_poll_option_text=&poll_length=&mode=newtopic&sid=5b2e663a3d724cc873053e7ca0f59bd0&f=1&post=Submit"
cookie = "phpbb2mysql_data=a%3A2%3A%7Bs%3A11%3A%22autologinid%22%3Bs%3A0%3A%22%22%3Bs%3A6%3A%22userid%22%3Bs%3A1%3A%223%22%3B%7D; phpbb2mysql_t=a%3A9%3A%7Bi%3A3%3Bi%3A1330156986%3Bi%3A1%3Bi%3A1330160737%3Bi%3A5%3Bi%3A1330161702%3Bi%3A6%3Bi%3A1330179284%3Bi%3A2%3Bi%3A1330160743%3Bi%3A7%3Bi%3A1330163187%3Bi%3A8%3Bi%3A1330164442%3Bi%3A9%3Bi%3A1330164739%3Bi%3A10%3Bi%3A1330176335%3B%7D; phpbb2mysql_sid=5b2e663a3d724cc873053e7ca0f59bd0"
#creating HTTP Req
req = urllib2.Request(url,data,cookie)
f = urllib2.urlopen(req)
print f.read()
However, if I try to run the program, it is throwing an error:
Traceback (most recent call last):
File "task-4.py", line 7, in <module>
req = urllib2.Request(url,data,cookie)
File "/usr/lib/python2.6/urllib2.py", line 197, in __init__
for key, value in headers.items():
AttributeError: 'str' object has no attribute 'items'
I have two questions:
1. Is my HTTP POST request proper? (I have properly been able to execute the same thing in Java and got a HTTP 200 with a successful post to phpBB, however, I am new to Python)
2. Can someone show me a toy example of handling HTTP POST with POST data and cookies?
Thanks in advance,
Roy
You can try requests, which makes life easier when dealing with HTTP queries.
import requests
url="http://localhost/testing/posting.php"
data= {
'subject': 'Alice-subject',
'addbbcode18': '%23444444',
'addbbcode20': '0',
'helpbox': 'Close all open bbCode tags',
'message': 'alice-body',
'poll_title': '',
'add_poll_option_text': '',
'poll_length': '',
'mode': 'newtopic',
'sid': '5b2e663a3d724cc873053e7ca0f59bd0',
'f': '1',
'post': 'Submit',
}
cookies = {'phpbb2mysql_data': 'a%3A2%3A%7Bs%3A11%3A%22autologinid%22%3Bs%3A0%3A%22%22%3Bs%3A6%3A%22userid%22%3Bs%3A1%3A%223%22%3B%7D',
'phpbb2mysql_t': 'a%3A9%3A%7Bi%3A3%3Bi%3A1330156986%3Bi%3A1%3Bi%3A1330160737%3Bi%3A5%3Bi%3A1330161702%3Bi%3A6%3Bi%3A1330179284%3Bi%3A2%3Bi%3A1330160743%3Bi%3A7%3Bi%3A1330163187%3Bi%3A8%3Bi%3A1330164442%3Bi%3A9%3Bi%3A1330164739%3Bi%3A10%3Bi%3A1330176335%3B%7D',
'phpbb2mysql_sid': '5b2e663a3d724cc873053e7ca0f59bd0',
}
print requests.get(url, data=data, cookies=cookies).text
http://python-requests.org/
the 3rd argument you pass is a header and should be a dictionary. This should do it
cookie = {"Cookie" : "phpbb2mysql_data=a%3A2%3A%7Bs%3A11%3A%22autologinid%22%3Bs%3A0%3A%22%22%3Bs%3A6%3A%22userid%22%3Bs%3A1%3A%223%22%3B%7D; phpbb2mysql_t=a%3A9%3A%7Bi%3A3%3Bi%3A1330156986%3Bi%3A1%3Bi%3A1330160737%3Bi%3A5%3Bi%3A1330161702%3Bi%3A6%3Bi%3A1330179284%3Bi%3A2%3Bi%3A1330160743%3Bi%3A7%3Bi%3A1330163187%3Bi%3A8%3Bi%3A1330164442%3Bi%3A9%3Bi%3A1330164739%3Bi%3A10%3Bi%3A1330176335%3B%7D; phpbb2mysql_sid=5b2e663a3d724cc873053e7ca0f59bd0"}
I like httplib:
from urlparse import urlparse
from httplib import HTTPConnection
url = "http://localhost/testing/posting.php"
data = "subject=Alice-subject&addbbcode18=%23444444&addbbcode20=0&helpbox=Close+all+open+bbCode+tags&message=alice-body&poll_title=&add_poll_option_text=&poll_length=&mode=newtopic&sid=5b2e663a3d724cc873053e7ca0f59bd0&f=1&post=Submit"
cookie = "phpbb2mysql_data=a%3A2%3A%7Bs%3A11%3A%22autologinid%22%3Bs%3A0%3A%22%22%3Bs%3A6%3A%22userid%22%3Bs%3A1%3A%223%22%3B%7D; phpbb2mysql_t=a%3A9%3A%7Bi%3A3%3Bi%3A1330156986%3Bi%3A1%3Bi%3A1330160737%3Bi%3A5%3Bi%3A1330161702%3Bi%3A6%3Bi%3A1330179284%3Bi%3A2%3Bi%3A1330160743%3Bi%3A7%3Bi%3A1330163187%3Bi%3A8%3Bi%3A1330164442%3Bi%3A9%3Bi%3A1330164739%3Bi%3A10%3Bi%3A1330176335%3B%7D; phpbb2mysql_sid=5b2e663a3d724cc873053e7ca0f59bd0"
urlparts = urlparse(url)
conn = HTTPConnection(urlparts.netloc, urlparts.port or 80)
conn.request("POST", urlparts.path, data, {'Cookie': cookie})
resp = conn.getresponse()
body = resp.read()
Not really. That error is because urllib2 library is trying to iterate over the items of the cookie string you gave it. Try using:
cookies = urllib.urlencode({'phpbb2mysql_data':'foo', 'autologinid':'blahblah'})
# Can do the same for data, allowing you to store it as a map.
headers = {'Cookie': cookies}
req = urllib2.Request(url, data, headers)
See python: urllib2 how to send cookie with urlopen request but your best reference is still really the urllib2 Request docs, but yes it's a tricky (but powerful) library compared to some newer ones.

Can I do preemptive authentication with httplib2?

I need to perform preemptive basic authentication against an HTTP server, i.e., authenticate right away without waiting on a 401 response. Can this be done with httplib2?
Edit:
I solved it by adding an Authorization header to the request, as suggested in the accepted answer:
headers["Authorization"] = "Basic {0}".format(
base64.b64encode("{0}:{1}".format(username, password)))
Add an appropriately formed 'Authorization' header to your initial request.
This also works with the built-in httplib (for anyone wishing to minimize 3rd-party libs/modules). I am using it to authenticate with our Jenkins server using the API Token that Jenkins can create for each user.
>>> import base64, httplib
>>> headers = {}
>>> headers["Authorization"] = "Basic {0}".format(
base64.b64encode("{0}:{1}".format('<username>', '<jenkins_API_token>')))
>>> ## Enable the job
>>> conn = httplib.HTTPConnection('jenkins.myserver.net')
>>> conn.request('POST', '/job/Foo-trunk/enable', None, headers)
>>> resp = conn.getresponse()
>>> resp.status
302
>>> ## Disable the job
>>> conn = httplib.HTTPConnection('jenkins.myserver.net')
>>> conn.request('POST', '/job/Foo-trunk/disable', None, headers)
>>> resp = conn.getresponse()
>>> resp.status
302
I realize this is old, but I figured I'd throw in the solution if you're using Python 3 with httplib2 since I haven't been able to find it anywhere else. I'm also authenticating against a Jenkins server using the API Token for each Jenkins user. If you're not concerned with Jenkins, simply substitute the actual user's password for the API Token.
b64encode is expecting an binary string of ASCII characters. With Python 3 a TypeError will be raised if a plain string is passed in. To get around this, the "user:api_token" portion of the header must be encoded using either 'ascii' or 'utf-8', passed to b64encode, then the resulting byte string must be decoded to a plain string before being placed in the header. The following code did what I needed:
import httplib2, base64
cred = base64.b64encode("{0}:{1}".format(
<user>, <api_token>).encode('utf-8')).decode()
headers = {'Authorization': "Basic %s" % cred}
h = httplib2.Http('.cache')
response, content = h.request("http://my.jenkins.server/job/my_job/enable",
"GET", headers=headers)

Categories

Resources