I am working on a program that reads the content of a Restful API from ImportIO. The connection works, and data is returned, but it's a jumbled mess. I'm trying to clean it to only return Asins.
I have tried using the split keyword and delimiter to no success.
stuff = requests.get('https://data.import.io/extractor***')
stuff.content
I get the content, but I want to extract only Asins.
results
While .content gives you access to the raw bytes of the response payload, you will often want to convert them into a string using a character encoding such as UTF-8. the response will do that for you when you access .text.
response.txt
Because the decoding of bytes to str requires an encoding scheme, requests will try to guess the encoding based on the response’s headers if you do not specify one. You can provide an explicit encoding by setting .encoding before accessing .text:
If you take a look at the response, you’ll see that it is actually serialized JSON content. To get a dictionary, you could take the str you retrieved from .text and deserialize it using json.loads(). However, a simpler way to accomplish this task is to use .json():
response.json()
The type of the return value of .json() is a dictionary, so you can access values in the object by key.
You can do a lot with status codes and message bodies. But, if you need more information, like metadata about the response itself, you’ll need to look at the response’s headers.
For More Info: https://realpython.com/python-requests/
What format is the return information in? Typically Restful API's will return the data as json, you will likely have luck parsing the it as a json object.
https://realpython.com/python-requests/#content
stuff_dictionary = stuff.json()
With that, you can load the content is returned as a dictionary and you will have a much easier time.
EDIT:
Since I don't have the full URL to test, I can't give an exact answer. Given the content type is CSV, using a pandas DataFrame is pretty easy. With a quick StackOverflow search, I found the following answer: https://stackoverflow.com/a/43312861/11530367
So I tried the following in the terminal and got a dataframe from it
from io import StringIO
import pandas as pd
pd.read_csv(StringIO("HI\r\ntest\r\n"))
So you should be able to perform the following
from io import StringIO
import pandas as pd
df = pd.read_csv(StringIO(stuff.content))
If that doesn't work, consider dropping the first three bytes you have in your response: b'\xef\xbb\xf'. Check the answer from Mark Tolonen to get parse this.
After that, selecting the ASIN (your second column) from your dataframe should be easy.
asins = df.loc[:, 'ASIN']
asins_arr = asins.array
The response is the byte string of CSV content encoded in UTF-8. The first three escaped byte codes are a UTF-8-encoded BOM signature. So stuff.content.decode('utf-8-sig') should decode it. stuff.text may also work if the encoding was returned correctly in the response headers.
Related
I am putting a JSON response into a variable via requests.json() like this:
response = requests.get(some_url, params=some_params).json()
This however converts JSON's original " to Python's ', true to True, null to None.
This poses a problem when trying to save the response as text and the convert it back to JSON - sure, I can use .replace() for all conversions mentioned above, but even once I do that, I get other funny json decoder errors.
Is there any way in Python to get JSON response and keep original JavaScript format?
json() is the JSON decoder method. You are looking at a Python object, that is why it looks like Python.
Other formats are listed on the same page, starting from Response Content
.text: text - it has no separate link/paragraph, it is right under "Response Content"
.content: binary, as bytes
.json(): decoded JSON, as Python object
.raw: streamed bytes (so you can get parts of content as it comes)
You need .text for getting text, including JSON data.
You can get the raw text of your response with requests.get(some_url, params=some_params).text
It is the json method which converts to a Python friendly format.
I'm having a hard time understanding what is going on with this walmart API and I can't seem to iterate through key, values like I wish. I get different errors depending on the way I attack the problem.
import requests
import json
import urllib
response=requests.get("https://grocery.walmart.com/v0.1/api/stores/4104/departments/1256653758154/aisles/1256653758260/products?count=60&start=0")
info = json.loads(response.text)
print(info)
I'm not sure if I'm playing with a dictionary or a JSON object.
I'm thrown off because the API itself has no quotes over key/val.
When I do a json.loads it comes in but only comes in with single quotes.
I've tried going at it with for-loops but can only traverse the top layer and nothing else. My overall goal is to retrieve the info from the API link, turn it into JSON and be able to grab which ever key/val I need from it.
I'm not sure if I'm playing with a dictionary or a JSON object.
Python has no concept of a "JSON Object". It's a dictionary.
I'm thrown off because the API itself has no quotes over key/val.
Yes it does
{"aisleName":"Organic Dairy, Eggs & Meat","productCount":17,"products":[{"data":
When I do a json.loads it comes in but only comes in with single quotes
Because it's a Python dictionary, and the repr() of dict uses single quotes.
Try print(info['aisleName']) for example
The request's content-type is application/json, but I want to get the request body bytes. Flask will auto convert the data to json. How do I get the request body?
You can get the non-form-related data by calling request.get_data() You can get the parsed form data by accessing request.form and request.files.
However, the order in which you access these two will change what is returned from get_data. If you call it first, it will contain the full request body, including the raw form data. If you call it second, it will typically be empty, and form will be populated. If you want consistent behavior, call request.get_data(parse_form_data=True).
You can get the body parsed as JSON by using request.get_json(), but this does not happen automatically like your question suggests.
See the docs on dealing with request data for more information.
To stream the data rather than reading it all at once, access request.stream.
If you want the data as a string instead of bytes, use request.get_data(as_text=True). This will only work if the body is actually text, not binary, data.
Files in a FormData request can be accessed at request.files then you can select the file you included in the FormData e.g. request.files['audio'].
So now if you want to access the actual bytes of the file, in our case 'audio' using .stream, you should make sure first that your cursor points to the first byte and not to the end of the file, in which case you will get empty bytes.
Hence, a good way to do it:
file = request.files['audio']
file.stream.seek(0)
audio = file.read()
If the data is JSON, use request.get_json() to parse it.
Note that the following pieces of code are used for a remote file inclusion exploit in a controlled environment (not doing anything malicious here).
I'm trying to perform a post request to a URL:
resp = requests.post("http://example.com/test/index.php",data=post_data,cookies=cookie,proxies=proxies,config={'encode_uri': False})
One of the data parameters is a url which is used for file inclusion, at the end it has a nullbyte:
http://mysite.org/simple-backdoor.php%00
But what requests is doing is re-encoding the nullbyte at the end, making it useless
http%3A%2F%2Fmysite.org%2Fsimple-backdoor.php%2500
I tried appending config={'encode_uri': False}) but this results in the same behavior. Does anyone have a clue how to disable this encoding or how to introduce a nullbyte character which gets encoded to %00?
Requests v2.0.0 onwards doesn't have (thus respect) encode_uri. It tries to encode data if data isn't a string.
Use a unicode null-byte instead of %00, OR manually encode every component of data and form data as a string.
I am trying to use the requests library in Python to push data (a raw value) to a firebase location.
Say, I have urladd (the url of the location with authentication token). At the location, I want to push a string, say International. Based on the answer here, I tried
data = {'.value': 'International'}
p = requests.post(urladd, data = sjson.dumps(data))
I get <Response [400]>. p.text gives me:
u'{\n "error" : "Invalid data; couldn\'t parse JSON object, array, or value. Perhaps you\'re using invalid characters in your key names."\n}\n'
It appears that they key .value is invalid. But that is what the answer linked above suggests. Any idea why this may not be working, or how I can do this through Python? There are no problems with connection or authentication because the following works. However, that pushes an object instead of a raw value.
data = {'name': 'International'}
p = requests.post(urladd, data = sjson.dumps(data))
Thanks for your help.
The answer you've linked is a special case for when you want to assign a priority to a value. In general, '.value' is an invalid name and will throw an error.
If you want to write just "International", you should write the stringified-JSON version of that data. I don't have a python example in front of me, but the curl command would be:
curl -X POST -d "\"International\"" https://...
Andrew's answer above works. In case someone else wants to know how to do this using the requests library in Python, I thought this would be helpful.
import simplejson as sjson
data = sjson.dumps("International")
p = requests.post(urladd, data = data)
For some reason I had thought that the data had to be in a dictionary format before it is converted to stringified JSON version. That is not the case, and a simple string can be used as an input to sjson.dumps().