Parse POSTed Excel file in python - python

Sorry, I am a noob when it comes to web. I am trying to send an excel file using the API gateway and process it to write to S3 using a lambda in python. I am sending the file as "application/octet-stream" and parsing after I get the event object as follows:
import io
import cgi
import pandas as pd
import xlrd
def read_file(event):
c_type, c_data = parse_header(event['headers']['Content-Type'])
encoded_file = event['body'].encode('utf-8')
c_data['boundary'] = bytes(c_data['boundary'], "utf-8")
parsed_body = cgi.parse_multipart(io.BytesIO(encoded_file), c_data)
return(parsed_body)
this essentially should give me a io.BytesIO stream which I should be able to read as
df = pd.ExcelFile(list(parsed_body.values())[0][0], engine = 'xlrd')
the function read_file() will be called by the lambda_handler as
def lambda_handler(event, context):
p_body = read_file(event)
df = pd.ExcelFile(list(parsed_body.values())[0][0], engine = 'xlrd')
# Some post processing to the df
I am failing at the point where pandas cannot read this parsed_body. I also tried the multipart library by that too did not give me a result.
If anyone can show me a method to parse the event body and give me a result I would be greteful.
The error that I get is
File "<ipython-input-264-dfd56a631cc4>", line 1, in <module>
cgi.parse_multipart(event_bytes, c_data)
File
"/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/cgi.py",line 261, in parse_multipart
line = fp.readline()
AttributeError: 'bytes' object has no attribute 'readline'

I finally found an answer, use base64 encoding from cURL and pass the data to the API like this
curl -H 'Content-Type:application/octet-stream' --data-binary '{"file": "'"$(base64 /Path/to/file)"'"}' 'https://someAPI.com/some/path?param1=value1\&param2=value2'
with this the API gateway receives a json in the body with the structure {"file": "Base64 encoded string here"}
Once you have this body first get the base64 encoded string as
eventBody = base64.b64decode(json.loads(event['body'])['file'])
Now create an empty stream and write this decoded string into the stream. Also set the seek position to 0
toread=io.BytesIO()
toread.write(eventBody)
toread.seek(0)
Finally just pass this stream to pandas
df=pd.read_excel(toread, sheet_name=sn)
And it worked.

Related

JSONDecodeError: Expecting value: line 1 column 1 (char 0) / While json parameter include

I'm trying to retrieve data from https://clinicaltrials.gov/ and althought I've specified the format as Json in the request parameter:
fmt=json
the returned value is txt by default.
As a consequence i'm not able to retrieve the response in json()
Good:
import requests
response = requests.get('https://clinicaltrials.gov/api/query/study_fields?expr=heart+attack&fields=NCTId%2CBriefTitle%2CCondition&min_rnk=1&max_rnk=&fmt=json')
response.text
Not Good:
import requests
response = requests.get('https://clinicaltrials.gov/api/query/study_fields?expr=heart+attack&fields=NCTId%2CBriefTitle%2CCondition&min_rnk=1&max_rnk=&fmt=json')
response.json()
Any idea how to turn this txt to json ?
I've tried with response.text which is working but I want to retrieve data in Json()
You can use following code snippet:
import requests, json
response = requests.get('https://clinicaltrials.gov/api/query/study_fields?expr=heart+attack&fields=NCTId%2CBriefTitle%2CCondition&min_rnk=1&max_rnk=&fmt=json')
jsonResponse = json.loads(response.content)
You should use the JSON package (that is built-in python, so you don't need to install anything), that will convert the text into a python object (dictionary) using the json.loads() function. Here you can find some examples.

Python & Json - Ebay Api Upload image Error

I've been trying to upload a png image to the Ebay Api with the return_file_upload call:
http://developer.ebay.com/Devzone/post-order/post-order_v2_return-returnId_file_upload__post.html#Samples
It's weird because the documentation says it accepts an array for the data parameter but the samples doesn't use arrays. When I tried using an array I got a Can not deserialize instance of byte out of VALUE_STRING at [Source: java.io.SequenceInputStream#4d57f134; line: 1, column: 11] (through reference chain: com.ebay.marketplace.returns.v3.services.request.UploadFileRequest["data"])
This is my code:
import json
import base64
import requests
with open("take_full_login.png", "rb") as image_file:
encoded_string = base64.encodestring(image_file.read())
url2 = 'https://api.ebay.com/post-order/v2/return/123456/file/upload'
payload2 = {
"data" : encoded_string,
"filePurpose" : "LABEL_RELATED"
}
requests.post(url=url2, data=json.dumps(payload2), headers=headers)
That currently outputs
{"error":[{"errorId":1616,"domain":"returnErrorDomain","severity":"ERROR","category":"REQUEST","message":"Invalid Input.","parameter":[{"value":"data","name":"parameter"}],"longMessage":"Invalid Input.","httpStatusCode":400}]}
Try replacing data=json.dumps(payload2) by json=payload2
The call /post-order/v2/cancellation/check_eligibility only worked that way for me

Can't extract JSON from an http request

I'm having problems getting data from an HTTP response. The format unfortunately comes back with '\n' attached to all the key/value pairs. JSON says it must be a str and not "bytes".
I have tried a number of fixes so my list of includes might look weird/redundant. Any suggestions would be appreciated.
#!/usr/bin/env python3
import urllib.request
from urllib.request import urlopen
import json
import requests
url = "http://finance.google.com/finance/info?client=ig&q=NASDAQ,AAPL"
response = urlopen(url)
content = response.read()
print(content)
data = json.loads(content)
info = data[0]
print(info)
#got this far - planning to extract "id:" "22144"
When it comes to making requests in Python, I personally like to use the requests library. I find it easier to use.
import json
import requests
r = requests.get('http://finance.google.com/finance/info?client=ig&q=NASDAQ,AAPL')
json_obj = json.loads(r.text[4:])
print(json_obj[0].get('id'))
The above solution prints: 22144
The response data had a couple unnecessary characters at the head, which is why I am only loading the relevant (json) portion of the response: r.text[4:]. This is the reason why you couldn't load it as json initially.
Bytes object has method decode() which converts bytes to string. Checking the response in the browser, seems there are some extra characters at the beginning of the string that needs to be removed (a line feed character, followed by two slashes: '\n//'). To skip the first three characters from the string returned by the decode() method we add [3:] after the method call.
data = json.loads(content.decode()[3:])
print(data[0]['id'])
The output is exactly what you expect:
22144
JSON says it must be a str and not "bytes".
Your content is "bytes", and you can do this as below.
data = json.loads(content.decode())

Python3 error: initial_value must be str or None, with StringIO

While porting code from python2 to 3, I get this error when reading from a URL
TypeError: initial_value must be str or None, not bytes.
import urllib
import json
import gzip
from urllib.parse import urlencode
from urllib.request import Request
service_url = 'https://babelfy.io/v1/disambiguate'
text = 'BabelNet is both a multilingual encyclopedic dictionary and a semantic network'
lang = 'EN'
Key = 'KEY'
params = {
'text' : text,
'key' : Key,
'lang' :'EN'
}
url = service_url + '?' + urllib.urlencode(params)
request = Request(url)
request.add_header('Accept-encoding', 'gzip')
response = urllib.request.urlopen(request)
if response.info().get('Content-Encoding') == 'gzip':
buf = StringIO(response.read())
f = gzip.GzipFile(fileobj=buf)
data = json.loads(f.read())
The exception is thrown at this line
buf = StringIO(response.read())
If I use python2, it works fine.
response.read() returns an instance of bytes while StringIO is an in-memory stream for text only. Use BytesIO instead.
From What's new in Python 3.0 - Text Vs. Data Instead Of Unicode Vs. 8-bit
The StringIO and cStringIO modules are gone. Instead, import the io module and use io.StringIO or io.BytesIO for text and data respectively.
This looks like another python3 bytes vs. str problem. Your response is of type bytes (which is different in python 3 from str). You need to get it into a string first using response.read().decode('utf-8') say and then use StringIO on it. Or you may want to use BytesIO as someone said - but if you expect it to be str, preferred way is to decode into an str first.
Consider using six.StringIO instead of io.StringIO.
And if you are migrating code from python2 to python3 and using suds old version use "suds-py3" for python3

How do I fix a "JSONDecodeError: No JSON object could be decoded: line 1 column 0 (char 0)"?

I'm trying to get Twitter API search results for a given hashtag using Python, but I'm having trouble with this "No JSON object could be decoded" error. I had to add the extra % towards the end of the URL to prevent a string formatting error. Could this JSON error be related to the extra %, or is it caused by something else? Any suggestions would be much appreciated.
A snippet:
import simplejson
import urllib2
def search_twitter(quoted_search_term):
url = "http://search.twitter.com/search.json?callback=twitterSearch&q=%%23%s" % quoted_search_term
f = urllib2.urlopen(url)
json = simplejson.load(f)
return json
There were a couple problems with your initial code. First you never read in the content from twitter, just opened the url. Second in the url you set a callback (twitterSearch). What a call back does is wrap the returned json in a function call so in this case it would have been twitterSearch(). This is useful if you want a special function to handle the returned results.
import simplejson
import urllib2
def search_twitter(quoted_search_term):
url = "http://search.twitter.com/search.json?&q=%%23%s" % quoted_search_term
f = urllib2.urlopen(url)
content = f.read()
json = simplejson.loads(content)
return json

Categories

Resources