How can i parse a json response? [duplicate] - python

I am getting error Expecting value: line 1 column 1 (char 0) when trying to decode JSON.
The URL I use for the API call works fine in the browser, but gives this error when done through a curl request. The following is the code I use for the curl request.
The error happens at return simplejson.loads(response_json)
response_json = self.web_fetch(url)
response_json = response_json.decode('utf-8')
return json.loads(response_json)
def web_fetch(self, url):
buffer = StringIO()
curl = pycurl.Curl()
curl.setopt(curl.URL, url)
curl.setopt(curl.TIMEOUT, self.timeout)
curl.setopt(curl.WRITEFUNCTION, buffer.write)
curl.perform()
curl.close()
response = buffer.getvalue().strip()
return response
Traceback:
File "/Users/nab/Desktop/myenv2/lib/python2.7/site-packages/django/core/handlers/base.py" in get_response
111. response = callback(request, *callback_args, **callback_kwargs)
File "/Users/nab/Desktop/pricestore/pricemodels/views.py" in view_category
620. apicall=api.API().search_parts(category_id= str(categoryofpart.api_id), manufacturer = manufacturer, filter = filters, start=(catpage-1)*20, limit=20, sort_by='[["mpn","asc"]]')
File "/Users/nab/Desktop/pricestore/pricemodels/api.py" in search_parts
176. return simplejson.loads(response_json)
File "/Users/nab/Desktop/myenv2/lib/python2.7/site-packages/simplejson/__init__.py" in loads
455. return _default_decoder.decode(s)
File "/Users/nab/Desktop/myenv2/lib/python2.7/site-packages/simplejson/decoder.py" in decode
374. obj, end = self.raw_decode(s)
File "/Users/nab/Desktop/myenv2/lib/python2.7/site-packages/simplejson/decoder.py" in raw_decode
393. return self.scan_once(s, idx=_w(s, idx).end())
Exception Type: JSONDecodeError at /pricemodels/2/dir/
Exception Value: Expecting value: line 1 column 1 (char 0)

Your code produced an empty response body, you'd want to check for that or catch the exception raised. It is possible the server responded with a 204 No Content response, or a non-200-range status code was returned (404 Not Found, etc.). Check for this.
Note:
There is no need to use simplejson library, the same library is included with Python as the json module.
There is no need to decode a response from UTF8 to unicode, the simplejson / json .loads() method can handle UTF8 encoded data natively.
pycurl has a very archaic API. Unless you have a specific requirement for using it, there are better choices.
Either the requests or httpx offers much friendlier APIs, including JSON support. If you can, replace your call with:
import requests
response = requests.get(url)
response.raise_for_status() # raises exception when not a 2xx response
if response.status_code != 204:
return response.json()
Of course, this won't protect you from a URL that doesn't comply with HTTP standards; when using arbirary URLs where this is a possibility, check if the server intended to give you JSON by checking the Content-Type header, and for good measure catch the exception:
if (
response.status_code != 204 and
response.headers["content-type"].strip().startswith("application/json")
):
try:
return response.json()
except ValueError:
# decide how to handle a server that's misbehaving to this extent

Be sure to remember to invoke json.loads() on the contents of the file, as opposed to the file path of that JSON:
json_file_path = "/path/to/example.json"
with open(json_file_path, 'r') as j:
contents = json.loads(j.read())
I think a lot of people are guilty of doing this every once in a while (myself included):
contents = json.load(json_file_path)

Check the response data-body, whether actual data is present and a data-dump appears to be well-formatted.
In most cases your json.loads- JSONDecodeError: Expecting value: line 1 column 1 (char 0) error is due to :
non-JSON conforming quoting
XML/HTML output (that is, a string starting with <), or
incompatible character encoding
Ultimately the error tells you that at the very first position the string already doesn't conform to JSON.
As such, if parsing fails despite having a data-body that looks JSON like at first glance, try replacing the quotes of the data-body:
import sys, json
struct = {}
try:
try: #try parsing to dict
dataform = str(response_json).strip("'<>() ").replace('\'', '\"')
struct = json.loads(dataform)
except:
print repr(resonse_json)
print sys.exc_info()
Note: Quotes within the data must be properly escaped

With the requests lib JSONDecodeError can happen when you have an http error code like 404 and try to parse the response as JSON !
You must first check for 200 (OK) or let it raise on error to avoid this case.
I wish it failed with a less cryptic error message.
NOTE: as Martijn Pieters stated in the comments servers can respond with JSON in case of errors (it depends on the implementation), so checking the Content-Type header is more reliable.

Check encoding format of your file and use corresponding encoding format while reading file. It will solve your problem.
with open("AB.json", encoding='utf-8', errors='ignore') as json_data:
data = json.load(json_data, strict=False)

I had the same issue trying to read json files with
json.loads("file.json")
I solved the problem with
with open("file.json", "r") as read_file:
data = json.load(read_file)
maybe this can help in your case

A lot of times, this will be because the string you're trying to parse is blank:
>>> import json
>>> x = json.loads("")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/__init__.py", line 348, in loads
return _default_decoder.decode(s)
File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
You can remedy by checking whether json_string is empty beforehand:
import json
if json_string:
x = json.loads(json_string)
else:
# Your code/logic here
x = {}

I encounterred the same problem, while print out the json string opened from a json file, found the json string starts with '', which by doing some reserach is due to the file is by default decoded with UTF-8, and by changing encoding to utf-8-sig, the mark out is stripped out and loads json no problem:
open('test.json', encoding='utf-8-sig')

This is the minimalist solution I found when you want to load json file in python
import json
data = json.load(open('file_name.json'))
If this give error saying character doesn't match on position X and Y, then just add encoding='utf-8' inside the open round bracket
data = json.load(open('file_name.json', encoding='utf-8'))
Explanation
open opens the file and reads the containts which later parse inside json.load.
Do note that using with open() as f is more reliable than above syntax, since it make sure that file get closed after execution, the complete sytax would be
with open('file_name.json') as f:
data = json.load(f)

There may be embedded 0's, even after calling decode(). Use replace():
import json
struct = {}
try:
response_json = response_json.decode('utf-8').replace('\0', '')
struct = json.loads(response_json)
except:
print('bad json: ', response_json)
return struct

I had the same issue, in my case I solved like this:
import json
with open("migrate.json", "rb") as read_file:
data = json.load(read_file)

I was having the same problem with requests (the python library). It happened to be the accept-encoding header.
It was set this way: 'accept-encoding': 'gzip, deflate, br'
I simply removed it from the request and stopped getting the error.

Just check if the request has a status code 200. So for example:
if status != 200:
print("An error has occured. [Status code", status, "]")
else:
data = response.json() #Only convert to Json when status is OK.
if not data["elements"]:
print("Empty JSON")
else:
"You can extract data here"

I had exactly this issue using requests.
Thanks to Christophe Roussy for his explanation.
To debug, I used:
response = requests.get(url)
logger.info(type(response))
I was getting a 404 response back from the API.

In my case I was doing file.read() two times in if and else block which was causing this error. so make sure to not do this mistake and hold contain in variable and use variable multiple times.

In my case it occured because i read the data of the file using file.read() and then tried to parse it using json.load(file).I fixed the problem by replacing json.load(file) with json.loads(data)
Not working code
with open("text.json") as file:
data=file.read()
json_dict=json.load(file)
working code
with open("text.json") as file:
data=file.read()
json_dict=json.loads(data)

For me, it was not using authentication in the request.

For me it was server responding with something other than 200 and the response was not json formatted. I ended up doing this before the json parse:
# this is the https request for data in json format
response_json = requests.get()
# only proceed if I have a 200 response which is saved in status_code
if (response_json.status_code == 200):
response = response_json.json() #converting from json to dictionary using json library

I received such an error in a Python-based web API's response .text, but it led me here, so this may help others with a similar issue (it's very difficult to filter response and request issues in a search when using requests..)
Using json.dumps() on the request data arg to create a correctly-escaped string of JSON before POSTing fixed the issue for me
requests.post(url, data=json.dumps(data))

In my case it is because the server is giving http error occasionally. So basically once in a while my script gets the response like this rahter than the expected response:
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html>
<head><title>502 Bad Gateway</title></head>
<body bgcolor="white">
<h1>502 Bad Gateway</h1>
<p>The proxy server received an invalid response from an upstream server.<hr/>Powered by Tengine</body>
</html>
Clearly this is not in json format and trying to call .json() will yield JSONDecodeError: Expecting value: line 1 column 1 (char 0)
You can print the exact response that causes this error to better debug.
For example if you are using requests and then simply print the .text field (before you call .json()) would do.

I did:
Open test.txt file, write data
Open test.txt file, read data
So I didn't close file after 1.
I added
outfile.close()
and now it works

If you are a Windows user, Tweepy API can generate an empty line between data objects. Because of this situation, you can get "JSONDecodeError: Expecting value: line 1 column 1 (char 0)" error. To avoid this error, you can delete empty lines.
For example:
def on_data(self, data):
try:
with open('sentiment.json', 'a', newline='\n') as f:
f.write(data)
return True
except BaseException as e:
print("Error on_data: %s" % str(e))
return True
Reference:
Twitter stream API gives JSONDecodeError("Expecting value", s, err.value) from None

if you use headers and have "Accept-Encoding": "gzip, deflate, br" install brotli library with pip install. You don't need to import brotli to your py file.

In my case it was a simple solution of replacing single quotes with double.
You can find my answer here

Related

Simple JSON Decode Error in Gemini API Started Out of Nowhere

I am trying to play with the Gemini trading API. I have issued myself an API key and a secret, and after configuring my environment, in which I had a lot of issues setting up and installing requests through pip, I used their example code to create a simple script to read my recent trades. Here is the script, minus my API keys:
#!/usr/bin/env/ python
import requests
import json
import base64
import hmac
import hashlib
import datetime, time
url = "https://api.sandbox.gemini.com"
gemini_api_key = "master-xxx"
gemini_api_secret = "xxx".encode()
t = datetime.datetime.now()
payload_nonce = str(int(time.mktime(t.timetuple())*1000))
payload = {"request": "/v1/mytrades", "nonce": payload_nonce}
encoded_payload = json.dumps(payload).encode()
b64 = base64.b64encode(encoded_payload)
signature = hmac.new(gemini_api_secret, b64, hashlib.sha384).hexdigest()
request_headers = {
'Content-Type': "text/plain",
'Content-Length': "0",
'X-GEMINI-APIKEY': gemini_api_key,
'X-GEMINI-PAYLOAD': b64,
'X-GEMINI-SIGNATURE': signature,
'Cache-Control': "no-cache"
}
response = requests.post(url, headers=request_headers)
my_trades = response.json()
print(my_trades)
Now at first, it would run, but give me an error saying I hadn't specified an account. Then, without changing ANYTHING AT ALL, it suddenly quit working altogether. So while I still have some sort of issue accessing the API, I can't even get to the errors anymore to try to figure out why. Now what I get is a JSON decode error, which looks like the following:
Traceback (most recent call last):
File "c:\Users\david\Desktop\Code Projects\GeminiTrader\GeminiTrader-v0.1.py",
line 33, in <module>
my_trades = response.json()
File "C:\Users\david\AppData\Local\Programs\Python\Python39\lib\site-packages\requests-2.25.1-py3.9.egg\requests\models.py", line 900, in json
return complexjson.loads(self.text, **kwargs)
File "C:\Users\david\AppData\Local\Programs\Python\Python39\lib\json\__init__.py", line 346, in loads
return _default_decoder.decode(s)
File "C:\Users\david\AppData\Local\Programs\Python\Python39\lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Users\david\AppData\Local\Programs\Python\Python39\lib\json\decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
What is causing this json decode issue? Why was it not coming up before, when the API was simply rejecting my request due to an account parameter error? Why did it suddenly change to this error without me modifying anything in the code? How can I fix it? I kept having issues with installing requests and getting it to work, maybe I messed something up in that process?
Certainly after fixing this I will have a new host of issues to fix because the documentation on this API is abysmal. Any help in progressing this project would be greatly appreciated! Thanks!
As you are calling an API, there are chances that your API call fails and returns just string or empty response. I would suggest that you first add a check on the status code of your response something like below and then process the json data.
data = requests.post()
if data.status_code != 200:
raise Exception("Error", data.reason)
json_data = data.json()

Python HTTP server giving error some time after

I coded a Python HTTP server as below and I run the server from the directory which this python file exist. I am typing "python myserver.py" in the cmd and server succesfully starts and reads the index.html in the directory but my problem is after some time my code gives the following error and closes the server
Traceback (most recent call last):
File "myserver.py", line 20, in
requesting_file = string_list[1]
IndexError: list index out of range
How can I fix this problem ?
import socket
HOST,PORT = '127.0.0.1',8082
my_socket = socket.socket(socket.AF_INET,socket.SOCK_STREAM)
my_socket.setsockopt(socket.SOL_SOCKET,socket.SO_REUSEADDR,1)
my_socket.bind((HOST,PORT))
my_socket.listen(1)
print('Serving on port ',PORT)
while True:
connection,address = my_socket.accept()
request = connection.recv(1024).decode('utf-8')
string_list = request.split(' ') # Split request from spaces
print (request)
method = string_list[0]
requesting_file = string_list[1]
print('Client request ',requesting_file)
myfile = requesting_file.split('?')[0] # After the "?" symbol not relevent here
myfile = myfile.lstrip('/')
if(myfile == ''):
myfile = 'index.html' # Load index file as default
try:
file = open(myfile,'rb') # open file , r => read , b => byte format
response = file.read()
file.close()
header = 'HTTP/1.1 200 OK\n'
if(myfile.endswith(".jpg")):
mimetype = 'image/jpg'
elif(myfile.endswith(".css")):
mimetype = 'text/css'
else:
mimetype = 'text/html'
header += 'Content-Type: '+str(mimetype)+'\n\n'
except Exception as e:
header = 'HTTP/1.1 404 Not Found\n\n'
response = '<html><body><center><h3>Error 404: File not found</h3><p>Python HTTP Server</p></center></body></html>'.encode('utf-8')
final_response = header.encode('utf-8')
final_response += response
connection.send(final_response)
connection.close()
socket.recv(n) is not guaranteed to read the entire n bytes of the message in one go and can return fewer bytes than requested in some circumstances.
Regarding your code it's possible that only the method, or part thereof, is received without any space character being present in the received data. In that case split() will return a list with one element, not two as you assume.
The solution is to check that a full message has been received. You could do that by looping until sufficient data has been received, e.g. you might ensure that some minimum number of bytes has been received by checking the length of data and looping until the minimum has been reached.
Alternatively you might continue reading until a new line or some other sentinel character is received. It's probably worth capping the length of the incoming data to avoid your server being swamped by data from a rogue client.
Finally, check whether split() returns the two values that you expect and handle accordingly if it does not. Furthermore, be very careful about the file name; what if it contains a relative path, e.g. ../../etc/passwd?

Python Requests - ChunkedEncodingError(e) - requests.iter_lines

I'm getting a ChunkedEncodingError(e) using Python requests. I'm using the following to rip down JSON:
r = requests.get(url, headers=auth, stream=True)
And the iterating over each line, using the carriage return as a delimiter, which is how this API distinguishes between distinct JSON events.
for d in r.iter_lines(delimiter="\n"):
d += "\n"
sock.send(d)
I'm delimiting on the carriage return and then adding it back in as the endpoint I'm pushing the logs to actually expects a carriage return at the end of each event also. This seems to work for roughly 100k log files. When I try to make a larger call I'll get this following thrown:
for d in r.iter_lines(delimiter="\n"):
logs_1 | File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 783, in iter_lines
logs_1 | for chunk in self.iter_content(chunk_size=chunk_size, decode_unicode=decode_unicode):
logs_1 | File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 742, in generate
logs_1 | raise ChunkedEncodingError(e)
logs_1 | requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))
UPDATE: I've discovered the API is sending back a NoneType at some point as well. So how can I account for this null byte somewhere in the response without blowing everything up? Each individual event is ended with a \n, and I need to be able to inspect each even individually. Should I chunk the content instead of iter_lines? Then ensure there is no NoneType in the chunk? That way I don't try to iter_lines over a NoneType and it blows up?
ChunkedEncodingError is caused by: httplib.IncompletedRead
import httplib
def patch_http_response_read(func):
def inner(*args):
try:
return func(*args)
except httplib.IncompleteRead, e:
return e.partial
return inner
httplib.HTTPResponse.read = patch_http_response_read(httplib.HTTPResponse.read)
I think this could be a patch. It allows you to deal with defective http servers.
Most servers transmit all data, but due implementation errors they wrongly close session and httplib raise error and bury your precious bytes.
As I posted here mentioned by another guy IncompleteRead, you can use the "With" clause to make sure that your previous request has closed.
with requests.request("POST", url_base, json=task, headers=headers) as report:
print('report: ', report)
If you are sharing a requests.Session object across multiple processes (multiprocessing), it may lead to this error. You can create a seperate Session per process (os.getpid()).

How to deal with incorrect json format ? (simplejson.scanner.JSONDecodeError: Expecting value: line 1 column 1 (char 0))

1) A script that I had working for many weeks broke a few days ago. I can't parse the JSON properly now. So this is not net new code, it's something that has been in operation for months.
2) Something changed in the servicing website, and it's making the JSON non-compliant but I have been trying to circumvent the issue with no success. I think it may be an extra space or something, but I can't change the information returned from the servicing website.
3) I know the json is not compliant because I used a validator (https://jsonformatter.curiousconcept.com/) by putting the URL of the service I need with my credentials/format needed, and I get proper results but the validation fails with "Invalid encoding, expecting UTF-8, UTF-16 or UTF-32.[Code 29, Structure 0]". There is a way to tell the validator not to validate and the Json looks proper, but Python will not have anything to do with it. When I run my script it reports:
simplejson.scanner.JSONDecodeError: Expecting value: line 1 column 1 (char 0).
4) Below is my URL entry manually and the script. I have obfuscated all sensitive and personal information so the URL if you try won't work, but when I do the non-obfuscated format, I do get a JSON response.
5) Manual URL (obfuscated):
https://mystuff.mydevices.com/Membership/SomeOtherURLrelated?appId=BB8pQgg123450WHahgl12345nAkkX67890q2HrHD7H1nabcde5KqtN654321LB%2fi&securityToken=null&username=myemail#somedomain.com&password=mypassword&culture=en
6) If I manually opened a browser and put the previous real URL (unmodified), the browser responds with json. An example (obfuscated):
{"UserId":0,"SecurityToken":"abcdb8c3-1ef1-1110-1234-402a914f52aa","ReturnCode":"0","ErrorMessage":"","BrandId":2,"BrandName":"Mydevicebrandname","RegionId":1}
7) What can I do to overcome this ? any suggestions ? I have been reading and testing but no luck!
8) Now the script (obfuscated) that basically builds the previous URL and extracts from the JSON a one-time security token that then I can use for other purposes in a much bigger application:
import json,requests
APPID = 'BB8pQgg123450WHahgl12345nAkkX67890q2HrHD7H1nabcde5KqtN654321LB%2fi'
USERNAME = 'myemail#somedomain.com'
PASSWORD = 'mypassword'
CULTURE = 'en'
SERVICE = 'https://mystuff.mydevices.com'
def get_token_formydevices():
payload = {'appId': APPID,
'securityToken': 'null',
'username': USERNAME,
'password': PASSWORD,
'culture': CULTURE,}
login_url = SERVICE + '/Membership/SomeOtherURLrelated'
try:
r = requests.get(login_url, params=payload)
except requests.exceptions.RequestException as err:
return
data = r.json()
if data['ReturnCode'] != '0':
print(data['ErrorMessage'])
sys.exit(1)
return data['SecurityToken']
tokenneeded = get_token_formydevices()
print tokenneeded
9) When I run the previous code this is what I get back:
Traceback (most recent call last):
File "testtoken.py", line 33, in <module>
tokenneeded = get_token_formydevices()
File "testtoken.py", line 26, in get_token_formydevices
data = r.json()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/requests/models.py", line 826, in json
return complexjson.loads(self.text, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/simplejson/__init__.py", line 516, in loads
return _default_decoder.decode(s)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/simplejson/decoder.py", line 370, in decode
obj, end = self.raw_decode(s)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/simplejson/decoder.py", line 400, in raw_decode
return self.scan_once(s, idx=_w(s, idx).end())
simplejson.scanner.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
I found a solution and want to share.
So I was very puzzled by the fact that I could open the servicing URL in a web browser and get some json back, but I couldn't do it in my script or even just using cURL. I kept on getting "request denied" even though the request worked from the browser, so it had to be something else.
So I started experimenting and sending in the request user agent information in my script and voilá! the code below is working although I obfuscated the original URL and my credentials for protection.
I further want to explain that I was doing this as the servicing URL provides back a one-time token that I can then use to trigger another action. So I needed this routine and executive for as many times I need to carry on specific actions, so all I wanted was to retrieve the token from the json form that url. Hope this makes more sense now with the code below.
import json,urllib2
url='https://mystuff.mydevices.com/Membership/SomeOtherURLrelated?appId=BB8pQgg123450WHahgl12345nAkkX67890q2HrHD7H1nabcde5KqtN654321LB%2fi&securityToken=null&username=myemail#somedomain.com&password=mypassword&culture=en'
request = urllib2.Request(url)
#request.add_header('User-Agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36') # <--- this works too
request.add_header('User-Agent', 'Mozilla/5.0')
data = json.loads(str(urllib2.urlopen(request).read()))
token = data["SecurityToken"]
print token

How to query a restful webservice using Python

Writing a Python script that uses Requests lib to fire off a request to a remote webservice. Here is my code (test.py):
import logging.config
from requests import Request, Session
logging.config.fileConfig('../../resources/logging.conf')
logr = logging.getLogger('pyLog')
url = 'https://158.74.36.11:7443/hqu/hqapi1/user/get.hqu'
token01 = 'hqstatus_python'
token02 = 'ytJFRyV7g'
response_length = 351
def main():
try:
logr.info('start SO example')
s = Session()
prepped = Request('GET', url, auth=(token01, token02), params={'name': token01}).prepare()
response = s.send(prepped, stream=True, verify=False)
logr.info('status: ' + str(response.status_code))
logr.info('elapsed: ' + str(response.elapsed))
logr.info('headers: ' + str(response.headers))
logr.info('content: ' + response.raw.read(response_length).decode())
except Exception:
logr.exception("Exception")
finally:
logr.info('stop')
if __name__ == '__main__':
main()
I get the following successful output when i run this:
INFO test - start SO example
INFO test - status: 200
INFO test - elapsed: 0:00:00.532053
INFO test - headers: CaseInsensitiveDict({'server': 'Apache-Coyote/1.1', 'set-cookie': 'JSESSIONID=8F87A69FB2B92F3ADB7F8A73E587A10C; Path=/; Secure; HttpOnly', 'content-type': 'text/xml;charset=UTF-8', 'transfer-encoding': 'chunked', 'date': 'Wed, 18 Sep 2013 06:34:28 GMT'})
INFO test - content: <?xml version="1.0" encoding="utf-8"?>
<UserResponse><Status>Success</Status> .... </UserResponse>
INFO test - stop
As you can see, there is this weird variable 'response_length' that i need to pass to the response object (optional argument) to be able to read the content. This variable has to be set to a numeric value that is equal to length of the 'content'. This obviously means that i need to know the response-content-length before hand, which is unreasonable.
If i don't pass that variable or set it to a value greater than the content length, I get the following error:
Traceback (most recent call last):
File "\Python33\lib\http\client.py", line 590, in _readall_chunked
chunk_left = self._read_next_chunk_size()
File "\Python33\lib\http\client.py", line 562, in _read_next_chunk_size
return int(line, 16)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb4 in position 0: invalid start byte
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "test.py", line 22, in main
logr.info('content: ' + response.raw.read().decode())
File "\Python33\lib\site-packages\requests\packages\urllib3\response.py", line 167, in read
data = self._fp.read()
File "\Python33\lib\http\client.py", line 509, in read
return self._readall_chunked()
File "\Python33\lib\http\client.py", line 594, in _readall_chunked
raise IncompleteRead(b''.join(value))
http.client.IncompleteRead: IncompleteRead(351 bytes read)
How do i make this work without this 'response_length' variable?
Also, are there any better options than 'Requests' lib?
PS: this code is an independent script, and does not run in the Django framework.
Use the public API instead of internals and leave worrying about content length and reading to the library:
import requests
s = requests.Session()
s.verify = False
s.auth = (token01, token02)
resp = s.get(url, params={'name': token01}, stream=True)
content = resp.content
or, since stream=True, you can use the resp.raw file object:
for line in resp.iter_lines():
# process a line
or
for chunk in resp.iter_content():
# process a chunk
If you must have a file-like object, then resp.raw can be used (provided stream=True is set on the request, like done above), but then just use .read() calls without a length to read to EOF.
If you are however, not querying a resource that requires you to stream (anything but a large file request, a requirement to test headers first, or a web service that is explicitly documented as a streaming service), just leave off the stream=True and use resp.content or resp.text for byte or unicode response data.
In the end, however, it appears your server is sending chunked responses that are malformed or incomplete; a chunked transfer encoding includes length information for each chunk and the server appears to be lying about a chunk length or sending too little data for a given chunk. The decode error is merely the result of incomplete data having been sent.
The server you request use "chunked" transfer encoding so there is not a content-length header. A raw response in chunked transfer encoding contains not only actual content but also chunks, a chunk is a number in hex followed by "\r\n" and it always cause xml or json parser error.
try use:
response.raw.read(decode_content=True)

Categories

Resources