Python Requests - ChunkedEncodingError(e) - requests.iter_lines - python

I'm getting a ChunkedEncodingError(e) using Python requests. I'm using the following to rip down JSON:
r = requests.get(url, headers=auth, stream=True)
And the iterating over each line, using the carriage return as a delimiter, which is how this API distinguishes between distinct JSON events.
for d in r.iter_lines(delimiter="\n"):
d += "\n"
sock.send(d)
I'm delimiting on the carriage return and then adding it back in as the endpoint I'm pushing the logs to actually expects a carriage return at the end of each event also. This seems to work for roughly 100k log files. When I try to make a larger call I'll get this following thrown:
for d in r.iter_lines(delimiter="\n"):
logs_1 | File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 783, in iter_lines
logs_1 | for chunk in self.iter_content(chunk_size=chunk_size, decode_unicode=decode_unicode):
logs_1 | File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 742, in generate
logs_1 | raise ChunkedEncodingError(e)
logs_1 | requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))
UPDATE: I've discovered the API is sending back a NoneType at some point as well. So how can I account for this null byte somewhere in the response without blowing everything up? Each individual event is ended with a \n, and I need to be able to inspect each even individually. Should I chunk the content instead of iter_lines? Then ensure there is no NoneType in the chunk? That way I don't try to iter_lines over a NoneType and it blows up?

ChunkedEncodingError is caused by: httplib.IncompletedRead
import httplib
def patch_http_response_read(func):
def inner(*args):
try:
return func(*args)
except httplib.IncompleteRead, e:
return e.partial
return inner
httplib.HTTPResponse.read = patch_http_response_read(httplib.HTTPResponse.read)
I think this could be a patch. It allows you to deal with defective http servers.
Most servers transmit all data, but due implementation errors they wrongly close session and httplib raise error and bury your precious bytes.

As I posted here mentioned by another guy IncompleteRead, you can use the "With" clause to make sure that your previous request has closed.
with requests.request("POST", url_base, json=task, headers=headers) as report:
print('report: ', report)

If you are sharing a requests.Session object across multiple processes (multiprocessing), it may lead to this error. You can create a seperate Session per process (os.getpid()).

Related

How can i parse a json response? [duplicate]

I am getting error Expecting value: line 1 column 1 (char 0) when trying to decode JSON.
The URL I use for the API call works fine in the browser, but gives this error when done through a curl request. The following is the code I use for the curl request.
The error happens at return simplejson.loads(response_json)
response_json = self.web_fetch(url)
response_json = response_json.decode('utf-8')
return json.loads(response_json)
def web_fetch(self, url):
buffer = StringIO()
curl = pycurl.Curl()
curl.setopt(curl.URL, url)
curl.setopt(curl.TIMEOUT, self.timeout)
curl.setopt(curl.WRITEFUNCTION, buffer.write)
curl.perform()
curl.close()
response = buffer.getvalue().strip()
return response
Traceback:
File "/Users/nab/Desktop/myenv2/lib/python2.7/site-packages/django/core/handlers/base.py" in get_response
111. response = callback(request, *callback_args, **callback_kwargs)
File "/Users/nab/Desktop/pricestore/pricemodels/views.py" in view_category
620. apicall=api.API().search_parts(category_id= str(categoryofpart.api_id), manufacturer = manufacturer, filter = filters, start=(catpage-1)*20, limit=20, sort_by='[["mpn","asc"]]')
File "/Users/nab/Desktop/pricestore/pricemodels/api.py" in search_parts
176. return simplejson.loads(response_json)
File "/Users/nab/Desktop/myenv2/lib/python2.7/site-packages/simplejson/__init__.py" in loads
455. return _default_decoder.decode(s)
File "/Users/nab/Desktop/myenv2/lib/python2.7/site-packages/simplejson/decoder.py" in decode
374. obj, end = self.raw_decode(s)
File "/Users/nab/Desktop/myenv2/lib/python2.7/site-packages/simplejson/decoder.py" in raw_decode
393. return self.scan_once(s, idx=_w(s, idx).end())
Exception Type: JSONDecodeError at /pricemodels/2/dir/
Exception Value: Expecting value: line 1 column 1 (char 0)
Your code produced an empty response body, you'd want to check for that or catch the exception raised. It is possible the server responded with a 204 No Content response, or a non-200-range status code was returned (404 Not Found, etc.). Check for this.
Note:
There is no need to use simplejson library, the same library is included with Python as the json module.
There is no need to decode a response from UTF8 to unicode, the simplejson / json .loads() method can handle UTF8 encoded data natively.
pycurl has a very archaic API. Unless you have a specific requirement for using it, there are better choices.
Either the requests or httpx offers much friendlier APIs, including JSON support. If you can, replace your call with:
import requests
response = requests.get(url)
response.raise_for_status() # raises exception when not a 2xx response
if response.status_code != 204:
return response.json()
Of course, this won't protect you from a URL that doesn't comply with HTTP standards; when using arbirary URLs where this is a possibility, check if the server intended to give you JSON by checking the Content-Type header, and for good measure catch the exception:
if (
response.status_code != 204 and
response.headers["content-type"].strip().startswith("application/json")
):
try:
return response.json()
except ValueError:
# decide how to handle a server that's misbehaving to this extent
Be sure to remember to invoke json.loads() on the contents of the file, as opposed to the file path of that JSON:
json_file_path = "/path/to/example.json"
with open(json_file_path, 'r') as j:
contents = json.loads(j.read())
I think a lot of people are guilty of doing this every once in a while (myself included):
contents = json.load(json_file_path)
Check the response data-body, whether actual data is present and a data-dump appears to be well-formatted.
In most cases your json.loads- JSONDecodeError: Expecting value: line 1 column 1 (char 0) error is due to :
non-JSON conforming quoting
XML/HTML output (that is, a string starting with <), or
incompatible character encoding
Ultimately the error tells you that at the very first position the string already doesn't conform to JSON.
As such, if parsing fails despite having a data-body that looks JSON like at first glance, try replacing the quotes of the data-body:
import sys, json
struct = {}
try:
try: #try parsing to dict
dataform = str(response_json).strip("'<>() ").replace('\'', '\"')
struct = json.loads(dataform)
except:
print repr(resonse_json)
print sys.exc_info()
Note: Quotes within the data must be properly escaped
With the requests lib JSONDecodeError can happen when you have an http error code like 404 and try to parse the response as JSON !
You must first check for 200 (OK) or let it raise on error to avoid this case.
I wish it failed with a less cryptic error message.
NOTE: as Martijn Pieters stated in the comments servers can respond with JSON in case of errors (it depends on the implementation), so checking the Content-Type header is more reliable.
Check encoding format of your file and use corresponding encoding format while reading file. It will solve your problem.
with open("AB.json", encoding='utf-8', errors='ignore') as json_data:
data = json.load(json_data, strict=False)
I had the same issue trying to read json files with
json.loads("file.json")
I solved the problem with
with open("file.json", "r") as read_file:
data = json.load(read_file)
maybe this can help in your case
A lot of times, this will be because the string you're trying to parse is blank:
>>> import json
>>> x = json.loads("")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/__init__.py", line 348, in loads
return _default_decoder.decode(s)
File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
You can remedy by checking whether json_string is empty beforehand:
import json
if json_string:
x = json.loads(json_string)
else:
# Your code/logic here
x = {}
I encounterred the same problem, while print out the json string opened from a json file, found the json string starts with '', which by doing some reserach is due to the file is by default decoded with UTF-8, and by changing encoding to utf-8-sig, the mark out is stripped out and loads json no problem:
open('test.json', encoding='utf-8-sig')
This is the minimalist solution I found when you want to load json file in python
import json
data = json.load(open('file_name.json'))
If this give error saying character doesn't match on position X and Y, then just add encoding='utf-8' inside the open round bracket
data = json.load(open('file_name.json', encoding='utf-8'))
Explanation
open opens the file and reads the containts which later parse inside json.load.
Do note that using with open() as f is more reliable than above syntax, since it make sure that file get closed after execution, the complete sytax would be
with open('file_name.json') as f:
data = json.load(f)
There may be embedded 0's, even after calling decode(). Use replace():
import json
struct = {}
try:
response_json = response_json.decode('utf-8').replace('\0', '')
struct = json.loads(response_json)
except:
print('bad json: ', response_json)
return struct
I had the same issue, in my case I solved like this:
import json
with open("migrate.json", "rb") as read_file:
data = json.load(read_file)
I was having the same problem with requests (the python library). It happened to be the accept-encoding header.
It was set this way: 'accept-encoding': 'gzip, deflate, br'
I simply removed it from the request and stopped getting the error.
Just check if the request has a status code 200. So for example:
if status != 200:
print("An error has occured. [Status code", status, "]")
else:
data = response.json() #Only convert to Json when status is OK.
if not data["elements"]:
print("Empty JSON")
else:
"You can extract data here"
I had exactly this issue using requests.
Thanks to Christophe Roussy for his explanation.
To debug, I used:
response = requests.get(url)
logger.info(type(response))
I was getting a 404 response back from the API.
In my case I was doing file.read() two times in if and else block which was causing this error. so make sure to not do this mistake and hold contain in variable and use variable multiple times.
In my case it occured because i read the data of the file using file.read() and then tried to parse it using json.load(file).I fixed the problem by replacing json.load(file) with json.loads(data)
Not working code
with open("text.json") as file:
data=file.read()
json_dict=json.load(file)
working code
with open("text.json") as file:
data=file.read()
json_dict=json.loads(data)
For me, it was not using authentication in the request.
For me it was server responding with something other than 200 and the response was not json formatted. I ended up doing this before the json parse:
# this is the https request for data in json format
response_json = requests.get()
# only proceed if I have a 200 response which is saved in status_code
if (response_json.status_code == 200):
response = response_json.json() #converting from json to dictionary using json library
I received such an error in a Python-based web API's response .text, but it led me here, so this may help others with a similar issue (it's very difficult to filter response and request issues in a search when using requests..)
Using json.dumps() on the request data arg to create a correctly-escaped string of JSON before POSTing fixed the issue for me
requests.post(url, data=json.dumps(data))
In my case it is because the server is giving http error occasionally. So basically once in a while my script gets the response like this rahter than the expected response:
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html>
<head><title>502 Bad Gateway</title></head>
<body bgcolor="white">
<h1>502 Bad Gateway</h1>
<p>The proxy server received an invalid response from an upstream server.<hr/>Powered by Tengine</body>
</html>
Clearly this is not in json format and trying to call .json() will yield JSONDecodeError: Expecting value: line 1 column 1 (char 0)
You can print the exact response that causes this error to better debug.
For example if you are using requests and then simply print the .text field (before you call .json()) would do.
I did:
Open test.txt file, write data
Open test.txt file, read data
So I didn't close file after 1.
I added
outfile.close()
and now it works
If you are a Windows user, Tweepy API can generate an empty line between data objects. Because of this situation, you can get "JSONDecodeError: Expecting value: line 1 column 1 (char 0)" error. To avoid this error, you can delete empty lines.
For example:
def on_data(self, data):
try:
with open('sentiment.json', 'a', newline='\n') as f:
f.write(data)
return True
except BaseException as e:
print("Error on_data: %s" % str(e))
return True
Reference:
Twitter stream API gives JSONDecodeError("Expecting value", s, err.value) from None
if you use headers and have "Accept-Encoding": "gzip, deflate, br" install brotli library with pip install. You don't need to import brotli to your py file.
In my case it was a simple solution of replacing single quotes with double.
You can find my answer here

Cannot read urllib error message once it is read()

My problem is with error handling of the python urllib error object. I am unable to read the error message while still keeping it intact in the error object, for it to be consumed later.
response = urllib.request.urlopen(request) # request that will raise an error
response.read()
response.read() # is empty now
# Also tried seek(0), that does not work either.
So this how I intend to use it, but when the Exception bubbles up, the.read() second time is empty.
try:
response = urllib.request.urlopen(request)
except urllib.error.HTTPError as err:
self.log.exception(err.read())
raise err
I tried making a deepcopy of the err object,
import copy
try:
response = urllib.request.urlopen(request)
except urllib.error.HTTPError as err:
err_obj_copy = copy.deepcopy(err)
self.log.exception(
"Method:{}\n"
"URL:{}\n"
"Data:{}\n"
"Details:{}\n"
"Headers:{}".format(method, url, data, err_obj_copy.read(), headers))
raise err
but copy is unable to make a deepcopy and throws an error -
TypeError: __init__() missing 5 required positional arguments: 'url', 'code', 'msg', 'hdrs', and 'fp'.
How do I read the error message, while still keeping it intact in the object?
I do know how to do it using requests, but I am stuck with legacy code and need to make it work with urllib
This is what I did. Worked for me.
When reading the error for the first time, save it to a variable like this: msg = response.read().decode('utf8'). You can then create a new HTTPError instance, with the message, and propagate it.
resp = urllib.request.urlopen(request)
msg = resp.read().decode('utf8')
self.log.exception(msg)
raise HTTPError(resp.url, resp.code, resp.reason, resp.headers, io.BytesIO(bytes(msg, 'utf8')))
The error object may read from the network. Network is not seekable -- you can't go back in the general case.
You could replace err with a new HTTPError instance that reads from a buffer (like io.BytesIO()) instead of the network e.g., (not tested):
content = err.read()
self.log.exception(content)
raise HTTPError(err.url, err.code, err.reason, err.headers, io.BytesIO(content))
Though I'm not sure that you should -- handle the error in a single place instead e.g., reraise a more application specific exception or leave the logging to an upstream handler.

Werkzeug response too slow

I have the following Werkzeug application for returning a file to the client:
from werkzeug.wrappers import Request, Response
#Request.application
def application(request):
fileObj = file(r'C:\test.pdf','rb')
response = Response( response=fileObj.read() )
response.headers['content-type'] = 'application/pdf'
return response
The part I want to focus on is this one:
response = Response( response=fileObj.read() )
In this case the response takes about 500 ms (C:\test.pdf is a 4 MB file. Web server is in my local machine).
But if I rewrite that line to this:
response = Response()
response.response = fileObj
Now the response takes about 1500 ms. (3 times slower)
And if write it like this:
response = Response()
response.response = fileObj.read()
Now the response takes about 80 seconds (that's right, 80 SECONDS).
Why is there that much difference between the 3 methods?
And why is the third method sooooo slow?
The answer to that is pretty simple:
x.read() <- reads the whole file into memory, inefficient
setting response to a file: very inefficient as the protocol for that object is an iterator. So you will send the file line by line. If it's binary you will send it with random chunk sizes even.
setting response to a string: bad idea. It's an iterator as mentioned before, so you are now sending each character in the string as a separate packet.
The correct solution is to wrap the file in the file wrapper provided by the WSGI server:
from werkzeug.wsgi import wrap_file
return Response(wrap_file(environ, yourfile), direct_passthrough=True)
The direct_passthrough flag is required so that the response object does not attempt to iterate over the file wrapper but leaves it untouched for the WSGI server.
After some testing I think I've figure out the mistery.
#Armin already explained why this...
response = Response()
response.response = fileObj.read()
...is so slow. But that doesn't explain why this...
response = Response( response=fileObj.read() )
...is so fast. They appear to be the same thing, but obviously they are not. Otherwise there wouldn't be that tremendous difference is speed.
The key here is in this part of the docs: http://werkzeug.pocoo.org/docs/wrappers/
Response can be any kind of iterable or string. If it’s a string it’s considered being an iterable with one item which is the string passed.
i.e. when you give a string to the constructor, it's converted to an iterable with the string being it's only element. But when you do this: response.response = fileObj.read(), the string is treated as is.
So to make it behave like the constructor, you have to do this:
response.response = [ fileObj.read() ]
and now the file is sent as fast as possible.
I can't give you a precise answer as to why this occurs, however http://werkzeug.pocoo.org/docs/wsgi/#werkzeug.wsgi.wrap_file may help address your underling problem.

Python send packet over 443

I have looked and perhaps i missed it. I currently have a file such as the one below:
PUT /URL/TO/SEND/REQUEST
Host: 127.0.0.1
Connection: keep-alive
...
bunch of data here
This file contains the header & the data i want to send over ssl. I know on windows i can use fiddler etc.. to send this raw data BUT i was hoping to use python. I tried looking (may be not hard enough) at urllib2 urllib & httplib to see if i could just send this file as the entire request i don't want to deal with parsing the file etc... Is this possible?
I did notice that in httplib i can use request where "body can be a file object." but from the description seems as though it still sends the header seperately and that file is only for the data being sent.
Thanks
It isn't documented, but it looks like you should be able to use httplib.HTTPConnection.send() for this:
In [13]: httplib.HTTPConnection.send??
Type: instancemethod
String Form:<unbound method HTTPConnection.send>
File: /usr/local/lib/python2.7/httplib.py
Definition: httplib.HTTPConnection.send(self, data)
Source:
def send(self, data):
"""Send `data' to the server."""
if self.sock is None:
if self.auto_open:
self.connect()
else:
raise NotConnected()
if self.debuglevel > 0:
print "send:", repr(data)
blocksize = 8192
if hasattr(data,'read') and not isinstance(data, array):
if self.debuglevel > 0: print "sendIng a read()able"
datablock = data.read(blocksize)
while datablock:
self.sock.sendall(datablock)
datablock = data.read(blocksize)
else:
self.sock.sendall(data)
The request() method combines the header and body and passes it to this function, which looks like it should handle strings or file objects.
Of course you will still need to know the host so that you can create the HTTPConnection object, so your code might look something like this (untested):
import httplib
conn = httplib.HTTPConnection('127.0.0.1')
conn.send(open(filename))
response = conn.getresponse()
edit: It turns out there is some internal state stuff that keeps this from working as is, here is a workaround (full example with google main page), but it is a bit of a hack. Tested using Python 2.6 and 2.7, does not appear to work on 3.x by just replacing httplib with http.client:
import httplib
conn = httplib.HTTPConnection('www.google.com')
conn.send('GET / HTTP/1.1\r\nHost: www.google.com\r\n\r\n')
conn._HTTPConnection__state = httplib._CS_REQ_SENT
response = conn.getresponse()
The key part here is setting conn.__state (mangled name) to the httplib._CS_REQ_SENT after calling send().

XML parser syntax error

So I'm working with a block of code which communicates with the Flickr API.
I'm getting a 'syntax error' in xml.parsers.expat.ExpatError (below). Now I can't figure out how it'd be a syntax error in a Python module.
I saw another similar question on SO regarding the Wikipedia API which seemed to return HTML intead of XML. Flickr API returns XML; and I'm also getting the same error when there shouldn't be a response from Flickr (such as flickr.galleries.addPhoto)
CODE:
def _dopost(method, auth=False, **params):
#uncomment to check you aren't killing the flickr server
#print "***** do post %s" % method
params = _prepare_params(params)
url = '%s%s/%s' % (HOST, API, _get_auth_url_suffix(method, auth, params))
payload = 'api_key=%s&method=%s&%s'% \
(API_KEY, method, urlencode(params))
#another useful debug print statement
#print url
#print payload
return _get_data(minidom.parse(urlopen(url, payload)))
TRACEBACK:
Traceback (most recent call last):
File "TESTING.py", line 30, in <module>
flickr.galleries_create('test_title', 'test_descriptionn goes here.')
File "/home/vlad/Documents/Computers/Programming/LEARNING/curatr/flickr.py", line 1006, in galleries_create
primary_photo_id=primary_photo_id)
File "/home/vlad/Documents/Computers/Programming/LEARNING/curatr/flickr.py", line 1066, in _dopost
return _get_data(minidom.parse(urlopen(url, payload)))
File "/usr/lib/python2.6/xml/dom/minidom.py", line 1918, in parse
return expatbuilder.parse(file)
File "/usr/lib/python2.6/xml/dom/expatbuilder.py", line 928, in parse
result = builder.parseFile(file)
File "/usr/lib/python2.6/xml/dom/expatbuilder.py", line 207, in parseFile
parser.Parse(buffer, 0)
xml.parsers.expat.ExpatError: syntax error: line 1, column 62
(Code from http://code.google.com/p/flickrpy/ under New BSD licence)
UPDATE:
print urlopen(url, payload) == <addinfourl at 43340936 whose fp = <socket._fileobject object at 0x29400d0>>
Doing a urlopen(url, payload).read() returns HTML which is hard to read in a terminal :P but I managed to make out a 'You are not signed in.'
The strange part is that Flickr shouldn't return anything here, or if permissions are a problem, it should return a 99: User not logged in / Insufficient permissions error as it does with the GET function (which I'd expect would be in valid XML).
I'm signed in to Flickr (in the browser) and the program is properly authenticated with delete permissions (dangerous, but I wanted to avoid permission problems.)
SyntaxError normally means an error in Python syntax, but I think here that expatbuilder is overloading it to mean an XML syntax error. Put a try:except block around it, and print out the contents of payload and to work out what's wrong with the first line of it.
My guess would be that flickr is rejecting your request for some reason and giving back a plain-text error message, which has an invalid xml character at column 62, but it could be any number of things. You probably want to check the http status code before parsing it.
Also, it's a bit strange this method is called _dopost but you seem to actually be sending an http GET. Perhaps that's why it's failing.
This seems to fix my problem:
url = '%s%s/?api_key=%s&method=%s&%s'% \
(HOST, API, API_KEY, method, _get_auth_url_suffix(method, auth, params))
payload = '%s' % (urlencode(params))
It seems that the API key and method had to be in the URL not in the payload. (Or maybe only one needed to be there, but anyways, it works :-)

Categories

Resources