Multiprocessing in python - UnboundLocalError: local variable 'data' referenced before assignment - python

I am trying to access an API which to return a set of products. Since the execution is slow I was hoping could use multiprocessing to make it faster. The API works perfectly when accessed using a simple for loop.
Here is my code:
from multiprocessing import Pool
from urllib2 import Request, urlopen, URLError
import json
def f(a):
request = Request('API'+ str(a))
try:
response = urlopen(request)
data = response.read()
except URLError, e:
print 'URL ERROR:', e
s=json.loads(data)
#count += len(s['Results'])
#print count
products=[]
for i in range(len(s['Results'])):
if (s['Results'][i]['IsSyndicated']==False):
try:
products.append(int(s['Results'][i]['ProductId']))
except ValueError as e:
products.append(s['Results'][i]['ProductId'])
return products
list=[0,100,200]
if __name__ == '__main__':
p = Pool(4)
result=p.map(f, list)
print result
Here is the error message:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\z080302\Desktop\WinPython-32bit-2.7.6.3\python-2.7.6\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 540, in runfile
execfile(filename, namespace)
File "C:/Users/z080302/Desktop/Python_Projects/mp_test.py", line 36, in <module>
result=p.map(f, list)
File "C:\Users\z080302\Desktop\WinPython-32bit-2.7.6.3\python-2.7.6\lib\multiprocessing\pool.py", line 250, in map
return self.map_async(func, iterable, chunksize).get()
File "C:\Users\z080302\Desktop\WinPython-32bit-2.7.6.3\python-2.7.6\lib\multiprocessing\pool.py", line 554, in get
raise self._value
UnboundLocalError: local variable 'data' referenced before assignment
I was thinking even with multiprocessing the function will still be executed sequentially. So why am I getting UnboundLocalError?

In this code:
try:
response = urlopen(request)
data = response.read()
except URLError, e:
print 'URL ERROR:', e
If urlopen throws a URLError exception, the following line (data = response.read() is never executed. So when you come to:
s=json.loads(data)
The variable data has never been assigned. You probably want to abort processing in the event of a URLError, since that suggests you will not have any JSON data.

The accepted answer is about the actual problem, but I thought I'll add my experience for others who come here because of mystic errors raised by multiprocessings ApplyResult.get with raise self._value. If you are getting TypeError, ValueError or basically any other error which in your case has nothing to do with multiprocessing then it's because that error is raised not by multiprocessing really, but by your code that you are running in the process that you are attempting to manage (or the thread if you happen to be using multiprocessing.pool.ThreadPool which I was).

Related

urllib IncompleteRead() error can I solve by just re-requesting?

I am running a script that is scraping several hundred pages on a site but recently I have been running into IncompleteRead() errors. My understanding is from looking on stackoverflow is that they can happen for any number of unknown reasons.
The error is caused randomly by the Request() function I believe from searching around:
for ec in unq:
print(ec)
url = Request("https://www.brenda-enzymes.org/enzyme.php?ecno=" +
ec, headers={'User-Agent': 'Mozilla/5.0'})
html = urlopen(url).read()
soup = BeautifulSoup(html, 'html.parser')
3.5.2.3
2.1.3.15
2.5.1.72
1.5.1.2
6.1.1.9
3.2.2.27
Traceback (most recent call last):
File "C:\Users\wmn262\Anaconda3\lib\http\client.py", line 554, in _get_chunk_left
chunk_left = self._read_next_chunk_size()
File "C:\Users\wmn262\Anaconda3\lib\http\client.py", line 521, in _read_next_chunk_size
return int(line, 16)
ValueError: invalid literal for int() with base 16: b''
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\wmn262\Anaconda3\lib\http\client.py", line 571, in _readall_chunked
chunk_left = self._get_chunk_left()
File "C:\Users\wmn262\Anaconda3\lib\http\client.py", line 556, in _get_chunk_left
raise IncompleteRead(b'')
IncompleteRead: IncompleteRead(0 bytes read)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<ipython-input-20-82f1876d3006>", line 5, in <module>
html = urlopen(url).read()
File "C:\Users\wmn262\Anaconda3\lib\http\client.py", line 464, in read
return self._readall_chunked()
File "C:\Users\wmn262\Anaconda3\lib\http\client.py", line 578, in _readall_chunked
raise IncompleteRead(b''.join(value))
IncompleteRead: IncompleteRead(1772944 bytes read)
The error happens randomly, as in not always the same url causes it, with https://www.brenda-enzymes.org/enzyme.php?ecno=3.2.2.27 causing this specific one.
Some solutions seems to introduce a try clause but within the except they store the partial data (I think). Why is the the case, why not just resubmit the request?
If so how would I just re run the request as doing that normally seems to solve the issue. Beyond this I have no idea how I can fix the problem.
As per Serges answer, a try function seems to be the way:
The stacktrace let think that you are reading a chunked tranfer encoded reponse and that for any reason you lost the connection between 2 chunks.
As you have said, this can happen for numerous causes, and the occurence is at random. So:
you cannot predict when or for what file it will happen
you cannot prevent it to happen
The best you can do is to catch the error and retry, after an optional delay.
For example:
import time
for ec in unq:
print(ec)
url = Request("https://www.brenda-enzymes.org/enzyme.php?ecno=" +
ec, headers={'User-Agent': 'Mozilla/5.0'})
sleep = 0
for i in range(4):
try:
html = urlopen(url).read()
break
except http.client.IncompleteRead:
if i == 3:
raise # give up after 4 attempts
time.sleep(sleep) # optionaly add a delay here
sleep += 5
soup = BeautifulSoup(html, 'html.parser')
The stacktrace let think that you are reading a chunked tranfer encoded reponse and that for any reason you lost the connection between 2 chunks.
As you have said, this can happen for numerous causes, and the occurence is at random. So:
you cannot predict when or for what file it will happen
you cannot prevent it to happen
The best you can do is to catch the error and retry, after an optional delay.
For example:
for ec in unq:
print(ec)
url = Request("https://www.brenda-enzymes.org/enzyme.php?ecno=" +
ec, headers={'User-Agent': 'Mozilla/5.0'})
for i in range(4):
try:
html = urlopen(url).read()
break
except http.client.IncompleteRead:
if i == 3:
raise # give up after 4 attempts
# optionaly add a delay here
soup = BeautifulSoup(html, 'html.parser')
I have faced with same issue and found this solution
After some little changes the code looks like here:
from http.client import IncompleteRead, HTTPResponse
from urllib.request import urlopen
from urllib.error import URLError, HTTPError
...
def patch_http_response_read(func):
def inner(args):
try:
return func(args)
except IncompleteRead as e:
return e.partial
return inner
HTTPResponse.read = patch_http_response_read(HTTPResponse.read)
try:
response = urlopen(my_url)
result = json.loads(response.read().decode('UTF-8'))
except URLError as e:
print('URL Error Reason: ', e.reason)
except HTTPError as e:
print('HTTP Error code: ', e.code)
I'm not sure that it is a better way. But it works in my case. I'll be happy if this advice will be useful to you or help to you to found something different good solution. Happy coding!

Mocking builtin read() function in Python so it throws Exception

I am trying to test the error handling of a class and I need to simulate read() raising a MemoryError. This is a simplified example.
import mock
def memory_error_read():
raise MemoryError
def read_file_by_filename(filename):
try:
handle = open(filename)
content = handle.read()
except MemoryError:
raise Exception("%s is too big" % self.filename)
finally:
handle.close()
return content
#mock.patch("__builtin__.file.read", memory_error_read)
def testfunc():
try:
read_file_by_filename("/etc/passwd")
except Exception, e:
return
print("Should have received exception")
testfunc()
When I run this I get the following traceback.
# ./read_filename.py
Traceback (most recent call last):
File "./read_filename.py", line 34, in <module>
testfunc()
File "build/bdist.linux-i686/egg/mock.py", line 1214, in patched
File "build/bdist.linux-i686/egg/mock.py", line 1379, in __exit__
TypeError: can't set attributes of built-in/extension type 'file'
It appears that I can't patch the builtin read function. Is there a way to trick it?
Is it possible to do what I want?
ANSWER
Here's the updated code based on Ben's suggestion below.
from __future__ import with_statement
...
def test_func():
with mock.patch("__builtin__.open", new_callable=mock.mock_open) as mo:
mock_file = mo.return_value
mock_file.read.side_effect = MemoryError
try:
read_file_by_filename("/etc/passwd")
except Exception, e:
if "is too big" in str(e):
return
else:
raise
print("Should have caught an exception")
Have you looked at mock_open?
You should be able to have that return a mock object that has an Exception raising side effect on read().

Python Execnet and Exception Handling

I am using execnet to call a jython module from within a python script.
From the docs:
Note that exceptions from the remotely executing code will be reraised as channel.RemoteError exceptions containing a textual representation of the remote traceback.
Let's say my remote module could result in two different exceptions, and I would like to be able to handle each exception differently. How will I be able to handle this, given that both exceptions will instead throw a RemoteError exception that only contains a string of the traceback?
For example, this particular calling code:
#...
channel.send('bogus')
results in the following RemoteError, which just contains an attribute formatted that contains a string of the traceback:
RemoteError: Traceback (most recent call last):
File "<string>", line 1072, in executetask
File "<string>", line 1, in do_exec
File "<remote exec>", line 33, in <module>
IOError: Open failed for table: bogus, error: No such file or directory (2)
I cannot do a try ... except IOError:. I could do a try ... except RemoteError as ex: and parse the ex.formatted to see if it contains IOError, and then raise that instead, but this seems rather sloppy:
from execnet.gateway_base import RemoteError
try:
channel.send('bogus')
except RemoteError as ex:
if 'IOError' in ex.formatted:
raise IOError(ex.formatted[ex.formatted.find('IOError'): -1])
if 'ValueError' in ex.formatted:
raise ValueError(ex.formatted[ex.formatted.find('ValueError'): -1])
# otherwise, reraise the uncaptured error:
raise ex
An old question - I tried to answer it
import unittest
from execnet.gateway_base import RemoteError
import execnet
class Test(unittest.TestCase):
def RemoteErrorHandler(self,error):
e,t = error.formatted.splitlines()[-1].split(':')
raise getattr(__builtins__,e)(t)
def raising_receive(self, ch):
try:
return ch.receive()
except RemoteError as ex:
self.RemoteErrorHandler(ex)
def setUp(self):
self.gateway = execnet.makegateway()
def test_NameError(self):
ch = self.gateway.remote_exec("print o")
with self.assertRaises(NameError):
self.raising_receive(ch)
if __name__ == '__main__':
unittest.main()

Printing passed bytes from Exception

This seems to be the code for the poplib.error_proto.
class error_proto(Exception): pass
It just passes the bytes from the POP response in the exception. What I would like to do is catch any exception, take those bytes, use .decode('ascii') on them, and print them as a string. I've written my own test setup like so:
class B(Exception): pass
def bex(): raise B(b'Problem')
try:
bex()
except B as err:
print(err.decode('ascii'))
I've tried replacing the last line with:
b = bytes(err)
print(b.decode('ascii'))
But to no avail. Is this possible and if so, how would I implement this?
UPDATE: Though, as falsetru points out, the documentation says results are returned as strings, this is not the case:
>>> p = poplib.POP3('mail.site.com')
>>> p.user('skillian#site.com')
b'+OK '
>>> p.pass_('badpassword')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python33\lib\poplib.py", line 201, in pass_
return self._shortcmd('PASS %s' % pswd)
File "C:\Python33\lib\poplib.py", line 164, in _shortcmd
return self._getresp()
File "C:\Python33\lib\poplib.py", line 140, in _getresp
raise error_proto(resp)
poplib.error_proto: b'-ERR authorization failed Check your server settings.'
>>>
According to poplib.error_proto documentation:
Exception raised on any errors from this module (errors from socket module are not caught). The reason for the exception is passed to the constructor as a string.
So, you don't need to decode it.
UPDATE It seems like the documentation does not match the actual implementation.
You can access the arguments passed to the exception constructor using args attribute.
p = poplib.POP3('mail.site.com')
try:
p.user('skillian#site.com')
p.pass_('badpassword')
except poplib.error_proto as e:
print(e.args[0].decode('ascii')) # `'ascii'` is not necessary.

Exception handling in Python and Praw

I am having trouble with the following code:
import praw
import argparse
# argument handling was here
def main():
r = praw.Reddit(user_agent='Python Reddit Image Grabber v0.1')
for i in range(len(args.subreddits)):
try:
r.get_subreddit(args.subreddits[i]) # test to see if the subreddit is valid
except:
print "Invalid subreddit"
else:
submissions = r.get_subreddit(args.subreddits[i]).get_hot(limit=100)
print [str(x) for x in submissions]
if __name__ == '__main__':
main()
subreddit names are taken as arguments to the program.
When an invalid args.subreddits is passed to get_subreddit, it throws an exception which should be caught in the above code.
When a valid args.subreddit name is given as an argument, the program runs fine.
But when an invalid args.subreddit name is given, the exception is not thrown, and instead the following uncaught exception is outputted.
Traceback (most recent call last):
File "./pyrig.py", line 33, in <module>
main()
File "./pyrig.py", line 30, in main
print [str(x) for x in submissions]
File "/usr/local/lib/python2.7/dist-packages/praw/__init__.py", line 434, in get_content
page_data = self.request_json(url, params=params)
File "/usr/local/lib/python2.7/dist-packages/praw/decorators.py", line 95, in wrapped
return_value = function(reddit_session, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/praw/__init__.py", line 469, in request_json
response = self._request(url, params, data)
File "/usr/local/lib/python2.7/dist-packages/praw/__init__.py", line 342, in _request
response = handle_redirect()
File "/usr/local/lib/python2.7/dist-packages/praw/__init__.py", line 316, in handle_redirect
url = _raise_redirect_exceptions(response)
File "/usr/local/lib/python2.7/dist-packages/praw/internal.py", line 165, in _raise_redirect_exceptions
.format(subreddit))
praw.errors.InvalidSubreddit: `soccersdsd` is not a valid subreddit
I can't tell what I am doing wrong. I have also tried rewriting the exception code as
except praw.errors.InvalidSubreddit:
which also does not work.
EDIT: exception info for Praw can be found here
File "./pyrig.py", line 30, in main
print [str(x) for x in submissions]
The problem, as your traceback indicates is that the exception doesn't occur when you call get_subreddit In fact, it also doesn't occur when you call get_hot. The first is a lazy invocation that just creates a dummy Subreddit object but doesn't do anything with it. The second, is a generator that doesn't make any requests until you actually try to iterate over it.
Thus you need to move the exception handling code around your print statement (line 30) which is where the request is actually made that results in the exception.

Categories

Resources