Error while writing to file with multiprocessing

Error while writing to file with multiprocessing - python

I am working on html parser, it uses Python multiprocessing Pool, because it runs through huge number of pages. The output from every page is saved to a separate CSV file. The problem is sometimes I get unexpected error and whole program crashes and I have errors handling almost everywhere - reading pages, parsing pages, even writing files. Moreover it looks like the script crashes after it finishes writing a batch of files, so it shouldn't be anything to crush on. Thus after whole day of debugging I am left clueless.
Error:
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "D:\Programy\Python36-32\lib\multiprocessing\pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "D:\Programy\Python36-32\lib\multiprocessing\pool.py", line 44, in mapstar
return list(map(*args))
File "D:\ppp\Python\parser\run.py", line 244, in media_process
save_media_product(DIRECTORY, category, media_data)
File "D:\ppp\Python\parser\manage_output.py", line 180, in save_media_product
_file_manager(target_file, temp, temp2)
File "D:\ppp\Python\store_parser\manage_output.py", line 214, in _file_manager
file_to_write.close()
UnboundLocalError: local variable 'file_to_write' referenced before assignment
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "D:\ppp\Python\store_parser\run.py", line 356, in <module>
main()
File "D:\Rzeczy Mariusza\Python\store_parser\run.py", line 318, in main
process.map(media_process, batch)
File "D:\Programy\Python36-32\lib\multiprocessing\pool.py", line 266, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "D:\Programy\Python36-32\lib\multiprocessing\pool.py", line 644, in get
raise self._value
UnboundLocalError: local variable 'file_to_write' referenced before assignment
It look like, there is an error with variable assignment, but it is not:
try:
file_to_write = open(target_file, 'w')
except OSError:
message = 'OSError while writing file name - {}'.format(target_file)
log_error(message)
except UnboundLocalError:
message = 'UnboundLocalError while writing file name - {}'.format(target_file)
log_error(message)
except Exception as e:
message = 'Total failure "{}" while writing file name - {}'.format(e, target_file)
log_error(message)
else:
file_to_write.write(temp)
file_to_write.write(temp2)
finally:
file_to_write.close()
Line - except Exception as e:, does not help with anything, the whole thing still crashes. So far i have excluded only Out Of Memory scenario, because this script is designed to be handled on low spec VPS, but in testing stage I run it in environment with 8 GB of ram. So if You have any theories please share.

The exception really says what is happening.
This part is telling you obvious issue:
UnboundLocalError: local variable 'file_to_write' referenced before assignment
Even you have try/except blocks that catches various exceptions, else/finally doesn't.
More specifically in finally block you reference variable that might not exist since exception with doing: file_to_write = open(target_file, 'w') is being handled by at least last except Exception as e block, but then finally is run too.
Since exception happened as a result of not being able to open target file, you do not have anything assigned to file_to_write and that variable doesn't exist after exception is handled. That is why finally block crashes.

Related

Getting untraceable OSError (exception ignored) after program terminates

I'm afraid that I won't be of any help for tracking this down since this issue might come from basically anywhere in my code.
The exception occurs after running a Tensorflow training (I don't know if the issue comes from Tensorflow though). I can't see anything broken so I assume this is just an exception that happens somewhere during teardown.
My biggest issue, besides this annoying exception itself, is the fact that I have literally zero information on where it originates from.
...
I1123 07:51:21.469002 140168846268160 experiment.py:174] Experiment completed.
Exception ignored in: <function Pool.__del__ at 0x7f7b225304c0>
Traceback (most recent call last):
File "/home/sfalk/miniconda3/envs/speech/lib/python3.9/multiprocessing/pool.py", line 268, in __del__
self._change_notifier.put(None)
File "/home/sfalk/miniconda3/envs/speech/lib/python3.9/multiprocessing/queues.py", line 378, in put
self._writer.send_bytes(obj)
File "/home/sfalk/miniconda3/envs/speech/lib/python3.9/multiprocessing/connection.py", line 205, in send_bytes
self._send_bytes(m[offset:offset + size])
File "/home/sfalk/miniconda3/envs/speech/lib/python3.9/multiprocessing/connection.py", line 416, in _send_bytes
self._send(header + buf)
File "/home/sfalk/miniconda3/envs/speech/lib/python3.9/multiprocessing/connection.py", line 373, in _send
n = write(self._handle, buf)
OSError: [Errno 9] Bad file descriptor
Is there any way I can get an idea where this is happening in my code?

Using VS Code to debug python files. Exception thrown on breakpoint and breakpoint is ignored

Tried with multiple different python files. Every time I try to use the debugger in vs code and set breakpoints the breakpoint gets ignored and exception gets raised and the script continues on. I've been googling and tinkering for over 2 hours and can't seem to figure out what's going on here. Tried rebooting PC, running vs code as admin, uninstall/reinstall the python extension for vs code. Tried to dig into the files mentioned in the traceback and pinpointed the function that seems to be raising the exception but I can't figure out where it's being called from or why it's raising the exception. I'm still new-ish to Python. Debugging works properly on my laptop but for whatever reason my desktop is having this issue.
Traceback (most recent call last):
File "c:\Users\Joel\.vscode\extensions\ms-python.python-2020.7.96456\pythonFiles\lib\python\debugpy\_vendored\pydevd\pydevd_file_utils.py", line 529, in _original_file_to_client
return cache[filename]
KeyError: 'c:\\users\\joel\\local settings\\application data\\programs\\python\\python37-32\\lib\\runpy.py'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "c:\Users\Joel\.vscode\extensions\ms-python.python-2020.7.96456\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_comm.py", line 330, in _on_run
self.process_net_command_json(self.py_db, json_contents)
File "c:\Users\Joel\.vscode\extensions\ms-python.python-2020.7.96456\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_process_net_command_json.py", line 190, in process_net_command_json
cmd = on_request(py_db, request)
File "c:\Users\Joel\.vscode\extensions\ms-python.python-2020.7.96456\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_process_net_command_json.py", line 771, in on_stacktrace_request
self.api.request_stack(py_db, request.seq, thread_id, fmt=fmt, start_frame=start_frame, levels=levels)
File "c:\Users\Joel\.vscode\extensions\ms-python.python-2020.7.96456\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_api.py", line 214, in request_stack
if internal_get_thread_stack.can_be_executed_by(get_current_thread_id(threading.current_thread())):
File "c:\Users\Joel\.vscode\extensions\ms-python.python-2020.7.96456\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_comm.py", line 661, in can_be_executed_by
py_db, self.seq, self.thread_id, frame, self._fmt, must_be_suspended=not timed_out, start_frame=self._start_frame, levels=self._levels)
File "c:\Users\Joel\.vscode\extensions\ms-python.python-2020.7.96456\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_net_command_factory_json.py", line 213, in make_get_thread_stack_message
py_db, frames_list
File "c:\Users\Joel\.vscode\extensions\ms-python.python-2020.7.96456\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_net_command_factory_xml.py", line 175, in _iter_visible_frames_info
new_filename_in_utf8, applied_mapping = pydevd_file_utils.norm_file_to_client(filename_in_utf8)
File "c:\Users\Joel\.vscode\extensions\ms-python.python-2020.7.96456\pythonFiles\lib\python\debugpy\_vendored\pydevd\pydevd_file_utils.py", line 531, in _original_file_to_client
translated = _path_to_expected_str(get_path_with_real_case(_AbsFile(filename)))
File "c:\Users\Joel\.vscode\extensions\ms-python.python-2020.7.96456\pythonFiles\lib\python\debugpy\_vendored\pydevd\pydevd_file_utils.py", line 221, in _get_path_with_real_case
return _resolve_listing(drive, iter(parts))
File "c:\Users\Joel\.vscode\extensions\ms-python.python-2020.7.96456\pythonFiles\lib\python\debugpy\_vendored\pydevd\pydevd_file_utils.py", line 184, in _resolve_listing
dir_contents = cache[resolved_lower] = os.listdir(resolved)
PermissionError: [WinError 5] Access is denied: 'C:\\Users\\Joel\\Local Settings'
So I get this traceback every time a breakpoint is hit. Taking a peek at the "_original_file_to_client" function in "pydevd_file_utils.py" we get this:
def _original_file_to_client(filename, cache={}):
try:
return cache[filename]
except KeyError:
translated = _path_to_expected_str(get_path_with_real_case(_AbsFile(filename)))
cache[filename] = (translated, False)
return cache[filename]
I wasn't able to figure out where this function was being called from or what the expected output was supposed to be. Any help with this would be greatly appreciated!
Edit: Forgot to mention I'm using Windows 10 if it wasn't obvious from the trace

This is a similar question. The spaces in the filename cause this problem:
"Local Settings", "application data"

Why Does This Python Script Freeze Up?

I have a function that tries a list of regexes on some text to see if there's a match.
#timeout(1)
def get_description(data, old):
description = None
if old:
for rx in rxs:
try:
matched = re.search(rx, data, re.S|re.M)
if matched is not None:
try:
description = matched.groups(1)
if description:
return description
else:
continue
except TimeoutError as why:
print(why)
continue
else:
continue
except Exception as why:
print(why)
pass
I use this function in a loop and run a bunch of text files through. In one file, execution keeps stopping:
Traceback (most recent call last):
File "extract.py", line 223, in <module>
scrape()
File "extract.py", line 40, in scrape
metadata = get_metadata(f)
File "extract.py", line 186, in get_metadata
description = get_description(text, True)
File "extract.py", line 64, in get_description
matched = re.search(rx, data, re.S|re.M)
File "C:\Users\Joseph\AppData\Local\Programs\Python\Python36\lib\re.py", line 182, in search
return _compile(pattern, flags).search(string)
KeyboardInterrupt
It simply hangs on evaluating matched = re.search(rx, data, re.S|re.M). For many other files, when no match is found, it goes on to the next regex. With this file, it does nothing and throws no exception. Any ideas what could be causing this?
EDIT:
I'm now trying to detect timeout errors (This is more efficient for me than changing the rx's)
The TimeoutError, borrowed from this question, is triggered but doesn't cause the script to keep running. It simply writes 'Timer expired' and stays frozen.

Error/ Exception handling in for loop - python

I am using the Google Cloud NL API to analyse the sentiment of some descriptions. As for some rows the error InvalidArgument: 400 The language vi is not supported for document_sentiment analysis.keeps popping up, I would like to build a way around it instead of desperately trying to find the reason why this happens and erase the responsible rows. Unfortunately, I am relatively new to Python and am not sure how to properly do it.
My code is the following:
description_list = []
sentimentscore_list=[]
magnitude_list=[]
# Create a Language client
language_client = google.cloud.language.LanguageServiceClient()
for i in range(len(description)): # use the translated description if the original description is not in English
if description_trans[i] == '':
descr = description[i]
else:
descr = description_trans[i]
document = google.cloud.language.types.Document(
content=descr,
type=google.cloud.language.enums.Document.Type.PLAIN_TEXT)
# Use Language to detect the sentiment of the text.
response = language_client.analyze_sentiment(document=document)
sentiment = response.document_sentiment
sentimentscore_list.append(sentiment.score)
magnitude_list.append(sentiment.magnitude)
# Add the description that was actually used to the description list
description_list.append(descr)
Would anyone be able to explain me how to wrap this for loop (or probably the latter part is sufficient) into the error/exception handling so that it simply "skips over" the one it can't read and continues with the next one? Also I want the 'description_list' to be only appended when the description is actually analysed (so not when it gets stuck in the error handling).
Any help is much appreciated!! Thanks :)
Edit: I was asked for a more complete error traceback:
Traceback (most recent call last):
File "<ipython-input-64-6e3db1d976c9>", line 1, in <module>
runfile('/Users/repos/NLPAnalysis/GoogleTest.py', wdir='/Users/repos/NLPAnalysis')
File "/Users/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 710, in runfile
execfile(filename, namespace)
File "/Users/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 101, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "/Users/repos/NLPAnalysis/GoogleTest.py", line 45, in <module>
response = language_client.analyze_sentiment(document=document)
File "/Users/anaconda3/lib/python3.6/site-packages/google/cloud/language_v1/gapic/language_service_client.py", line 180, in analyze_sentiment
return self._analyze_sentiment(request, retry=retry, timeout=timeout)
File "/Users/anaconda3/lib/python3.6/site-packages/google/api_core/gapic_v1/method.py", line 139, in __call__
return wrapped_func(*args, **kwargs)
File "/Users/anaconda3/lib/python3.6/site-packages/google/api_core/retry.py", line 260, in retry_wrapped_func
on_error=on_error,
File "/Users/anaconda3/lib/python3.6/site-packages/google/api_core/retry.py", line 177, in retry_target
return target()
File "/Users/anaconda3/lib/python3.6/site-packages/google/api_core/timeout.py", line 206, in func_with_timeout
return func(*args, **kwargs)
File "/Users/anaconda3/lib/python3.6/site-packages/google/api_core/grpc_helpers.py", line 56, in error_remapped_callable
six.raise_from(exceptions.from_grpc_error(exc), exc)
File "<string>", line 3, in raise_from
InvalidArgument: 400 The language vi is not supported for document_sentiment analysis.

I agree with ThatBird that wrapping too much code in a try-block can make debugging internal errors complicated. I would suggest utilizing python's continue keyword.
try:
# smallest block of code you foresee an error in
response = language_client.analyze_sentiment(document=document) # I think your exception is being raised in this call
except InvalidArgument as e:
# your trace shows InvalidArgument being raised and it appears you dont care about it
continue # continue to next iteration since this error is expected
except SomeOtherOkayException as e:
# this is an example exception that is also OK and "skippable"
continue # continue to next iteration
except Exception as e:
# all other exceptions are BAD and unexpected.This is a larger problem than just this loop
raise e # break the looping and raise to calling function
sentiment = response.document_sentiment
sentimentscore_list.append(sentiment.score)
magnitude_list.append(sentiment.magnitude)
# Add the description that was actually used to the description list
description_list.append(descr)
# more code here...
Essentially, you're explicitly catching Exceptions that are expected, and discarding that iteration if they occur and continuing to the next one. You should raise all other exceptions that are not expected.

In the traceback, look at the fourth line, it's the same line that is in your code and causing an exception. We always put try except around the code block that we think is going to cause an exception. Everything else is put outside the block.
try:
response = language_client.analyze_sentiment(document=document)
except InvalidArgument:
continue
# Assuming none of these would work if we don't get response?
description_list.append(descr)
sentiment = response.document_sentiment
entimentscore_list.append(sentiment.score)
magnitude_list.append(sentiment.magnitude)
# Add the description that was actually used to the description list
We try to get response from language client, it raises an exception saying InvalidArgument, we catch that. Now we know we don't need to do anything and we use continue, and move on to the next iteration.
You probably will need to import InvalidArgument like -
from google.api_core.exceptions import InvalidArgument
before using it in the code.
You are right about continue. More about continue statement and how to handle exceptions in python.

Why does my code raise an exception?

Hi I am trying to create an FTP server and to aid the development I'm using pyftpdlib. What I wanted to do is to do some file operations if a user downloads a specific file but sometimes it raises an exception and I don't really know why.
I wrote my own handler in pyftpdlib after this tutorial: http://code.google.com/p/pyftpdlib/wiki/Tutorial#3.8_-_Event_callbacks
But something goes terribly wrong sometimes when the user downloads the log file (which I intend to do some file operations on) and I don't really understand why. I have another class which basically reads from a configuration file and the error message said it couldn't find FTP Section. But it's strange because I clearly have it in my configuration file and it is working sometimes perfectly.
May this error appear because I have two "Connection" objects? That's the only guess I have, so I would be very glad if someone could explain what's going wrong. Here is my code that's troubled (nevermind the file.name check because that was very recently added):
class ArchiveHandler(ftpserver.FTPHandler):
def on_login(self, username):
# do something when user login
pass
def on_logout(self, username):
# do something when user logs out
pass
def on_file_sent(self, file):
"What to do when retrieved the file the class is watching over"
attr = Connection()
if attr.getarchive() == 'true':
t = datetime.now()
if file.name == "log.log":
try:
shutil.copy2(file, attr.getdir() + ".archive/" + str(t.strftime("%Y-%m-%d_%H:%M:%S") + '.log'))
except OSError:
print 'Could not copy file'
raise
if attr.getremain() == 'false':
try:
os.remove(file)
except OSError:
print 'Could not remove file'
raise
The full source:
http://pastie.org/3552079
Source of the config-file:
http://pastie.org/3552085
EDIT-> (and of course the error):
[root]#85.230.122.159:40659 unhandled exception in instance <pyftpdlib.ftpserver.DTPHandler object at 0xb75f49ec>
Traceback (most recent call last):
File "/usr/lib/python2.6/asyncore.py", line 84, in write
obj.handle_write_event()
File "/usr/lib/python2.6/asyncore.py", line 435, in handle_write_event
self.handle_write()
File "/usr/lib/python2.6/asynchat.py", line 174, in handle_write
self.initiate_send()
File "/usr/lib/python2.6/asynchat.py", line 215, in initiate_send
self.handle_close()
File "/usr/local/lib/python2.6/dist-packages/pyftpdlib/ftpserver.py", line 1232, in handle_close
self.close()
File "/usr/local/lib/python2.6/dist-packages/pyftpdlib/ftpserver.py", line 1261, in close
self.cmd_channel.on_file_sent(filename)
File "ftp.py", line 87, in on_file_sent
File "ftp.py", line 12, in __init__
File "/usr/lib/python2.6/ConfigParser.py", line 311, in get
raise NoSectionError(section)
NoSectionError: No section: 'FTP Section'

The problem is in a section you didn't include. It says
File "ftp.py", line 12, in __init__
File "/usr/lib/python2.6/ConfigParser.py", line 311, in get
raise NoSectionError(section)
NoSectionError: No section: 'FTP Section'
So from the first line, we know that whatever is on line 12 of ftp.py is the problem (since everything below that isn't our code, so we assume that it's correct).
Line 12 is this:
self.ip = config.get('FTP Section', 'hostname')
And the error message says "No section: 'FTP Section'".
From this we can assume there's an error in the config file (that it doesn't have a "FTP Section").
Are you pointing at the correct config file? Is it in the same directory that you're running the script from? Being in the same folder as the script will not work, it must be the folder that you run the script from.
I think this is the problem you're having, since according to the documentation:
If none of the named files exist, the ConfigParser instance will contain an empty dataset.
You can confirm this by trying to open the file.

The problem in this case was that by reading the file it left the file open.
I changed to this and it's working much better:
config = ConfigParser.RawConfigParser()
fp = open('config.cfg')
config.readfp(fp)
And then when I'm finished reading in the constructor I add:
#Close the file
fp.close()
And voila, you can open how many objects of the class you want and it won't show any errors. :)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Error while writing to file with multiprocessing - python

Related

Getting untraceable OSError (exception ignored) after program terminates

Using VS Code to debug python files. Exception thrown on breakpoint and breakpoint is ignored

Why Does This Python Script Freeze Up?

Error/ Exception handling in for loop - python

Why does my code raise an exception?

Categories

Resources