I have a function that tries a list of regexes on some text to see if there's a match.
#timeout(1)
def get_description(data, old):
description = None
if old:
for rx in rxs:
try:
matched = re.search(rx, data, re.S|re.M)
if matched is not None:
try:
description = matched.groups(1)
if description:
return description
else:
continue
except TimeoutError as why:
print(why)
continue
else:
continue
except Exception as why:
print(why)
pass
I use this function in a loop and run a bunch of text files through. In one file, execution keeps stopping:
Traceback (most recent call last):
File "extract.py", line 223, in <module>
scrape()
File "extract.py", line 40, in scrape
metadata = get_metadata(f)
File "extract.py", line 186, in get_metadata
description = get_description(text, True)
File "extract.py", line 64, in get_description
matched = re.search(rx, data, re.S|re.M)
File "C:\Users\Joseph\AppData\Local\Programs\Python\Python36\lib\re.py", line 182, in search
return _compile(pattern, flags).search(string)
KeyboardInterrupt
It simply hangs on evaluating matched = re.search(rx, data, re.S|re.M). For many other files, when no match is found, it goes on to the next regex. With this file, it does nothing and throws no exception. Any ideas what could be causing this?
EDIT:
I'm now trying to detect timeout errors (This is more efficient for me than changing the rx's)
The TimeoutError, borrowed from this question, is triggered but doesn't cause the script to keep running. It simply writes 'Timer expired' and stays frozen.
Related
I have the following function:
def get_prev_match_elos(player_id, prev_matches):
try:
last_match = prev_matches[-1]
return last_match, player_id
except IndexError:
return
Sometimes prev_matches can be an empty list so I've added the try except block to catch an IndexError. However, I'm still getting an explicit IndexError on last_match = prev_matches[-1] when I pass an empty list instead of the except block kicking in.
I've tried replicating this function in another file and it works fine! Any ideas?
Full error:
Exception has occurred: IndexError
list index out of range
File "C:\Users\Philip\OneDrive\Betting\Capra\Tennis\polgara\elo.py", line 145, in get_prev_match_elos
last_match = prev_matches[-1]
File "C:\Users\Philip\OneDrive\Betting\Capra\Tennis\polgara\elo.py", line 24, in engineer_elos
get_prev_match_elos(player_id, prev_matches_all_surface)
File "C:\Users\Philip\OneDrive\Betting\Capra\Tennis\polgara\updater.py", line 499, in engineer_variables
engineer_elos(dal, p1_id, date, surface, params)
File "C:\Users\Philip\OneDrive\Betting\Capra\Tennis\polgara\updater.py", line 99, in run_updater
engineer_variables(dal, matches_for_engineering, params)
File "C:\Users\Philip\OneDrive\Betting\Capra\Tennis\polgara\decorators.py", line 12, in wrapper_timer
value = func(*args, **kwargs)
File "C:\Users\Philip\OneDrive\Betting\Capra\Tennis\polgara\updater.py", line 72, in main
run_updater(dal, scraper)
File "C:\Users\Philip\OneDrive\Betting\Capra\Tennis\polgara\updater.py", line 645, in <module>
main()
I also can't replicate the error, but an easy fix is to not use Exceptions this way. Programming languages aren't optimized for manually handling exceptions often. They should only be used for preemptively capturing possible failures, not for normal logic. Try checking if it's empty instead.
def get_prev_match_elos(player_id, prev_matches):
if not prev_matches:
return
last_match = prev_matches[-1]
return last_match, player_id
Here's Microsoft's take, using C# as the language:
I am using the Google Cloud NL API to analyse the sentiment of some descriptions. As for some rows the error InvalidArgument: 400 The language vi is not supported for document_sentiment analysis.keeps popping up, I would like to build a way around it instead of desperately trying to find the reason why this happens and erase the responsible rows. Unfortunately, I am relatively new to Python and am not sure how to properly do it.
My code is the following:
description_list = []
sentimentscore_list=[]
magnitude_list=[]
# Create a Language client
language_client = google.cloud.language.LanguageServiceClient()
for i in range(len(description)): # use the translated description if the original description is not in English
if description_trans[i] == '':
descr = description[i]
else:
descr = description_trans[i]
document = google.cloud.language.types.Document(
content=descr,
type=google.cloud.language.enums.Document.Type.PLAIN_TEXT)
# Use Language to detect the sentiment of the text.
response = language_client.analyze_sentiment(document=document)
sentiment = response.document_sentiment
sentimentscore_list.append(sentiment.score)
magnitude_list.append(sentiment.magnitude)
# Add the description that was actually used to the description list
description_list.append(descr)
Would anyone be able to explain me how to wrap this for loop (or probably the latter part is sufficient) into the error/exception handling so that it simply "skips over" the one it can't read and continues with the next one? Also I want the 'description_list' to be only appended when the description is actually analysed (so not when it gets stuck in the error handling).
Any help is much appreciated!! Thanks :)
Edit: I was asked for a more complete error traceback:
Traceback (most recent call last):
File "<ipython-input-64-6e3db1d976c9>", line 1, in <module>
runfile('/Users/repos/NLPAnalysis/GoogleTest.py', wdir='/Users/repos/NLPAnalysis')
File "/Users/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 710, in runfile
execfile(filename, namespace)
File "/Users/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 101, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "/Users/repos/NLPAnalysis/GoogleTest.py", line 45, in <module>
response = language_client.analyze_sentiment(document=document)
File "/Users/anaconda3/lib/python3.6/site-packages/google/cloud/language_v1/gapic/language_service_client.py", line 180, in analyze_sentiment
return self._analyze_sentiment(request, retry=retry, timeout=timeout)
File "/Users/anaconda3/lib/python3.6/site-packages/google/api_core/gapic_v1/method.py", line 139, in __call__
return wrapped_func(*args, **kwargs)
File "/Users/anaconda3/lib/python3.6/site-packages/google/api_core/retry.py", line 260, in retry_wrapped_func
on_error=on_error,
File "/Users/anaconda3/lib/python3.6/site-packages/google/api_core/retry.py", line 177, in retry_target
return target()
File "/Users/anaconda3/lib/python3.6/site-packages/google/api_core/timeout.py", line 206, in func_with_timeout
return func(*args, **kwargs)
File "/Users/anaconda3/lib/python3.6/site-packages/google/api_core/grpc_helpers.py", line 56, in error_remapped_callable
six.raise_from(exceptions.from_grpc_error(exc), exc)
File "<string>", line 3, in raise_from
InvalidArgument: 400 The language vi is not supported for document_sentiment analysis.
I agree with ThatBird that wrapping too much code in a try-block can make debugging internal errors complicated. I would suggest utilizing python's continue keyword.
try:
# smallest block of code you foresee an error in
response = language_client.analyze_sentiment(document=document) # I think your exception is being raised in this call
except InvalidArgument as e:
# your trace shows InvalidArgument being raised and it appears you dont care about it
continue # continue to next iteration since this error is expected
except SomeOtherOkayException as e:
# this is an example exception that is also OK and "skippable"
continue # continue to next iteration
except Exception as e:
# all other exceptions are BAD and unexpected.This is a larger problem than just this loop
raise e # break the looping and raise to calling function
sentiment = response.document_sentiment
sentimentscore_list.append(sentiment.score)
magnitude_list.append(sentiment.magnitude)
# Add the description that was actually used to the description list
description_list.append(descr)
# more code here...
Essentially, you're explicitly catching Exceptions that are expected, and discarding that iteration if they occur and continuing to the next one. You should raise all other exceptions that are not expected.
In the traceback, look at the fourth line, it's the same line that is in your code and causing an exception. We always put try except around the code block that we think is going to cause an exception. Everything else is put outside the block.
try:
response = language_client.analyze_sentiment(document=document)
except InvalidArgument:
continue
# Assuming none of these would work if we don't get response?
description_list.append(descr)
sentiment = response.document_sentiment
entimentscore_list.append(sentiment.score)
magnitude_list.append(sentiment.magnitude)
# Add the description that was actually used to the description list
We try to get response from language client, it raises an exception saying InvalidArgument, we catch that. Now we know we don't need to do anything and we use continue, and move on to the next iteration.
You probably will need to import InvalidArgument like -
from google.api_core.exceptions import InvalidArgument
before using it in the code.
You are right about continue. More about continue statement and how to handle exceptions in python.
I am working on html parser, it uses Python multiprocessing Pool, because it runs through huge number of pages. The output from every page is saved to a separate CSV file. The problem is sometimes I get unexpected error and whole program crashes and I have errors handling almost everywhere - reading pages, parsing pages, even writing files. Moreover it looks like the script crashes after it finishes writing a batch of files, so it shouldn't be anything to crush on. Thus after whole day of debugging I am left clueless.
Error:
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "D:\Programy\Python36-32\lib\multiprocessing\pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "D:\Programy\Python36-32\lib\multiprocessing\pool.py", line 44, in mapstar
return list(map(*args))
File "D:\ppp\Python\parser\run.py", line 244, in media_process
save_media_product(DIRECTORY, category, media_data)
File "D:\ppp\Python\parser\manage_output.py", line 180, in save_media_product
_file_manager(target_file, temp, temp2)
File "D:\ppp\Python\store_parser\manage_output.py", line 214, in _file_manager
file_to_write.close()
UnboundLocalError: local variable 'file_to_write' referenced before assignment
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "D:\ppp\Python\store_parser\run.py", line 356, in <module>
main()
File "D:\Rzeczy Mariusza\Python\store_parser\run.py", line 318, in main
process.map(media_process, batch)
File "D:\Programy\Python36-32\lib\multiprocessing\pool.py", line 266, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "D:\Programy\Python36-32\lib\multiprocessing\pool.py", line 644, in get
raise self._value
UnboundLocalError: local variable 'file_to_write' referenced before assignment
It look like, there is an error with variable assignment, but it is not:
try:
file_to_write = open(target_file, 'w')
except OSError:
message = 'OSError while writing file name - {}'.format(target_file)
log_error(message)
except UnboundLocalError:
message = 'UnboundLocalError while writing file name - {}'.format(target_file)
log_error(message)
except Exception as e:
message = 'Total failure "{}" while writing file name - {}'.format(e, target_file)
log_error(message)
else:
file_to_write.write(temp)
file_to_write.write(temp2)
finally:
file_to_write.close()
Line - except Exception as e:, does not help with anything, the whole thing still crashes. So far i have excluded only Out Of Memory scenario, because this script is designed to be handled on low spec VPS, but in testing stage I run it in environment with 8 GB of ram. So if You have any theories please share.
The exception really says what is happening.
This part is telling you obvious issue:
UnboundLocalError: local variable 'file_to_write' referenced before assignment
Even you have try/except blocks that catches various exceptions, else/finally doesn't.
More specifically in finally block you reference variable that might not exist since exception with doing: file_to_write = open(target_file, 'w') is being handled by at least last except Exception as e block, but then finally is run too.
Since exception happened as a result of not being able to open target file, you do not have anything assigned to file_to_write and that variable doesn't exist after exception is handled. That is why finally block crashes.
I am trying to scrape stock prices from Yahoo! Finance into a local database as per a tutorial by Chris Reeves, and I keep getting the above error when trying to execute this code. Can anyone tell me what is wrong here? Thanks.
from threading import Thread
import urllib
import re
import MySQLdb
gmap = {}
def th(ur):
base = "http://finance.yahoo.com/q?s="+ur
regex = '<span id="yfs_l84_'+ur.lower()+'">(.+?)</span>'
pattern = re.compile(regex)
htmltext = urllib.urlopen(base).read()
results = re.findall(pattern, htmltext)
try:
gmap[ur] = results[0]
except:
print "Got an error"
symbolslist = open("multithread/stocks.txt").read()
symbolslist = symbolslist.replace(" ","").split(",")
print symbolslist
threadlist = []
for u in symbolslist:
t = Thread(target=th,args=(u,))
t.start()
threadlist.append(t)
for b in threadlist:
b.join()
This is the exact error that I'm getting:
Exception in thread Thread-1:
Traceback (most recent call last):
File "C:\Python27\lib\threading.py", line 810, in __bootstrap_inner
self.run()
File "C:\Python27\lib\threading.py", line 763, in run
self.__target(*self.__args, **self.__kwargs)
File "multithread/threads.py", line 11, in th
pattern = re.compile(regex)
File "C:\Python27\lib\re.py", line 190, in compile
return _compile(pattern, flags)
File "C:\Python27\lib\re.py", line 242, in _compile
raise error, v # invalid expression
error: unexpected end of regular expression
Alas, you didn't show us the important part. That is, print symbolslist. Something in that list is creating an invalid regular expression when you paste it into the <span ... boilerplate.
You can probably fix it by changing that line like so:
regex = '<span id="yfs_l84_' + re.escape(ur.lower()) + '">(.+?)</span>'
^^^^^^^^^^ ^
However, if that works, it would probably only be hiding the real problem. The real problem is probably that you have some kind of nonsense in symbolslist.
I am having trouble with the following code:
import praw
import argparse
# argument handling was here
def main():
r = praw.Reddit(user_agent='Python Reddit Image Grabber v0.1')
for i in range(len(args.subreddits)):
try:
r.get_subreddit(args.subreddits[i]) # test to see if the subreddit is valid
except:
print "Invalid subreddit"
else:
submissions = r.get_subreddit(args.subreddits[i]).get_hot(limit=100)
print [str(x) for x in submissions]
if __name__ == '__main__':
main()
subreddit names are taken as arguments to the program.
When an invalid args.subreddits is passed to get_subreddit, it throws an exception which should be caught in the above code.
When a valid args.subreddit name is given as an argument, the program runs fine.
But when an invalid args.subreddit name is given, the exception is not thrown, and instead the following uncaught exception is outputted.
Traceback (most recent call last):
File "./pyrig.py", line 33, in <module>
main()
File "./pyrig.py", line 30, in main
print [str(x) for x in submissions]
File "/usr/local/lib/python2.7/dist-packages/praw/__init__.py", line 434, in get_content
page_data = self.request_json(url, params=params)
File "/usr/local/lib/python2.7/dist-packages/praw/decorators.py", line 95, in wrapped
return_value = function(reddit_session, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/praw/__init__.py", line 469, in request_json
response = self._request(url, params, data)
File "/usr/local/lib/python2.7/dist-packages/praw/__init__.py", line 342, in _request
response = handle_redirect()
File "/usr/local/lib/python2.7/dist-packages/praw/__init__.py", line 316, in handle_redirect
url = _raise_redirect_exceptions(response)
File "/usr/local/lib/python2.7/dist-packages/praw/internal.py", line 165, in _raise_redirect_exceptions
.format(subreddit))
praw.errors.InvalidSubreddit: `soccersdsd` is not a valid subreddit
I can't tell what I am doing wrong. I have also tried rewriting the exception code as
except praw.errors.InvalidSubreddit:
which also does not work.
EDIT: exception info for Praw can be found here
File "./pyrig.py", line 30, in main
print [str(x) for x in submissions]
The problem, as your traceback indicates is that the exception doesn't occur when you call get_subreddit In fact, it also doesn't occur when you call get_hot. The first is a lazy invocation that just creates a dummy Subreddit object but doesn't do anything with it. The second, is a generator that doesn't make any requests until you actually try to iterate over it.
Thus you need to move the exception handling code around your print statement (line 30) which is where the request is actually made that results in the exception.