I am having trouble with the following code:
import praw
import argparse
# argument handling was here
def main():
r = praw.Reddit(user_agent='Python Reddit Image Grabber v0.1')
for i in range(len(args.subreddits)):
try:
r.get_subreddit(args.subreddits[i]) # test to see if the subreddit is valid
except:
print "Invalid subreddit"
else:
submissions = r.get_subreddit(args.subreddits[i]).get_hot(limit=100)
print [str(x) for x in submissions]
if __name__ == '__main__':
main()
subreddit names are taken as arguments to the program.
When an invalid args.subreddits is passed to get_subreddit, it throws an exception which should be caught in the above code.
When a valid args.subreddit name is given as an argument, the program runs fine.
But when an invalid args.subreddit name is given, the exception is not thrown, and instead the following uncaught exception is outputted.
Traceback (most recent call last):
File "./pyrig.py", line 33, in <module>
main()
File "./pyrig.py", line 30, in main
print [str(x) for x in submissions]
File "/usr/local/lib/python2.7/dist-packages/praw/__init__.py", line 434, in get_content
page_data = self.request_json(url, params=params)
File "/usr/local/lib/python2.7/dist-packages/praw/decorators.py", line 95, in wrapped
return_value = function(reddit_session, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/praw/__init__.py", line 469, in request_json
response = self._request(url, params, data)
File "/usr/local/lib/python2.7/dist-packages/praw/__init__.py", line 342, in _request
response = handle_redirect()
File "/usr/local/lib/python2.7/dist-packages/praw/__init__.py", line 316, in handle_redirect
url = _raise_redirect_exceptions(response)
File "/usr/local/lib/python2.7/dist-packages/praw/internal.py", line 165, in _raise_redirect_exceptions
.format(subreddit))
praw.errors.InvalidSubreddit: `soccersdsd` is not a valid subreddit
I can't tell what I am doing wrong. I have also tried rewriting the exception code as
except praw.errors.InvalidSubreddit:
which also does not work.
EDIT: exception info for Praw can be found here
File "./pyrig.py", line 30, in main
print [str(x) for x in submissions]
The problem, as your traceback indicates is that the exception doesn't occur when you call get_subreddit In fact, it also doesn't occur when you call get_hot. The first is a lazy invocation that just creates a dummy Subreddit object but doesn't do anything with it. The second, is a generator that doesn't make any requests until you actually try to iterate over it.
Thus you need to move the exception handling code around your print statement (line 30) which is where the request is actually made that results in the exception.
Related
I have the following function:
def get_prev_match_elos(player_id, prev_matches):
try:
last_match = prev_matches[-1]
return last_match, player_id
except IndexError:
return
Sometimes prev_matches can be an empty list so I've added the try except block to catch an IndexError. However, I'm still getting an explicit IndexError on last_match = prev_matches[-1] when I pass an empty list instead of the except block kicking in.
I've tried replicating this function in another file and it works fine! Any ideas?
Full error:
Exception has occurred: IndexError
list index out of range
File "C:\Users\Philip\OneDrive\Betting\Capra\Tennis\polgara\elo.py", line 145, in get_prev_match_elos
last_match = prev_matches[-1]
File "C:\Users\Philip\OneDrive\Betting\Capra\Tennis\polgara\elo.py", line 24, in engineer_elos
get_prev_match_elos(player_id, prev_matches_all_surface)
File "C:\Users\Philip\OneDrive\Betting\Capra\Tennis\polgara\updater.py", line 499, in engineer_variables
engineer_elos(dal, p1_id, date, surface, params)
File "C:\Users\Philip\OneDrive\Betting\Capra\Tennis\polgara\updater.py", line 99, in run_updater
engineer_variables(dal, matches_for_engineering, params)
File "C:\Users\Philip\OneDrive\Betting\Capra\Tennis\polgara\decorators.py", line 12, in wrapper_timer
value = func(*args, **kwargs)
File "C:\Users\Philip\OneDrive\Betting\Capra\Tennis\polgara\updater.py", line 72, in main
run_updater(dal, scraper)
File "C:\Users\Philip\OneDrive\Betting\Capra\Tennis\polgara\updater.py", line 645, in <module>
main()
I also can't replicate the error, but an easy fix is to not use Exceptions this way. Programming languages aren't optimized for manually handling exceptions often. They should only be used for preemptively capturing possible failures, not for normal logic. Try checking if it's empty instead.
def get_prev_match_elos(player_id, prev_matches):
if not prev_matches:
return
last_match = prev_matches[-1]
return last_match, player_id
Here's Microsoft's take, using C# as the language:
I have a function that tries a list of regexes on some text to see if there's a match.
#timeout(1)
def get_description(data, old):
description = None
if old:
for rx in rxs:
try:
matched = re.search(rx, data, re.S|re.M)
if matched is not None:
try:
description = matched.groups(1)
if description:
return description
else:
continue
except TimeoutError as why:
print(why)
continue
else:
continue
except Exception as why:
print(why)
pass
I use this function in a loop and run a bunch of text files through. In one file, execution keeps stopping:
Traceback (most recent call last):
File "extract.py", line 223, in <module>
scrape()
File "extract.py", line 40, in scrape
metadata = get_metadata(f)
File "extract.py", line 186, in get_metadata
description = get_description(text, True)
File "extract.py", line 64, in get_description
matched = re.search(rx, data, re.S|re.M)
File "C:\Users\Joseph\AppData\Local\Programs\Python\Python36\lib\re.py", line 182, in search
return _compile(pattern, flags).search(string)
KeyboardInterrupt
It simply hangs on evaluating matched = re.search(rx, data, re.S|re.M). For many other files, when no match is found, it goes on to the next regex. With this file, it does nothing and throws no exception. Any ideas what could be causing this?
EDIT:
I'm now trying to detect timeout errors (This is more efficient for me than changing the rx's)
The TimeoutError, borrowed from this question, is triggered but doesn't cause the script to keep running. It simply writes 'Timer expired' and stays frozen.
I am trying to access an API which to return a set of products. Since the execution is slow I was hoping could use multiprocessing to make it faster. The API works perfectly when accessed using a simple for loop.
Here is my code:
from multiprocessing import Pool
from urllib2 import Request, urlopen, URLError
import json
def f(a):
request = Request('API'+ str(a))
try:
response = urlopen(request)
data = response.read()
except URLError, e:
print 'URL ERROR:', e
s=json.loads(data)
#count += len(s['Results'])
#print count
products=[]
for i in range(len(s['Results'])):
if (s['Results'][i]['IsSyndicated']==False):
try:
products.append(int(s['Results'][i]['ProductId']))
except ValueError as e:
products.append(s['Results'][i]['ProductId'])
return products
list=[0,100,200]
if __name__ == '__main__':
p = Pool(4)
result=p.map(f, list)
print result
Here is the error message:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\z080302\Desktop\WinPython-32bit-2.7.6.3\python-2.7.6\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 540, in runfile
execfile(filename, namespace)
File "C:/Users/z080302/Desktop/Python_Projects/mp_test.py", line 36, in <module>
result=p.map(f, list)
File "C:\Users\z080302\Desktop\WinPython-32bit-2.7.6.3\python-2.7.6\lib\multiprocessing\pool.py", line 250, in map
return self.map_async(func, iterable, chunksize).get()
File "C:\Users\z080302\Desktop\WinPython-32bit-2.7.6.3\python-2.7.6\lib\multiprocessing\pool.py", line 554, in get
raise self._value
UnboundLocalError: local variable 'data' referenced before assignment
I was thinking even with multiprocessing the function will still be executed sequentially. So why am I getting UnboundLocalError?
In this code:
try:
response = urlopen(request)
data = response.read()
except URLError, e:
print 'URL ERROR:', e
If urlopen throws a URLError exception, the following line (data = response.read() is never executed. So when you come to:
s=json.loads(data)
The variable data has never been assigned. You probably want to abort processing in the event of a URLError, since that suggests you will not have any JSON data.
The accepted answer is about the actual problem, but I thought I'll add my experience for others who come here because of mystic errors raised by multiprocessings ApplyResult.get with raise self._value. If you are getting TypeError, ValueError or basically any other error which in your case has nothing to do with multiprocessing then it's because that error is raised not by multiprocessing really, but by your code that you are running in the process that you are attempting to manage (or the thread if you happen to be using multiprocessing.pool.ThreadPool which I was).
This seems to be the code for the poplib.error_proto.
class error_proto(Exception): pass
It just passes the bytes from the POP response in the exception. What I would like to do is catch any exception, take those bytes, use .decode('ascii') on them, and print them as a string. I've written my own test setup like so:
class B(Exception): pass
def bex(): raise B(b'Problem')
try:
bex()
except B as err:
print(err.decode('ascii'))
I've tried replacing the last line with:
b = bytes(err)
print(b.decode('ascii'))
But to no avail. Is this possible and if so, how would I implement this?
UPDATE: Though, as falsetru points out, the documentation says results are returned as strings, this is not the case:
>>> p = poplib.POP3('mail.site.com')
>>> p.user('skillian#site.com')
b'+OK '
>>> p.pass_('badpassword')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python33\lib\poplib.py", line 201, in pass_
return self._shortcmd('PASS %s' % pswd)
File "C:\Python33\lib\poplib.py", line 164, in _shortcmd
return self._getresp()
File "C:\Python33\lib\poplib.py", line 140, in _getresp
raise error_proto(resp)
poplib.error_proto: b'-ERR authorization failed Check your server settings.'
>>>
According to poplib.error_proto documentation:
Exception raised on any errors from this module (errors from socket module are not caught). The reason for the exception is passed to the constructor as a string.
So, you don't need to decode it.
UPDATE It seems like the documentation does not match the actual implementation.
You can access the arguments passed to the exception constructor using args attribute.
p = poplib.POP3('mail.site.com')
try:
p.user('skillian#site.com')
p.pass_('badpassword')
except poplib.error_proto as e:
print(e.args[0].decode('ascii')) # `'ascii'` is not necessary.
I'm writing a crawler to download the static html pages using urllib.
The get_page function works for 1 cycle but when i try to loop it, it doesn't open the content to the next url i've fed in.
How do i make urllib.urlopen continuously download HTML pages?
If it is not possible, is there any other suggestion to download
webpages within my python code?
my code below only returns the html for the 1st website in the seed list:
import urllib
def get_page(url):
return urllib.urlopen(url).read().decode('utf8')
seed = ['http://www.pmo.gov.sg/content/pmosite/home.html',
'http://www.pmo.gov.sg/content/pmosite/aboutpmo.html']
for j in seed:
print "here"
print get_page(j)
The same crawl "once-only" problem also occurs with urllib2:
import urllib2
def get_page(url):
req = urllib2.Request(url)
response = urllib2.urlopen(req)
return response.read().decode('utf8')
seed = ['http://www.pmo.gov.sg/content/pmosite/home.html',
'http://www.pmo.gov.sg/content/pmosite/aboutpmo.html']
for j in seed:
print "here"
print get_page(j)
Without the exception, i'm getting an IOError with urllib:
Traceback (most recent call last):
File "/home/alvas/workspace/SingCorp/sgcrawl.py", line 91, in <module>
print get_page(j)
File "/home/alvas/workspace/SingCorp/sgcrawl.py", line 4, in get_page
return urllib.urlopen(url).read().decode('utf8')
File "/usr/lib/python2.7/urllib.py", line 86, in urlopen
return opener.open(url)
File "/usr/lib/python2.7/urllib.py", line 207, in open
return getattr(self, name)(url)
File "/usr/lib/python2.7/urllib.py", line 462, in open_file
return self.open_local_file(url)
File "/usr/lib/python2.7/urllib.py", line 476, in open_local_file
raise IOError(e.errno, e.strerror, e.filename)
IOError: [Errno 2] No such file or directory: 'http://www.pmo.gov.sg/content/pmosite/aboutpmo.html'
Without the exception, i'm getting a ValueError with urllib2:
Traceback (most recent call last):
File "/home/alvas/workspace/SingCorp/sgcrawl.py", line 95, in <module>
print get_page(j)
File "/home/alvas/workspace/SingCorp/sgcrawl.py", line 7, in get_page
response = urllib2.urlopen(req)
File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 392, in open
protocol = req.get_type()
File "/usr/lib/python2.7/urllib2.py", line 254, in get_type
raise ValueError, "unknown url type: %s" % self.__original
ValueError: unknown url type: http://www.pmo.gov.sg/content/pmosite/aboutpmo.html
ANSWERED:
The IOError and ValueError occurred because there was some sort of Unicode byte order mark (BOM). A non-break space was found in the second URL. Thanks for all your help and suggestion in solving the problem!!
your code is choking on .read().decode('utf8').
but you wouldn't see that since you are just swallowing exceptions. urllib works fine "more than once".
import urllib
def get_page(url):
return urllib.urlopen(url).read()
seeds = ['http://www.pmo.gov.sg/content/pmosite/home.html',
'http://www.pmo.gov.sg/content/pmosite/aboutpmo.html']
for seed in seeds:
print 'here'
print get_page(seed)
Both of your examples work fine for me. The only explanation I can think of for your exact errors is that the second URL string contains some sort of non-printable character (a Unicode BOM, perhaps) that got filtered out when pasting the code here. Try copying the code back from this site into your file, or retyping the entire second string from scratch.