How do I catch all possible errors with url fetch (python)? - python

In my application users enter a url and I try to open the link and get the title of the page. But I realized that there can be many different kinds of errors, including unicode characters or newlines in titles and AttributeError and IOError. I first tried to catch each error, but now in case of a url fetch error I want to redirect to an error page where the user will enter the title manually. How do I catch all possible errors? This is the code I have now:
title = "title"
try:
soup = BeautifulSoup.BeautifulSoup(urllib.urlopen(url))
title = str(soup.html.head.title.string)
if title == "404 Not Found":
self.redirect("/urlparseerror")
elif title == "403 - Forbidden":
self.redirect("/urlparseerror")
else:
title = str(soup.html.head.title.string).lstrip("\r\n").rstrip("\r\n")
except UnicodeDecodeError:
self.redirect("/urlparseerror?error=UnicodeDecodeError")
except AttributeError:
self.redirect("/urlparseerror?error=AttributeError")
#https url:
except IOError:
self.redirect("/urlparseerror?error=IOError")
#I tried this else clause to catch any other error
#but it does not work
#this is executed when none of the errors above is true:
#
#else:
# self.redirect("/urlparseerror?error=some-unknown-error-caught-by-else")
UPDATE
As suggested by #Wooble in the comments I added try...except while writing the title to database:
try:
new_item = Main(
....
title = unicode(title, "utf-8"))
new_item.put()
except UnicodeDecodeError:
self.redirect("/urlparseerror?error=UnicodeDecodeError")
This works. Although the out-of-range character —is still in title according to the logging info:
***title: 7.2. re — Regular expression operations — Python v2.7.1 documentation**
Do you know why?

You can use except without specifying any type to catch all exceptions.
From the python docs http://docs.python.org/tutorial/errors.html:
import sys
try:
f = open('myfile.txt')
s = f.readline()
i = int(s.strip())
except IOError as (errno, strerror):
print "I/O error({0}): {1}".format(errno, strerror)
except ValueError:
print "Could not convert data to an integer."
except:
print "Unexpected error:", sys.exc_info()[0]
raise
The last except will catch any exception that has not been caught before (i.e. a Exception which is not of IOError or ValueError.)

You can use the top level exception type Exception, which will catch any exception that has not been caught before.
http://docs.python.org/library/exceptions.html#exception-hierarchy
try:
soup = BeautifulSoup.BeautifulSoup(urllib.urlopen(url))
title = str(soup.html.head.title.string)
if title == "404 Not Found":
self.redirect("/urlparseerror")
elif title == "403 - Forbidden":
self.redirect("/urlparseerror")
else:
title = str(soup.html.head.title.string).lstrip("\r\n").rstrip("\r\n")
except UnicodeDecodeError:
self.redirect("/urlparseerror?error=UnicodeDecodeError")
except AttributeError:
self.redirect("/urlparseerror?error=AttributeError")
#https url:
except IOError:
self.redirect("/urlparseerror?error=IOError")
except Exception, ex:
print "Exception caught: %s" % ex.__class__.__name__

Related

Basic practice for throwing exception/error from function

For example I have script like this
def form_valid:
temp = form.save(commit=False)
try:
temp.contents = makecontents()
except:
messages.error(self.request, error)
return self.render_to_response(self.get_context_data(form=form))
messages.success(self.request, self.success_message)
return super().form_valid(form)
def makecontents():
if (if works well):
return "this is my contents"
else:
return "this is error!!"
So, my basic idea is when makecontents() success it returns the string s and set this as member of model instance.
However makecontents() failed , I want to return the error to form_valid.
So, maybe ,, I should change return "this is error!!" somehow..
Is it possible? , or my idea is wrong??
you could use anraise '<an Exception Type>("error message")' to raise an Error, and then except Exception as e: to get the exception message:
def makecontents(works_well):
try:
if (works_well):
return "this is my contents"
else:
raise SyntaxError("this is an error")
except Exception as e:
return str(e)
print(makecontents(works_well=False).__repr__())
output:
"this is an error"

Use API to write to json file

I am facing this problem while I try to loop tweet_id using the API and write it to tweet_json.txt, the output for all data is Failed which I know is wrong
Before it was working good but when I try to Run all the code again it starts to show failed
for tweet_id in df['tweet_id']:
try:
tweet = api.get_status(tweet_id, tweet_mode = 'extended')
with open('tweet_json.txt', 'a+') as file:
json.dump(tweet._json, file)
file.write('\n')
print (tweet_id, 'success')
except:
print (tweet_id, 'Failed')
Your except is swallowing whatever exception is causing your code to die. Until you comment out the except or make it more specific you won't know if your problem is the Twitter API or file I/O or something else. Good luck!
A quick step forward would be to adjust your exception handler so that it writes the exception. I like to use the format_exc function to get my stack traces so i can write it with a logger, or however i want to handle it.
from traceback import format_exc
try:
a = "" + 1
except Exception as ex:
print("Exception encountered! \n %s " % format_exc())

Python: Can't convert 'tuple' object to str implicitly

I'm trying to get my code to store messages within txt file using json. Each time a new message comes in, it will add the new message to the array.
The structure will be:
{
"Messages": {
"Test Contact 2": {
"0": "\"Message 1"
},
"Test Contact 1": {
"0": "\"Message 1\"",
"1": "\"Message 2\""
}
}
}
And here is my current code:
class PluginOne(IPlugin):
def process(self):
try:
print("Database")
data_store('Test contact', 'Text Message')
pass
except Exception as exc:
print("Error in database: " + exc.args)
def data_store(key_id, key_info):
try:
with open('Plugins/Database/messages.txt', 'r+') as f:
data = json.load(f)
data[key_id] = key_info
f.seek(0)
json.dump(data, f)
f.truncate()
pass
except Exception as exc:
print("Error in data store: " + exc.args)
when I try to run the code, I get the following error
Can't convert 'tuple' object to str implicitly
In your exception handler, you're adding exc.args to a string. The args attribute is a tuple, which can't be converted to a string implicitly. You could...
# print it seperately
print("Error in data store")
print(exc.args)
# or alternatively
print("Error in data store: " + str(exc.args))
# or alternatively
print("Error in data store: " + str(exc))
However, this being a problem in the exception handler, the root cause of the problem is something else, and your current exception handler isn't that great at handling it:
without your exception handler, Python would show a complete traceback of the root cause of the exception, and halt your program.
with your exception handler, only your message is printed and the program continues. This might not be what you want.
It would perhaps be better to only catch the specific exceptions you know you can recover from.

Which key failed in Python KeyError?

If I catch a KeyError, how can I tell what lookup failed?
def poijson2xml(location_node, POI_JSON):
try:
man_json = POI_JSON["FastestMan"]
woman_json = POI_JSON["FastestWoman"]
except KeyError:
# How can I tell what key ("FastestMan" or "FastestWoman") caused the error?
LogErrorMessage ("POIJSON2XML", "Can't find mandatory key in JSON")
Take the current exception (I used it as e in this case); then for a KeyError the first argument is the key that raised the exception. Therefore we can do:
except KeyError as e: # One would do it as 'KeyError, e:' in Python 2.
cause = e.args[0]
With that, you have the offending key stored in cause.
Expanding your sample code, your log might look like this:
def poijson2xml(location_node, POI_JSON):
try:
man_json = POI_JSON["FastestMan"]
woman_json = POI_JSON["FastestWoman"]
except KeyError as e:
LogErrorMessage ("POIJSON2XML", "Can't find mandatory key '"
e.args[0]
"' in JSON")
It should be noted that e.message works in Python 2 but not Python 3, so it shouldn't be used.
Not sure if you're using any modules to assist you - if the JSON is coming in as a dict, one can use dict.get() towards a useful end.
def POIJSON2DOM (location_node, POI_JSON):
man_JSON = POI_JSON.get("FastestMan", 'No Data for fastest man')
woman_JSON = POI_JSON.get("FastestWoman", 'No Data for fastest woman')
#work with the answers as you see fit
dict.get() takes two arguments - the first being the key you want, the second being the value to return if that key does not exist.
If you import the sys module you can get exception info with sys.exc_info()
like this:
def POIJSON2DOM (location_node, POI_JSON):
try:
man_JSON = POI_JSON["FastestMan"]
woman_JSON = POI_JSON["FastestWoman"]
except KeyError:
# you can inspect these variables for error information
err_type, err_value, err_traceback = sys.exc_info()
REDI.LogErrorMessage ("POIJSON2DOM", "Can't find mandatory key in JSON")

Catch exceptions/runtime errors in python

I have a web scraping python script that when you run , it asks for a web address. What I want to happen is to validate the users input eg. if it's a valid web address or when there is no input from the user. I have done the try and except which almost works, it displays the message that I want the user to see but it also returns Traceback calls and I dont want that. I only want to display my custom error message. Could anyone help me to implement this? Here's my code:
import sys, urllib, urllib2
try:
url= raw_input('Please input address: ')
webpage=urllib.urlopen(url)
print 'Web address is valid'
except:
print 'No input or wrong url format usage: http://wwww.domainname.com '
def wget(webpage):
print '[*] Fetching webpage...\n'
page = webpage.read()
return page
def main():
sys.argv.append(webpage)
if len(sys.argv) != 2:
print '[-] Usage: webpage_get URL'
return
print wget(sys.argv[1])
if __name__ == '__main__':
main()
You can simply do:
try:
# ...
except Exception as e:
print("What you want to show")
Edit: "How do I stop it from executing when it reach an exception?"
You can either have try and except in wget() as #sabujhassan mentioned or you can exit on catching the exception:
except Exception as e:
print("Exception caught!")
exit(1)
Edit 2: "is it possible to loop the program eg. when there is no user input, just keep asking the user to input a web address?"
Yes, You can simply cover it under infinite while loop and break when the right value is selected.
while True:
try:
# Your logic ...
break
except:
print 'No input or wrong url format usage: http://wwww.domainname.com '
print 'Try again!'
use try except for both the function wget() and main(). for example:
def wget(webpage):
try:
print '[*] Fetching webpage...\n'
page = webpage.read()
return page
except:
print "exception!"
You perform the initial try/except, but you're not exiting once the exception is caught. The problem is the webpage will only be filled in when something is passed in, so it fails later since "webpage" has not been defined yet, so the answer is to quit once the exception is thrown.
So:
try:
url= raw_input('Please input address: ')
webpage=urllib.urlopen(url)
print 'Web address is valid'
except:
print 'No input or wrong url format usage: http://wwww.domainname.com '
sys.exit(1)
Try this (replaces lines 3-8) :
def main(url = None) :
if not url : # no arguments through sys.argv
url= raw_input('Please input address: ')
if not url : # No arguments from the user
try :
webpage=urllib.urlopen(url)
except : # Funky arguments from the user
print 'Web address is valid'
finally :
wget(webpage)
else :
print 'No input or wrong url format usage: http://wwww.domainname.com '
For the latter half, (from main onwards) :
if __name__ == '__main__':
if len(sys.argv) == 2 :
main(sys.argv[1])
else :
main()
(I disapprove of the pythonic 4 spaces rule, I keep on having to replace spacebars)

Categories

Resources