urllib2.open giving 500 HTTPError exception even when call is successful

urllib2.open giving 500 HTTPError exception even when call is successful - python

I am using urllib2 to access a URL and read the data. The urlopen call is in a try except block like below. I have seen other questions asked on the site saying they are encountering this 500 error but I could not find a concrete answer as to why we get this 500 exception even when the call is successful. Can anyone elaborate on that or point out ways to encounter it?
try:
data = urllib2.urlopen(url).read().split('\n')
except urllib2.HTTPError, e:
print "Could not get data with url {0} due to error code {1}.".format(url,e.code)
except urllib2.URLError, e:
print "Could not get data with url {0} due to reason {1}.".format(url,e.reason)
sys.exit(1)

HTTP Error 500 is a server error (https://en.wikipedia.org/wiki/List_of_HTTP_status_codes). You should investigate the server side logs

You're getting a server side error.
You need to inspect the error (e) to see if there is any feedback on what is causing it. it usually has some of the actual error data from the server in it. not all servers will return error data though, sometimes it's just on the server logs.
If this is running on a daemon, or sporadically, you could write something that logs the contents of e somewhere.
You could also use pdb.set_trace() to set a breakpoint and inspect the object yourself.
also, while this line looks great:
data = urllib2.urlopen(url).read().split('\n')
it's a real pain during debugging and troubleshooting, which happens A LOT when using urllib.
i would suggest splitting it into a few lines like this
url_obj = urllib2.urlopen(url)
data = url_obj.read()
data = data.split('\n')
if you enter in a few breakpoints with pdb ( pdb.set_trace() ) you'll be able to instead each variable.
since you're not using a custom opener, i would also just use the requests library, which just wraps urllib and makes it less horrible.

Related

Is there a way to catch RuntimeErrors when using os.system?

I'm writing a tool that sends commands to the CMD for Google Lighthouse and want to catch the error if an URL isn't valid. What exception would I use?
I'm currently trying to catch RuntimeError in the Exception when entering an invalid URL.
try:
os.system("lighthouse --quiet {} {} {} {} {} --output-path={}/{}.html ".format(DevEmuStr,throttlingVar,CacheStr,presetVar,url,reportlocation,filename))
except RuntimeError:
print("Please provide a proper URL")
Instead of "Please provide a proper URL" I still get:
Runtime error encountered: The URL you have provided appears to be invalid.
LHError: INVALID_URL
at lighthouse (C:\Users\sugar\AppData\Roaming\npm\node_modules\lighthouse\lighthouse-core\index.js:44:11)
at chromeP.then._ (C:\Users\sugar\AppData\Roaming\npm\node_modules\lighthouse\lighthouse-cli\run.js:182:12)
at process._tickCallback (internal/process/next_tick.js:68:7)
And Lighthouse just continues with the next URL
Is there another error I could catch?

Thanks to everyone that tried to help me, I finally found a way to get it.
By adding this:
lh_url_ok = os.system("lighthouse --quiet {} {} {} {} {} --output-path={}/{}.html ".format(DevEmuStr,throttlingVar,CacheStr,presetVar,url,reportlocation,filename))
if lh_url_ok >0:
print("Error")
i was able to check if the exit code was above 0 (0=no error)

No, there isn't an exception you could catch from Python.
It appears to me that "Runtime error encountered" is output printed out by lighthouse, it isn't an actual Python exception you could catch.
Python doesn't know anything about what is internally going on in the executable you start with os.system, you can just get the output and an exit code.

python disable error output

I created a script that connects via python-xmpp (xmpppy) to chat.euw1.lol.riotgames.com, but I get always an error, even there is none.
Here is the code:
jid=xmpp.protocol.JID('my_jid#pvp.net')
cl=xmpp.Client(jid.getDomain(),debug=[])
con=cl.connect(server=('chat.euw1.lol.riotgames.com', 5223))
if not con:
print 'could not connect!'
Again: everything works fine, but I still get this nasty error message:
An error occurred while looking up _xmpp-client._tcp.chat.euw1.lol.riotgames.com
I just wonder how I can prevent xmpppy from outputting it, I have tried several techniques like setting sys.stdout/stderr to os.devnull.

I think you can use a try/except structure to handle that problem..
try:
jid=xmpp.protocol.JID('my_jid#pvp.net')
cl=xmpp.Client(jid.getDomain(),debug=[])
con=cl.connect(server=('chat.euw1.lol.riotgames.com', 5223))
except:
print 'could not connect!'

Why does urllib2's .getcode() method crash on 404's?

In the beginner Python course I took on Lynda it said to use .getcode() to get the http code from a url and that that can be used as a test before reading the data:
webUrl = urllib2.urlopen('http://www.wired.com/tag/magazine-23-05/page/4')
print(str(webUrl.getcode()))
if (webURL.getcode() == 200):
data = webURL.read()
else:
print 'error'
However, when used with the 404 page above it causes Python to quit: Python function terminated unexpectedly: HTTP Error 404: Not Found, so it seems this lesson was completely wrong?
My question then is what exactly is .getcode() actually good for? You can't actually use it to test what the http code is unless you know what it is (or at least that it's not a 404). Was the course wrong or am I missing something?
My understanding is the proper way to do it is like this, which doesn't use .getcode() at all (though tell me if there is a better way):
try:
url = urllib2.urlopen('http://www.wired.com/tag/magazine-23-05/page/4')
except urllib2.HTTPError, e:
print e
This doesn't use .getcode() at all. Am I misunderstanding the point of .getcode() or is it pretty much useless? It seems strange to me a method for getting a page code in a library dedicated to opening url's can't handle something as trivial as returning a 404.

A 404 code is considered an error status by urllib2 and thus an exception is raised. The exception object also supports the getcode() method:
>>> import urllib2
>>> try:
... url = urllib2.urlopen('http://www.wired.com/tag/magazine-23-05/page/4')
... except urllib2.HTTPError, e:
... print e
... print e.getcode()
...
HTTP Error 404: Not Found
404
The fact that errors are raised is poorly documented. The library uses a stack of handlers to form a URL opener (created with (urllib2.build_opener(), installed with urllib2.install_opener()), and in the default stack the urllib2.HTTPErrorProcessor class is included.
It is that class that causes anything response with a response code outside the 2xx range to be handled as an error. The 3xx status codes then are handled by the HTTPRedirectHandler object, and some of the 40x codes (related to authentication) are handled by specialised authentication handlers, but most codes simply are left to be raised as an exception.
If you are up to installing additional Python libraries, I recommend you install the requests library instead, where error handling is a lot saner. No exceptions are raised unless you explicitly request it:
import requests
response = requests.get(url)
response.raise_for_status() # raises an exception for 4xx or 5xx status codes.

Yes you are understanding right, It throws an exception for a non-"OK" http status code. At the time of writing the lesson might have worked because the URL was valid, but if you try that URL in a browser now, you will also get a 404 not found, because the URL is now no longer valid.
In this case, urllib2.urlopen is in a way (arguably), abusing exceptions to return http status codes as exceptions (see docs for urllib2.HTTPError)
As an aside, I would suggest trying the requests library, which is much nicer to work with if you are planning to do some actual scripting work in this space outside of tutorials.

How to know which line gives error in python?

I am new to Python so please excuse my rudimentary question.
When I get an error I can usually figure out what line caused the error, but sometimes from the error message itself I can't decide which line is responsible. So I add some messages between the lines to track the issue. Is there any more effective solution to that?
I am running my codes form ArcGIS toolbox script and I am not sure if I can trace the errors from there.

I always use print statements (okay, function in Py3). It's the most standard way. Just use it to track where are you now in your program, and what are you doing.
However, if your application processes a large data, or if it's a large application, print statements may be not enough. Sometimes, you'll need try and except statements, just to narrow the search of the error.
More on error handling? Here!
This may also be useful.

If youre trying to do this with a excepted error do this:
import traceback
import sys
try:
raise Exception("foo")
except:
for frame in traceback.extract_tb(sys.exc_info()[2]):
fname,lineno,fn,text = frame
print "Error in %s on line %d" % (fname, lineno)
otherwise just read the traceback

Any way to save a traceback object in Python

I was looking to possibly try and save a traceback object and somehow pickle it to a file that I can access. An example of a use case for this is if I am submitting some python code to a farm computer to run and it fails, it would be nice to be able to open a session and access that traceback to debug the problem rather than just seeing a log of the traceback. I do not know if there is any sort of way to do this but thought it would be worth asking why it couldn't if so.

okay so you can use traceback.print_exception(type, value, traceback[, limit[, file]]) and save it in a text or json or you can refer to docs
if you find it helpful please mark it correct or upvote thanx..:)

Depending on how you've written your code, the try statement is probably your best answer. Since any error is just a class that inherits Python's builtin Exception, you can raise custom errors everywhere you need more information about a thrown error. You just need to rename your errors or pass in an appropriate string as the first argument. If you then try your code and use the except statement except CustomError as e, you can pull all the information you want out of e in the except statement as a regular instance. Example:
Your code would be:
def script():
try: codeblock
except Exception as e: raise Error1('You hit %s error in the first block'% e)
try: codeblock 2
except Exception as e: raise Error2('You hit %s error in the second block' % e)
try: script()
except Exception as e:
with open('path\to\file.txt','w') as outFile:
outFile.write(e)
The last part is really nothing more than creating your own log file, but you have to write it down somewhere, right?
As for using the traceback module mentioned above, you can get error information out of that. Any of the commands here can get you a list of tracebacks:
http://docs.python.org/2/library/traceback.html
On the otherhand, if you're trying to avoid looking at log files, the traceback module is only going to give you the same thing a log file would, in a different format. Adding your own error statements in your code gives you more information than a cryptic ValueError about what actually happened. If you print the traceback to your special error, it might give you still more information on your issue.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

urllib2.open giving 500 HTTPError exception even when call is successful - python

HTTP Error 500 is a server error (https://en.wikipedia.org/wiki/List_of_HTTP_status_codes). You should investigate the server side logs

Related

Is there a way to catch RuntimeErrors when using os.system?

python disable error output

Why does urllib2's .getcode() method crash on 404's?

How to know which line gives error in python?

Any way to save a traceback object in Python

Categories

Resources