Checking URL Status without Throwing Error

Checking URL Status without Throwing Error - python

I'm looking to check to see if 500+ strings in a given dataframe are URLs. I've seen that this can be done using the requests package but I've found that if I provide a URL, instead of receiving the error code 404, my program is crashing.
Because I'm looking to apply this function to a dataframe with many strings not being active URLs, the current function would not work for what I'm looking to accomplish.
I'm wondering if there is a way to adapt the coded below to actually return no (or anything else) in the case that the URL isn't real. For example, providing the url 'http://www.example.commmm' results in an error:
import requests
response = requests.get('http://www.example.com')
if response.status_code == 200:
print('Yes')
else:
print('No')
thanks in advance!

I would try and add a try/except to prevent your code from breaking
try:
print(x)
except:
print("An exception occurred")

Related

Try and except whilst trying to writerow in Python

I have the following code that is throwing up an out of range error on the barcode looping section of the below code.
for each in data['articles']:
f.writerow([each['local']['name'],
each['information'][0]['barcodes'][0]['barcode']])
I wrote a try and except to catch and handle when a barcode is not present within the json I am parsing and this worked perfectly during testing using the print function however I have been having some trouble getting the try and except to work whilst trying to writerow to a csv file.
Does anyone have any suggestions or another method I could try to get this to work.
My try and accept which worked when testing using print was as follows:
for each in data['articles']:
print(each['local']['name'])
try:
print(each['information'][0]['barcodes'][0]['barcode'])
except:
"none"
Any help is much appreciated!

As komatiraju032 points out, one way of doing this is via get(), although if there are different elements of the dictionary that might have empty/incorrect values, it might get unwieldy to provide a default for each one. To do this via a try/except you might do:
for each in data['articles']:
row = [each['local']['name']]
try:
row.append(each['information'][0]['barcodes'][0]['barcode'])
except (IndexError, KeyError):
row.append("none")
f.writerow(row)
This will give you that "none" replacement value regardless of which of those lists/dicts is missing the requested index/key, since any of those lookups might raise but they'll all end up at the same except.

Use dict.get() method. It will return None if key not exist
res = each['information'][0]['barcodes'][0].get('barcode')

Python - Socket Error 10054 - How to prevent terminal from preventing error?

Since it is not an execution-fail error, I am not sure what my options are to keep this error from popping up. I do not believe it really matters what my code is that is causing the error if there is some universal command to suppress this error line from printing see my error here
It is simply using whois to determine if the domain is registered or not. I was doing a basic test of the top 1,000 english words to see if their .com domains were taken. code here
Here is my code:
for url in wordlist:
try:
domain = whois.whois(url)
boom.write( ("%s,%s,%s\r\n"% \
(str(number), url, "TAKEN")).encode('UTF-8'))
except:
boom.write( ("%s,%s,%s\r\n"% \
(str(number), url, "NOT TAKEN")).encode('UTF-8'))

A bit hard to know for sure without your code, but wrap the section that's generating the error like this:
try:
# Your error-generating code
except:
pass

the urlib2.URLError and its reason in python

The title of the question may be a bit confusing but I don't really know how best to word it...
I've found the following chunk of code which downloads a web page from the web by making use of the urllib2 library.
import urllib2
def download(url):
try:
html = urllib2.urlopen(url).read()
except urllib2.URLError as e:
print 'Download error:', e.reason
html = None
return html
Now if it happens that e.code is 404 then e.reason is simply an empty string which means it bears absolutely no information on what triggered the error, thus I don't really understand the point of using e.reason here.
It seems like it would be more reasonable to print e instead but even if I change it to simply print e it will still yield something awkward: HTTP Error 404: and the colon is apparenty followed by an empty string...
So it appears to me that the abovementioned code is a little clumsy in terms of exception handling. Is it so?

It would seem that you could either use the error itself (print e) or the code and the reason (print "Download Error: ", e.code, e.reason) if you wanted to see the 404 code.

100,000 HTTP Response Code Checks

I've got a list of ~100,000 links that I'd like to check the HTTP Response Code for. What might be the best method to use for doing this check programmatically?
I'm considering using the below Python code:
import requests
try:
for x in range(0, 100000):
r = requests.head(''.join(["http://stackoverflow.com/", str(x)]))
# They'll actually be read from a file, and aren't sequential
print r.status_code
except requests.ConnectionError:
print "failed to connect"
.. but am not aware of the potential side effects of checking such a large number of URLs in a single take. Thoughts?

The only side effect I can think of is time, which you can mitigate by making the requests in parallel. (use http://gevent.org/ or https://docs.python.org/2/library/thread.html).

Strange urllib2.urlopen() error with variable vs string

I am having some strange behavior while using urllib2 to open a URL and download a video.
I am trying to open a video resource and here is an example link:
https://zencoder-temp-storage-us-east-1.s3.amazonaws.com/o/20130723/b3ed92cc582885e27cb5c8d8b51b9956/b740dc57c2a44ea2dc2d940d93d772e2.mp4?AWSAccessKeyId=AKIAI456JQ76GBU7FECA&Signature=S3lvi9n9kHbarCw%2FUKOknfpkkkY%3D&Expires=1374639361
I have the following code:
mp4_url = ''
#response_body is a json response that I get the mp4_url from
if response_body['outputs'][0]['label'] == 'mp4':
mp4_url = response_body['outputs'][0]['url']
if mp4_url:
logging.info('this is the mp4_url')
logging.info(mp4_url)
#if I add the line directly below this then it works just fine
mp4_url = 'https://zencoder-temp-storage-us-east-1.s3.amazonaws.com/o/20130723/b3ed92cc582885e27cb5c8d8b51b9956/b740dc57c2a44ea2dc2d940d93d772e2.mp4?AWSAccessKeyId=AKIAI456JQ76GBU7FECA&Signature=S3lvi9n9kHbarCw%2FUKOknfpkkkY%3D&Expires=1374639361'
mp4_video = urllib2.urlopen(mp4_url)
logging.info('succesfully opened the url')
The code works when I add the designated line but it gives me a HTTP Error 403: Forbidden message when I don't which makes me think it is messing up the mp4_url somehow. But the confusing part is that when I check the logging line for mp4_url it is exactly what I hardcoded in there. What could the difference be? Are there some characters in there that may be disrupting it? I have tried converting it to a string by doing:
mp4_video = urllib2.urlopen(str(mp4_url))
But that didn't do anything. Any ideas?
UPDATE:
With the suggestion to use print repr(mp4_url) it is giving me:
u'https://zencoder-temp-storage-us-east-1.s3.amazonaws.com/o/20130723/b3ed92cc582885e27cb5c8d8b51b9956/b740dc57c2a44ea2dc2d940d93d772e2.mp4?AWSAccessKeyId=AKIAI456JQ76GBU7FECA&Signature=S3lvi9n9kHbarCw%2FUKOknfpkkkY%3D&Expires=1374639361'
And I suppose the difference is what is causing the error but what would be the best way to parse this?
UPDATE II:
It ended up that I did need to cast it to a string but also the source that I was getting the link (an encoded video) needed nearly a 60 second delay before it could serve that URL so that is why it kept working when I hardcoded it because it had that delay. Thanks for the help!

It would be better to simply dump the response obtained. This way you would be able to check what response_body['outputs'][0]['label'] evaluates to. In you case, you are initializing mp4_url to ''. This is not the same as None and hence the condition if mp4_url: will always be true.
You may want to check that the initial if statement where you check that response_body['outputs'][0]['label'] is correct.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Checking URL Status without Throwing Error - python

I would try and add a try/except to prevent your code from breaking try: print(x) except: print("An exception occurred")

Related

Try and except whilst trying to writerow in Python

Python - Socket Error 10054 - How to prevent terminal from preventing error?

the urlib2.URLError and its reason in python

100,000 HTTP Response Code Checks

Strange urllib2.urlopen() error with variable vs string

Categories

Resources