Why does 'url' not work as a variable here? - python

I originally had the variable cpanel named url and the code would not return anything. Any idea why? It doesn't seem to be used by anything else, but there's gotta be something I'm overlooking.
import urllib2
cpanel = 'http://www.tas-tech.com/cpanel'
req = urllib2.Request(cpanel)
try:
handle = urllib2.urlopen(req)
except IOError, e:
if hasattr(e, 'code'):
if e.code != 401:
print 'We got another error'
print e.code
else:
print e.headers
print e.headers['www-authenticate']

Note that urllib2.Request has a parameter named url, but that really shouldn't be the source of the problem, it works as expected:
>>> import urllib2
>>> url = "http://www.google.com"
>>> req = urllib2.Request(url)
>>> urllib2.urlopen(req).code
200
Note that your code above functions identically when you switch cpanel for url. So the problem must have been elsewhere.

I'm pretty sure that /cpanel (if it is the hosting control panel) actually redirects (302) you to http://www.tas-tech.com:2082/ or something like that. You should just update your thing to deal with the redirect (or better yet, just send the request to the real address).

Related

Bad URLs in Python 3.4.3

I am new to this so please help me. I am using urllib.request to open and reading webpages. Can someone tell me how can my code handle redirects, timeouts, badly formed URLs?
I have sort of found a way for timeouts, I am not sure if it is correct though. Is it? All opinions are welcomed! Here it is:
from socket import timeout
import urllib.request
try:
text = urllib.request.urlopen(url, timeout=10).read().decode('utf-8')
except (HTTPError, URLError) as error:
logging.error('Data of %s not retrieved because %s\nURL: %s', name, error, url)
except timeout:
logging.error('socket timed out - URL %s', url)
Please help me as I am new to this. Thanks!
Take a look at the urllib error page.
So for following behaviours:
Redirect: HTTP code 302, so that's a HTTPError with a code. You could also use the HTTPRedirectHandler instead of failing.
Timeouts: You have that correct.
Badly formed URLs: That's a URLError.
Here's the code I would use:
from socket import timeout
import urllib.request
try:
text = urllib.request.urlopen("http://www.google.com", timeout=0.1).read()
except urllib.error.HTTPError as error:
print(error)
except urllib.error.URLError as error:
print(error)
except timeout as error:
print(error)
I can't finding a redirecting URL, so I'm not exactly sure how to check the HTTPError is a redirect.
You might find the requests package is a bit easier to use (it's suggested on the urllib page).
Using requests package I was able to find a better solution. With the only exception you need to handle are:
try:
r = requests.get(url, timeout =5)
except requests.exceptions.Timeout:
# Maybe set up for a retry, or continue in a retry loop
except requests.exceptions.TooManyRedirects as error:
# Tell the user their URL was bad and try a different one
except requests.exceptions.ConnectionError:
# Connection could not be completed
except requests.exceptions.RequestException as e:
# catastrophic error. bail.
And to get the text of that page, all you need to do is:
r.text

Python - urllib2 > how to escape HTTP errors

I am making a python app and I want to read a file from the net.
this is the code that I am using to read it :
urllib2.urlopen("http://example.com/check.txt").read()
everything works great but when I point it to a url that do not exist, it gives HTTP 404: not found error and that is normal.
the problem is that the app is designed to work on windows, so, it will be compiled.
on windows, when the app tries to get a file from a url that do not exist, the app crushes and gives an error window + it creates a log that contains HTTP 404: NOT found error.
I tried to escape this error but I failed. This is the full code:
import urllib2
file = urllib2.urlopen("http://example.com/check.txt")
try:
file.read()
except urllib2.URLError:
print "File Not Found"
else:
print "File is found"
please, if you know how to escape this error, help me.
You should apply the try..except around the urlopen, not the read.
Try this
import urllib2
try:
fh = urllib2.urlopen('http://example.com/check.txt')
print fh.read()
except urllib2.HTTPError, e:
print e.code
except urllib2.URLError, e:
print e.code

Getting the options in a http request status 300

I read that when I get this error I should specify better the url. I assume that I should specify between two displayed or accessible options. How can I do that?
In urllib or its tutorial I couldn't find anything. My assumption is true? Can I read somewhere the possible url?
When I open this url in my browser I am redirected to a new url.
The url I try to access: http://www.uniprot.org/uniprot/P08198_CSG_HALHA.fasta
The new url I am redirected: http://www.uniprot.org/uniprot/?query=replaces:P08198&format=fasta
import urllib.request
try:
response = urllib.request.urlopen(url)
except urllib.error.HTTPError as e:
if int(e.code) == 300:
# what now?
The status code 300 is returned from the server to tell you, your request is somehow not complete and you shall be more specific.
Testing the url, I tried to search from http://www.uniprot.org/ and entered into search "P08198". This resulted in page http://www.uniprot.org/uniprot/P08198 telling me
Demerged into Q9HM69, B0R8E4 and P0DME1. [ List ]
To me it seems, the query for some protein is not specific enough as this protein code was split to subcategories or subcodes Q9HM69, B0R8E4 and P0DME1.
Conclusion
Status code 300 is signal from server app, that your request is somehow ambiguous. The way, you can make it specific enough is application specific and has nothing to do with Python or HTTP status codes, you have to find more details about good url in the application logic.
So I ran into this issue and wanted to get the actual content returned.
turns out that this is the solution to my problem.
import urllib.request
try:
response = urllib.request.urlopen(url)
except urllib.error.HTTPError as e:
if int(e.code) == 300:
response = r.read()

Python - check if file/webpage exists

I would like to use Python to check if a file/webpage exists based off its response code and act accordingly. However, I have a requirement to use HTTPS and to also provide username and password credentials. I couldn't get it running through curl (doesn't like HTTPS) but had success by using wget (with --spider and --user and --password). I suppose I can try incorporating wget into the script via os.system but it prints out a lot of output that would be very tricky to parse and if the URI does not exist (aka 404), I think gets stuck "awaiting response..".
I've had a look at urllib2 around the web and have seen people do some stuff, but I'm not sure if this addresses my situation and the solutions are always very convoluted (such as Python urllib2, basic HTTP authentication, and tr.im) . Anyway, if I can get some guidance on what the easiest avenue for me to pursue is using python, that would be appreciated.
edit: using the os.system method (and providing wget with "-q") seems to return a different number if the URI exists or not, so that gives me something to work with for now.
You can make a HEAD request using python requests.
import requests
r = requests.head('http://google.com/sjklfsjd', allow_redirects=True, auth=('user', 'pass'))
assert r.status_code != 404
If the request fails with a ConnectionError, the website does not exist. If you only want to check whether a certain page exists, you will get a successful response but the status code will be 404.
Requests has a pretty nice interface so I recommend checking it out. You'll probably like it a lot as it is very intuitive and powerful (while being lightweight).
urllib2 is the way to go to open any web page
urllib2.urlopen('http://google.com')
for added functionality, you'll need an opener with handlers. I reckon you'll only need the https because you're barely extracting any info
opener = urllib2.build_opener(
urllib2.HTTPSHandler())
opener.open('https://google.com')
add data and it will automatically become a POST request, or so i believe:
opener.open('https://google.com',data="username=bla&password=da")
the object you'll receive will have a code attribute.
That's the basic gist of it, do add as many handlers as you like, i believe they can't hurt.
source: https://docs.python.org/2.7/library/urllib2.html#httpbasicauthhandler-objects
You should use urllib2 to check that:
import urllib2, getpass
url = raw_input('Enter the url to search: ')
username = raw_input('Enter your username: ')
password = getpass.getpass('Enter your password: ')
if not url.startswith('http://') or not url.startswith('https://'):
url = 'http://'+url
def check(url):
try:
urllib2.urlopen(url)
return True
except urllib2.HTTPError:
return False
if check(url):
print 'The webpage exists!'
else:
print 'The webpage does not exist!'
opener = urllib2.build_opener(
urllib2.HTTPSHandler())
opener.open(url,data="username=%s&password=%s" %(username, password))
This runs as:
bash-3.2$ python url.py
Enter the url to search: gmail.com
Enter your username: aj8uppal
Enter your password:
The webpage exists!

calling a function with cherry.py

So im doing a bit of web development, and due to some restriction set by my employer i need to use cheetah and cherrypy. I have this form that upon submit runs a function, and from said function i call another via HTTPRedirect, and what i want is to call it without redirecting. here is an example
#cherrypy.expose
def onSubmit(**kwargs):
##Do something
##Do something
##Do something
raise cherrypy.HTTPRedirect("/some_other_location/doSomethingElse?arg1=x&arg2=y")
now i want to do more stuff after running the second function, but i cant because since i redirect the code ends there. So my question is, is there a way to run that other function and not redirect, but still using HTTP. In javascript i would use AJAX and pass it the url, storing the output on the loader variable, but im not sure how to do this with cherrypy
Instead of doing the redirect, use one of the standard Python libraries for fetching HTTP data:
http://docs.python.org/library/urllib.html
http://docs.python.org/library/urllib2.html
or other arguably nicer third-party ones:
http://docs.python-requests.org/
http://code.google.com/p/httplib2/
Also, don't forget to convert the relative url to an absolute url, even if it's localhost:
To help you get started, here's an untested code snippet derived from your example, using urllib2:
import urllib2
#cherrypy.expose
def onSubmit(**kwargs):
##Do something
##Do something
##Do something
url = "http://localhost/some_other_location/doSomethingElse?arg1=x&arg2=y"
try:
data = urllib2.urlopen(url).read()
except urllib2.HTTPError, e:
raise cherrypy.HTTPError(500, "HTTP error: %d" % e.code)
except urllib2.URLError, e:
raise cherrypy.HTTPError(500, "Network error: %s" % e.reason.args[1])
##Do something with the data

Categories

Resources