How to check the connectivity for a URL in python

How to check the connectivity for a URL in python - python

My requirement is almost same as Requests — how to tell if you're getting a success message?
But I need to print error whenever I could not reach the URL..Here is my try..
# setting up the URL and checking the conection by printing the status
url = 'https://www.google.lk'
try:
page = requests.get(url)
print(page.status_code)
except requests.exceptions.HTTPError as err:
print("Error")
The issue is rather than printing just "Error" it prints a whole error msg as below.
Traceback (most recent call last):
File "testrun.py", line 22, in <module>
page = requests.get(url)
File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/requests/api.py", line 76, in get
return request('get', url, params=params, **kwargs)
File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/requests/sessions.py", line 530, in request
resp = self.send(prep, **send_kwargs)
File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/requests/sessions.py", line 643, in send
r = adapter.send(request, **kwargs)
File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/requests/adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='learn.microsoft.com', port=443): Max retries exceeded with url: /en-us/microsoft-365/enterprise/urls-and-ip-address-ranges?view=o365-worldwide (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7ff91a543198>: Failed to establish a new connection: [Errno 110] Connection timed out',))
Can someone show me how should I modify my code to just print "Error" only if there is any issue? Then I can extend it to some other requirement.

You're not catching the correct exception.
import requests
url = 'https://www.googlggggggge.lk'
try:
page = requests.get(url)
print(page.status_code)
except (requests.exceptions.HTTPError, requests.exceptions.ConnectionError):
print("Error")
you can also do except Exception, however note that Exception is too broad and is not recommended in most cases since it traps all errors

You need to either use a general exception except or catch all exceptions that requests module might throw, e.g. except (requests.exceptions.HTTPError, requests.exceptions.ConnectionError).
For full list see: Correct way to try/except using Python requests module?

Related

What is the correct way to attach messages to an error?

I have a project which is structured something like this, with multiple functionality and often more possible sources of error. One functionality may also call something else that raises an error.
def functionality_one(arguments) -> str:
try:
status_feedback = attempt_functionality_one(arguments)
# this would usually be multiple lines
except ValueError as e:
return "known-failure-code"
except ConnectionError as e:
raise ConnectionError("Some user-friendly message for unexpected error") from e
else:
return status_feedback
def main():
## when the relevant CLI argument is passed:
try:
status = functionality_one(arguments)
except Exception as e:
send_notification_to_user(e.args[0])
else:
send_notification_to_user(USER_FRIENDLY_SUCCESS_MESSAGES.get(status, "Success!"))
if __name__ == "__main__":
main()
Focus on this bit about re-raising errors:
except ConnectionError as e:
raise ConnectionError("Some user-friendly message for unexpected error") from e
I do this to attach a user-friendly message in the error that I can later display to the user. Is there a better way to accomplish this?
In particular, normally error tracebacks just state errors that propogate. With this method, it gives a message like "... was the direct cause of the following exception ..." and I don't know whether this is the norm in Python. Here's an example from the log file:
Traceback (most recent call last):
File "D:\username\Documents\tech-projects\project-name\src\auth.py", line 157, in login
login_request = post(
File "D:\username\Documents\tech-projects\project-name\.venv\lib\site-packages\requests\api.py", line 115, in post
return request("post", url, data=data, json=json, **kwargs)
File "D:\username\Documents\tech-projects\project-name\.venv\lib\site-packages\requests\api.py", line 59, in request
return session.request(method=method, url=url, **kwargs)
File "D:\username\Documents\tech-projects\project-name\.venv\lib\site-packages\requests\sessions.py", line 587, in request
resp = self.send(prep, **send_kwargs)
File "D:\username\Documents\tech-projects\project-name\.venv\lib\site-packages\requests\sessions.py", line 701, in send
r = adapter.send(request, **kwargs)
File "D:\username\Documents\tech-projects\project-name\.venv\lib\site-packages\requests\adapters.py", line 565, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='wifi-login.university-website.domain', port=80): Max retries exceeded with url: /cgi-bin/authlogin (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x000001FF681B27D0>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "D:\username\Documents\tech-projects\project-name\login_cli.py", line 269, in main
status_message: str = parsed_namespace.func(parsed_namespace)
File "D:\username\Documents\tech-projects\project-name\login_cli.py", line 197, in connect
return src.auth.login(credentials)
File "D:\username\Documents\tech-projects\project-name\src\auth.py", line 164, in login
raise ConnectionError(f"Server-side error. Contact IT support or wait until morning.") from e
requests.exceptions.ConnectionError: Server-side error. Contact IT support or wait until morning.
So what's the right way to do this? Feel free to suggest a change that completely changes the structure of the program too, if you feel that's necessary.

Handeling errors in Python requests [duplicate]

This question already has answers here:
Python requests.exceptions.SSLError: EOF occurred in violation of protocol
(9 answers)
Closed 1 year ago.
I am learning to use requests in Python and I need a way to get a meaningful output if the site does not exist at all.
I looked at this question, but it is unclear if the OP of the question actually wants to check if the site exists, or if it just returns an error. The problem with all of the answers that question is that if the site does not exist at all we cannot really use HTTP response headers, because no response is returned from a server that does not exist.
Here is an example.
If I use this code I will not get any errors because the site exists.
import requests
r = requests.get('https://duckduckgo.com')
However, if I enter a web page I know does not exist I will get an error
import requests
r = requests.get('https://thissitedoesnotexist.com')
if r.status_code == requests.codes.ok:
print('Site good')
else:
print('Site bad')
This error is super long and I would prefer to have a more meaningful and short error if the site does not exist.
Traceback (most recent call last):
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\urllib3\connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\urllib3\connectionpool.py", line 382, in _make_request
self._validate_conn(conn)
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\urllib3\connectionpool.py", line 1010, in _validate_conn
conn.connect()
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\urllib3\connection.py", line 416, in connect
self.sock = ssl_wrap_socket(
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\urllib3\util\ssl_.py", line 449, in ssl_wrap_socket
ssl_sock = _ssl_wrap_socket_impl(
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\urllib3\util\ssl_.py", line 493, in _ssl_wrap_socket_impl
return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\ssl.py", line 512, in wrap_socket
return self.sslsocket_class._create(
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\ssl.py", line 1070, in _create
self.do_handshake()
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\ssl.py", line 1341, in do_handshake
self._sslobj.do_handshake()
ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:997)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\requests\adapters.py", line 439, in send
resp = conn.urlopen(
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\urllib3\connectionpool.py", line 755, in urlopen
retries = retries.increment(
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\urllib3\util\retry.py", line 574, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='234876.com', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:997)')))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\ADMIN\Desktop\tetst.py", line 2, in <module>
r = requests.get('https://234876.com')
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\requests\api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\requests\api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\requests\sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\requests\sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "C:\Users\ADMIN\AppData\Local\Programs\Python\Python310\lib\site-packages\requests\adapters.py", line 514, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='234876.com', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:997)')))
Is it possible to make a function that returns, for example print('The site probably does not exist') or at least does not give an EOF error?

Normally the desirable thing to do is trap Exceptions from requests
You also can use .raise_for_status() on the Response to get a meaningful Exception for non-OK requests
However, you want to watch out for where you want to handle an Exception
immediately? can your program handle it meaningfully or should it exit?
should the caller handle a specific Exception (such as requests.exceptions.Timeout) or a more general one?
do you have many functions which call each other? should any handle some subset of possible Exceptions? and which?
See Python Exception Hierarchy for how the first-party Exceptions inheritance structure
import sys
import requests
def some_function_which_makes_requests():
r = requests.get("https://example.com", timeout=(2,10))
r.raise_for_status() # raise for non-OK
return r.json() # interpret response via some method (for example as JSON)
def main():
...
try:
result_json = some_function_which_makes_requests
except requests.exceptions.Timeout:
print("WARNING: request timed out")
result_json = None # still effectively handled for later program?
except requests.exceptions.RequestException as ex:
sys.exit(f"something wrong with Request: {repr(ex)}")
except Exception:
sys.exit(f"something wrong around Request: {repr(ex)}")
# now you can use result_json

Did some more research and just learned that I need to use a Python Try Except as mentioned by #Anand Sowmithiran. Here is a video explaining it for beginners: https://www.youtube.com/watch?v=NIWwJbo-9_8
import requests
try:
r = requests.get("http://www.duckduckgo.com")
except requests.exceptions.ConnectionError:
print('\n\tSorry. There was a network problem getting the URL. Perhaps it does not exist?\n\tCheck the URL, DNS issues or if you are being rejected by the server.')
else:
print(r)

Max retries exceeded with URL Error on using requests.get()

I'm trying to write a script in python to send HTTP get request to automatically generated URLs and get its response code and elapsed time. The URLs need not necessarily be a valid one, 400 responses are acceptable too.
script1.py
import sys
import requests
str1="http://www.googl"
str3=".com"
str2='a'
for x in range(0, 8):
y = chr(ord(str2)+x)
str_s=str1+y+str3
r=requests.get(str_s)
print(str_s, r.status_code, r.elapsed.total_seconds())
Error:
File "script1.py", line 12, in <module><br>
r=requests.get(str_s)<br>
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 72, in get<br>
return request('get', url, params=params, **kwargs)<br>
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 58, in request<br>
return session.request(method=method, url=url, **kwargs)<br>
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 508, in request<br>
resp = self.send(prep, **send_kwargs)<br>
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 618, in send<br>
r = adapter.send(request, **kwargs)<br>
File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 508, in send<br>
raise ConnectionError(e, request=request)<br>
requests.exceptions.ConnectionError: HTTPConnectionPool(host='www.googla.com', port=80): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fc44c891e50>: Failed to establish a new connection: [Errno -2] Name or service not known',))
I just want see the time taken to receive response of each request.
Only one request has to be sent
Response code does not matter.

I guess you want to get something like this:
import sys
import requests
str1="http://www.googl"
str3=".com"
str2='a'
for x in range(0, 8):
y = chr(ord(str2)+x)
str_s=str1+y+str3
print('Connecting to ' + str_s)
try:
r = requests.get(str_s)
print(str_s, r.status_code, r.elapsed.total_seconds())
except requests.ConnectionError as e:
print(" Failed to open url")
In this case, using the try...except you can catch the exception that get raises and handle it in a nice way.

how do i catch SSL: CERTIFICATE_VERIFY_FAILED error python?

I am using urllib3.PoolManager to make http requests. And in some part of my code I use this code to make a request
resp = h.request(self.method, self.url, body=body, headers=headers, timeout=TIMEOUT, retries=retries)
and I get the error SSL: CERTIFICATE_VERIFY_FAILED. Below is the full stack trace.
File "/lib/python2.7/site-packages/urllib3/request.py", line 69, in request
**urlopen_kw)
File "/lib/python2.7/site-packages/urllib3/request.py", line 90, in request_encode_url
return self.urlopen(method, url, **extra_kw)
File "/lib/python2.7/site-packages/urllib3/poolmanager.py", line 248, in urlopen
response = conn.urlopen(method, u.request_uri, **kw)
File "/lib/python2.7/site-packages/urllib3/connectionpool.py", line 621, in urlopen
raise SSLError(e)
[SSL: CERTIFICATE_VERIFY_FAILED]
The error is expected. But the problem is I cannot catch the error in try except block.
I tried to use
except ssl.SSLError:
but that does not catch this error.
I also tried ssl.CertificateError but no results.
I can catch it by using the Exception class but I need to catch the specific errors and handle them differently. Can someone please find me a solution to this?

I found the solution. The exception class that was being raised is urllib3.exceptions.SSLError.

Late answer, but you can catch SSL Errors using requests.exceptions.SSLError
import requests, traceback
try:
r = requests.get('https://domain.tld')
except (requests.exceptions.SSLError):
print(traceback.format_exc())

To find the specific type of any exception, you can use type()
import requests
try:
r = requests.get('https://domain.tld')
except Exception as e:
print(type(e))
Output:
<class 'requests.exceptions.ConnectionError'>
Which leads us to:
import requests
try:
r = requests.get('https://domain.tld')
except requests.exceptions.ConnectionError as e:
print("Caught correctly")
Output:
Caught correctly

How to move on if the error occur in response on python in beautiful Soup

I have made a web crawler that takes thousands of Urls from a text file and then crawls the data on that webpage.
Now that it has many Urls; some Urls are broken too.
So it gives me the error:
Traceback (most recent call last):
File "C:/Users/khize_000/PycharmProjects/untitled3/new.py", line 57, in <module>
crawl_data("http://www.foasdasdasdasdodily.com/r/126e7649cc-sweetssssie-pies-mac-and-cheese-recipe-by-the-dr-oz-show")
File "C:/Users/khize_000/PycharmProjects/untitled3/new.py", line 18, in crawl_data
data = requests.get(url)
File "C:\Python27\lib\site-packages\requests\api.py", line 67, in get
return request('get', url, params=params, **kwargs)
File "C:\Python27\lib\site-packages\requests\api.py", line 53, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Python27\lib\site-packages\requests\sessions.py", line 468, in request
resp = self.send(prep, **send_kwargs)
File "C:\Python27\lib\site-packages\requests\sessions.py", line 576, in send
r = adapter.send(request, **kwargs)
File "C:\Python27\lib\site-packages\requests\adapters.py", line 437, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='www.foasdasdasdasdodily.com', port=80): Max retries exceeded with url: /r/126e7649cc-sweetssssie-pies-mac-and-cheese-recipe-by-the-dr-oz-show (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x0310FCB0>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed',))
Here's my code:
def crawl_data(url):
global connectString
data = requests.get(url)
response = str( data )
if response != "<Response [200]>":
return
soup = BeautifulSoup(data.text,"lxml")
titledb = soup.h1.string
But it still gives me the same exception or error.
I simply want it to ignore that Urls from which there is no response
and move on to the next Url.

You need to learn about exception handling. The easiest way to ignore these errors is to surround the code that processes a single URL with a try-except construct, making you code read something like:
try:
<process a single URL>
except requests.exceptions.ConnectionError:
pass
This will mean that if the specified exception occurs your program will just execute the pass (do nothing) statement and move on to the next

Use try-except:
def crawl_data(url):
global connectString
try:
data = requests.get(url)
except requests.exceptions.ConnectionError:
return
response = str( data )
soup = BeautifulSoup(data.text,"lxml")
titledb = soup.h1.string

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to check the connectivity for a URL in python - python

You need to either use a general exception except or catch all exceptions that requests module might throw, e.g. except (requests.exceptions.HTTPError, requests.exceptions.ConnectionError). For full list see: Correct way to try/except using Python requests module?

Related

What is the correct way to attach messages to an error?

Handeling errors in Python requests [duplicate]

Max retries exceeded with URL Error on using requests.get()

how do i catch SSL: CERTIFICATE_VERIFY_FAILED error python?

How to move on if the error occur in response on python in beautiful Soup

Categories

Resources