Max retries exceeded with URL Error on using requests.get() - python

I'm trying to write a script in python to send HTTP get request to automatically generated URLs and get its response code and elapsed time. The URLs need not necessarily be a valid one, 400 responses are acceptable too.
script1.py
import sys
import requests
str1="http://www.googl"
str3=".com"
str2='a'
for x in range(0, 8):
y = chr(ord(str2)+x)
str_s=str1+y+str3
r=requests.get(str_s)
print(str_s, r.status_code, r.elapsed.total_seconds())
Error:
File "script1.py", line 12, in <module><br>
r=requests.get(str_s)<br>
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 72, in get<br>
return request('get', url, params=params, **kwargs)<br>
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 58, in request<br>
return session.request(method=method, url=url, **kwargs)<br>
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 508, in request<br>
resp = self.send(prep, **send_kwargs)<br>
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 618, in send<br>
r = adapter.send(request, **kwargs)<br>
File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 508, in send<br>
raise ConnectionError(e, request=request)<br>
requests.exceptions.ConnectionError: HTTPConnectionPool(host='www.googla.com', port=80): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fc44c891e50>: Failed to establish a new connection: [Errno -2] Name or service not known',))
I just want see the time taken to receive response of each request.
Only one request has to be sent
Response code does not matter.

I guess you want to get something like this:
import sys
import requests
str1="http://www.googl"
str3=".com"
str2='a'
for x in range(0, 8):
y = chr(ord(str2)+x)
str_s=str1+y+str3
print('Connecting to ' + str_s)
try:
r = requests.get(str_s)
print(str_s, r.status_code, r.elapsed.total_seconds())
except requests.ConnectionError as e:
print(" Failed to open url")
In this case, using the try...except you can catch the exception that get raises and handle it in a nice way.

Related

How to check the connectivity for a URL in python

My requirement is almost same as Requests — how to tell if you're getting a success message?
But I need to print error whenever I could not reach the URL..Here is my try..
# setting up the URL and checking the conection by printing the status
url = 'https://www.google.lk'
try:
page = requests.get(url)
print(page.status_code)
except requests.exceptions.HTTPError as err:
print("Error")
The issue is rather than printing just "Error" it prints a whole error msg as below.
Traceback (most recent call last):
File "testrun.py", line 22, in <module>
page = requests.get(url)
File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/requests/api.py", line 76, in get
return request('get', url, params=params, **kwargs)
File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/requests/sessions.py", line 530, in request
resp = self.send(prep, **send_kwargs)
File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/requests/sessions.py", line 643, in send
r = adapter.send(request, **kwargs)
File "/root/anaconda3/envs/py36/lib/python3.6/site-packages/requests/adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='learn.microsoft.com', port=443): Max retries exceeded with url: /en-us/microsoft-365/enterprise/urls-and-ip-address-ranges?view=o365-worldwide (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7ff91a543198>: Failed to establish a new connection: [Errno 110] Connection timed out',))
Can someone show me how should I modify my code to just print "Error" only if there is any issue? Then I can extend it to some other requirement.
You're not catching the correct exception.
import requests
url = 'https://www.googlggggggge.lk'
try:
page = requests.get(url)
print(page.status_code)
except (requests.exceptions.HTTPError, requests.exceptions.ConnectionError):
print("Error")
you can also do except Exception, however note that Exception is too broad and is not recommended in most cases since it traps all errors
You need to either use a general exception except or catch all exceptions that requests module might throw, e.g. except (requests.exceptions.HTTPError, requests.exceptions.ConnectionError).
For full list see: Correct way to try/except using Python requests module?

Max retries exceed with URL (Caused by NewConnection Error)

I am trying to create code that scrapes and downloads specific files from archive.org. When I run the program, I run into this code error.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\ROMS\Gamecube\main.py", line 16, in <module>
response = requests.get(DOMAIN + file_link)
File "C:\Users\cycle\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\api.py", line 76, in get
return request('get', url, params=params, **kwargs)
File "C:\Users\cycle\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Users\cycle\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\sessions.py", line 530, in request
resp = self.send(prep, **send_kwargs)
File "C:\Users\cycle\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\sessions.py", line 643, in send
r = adapter.send(request, **kwargs)
File "C:\Users\cycle\AppData\Local\Programs\Python\Python38-32\lib\site-packages\requests\adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='archive.org007%20-%20agent%20under%20fire%20%28usa%29.nkit.gcz', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x043979B8>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))
This is my code:
from bs4 import BeautifulSoup as bs
import requests
DOMAIN = 'https://archive.org'
URL = 'https://archive.org/download/GCRedumpNKitPart1'
FILETYPE = '%28USA%29.nkit.gcz'
def get_soup(url):
return bs(requests.get(url).text, 'html.parser')
for link in get_soup(URL).find_all('a'):
file_link = link.get('href')
if FILETYPE in file_link:
print(file_link)
with open(link.text, 'wb') as file:
response = requests.get(DOMAIN + file_link)
file.write(response.content)
You simply forgot / after https://archive.org so you create incorrect urls.
Add / at the end of domain
DOMAIN = 'https://archive.org/'
or add / later
response = requests.get(DOMAIN + '/' + file_link)
or use urllib.parse.urljoin() to create urls
import urllib.parse
response = requests.get(urllib.parse.urljoin(DOMAIN, file_link))

How to move on if the error occur in response on python in beautiful Soup

I have made a web crawler that takes thousands of Urls from a text file and then crawls the data on that webpage.
Now that it has many Urls; some Urls are broken too.
So it gives me the error:
Traceback (most recent call last):
File "C:/Users/khize_000/PycharmProjects/untitled3/new.py", line 57, in <module>
crawl_data("http://www.foasdasdasdasdodily.com/r/126e7649cc-sweetssssie-pies-mac-and-cheese-recipe-by-the-dr-oz-show")
File "C:/Users/khize_000/PycharmProjects/untitled3/new.py", line 18, in crawl_data
data = requests.get(url)
File "C:\Python27\lib\site-packages\requests\api.py", line 67, in get
return request('get', url, params=params, **kwargs)
File "C:\Python27\lib\site-packages\requests\api.py", line 53, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Python27\lib\site-packages\requests\sessions.py", line 468, in request
resp = self.send(prep, **send_kwargs)
File "C:\Python27\lib\site-packages\requests\sessions.py", line 576, in send
r = adapter.send(request, **kwargs)
File "C:\Python27\lib\site-packages\requests\adapters.py", line 437, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='www.foasdasdasdasdodily.com', port=80): Max retries exceeded with url: /r/126e7649cc-sweetssssie-pies-mac-and-cheese-recipe-by-the-dr-oz-show (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x0310FCB0>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed',))
Here's my code:
def crawl_data(url):
global connectString
data = requests.get(url)
response = str( data )
if response != "<Response [200]>":
return
soup = BeautifulSoup(data.text,"lxml")
titledb = soup.h1.string
But it still gives me the same exception or error.
I simply want it to ignore that Urls from which there is no response
and move on to the next Url.
You need to learn about exception handling. The easiest way to ignore these errors is to surround the code that processes a single URL with a try-except construct, making you code read something like:
try:
<process a single URL>
except requests.exceptions.ConnectionError:
pass
This will mean that if the specified exception occurs your program will just execute the pass (do nothing) statement and move on to the next
Use try-except:
def crawl_data(url):
global connectString
try:
data = requests.get(url)
except requests.exceptions.ConnectionError:
return
response = str( data )
soup = BeautifulSoup(data.text,"lxml")
titledb = soup.h1.string

Python error for request.get

I am trying to write a Python script which enables me to acces a webpage and download a file from that page. My first attempt was to simply get to that page and i tried the following code:
import requests
url = 'https://www.google.com/?gws_rd=ssl' #using google as an example
r = requests.get(url)
print(r.url)
I am given this error:
runfile('C:/Users/ME/Desktop/TMS502.py', wdir='C:/Users/ME/Desktop')
Traceback (most recent call last):
File "<ipython-input-23-bc585dcceef8>", line 1, in <module>
runfile('C:/Users/ME/Desktop/TMS502.py', wdir='C:/Users/ME/Desktop')
File "C:\Users\ME\AppData\Local\Continuum\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 585, in runfile
execfile(filename, namespace)
File "C:/Users/ME/Desktop/TMS502.py", line 16, in <module>
r = requests.get(url)
File "C:\Users\ME\AppData\Local\Continuum\Anaconda\lib\site-packages\requests\api.py", line 55, in get
return request('get', url, **kwargs)
File "C:\Users\ME\AppData\Local\Continuum\Anaconda\lib\site-packages\requests\api.py", line 44, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Users\ME\AppData\Local\Continuum\Anaconda\lib\site-packages\requests\sessions.py", line 456, in request
resp = self.send(prep, **send_kwargs)
File "C:\Users\ME\AppData\Local\Continuum\Anaconda\lib\site-packages\requests\sessions.py", line 559, in send
r = adapter.send(request, **kwargs)
File "C:\Users\ME\AppData\Local\Continuum\Anaconda\lib\site-packages\requests\adapters.py", line 375, in send
raise ConnectionError(e, request=request)
ConnectionError: HTTPSConnectionPool(host='www.google.com', port=443): Max retries exceeded with url: /?gws_rd=ssl (Caused by <class 'socket.error'>: [Errno 10054] An existing connection was forcibly closed by the remote host)
Can someone please help me?
You are getting that error because the remote side (in this case Google) is closing your requests or you are otherwise no longer able to establish a connection to it.
From the error:
ConnectionError: HTTPSConnectionPool(host='www.google.com', port=443):
Max retries exceeded with url: /?gws_rd=ssl
(Caused by <class 'socket.error'>: [Errno 10054] An existing connection was forcibly closed by the remote host)
We can look into the source for a hint:
class MaxRetryError(RequestError):
"""Raised when the maximum number of retries is exceeded.
:param pool: The connection pool
:type pool: :class:`~urllib3.connectionpool.HTTPConnectionPool`
:param string url: The requested Url
:param exceptions.Exception reason: The underlying error
"""
def __init__(self, pool, url, reason=None):
self.reason = reason
message = "Max retries exceeded with url: %s (Caused by %r)" % (
url, reason)
RequestError.__init__(self, pool, url, message)
Try another example host and your code should work, such as https://example.org.
The error message "An existing connection was forcibly closed by the remote host" is coming from your operating system (Windows) and Requests is showing you this text in an attempt to be helpful.
Your code is fine.
I guess the problem is coursed by url = 'https://www.google.com/?gws_rd=ssl'.
Maybe your network can't reach www.google.com, try another url.
y are you bothering with requests anyway:
from urllib2 import urlopen
u = urlopen("https://www.google.com/?gws_rd=ssl")
data = u.read()
u.close()
Maybe this'll work.

handling [Errno 111] Connection refused return by requests in flask

I have my backend developed in java which does all kind of processing. And my frontend is developed using python's flask framework. I am using requests to send a request and get a response from the apis present in java.
Following is the line in my code which does that:
req = requests.post(buildApiUrl.getUrl('user') + "/login", data=payload)
My problem is, sometimes when the tomcat instance is not running or there is some issue with java apis, I always get an error from requests as follows:
ERROR:root:HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: /MYAPP/V1.0/user/login (Caused by <class 'socket.error'>: [Errno 111] Connection refused)
Traceback (most recent call last):
File "/home/rahul/git/myapp/webapp/views/utils.py", line 31, in decorated_view
return_value = func(*args, **kwargs)
File "/home/rahul/git/myapp/webapp/views/public.py", line 37, in login
req = requests.post(buildApiUrl.getUrl('user') + "/login", data=payload)
File "/home/rahul/git/myapp/venv/local/lib/python2.7/site-packages/requests/api.py", line 88, in post
return request('post', url, data=data, **kwargs)
File "/home/rahul/git/myapp/venv/local/lib/python2.7/site-packages/requests/api.py", line 44, in request
return session.request(method=method, url=url, **kwargs)
File "/home/rahul/git/myapp/venv/local/lib/python2.7/site-packages/requests/sessions.py", line 335, in request
resp = self.send(prep, **send_kwargs)
File "/home/rahul/git/myapp/venv/local/lib/python2.7/site-packages/requests/sessions.py", line 438, in send
r = adapter.send(request, **kwargs)
File "/home/rahul/git/myapp/venv/local/lib/python2.7/site-packages/requests/adapters.py", line 327, in send
raise ConnectionError(e)
ConnectionError: HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: /MYAPP/V1.0/user/login (Caused by <class 'socket.error'>: [Errno 111] Connection refused)
I want to handle any such errors that I receive in my flask app so that I can give the necessary response on the web page instead of showing blank screen. So how can I achieve this?
Catch the exception request.post raises using try-except:
try:
req = requests.post(buildApiUrl.getUrl('user') + "/login", data=payload)
except requests.exceptions.RequestException:
# Handle exception ..

Categories

Resources