Getting (requests.exceptions.ConnectionError) using requests.get(URL) - python

I get requests.exceptions.ConnectionError error when I'm Running following code:
from requests import *
from bs4 import *
URL = "https://www.ldoceonline.com/dictionary/"
response = get(URL)
But when I test it with another URL, It works. I really want to scrape this website. How to fix this error?
Complete error note:
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "e:/amirhossein/Project/Programming/L/Longmandict.py", line 5, in <module>
response = get(URL,verify=False)
File "C:\Users\User\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\requests\api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "C:\Users\User\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\requests\api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Users\User\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\requests\sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "C:\Users\User\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\requests\sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "C:\Users\User\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.7_qbz5n2kfra8p0\LocalCache\local-packages\Python37\site-packages\requests\adapters.py", line 498, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

Apparently the server needs correct User-Agent HTTP header to be set:
import requests
from bs4 import BeautifulSoup
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:92.0) Gecko/20100101 Firefox/92.0"
}
url = "https://www.ldoceonline.com/dictionary/"
soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
print(soup.title.text)
Prints:
Longman English Dictionaries | Meanings, thesaurus, collocations and grammar

Related

Beautifulsoup: requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response

I am trying to build a python webscraper with beautifulsoup4. If I run the code on my Macbook the script works, but if I let the script run on my homeserver (ubuntu vm) I get the following error msg (see below). I tried a vpn connection and multiple headers without success.
Highly appreciate your feedback on how to get the script working. THANKS!
Here the error msg:
{'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US) AppleWebKit/534.7 (KHTML, like Gecko) Chrome/7.0.517.41 Safari/534.7 ChromePlus/1.5.0.0alpha1'}
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "/usr/lib/python3.10/http/client.py", line 1374, in getresponse
response.begin()
File "/usr/lib/python3.10/http/client.py", line 318, in begin
version, status, reason = self._read_status()
File "/usr/lib/python3.10/http/client.py", line 287, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response
[...]
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
[Finished in 15.9s with exit code 1]
Here my code:
from bs4 import BeautifulSoup
import requests
import pyuser_agent
URL = f"https://www.edmunds.com/inventory/srp.html?radius=5000&sort=publishDate%3Adesc&pagenumber=2"
ua = pyuser_agent.UA()
headers = {'User-Agent': ua.random}
print(headers)
response = requests.get(url=URL, headers=headers)
soup = BeautifulSoup(response.text, 'lxml')
overview = soup.find()
print(overview)
I tried multiple headers, but do not get a result
Try to use real web-browser User Agent instead random one from pyuser_agent. For example:
import requests
from bs4 import BeautifulSoup
URL = f"https://www.edmunds.com/inventory/srp.html?radius=5000&sort=publishDate%3Adesc&pagenumber=2"
headers = {"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:108.0) Gecko/20100101 Firefox/108.0"}
response = requests.get(url=URL, headers=headers)
soup = BeautifulSoup(response.text, "lxml")
overview = soup.find()
print(overview)
The possible explanation could be that server keeps a list of real-world User Agents and don't serve any page to some non-existent ones.
I'm pretty bad at figuring out the right set of headers and cookies, so in these situations, I often end up resorting to:
either cloudscraper
response = cloudscraper.create_scraper().get(URL)
or HTMLSession - which is particularly nifty in that it also parses the HTML and has some JavaScript support as well
response = HTMLSession().get(URL)

Python requests: using proxy but get connect 'Connection aborted'

My vpn works WELL, and can visit google(from China).
sock5 Port of my VPN: 1080
But when I run the following code, I get error.
import requests
headers = {'user-agent': ''}
proxies = {"http": "socks5://127.0.0.1:1080",'https': 'socks5://127.0.0.1:1080'}
# url = 'https://www.baidu.com/'
url = 'https://www.google.com/search?q=python' #
res = requests.get(url, headers=headers, proxies=proxies)
print("res.status_code:\n",res.status_code)
if I remove , proxies=proxies, and change the url to baidu it works.
...
url = 'https://www.baidu.com/'
# url = 'https://www.google.com/search?q=python'
res = requests.get(url, headers=headers)
print("res.status_code:\n",res.status_code)
the error in 3:
Traceback (most recent call last):
File "Try.py", line 17, in <module>
res = requests.get(url, headers=headers, proxies=proxies)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/requests-2.22.0-py3.7.egg/requests/api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/requests-2.22.0-py3.7.egg/requests/api.py", line 60, in request
return session.request(method=method, url=url, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/requests-2.22.0-py3.7.egg/requests/sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/requests-2.22.0-py3.7.egg/requests/sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/requests-2.22.0-py3.7.egg/requests/adapters.py", line 498, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', OSError(0, 'Error'))
from 1 to 4, 1 is contradictory to 4. I don't really know where the problem is. I'd be extremely grateful If someone can help.
Solved by substituting sock5 with sock5h

SSLError: request module cannot connect via https

What am I missing?
HINT: I've also tried using urllib module
import requests
import sys
import time
import random
headers = {"User-Agent": "Mozilla/5.0 (X11; U; Linux i686) Gecko/20071127 Firefox/25.0"}
url = "HTTP LINK TO YOUTUBE VIDEO"
views = 10
videoMins = 3
videoSec = 33
refreshRate = videoMins * 60 + videoSec
proxy_list = [
{"http":"49.156.37.30:65309"}, {"http":"160.202.42.106:8080"},
{"http":"218.248.73.193:808"}, {"http":"195.246.57.154:8080"},
{"http":"80.161.30.156:80"}, {"http":"122.228.25.97:8101"},
{"http":"165.84.167.54:8080"},{"https":"178.140.216.229:8080"},
{"https":"46.37.193.74:3128"},{"https":"5.1.27.124:53281"},
{"https":"196.202.194.127:62225"},{"https":"194.243.194.51:8080"},
{"https":"206.132.165.246:8080"},{"https":"92.247.127.177:3128"}]
proxies = random.choice(proxy_list)
while True:
for view in range(views): # to loop in the number of allocated views
s = requests.Session()
s.get(url, headers=headers, proxies=proxies, stream=True, timeout=refreshRate)
s.close()
time.sleep(60) # time between loops so we appear real
sys.exit()
Here's the traceback error I got:
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "pytest.py", line 24, in <module>
s.get(url, headers=headers, proxies=proxies, stream=True,
timeout=refreshRate)
File "C:\Python\lib\site-packages\requests\sessions.py", line 521, in get
return self.request('GET', url, **kwargs)
File "C:\Python\lib\site-packages\requests\sessions.py", line 508, in
request
resp = self.send(prep, **send_kwargs)
File "C:\Python\lib\site-packages\requests\sessions.py", line 640, in send
history = [resp for resp in gen] if allow_redirects else []
File "C:\Python\lib\site-packages\requests\sessions.py", line 640, in
<listcomp>
history = [resp for resp in gen] if allow_redirects else []
File "C:\Python\lib\site-packages\requests\sessions.py", line 218, in
resolve_redirects
**adapter_kwargs
File "C:\Python\lib\site-packages\requests\sessions.py", line 618, in send
r = adapter.send(request, **kwargs)
File "C:\Python\lib\site-packages\requests\adapters.py", line 506, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='www.youtube.com',
port=443): Max retries exceed
ch?v=dHUP25DkKWo (Caused by SSLError(SSLError("bad handshake:
SysCallError(-1, 'Unexpected EOF')",),))
I suspect max retries from youtube. But its confusing because I'm connecting via random proxies. If that's the case, maybe the proxies aren't working...or no https connection was made.

Python: Requests patch method doesn't work

I have the code below which works fine and brings back what I need
import requests
from requests.auth import HTTPBasicAuth
response = requests.get('https://example/answers/331', auth=HTTPBasicAuth('username', 'password'),json={"solution": "12345"})
print response.content
However when I change it to a patch method, which is accepted by the server, I get the following errors. Any idea on why?
Traceback (most recent call last):
File "auth.py", line 8, in <module>
response = requests.patch('https://example/answers/331', auth=HTTPBasicAuth('username', 'password'),json={"solution": "12345"})
File "C:\Python27\lib\site-packages\requests-2.12.0-py2.7.egg\requests\api.py", line 138, in patch
return request('patch', url, data=data, **kwargs)
File "C:\Python27\lib\site-packages\requests-2.12.0-py2.7.egg\requests\api.py", line 56, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Python27\lib\site-packages\requests-2.12.0-py2.7.egg\requests\sessions.py", line 488, in request
resp = self.send(prep, **send_kwargs)
File "C:\Python27\lib\site-packages\requests-2.12.0-py2.7.egg\requests\sessions.py", line 609, in send
r = adapter.send(request, **kwargs)
File "C:\Python27\lib\site-packages\requests-2.12.0-py2.7.egg\requests\adapters.py", line 473, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', BadStatusLine("''",))
Thanks
Try using a POST request with the following header: X-HTTP-Method-Override: PATCH
This is unique to the Oracle Service Cloud REST API implementation and is documented.
In cases where the browser or client application does not support PATCH requests, or network intermediaries block PATCH requests, HTTP tunneling can be used with a POST request by supplying an X-HTTP-Method-Override header.
Example:
import requests
restURL = <Your REST URL>
params = {'field': 'val'}
headers = {'X-HTTP-Method-Override':'PATCH'}
try:
resp = requests.post(restURL, json=params, auth=('<uname>', '<pwd>'), headers=headers)
print resp
except requests.exceptions.RequestException as err:
errMsg = "Error: %s" % err
print errMsg

Python requests.exceptions.ConnectionError: HTTPSConnectionPool : Max retries exceeded with url: [Errno 111] Connection refused)

So I am trying to use an API (http://twittercounter.com/pages/api) to get some data from the net. While using my API key direcly via browser, I am getting the required results. But on using requests.get() function in python, I am getting an error, the traceback is given here.
code:
>>> import requests
>>> r = requests.get('https://api.twittercounter.com/?apikey=XXXX&twitter_id=57947109')
traceback:
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/site-packages/requests/api.py", line 55, in get
return request('get', url, **kwargs)
File "/usr/lib/python2.7/site-packages/requests/api.py", line 44, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 382, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 485, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python2.7/site-packages/requests/adapters.py", line 372, in send
raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='api.twittercounter.com', port=443): Max retries exceeded with url: /?apikey=XXXX&twitter_id=57947109 (Caused by <class 'socket.error'>: [Errno 111] Connection refused)
I made about 10 connections with this key, and the rate limit is 100, so I am sure I am not exceeding the limit. Can anyone please help. I am pretty much a noob with requests and http.
EDIT: Tried to set browser agent in the request headers
I tried this to change the browser agent, and it still does not work
>>> headers = {
... 'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:24.0) Gecko/20100101 Firefox/24.0'}
>>> url = 'https://api.twittercounter.com/?apikey=09792d72d848c55a5b6b9a1bf3bb225a&twitter_id=57947109'
>>> response = requests.get(url, headers=headers)
Am getting the same traceback as last time.
This method worked perfectly for me:
>>> import requests
>>> url = 'http://api.twittercounter.com/'
>>> payload = {'apikey': 'XXXXXX', 'twitter_id':53687449}
>>> requests.get(url, params=payload)
<Response [200]>
While I couldn't resolve this problem myself, I found a workaround to this, wherein this works, instead of using requests, I had to use httplib and then the ast module. The code I used is:
>>> import httplib
>>> conn = httplib.HTTPConnection("api.twittercounter.com")
>>> conn.request("GET", "?apikey=XXXX&twitter_id=15160529")
>>> r1 = conn.getresponse()
>>> a = r1.read()
>>> import ast
>>> b = ast.literal_eval(a)
Now, b is a dictionary that has whatever data I am looking for. However, if someone can tell me the proper solution to this error, it would be very useful.

Categories

Resources