I have the follow code:
res = requests.get(url)
I use multi-thread method that will have the follow error:
ConnectionError: HTTPConnectionPool(host='bjtest.com', port=80): Max retries exceeded with url: /rest/data?method=check&test=123 (Caused by : [Errno 104] Connection reset by peer)
I have used the follow method, but it still have the error:
s = requests.session()
s.keep_alive = False
OR
res = requests.get(url, headers={'Connection': 'close'})
So, I should how do it?
BTW, the url is OK, but it only can be visited internal, so the url have no problem. Thanks!
you run your script on Mac? I also meet similar problem, you can execute ulimit -n to check how many files you can handle in a time.
you can use below to enlarge the configuration.
resource.setrlimit(resource.RLIMIT_NOFILE, (the number you reset,resource.RLIM_INFINITY))
hoping can help you.
my blog which associated with your problem
I got a similar case, hopefully it can save some time to you:
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8001): Max retries exceeded with url: /enroll/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x10f96ecc0>: Failed to establish a new connection: [Errno 61] Connection refused'))
The problem was actually silly... the localhost was down at port 8001! Restarting the server solved it.
The error message (which is admittedly a little confusing) actually means that requests failed to connect to your requested URL at all.
In this case that's because your url is http://bjtest.com/rest/data?method=check&test=123, which isn't a real website.
It has nothing to do with the format you made the request in. Fix your url and it should (presumably) work for you.
Related
Summarize the problem
I am trying to practice a simple example from Keith Galli tutorials, and scraping some headers from target web. But I don't know why Keith can scrap successfully from his tutorial but I can't. Now I am working at company, if I replace this url by "company website", it works. I don't know if it is a proxy issue or some else.
The code is like this:
# Load the webpage content
r = requests.get("https://keithgalli.github.io/web-scraping/example.html")
# Convert to a beautiful soup object
soup = bs(r.content)
# Print out our html
print(soup.prettify())
And my prompt is:
ConnectionError: HTTPSConnectionPool(host='keithgalli.github.io', port=443): Max retries exceeded with url: /web-scraping/example.html (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x0000026C42D80190>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))
Thanks~
The link is to the example.html is fine. Either you don't have an internet connection or some network security policy is blocking your connection to the example.html.
well i've been trying to download a File directly from a URL, the thing is, when I do it at home it works perfectly fine, but when I try to run it at my company's server it doesn't work.
My code is:
import requests
url = "https://servicos.ibama.gov.br/ctf/publico/areasembargadas/downloadListaAreasEmbargadas.php"
response = requests.get(url, verify=False)
url = open("teste.zip", "w")
url.write(response.content)
url.close()
and the error message that i get is:
HTTPSConnectionPool(host='servicos.ibama.gov.br', port=443): Max retries exceeded with url: /ctf/publico/areasembargadas/downloadListaAreasEmbargadas.php (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x000001F2F792C0F0>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))
I konw that we use proxy here, i've been looking for solutions for this problem and I could see that this is relevant, but I couldn't find something to do with this information
I have this host: http://retsau.torontomls.net:7890/and I want to access http://retsau.torontomls.net:7890/rets-treb3pv/server/login, how can I accomplish this using Python Requests? All my attempts till now have failed.
I also followed the solution here - Python Requests - Use navigate site by servers IP and came up with this -
response = requests.get(http://206.152.41.279/rets-treb3pv/server/login, headers={'Host': retsau.torontomls.net})
but that resulted in this error:
requests.exceptions.ConnectionError: HTTPConnectionPool(host='206.152.41.279', port=80): Max retries exceeded with url: /rets-treb3pv/server/login (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x10a4f6d84>: Failed to establish a new connection: [Errno 60] Operation timed out',))
Funny thing is everything seems to work perfectly fine on Postman, I am able to access all sorts of URLs on that server, from logging in to searching for something.
You left out the port number (7890) from the URL to your get call:
response = requests.get('http://206.152.41.279:7890/rets-treb3pv/server/login', headers={'Host': 'retsau.torontomls.net'})
# ^^^^ Add this
Also, unless you actually have a specific reason for accessing the site by IP address, it would make more sense to put the FQDN in the URL rather than the Host header:
response = requests.get('http://retsau.torontomls.net:7890/rets-treb3pv/server/login')
I keep getting the following error intermittently, i'm suspecting a proxy on the network is causing this, since i can run my python script when using a different connection.
I'm using python 2.7 and also using Fiddler to help with the proxy authentication.
SSLError: HTTPSConnectionPool(host='api.bogus.com', port=443): Max
retries exceeded with url: /api/v1/oauth2/token (Caused by
SSLError(SSLError("bad handshake: SysCallError(-1, 'Unexpected
EOF')",),))
At the moment I am having some limited success with the following setup, it does fail quite often but it manages to work a few times. I am using a session with the following parameters:
def get_session():
session = requests.Session()
# limit connection pool and connection numbers
session.mount('https://', HTTPAdapter(max_retries=5, pool_connections=100, pool_maxsize=100))
# make sure all connections are closed
session.headers.update({"Connection": "close"})
return session
When I was crawling data from a webpage, I got a max retry error. And although I've searched for it online, the Errno code seems to be different from what I had.
requests.exceptions.ConnectionError: HTTPConnectionPool(host={},port={}): Max retries exceeded with url: {} (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at {}>: Failed to establish a new connection: [Errno 60] Operation timed out'
I'm crawling websites with different remote host addresses, and only for this one address my script went wrong. In other cases it worked as usual. I tried to add time.sleep() but it didn't help with this error. So I think this error is not because I'm sending too many requests to the server.
I'd appreciate any help. Thank you!
The url it is failing on:
http://222.175.25.10:8403/ajax/npublic/NData.ashx?jsoncallback=jQuery1111054523240929524232_1457362751668&Method=GetMonitorDataList&entCode=37150001595&subType=&subID=&year=2016&itemCode=&dtStart=2015-01-01&dtEnd=2015-12-31&monitoring=1&bReal=false&page=1&rows=500&_=1457362751769
(Because the pages I'm crawling are generated by js so I reconstructed the url myself. )
Update: it is working now! The reason seems to be just that the website timed out.