Is it possible to check number of HTTPConnection request made - python

I have a uWSGI server running in a linux VM node where multiple requests are made to this.
At only some point there are some errors like ReadTimeout, HTTPConnectionPool and recovered automatically.
ConnectionError: HTTPConnectionPool(host='10.1.1.1', port=8000): Max retries exceeded with url: /app_servers (Caused by NewConnectionError('<requests.packages.urllib3.connection.
HTTPConnection object at 0x7f16e8a89190>: Failed to establish a new connection: [Errno 101] Network is unreachable',))
Is it due to requests exceeded ? or some network lookup issue.
I tried using netstat and sar command to identify the root cause, but CPU and IO stats are fine.
No of establisbed connected(ESTABLISHED) and CLOSE_WAIT state requests are also less. Not sure how to check for the past time.
How to check the number of http connection made at that point of time or why the HTTPConnectionPool (Max url exceeds)error occurs

Related

How to overcome the problem of maximum response limit in web scrapping?

I'm trying to scrap Yahoo finance for stock market info. As I want the data of whole NASDAQ(over 8000 stocks), I'm using multithreading to reduce the execution time. the problem is that it seems Yahoo only allows certain number of my requests to be responded and block all others, giving the error:
equests.exceptions.ConnectionError: HTTPSConnectionPool(host='finance.yahoo.com', port=443): Max retries exceeded with url: /quote/CAT?p=CAT&.tsrc=fin-srch (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x0000024386785A30>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond'))
Is there any way for resolving the issue?

Max retries exceed with url (Failed to establish a new connection: [Errno 110] Connection timed out)

raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='mycompanyurl.in',
port=443): Max retries exceeded with url: /api/v1/issues.json (Caused by
NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSConnection object
at 0x51047d0>: Failed to establish a new connection: [Errno 110] Connection timed out',))
However, mycompanyurl.in is fine & I can open it in a browser as well.
I'm using Python 2.7.5.
I faced this issue earlier and in my case the IP address of our server was not allowed to access the APIs by the APIs provider. So maybe you should contact with your API's provider to whitelist your server IP.
I don't know exactly about your infrastructure, but I resolved the same issue.
For example, assuming you have AWS EC2 (internal IP:150.150.150.150 and external IP: 10.10.10.10) and a second one for any API (internal IP:x.x.x.x and external IP: y.y.y.y). Now, you want to call API on the second EC2.
If in you a security group of the second EC2 allow, for example, port 5000 on HTTP protocol for internal IP (150.150.150.150) of the first EC2 you will have this issue. When you write down the external IP (10.10.10.10) you will be successful.
To be close to your case, I would like to suggest check the security group/policy on the instance where 'mycompanyurl.in' is located. Maybe, there, something is wrong

How solve python requests error: "Max retries exceeded with url"

I have the follow code:
res = requests.get(url)
I use multi-thread method that will have the follow error:
ConnectionError: HTTPConnectionPool(host='bjtest.com', port=80): Max retries exceeded with url: /rest/data?method=check&test=123 (Caused by : [Errno 104] Connection reset by peer)
I have used the follow method, but it still have the error:
s = requests.session()
s.keep_alive = False
OR
res = requests.get(url, headers={'Connection': 'close'})
So, I should how do it?
BTW, the url is OK, but it only can be visited internal, so the url have no problem. Thanks!
you run your script on Mac? I also meet similar problem, you can execute ulimit -n to check how many files you can handle in a time.
you can use below to enlarge the configuration.
resource.setrlimit(resource.RLIMIT_NOFILE, (the number you reset,resource.RLIM_INFINITY))
hoping can help you.
my blog which associated with your problem
I got a similar case, hopefully it can save some time to you:
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8001): Max retries exceeded with url: /enroll/ (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x10f96ecc0>: Failed to establish a new connection: [Errno 61] Connection refused'))
The problem was actually silly... the localhost was down at port 8001! Restarting the server solved it.
The error message (which is admittedly a little confusing) actually means that requests failed to connect to your requested URL at all.
In this case that's because your url is http://bjtest.com/rest/data?method=check&test=123, which isn't a real website.
It has nothing to do with the format you made the request in. Fix your url and it should (presumably) work for you.

Python requests.get(url, timeout=75) does not wait for specified timeout

requests.get("http://172.19.235.178", timeout=75)
is my piece of code.
It is trying a get request on the url which is a phone and is supposed to wait upto 75 seconds for it to return a 200OK.
This request works perfectly on one Ubuntu machine but does not wait for 75 seconds on another machine.
according the documentation on https://2.python-requests.org/en/master/user/advanced/#timeouts you can set a timeout in the requests connection part but the timeout you are encountering is an OS related socket timeout.
notice that if you do:
requests.get("http://172.19.235.178", timeout=1)
you get:
ConnectTimeout: HTTPConnectionPool(host='172.19.235.178', port=80):
Max retries exceeded with url: / (Caused by
ConnectTimeoutError(, 'Connection to 172.19.235.178 timed out. (connect
timeout=1)'))
while when you do
requests.get("http://172.19.235.178", timeout=75)
you get:
ConnectionError: HTTPConnectionPool(host='172.19.235.178', port=80): Max
retries exceeded with url: / (Caused by
NewConnectionError(': Failed to establish a new connection: [Errno
10060] A connection attempt failed because the connected party did not
properly respond after a period of time, or established connection
failed because connected host has failed to respond',))
while you could change you OS behavior as stated here: http://willbryant.net/overriding_the_default_linux_kernel_20_second_tcp_socket_connect_timeout
In your case I would put a timeout of 10 and iterate over it a few times with a try except statement

requests exception: max retries exceeded, Errno 60 operation timed out

When I was crawling data from a webpage, I got a max retry error. And although I've searched for it online, the Errno code seems to be different from what I had.
requests.exceptions.ConnectionError: HTTPConnectionPool(host={},port={}): Max retries exceeded with url: {} (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at {}>: Failed to establish a new connection: [Errno 60] Operation timed out'
I'm crawling websites with different remote host addresses, and only for this one address my script went wrong. In other cases it worked as usual. I tried to add time.sleep() but it didn't help with this error. So I think this error is not because I'm sending too many requests to the server.
I'd appreciate any help. Thank you!
The url it is failing on:
http://222.175.25.10:8403/ajax/npublic/NData.ashx?jsoncallback=jQuery1111054523240929524232_1457362751668&Method=GetMonitorDataList&entCode=37150001595&subType=&subID=&year=2016&itemCode=&dtStart=2015-01-01&dtEnd=2015-12-31&monitoring=1&bReal=false&page=1&rows=500&_=1457362751769
(Because the pages I'm crawling are generated by js so I reconstructed the url myself. )
Update: it is working now! The reason seems to be just that the website timed out.

Categories

Resources