I have a Python script that downloads product feeds from multiple affiliates in different ways. This didn't give me any problems until last Wednesday, when it started throwing all kinds of timeout exceptions from different locations.
Examples: Here I connect with a FTP service:
ftp = FTP(host=self.host)
threw:
Exception in thread Thread-7:
Traceback (most recent call last):
File "C:\Python27\Lib\threading.py", line 810, in __bootstrap_inner
self.run()
File "C:\Python27\Lib\threading.py", line 763, in run
self.__target(*self.__args, **self.__kwargs)
File "C:\Users\Administrator\Documents\Crawler\src\Crawlers\LDLC.py", line 23, in main
ftp = FTP(host=self.host)
File "C:\Python27\Lib\ftplib.py", line 120, in __init__
self.connect(host)
File "C:\Python27\Lib\ftplib.py", line 138, in connect
self.welcome = self.getresp()
File "C:\Python27\Lib\ftplib.py", line 215, in getresp
resp = self.getmultiline()
File "C:\Python27\Lib\ftplib.py", line 201, in getmultiline
line = self.getline()
File "C:\Python27\Lib\ftplib.py", line 186, in getline
line = self.file.readline(self.maxline + 1)
File "C:\Python27\Lib\socket.py", line 476, in readline
data = self._sock.recv(self._rbufsize)
timeout: timed out
Or downloading an XML File :
xmlFile = urllib.URLopener()
xmlFile.retrieve(url, self.feedPath + affiliate + "/" + website + '.' + fileType)
xmlFile.close()
throws:
File "C:\Users\Administrator\Documents\Crawler\src\Crawlers\FeedCrawler.py", line 106, in save
xmlFile.retrieve(url, self.feedPath + affiliate + "/" + website + '.' + fileType)
File "C:\Python27\Lib\urllib.py", line 240, in retrieve
fp = self.open(url, data)
File "C:\Python27\Lib\urllib.py", line 208, in open
return getattr(self, name)(url)
File "C:\Python27\Lib\urllib.py", line 346, in open_http
errcode, errmsg, headers = h.getreply()
File "C:\Python27\Lib\httplib.py", line 1139, in getreply
response = self._conn.getresponse()
File "C:\Python27\Lib\httplib.py", line 1067, in getresponse
response.begin()
File "C:\Python27\Lib\httplib.py", line 409, in begin
version, status, reason = self._read_status()
File "C:\Python27\Lib\httplib.py", line 365, in _read_status
line = self.fp.readline(_MAXLINE + 1)
File "C:\Python27\Lib\socket.py", line 476, in readline
data = self._sock.recv(self._rbufsize)
IOError: [Errno socket error] timed out
These are just two examples but there are other methods, like authenticate or other API specific methods where my script throws these timeout errors. It never showed this behavior until Wednesday. Also, it starts throwing them at random times. Sometimes at the beginning of the crawl, sometimes later on. My script has this behavior on both my server and my local machine. I've been struggling with it for two days now but can't seem to figure it out.
This is what I know might have caused this:
On Wednesday one affiliate script broke down with the following error:
URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:581)>
I didn't change anything to my script but suddenly it stopped crawling that affiliate and threw that error all the time where I tried to authenticate. I looked it up and found that is was due to an OpenSSL error (where did that come from). I fixed it by adding the following before the authenticate method:
if hasattr(ssl, '_create_unverified_context'):
ssl._create_default_https_context = ssl._create_unverified_context
Little did I know, this was just the start of my problems... At that same time, I changed from Python 2.7.8 to Python 2.7.9. It seems that this is the moment that everything broke down and started throwing timeouts.
I tried changing my script in endless ways but nothing worked and like I said, it's not just one method that throws it. Also I switched back to Python 2.7.8, but this didn't do the trick either. Basically everything that makes a request to an external source can throw an error.
Final note: My script is multi threaded. It downloads product feeds from different affiliates at the same time. It used to run 10 threads per affiliate without a problem. Now I tried lowering it to 3 per affiliate, but it still throws these errors. Setting it to 1 is no option because that will take ages. I don't think that's the problem anyway because it used to work fine.
What could be wrong?
Related
Recently, I have been very frequently encountering the error pymongo.errors.AutoReconnect. What is strange is that I never really encountered this error before and it just started happening for some reason.
Here is the full stack trace:
File "/home/ubuntu/.local/lib/python3.9/site-packages/pymongo/command_cursor.py", line 259, in next
doc = self._try_next(True)
File "/home/ubuntu/.local/lib/python3.9/site-packages/pymongo/command_cursor.py", line 270, in _try_next
self._refresh()
File "/home/ubuntu/.local/lib/python3.9/site-packages/pymongo/command_cursor.py", line 196, in _refresh
self.__send_message(
File "/home/ubuntu/.local/lib/python3.9/site-packages/pymongo/command_cursor.py", line 139, in __send_message
response = client._run_operation_with_response(
File "/home/ubuntu/.local/lib/python3.9/site-packages/pymongo/mongo_client.py", line 1342, in _run_operation_with_response
return self._retryable_read(
File "/home/ubuntu/.local/lib/python3.9/site-packages/pymongo/mongo_client.py", line 1464, in _retryable_read
return func(session, server, sock_info, slave_ok)
File "/home/ubuntu/.local/lib/python3.9/site-packages/pymongo/mongo_client.py", line 1334, in _cmd
return server.run_operation_with_response(
File "/home/ubuntu/.local/lib/python3.9/site-packages/pymongo/server.py", line 117, in run_operation_with_response
reply = sock_info.receive_message(request_id)
File "/home/ubuntu/.local/lib/python3.9/site-packages/pymongo/pool.py", line 646, in receive_message
self._raise_connection_failure(error)
File "/home/ubuntu/.local/lib/python3.9/site-packages/pymongo/pool.py", line 643, in receive_message
return receive_message(self.sock, request_id,
File "/home/ubuntu/.local/lib/python3.9/site-packages/pymongo/network.py", line 196, in receive_message
_receive_data_on_socket(sock, 16))
File "/home/ubuntu/.local/lib/python3.9/site-packages/pymongo/network.py", line 261, in _receive_data_on_socket
raise AutoReconnect("connection closed")
pymongo.errors.AutoReconnect: connection closed
This error is raised by various queries in various situations. I'm really not certain why this issue started when it wasn't a problem before.
I have tried correcting the issue at one place where the error was most common by utilizing an exponential backoff.
retry_count = 0
while retry_count < 10:
try:
collection.remove({"index" : {"$lt" : (theTime- datetime.timedelta(minutes=60)).timestamp()}})
break
except:
wait_t = 0.5 * pow(2, retry_count)
self.myLogger.error(f"An error occurred while trying to delete entries. Retry : {retry_count}, Wait Time : {wait_t}")
time.sleep( wait_t )
finally:
retry_count = retry_count + 1
So far, this appears to have worked. My question is as follows:
I have a lot of various MongoDB queries. It would be very tedious to track down each of these queries and wrap it into an exponential backoff block that I have above. Is there a way to apply this in all situations when MongoDB encounters this error, so that I don't have to manually add this code everywhere?
Adding this block of code to each MongoDB query would be excessively tedious.
I am using pandas.read_sql function with hive connection to extract a really large data. I have a script like this:
df = pd.read_sql(query_big, hive_connection)
df2 = pd.read_sql(query_simple, hive_connection)
The big query take a long time, and after it is executed, python returns the following error when trying to execute the second line:
raise NotSupportedError("Hive does not have transactions") # pragma: no cover
It seems there is something wrong with the connection.
Moreover, If I replace the second line with multirpocessing.Manager().Queue(), It returns the following error:
File "/usr/lib64/python3.6/multiprocessing/managers.py", line 662, in temp
token, exp = self._create(typeid, *args, **kwds)
File "/usr/lib64/python3.6/multiprocessing/managers.py", line 554, in _create
conn = self._Client(self._address, authkey=self._authkey)
File "/usr/lib64/python3.6/multiprocessing/connection.py", line 493, in Client
answer_challenge(c, authkey)
File "/usr/lib64/python3.6/multiprocessing/connection.py", line 732, in answer_challenge
message = connection.recv_bytes(256) # reject large message
File "/usr/lib64/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/usr/lib64/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/usr/lib64/python3.6/multiprocessing/connection.py", line 383, in _recv
raise EOFError
EOFError
It seems this kind of error are related to exit function being messed up, in the connection.py. Moreover, when I changed the query in the first command to extract smaller data that doesn't take too long, Everything works fine. So I assume it may be that because it takes too long to execute the first query, something is improperly terminated. which caused the two error, both of which are so different in nature but both are related to broken connection issues.
When creating a new chromedriver instance (in python): webdriver.Chrome("./venv/selenium/webdriver/chromedriver"), I get an error http.client.BadStatusLine: ''. I am not navigating to a site, or using a server, just creating a new chromedriver. I am in a VirtualEnv that has the most recent version of Selenium (3.0.1) and chromedriver (2.24.1). This was working fine a few days ago, and I didn't change any code. I'm not really sure where to begin solving the code. My first step was to run pip install --upgrade -r requirements.txt to make sure all packages were up to date. My only idea now is that selenium is no handling the default start page, with url as data;,, because there is no response. However, as that is the default behavior, I would be surprised if selenium could not handle its' own default behavior. Any help would be much appreciated!
When the code is run (via python from the bash terminal), a new chromedriver instance is successfully created, but the error http.client.BadStatusLine: '' gets thrown, and the python terminal loses the connection to the chromedriver.
Full code:
import pythonscripts
# Creates a new webdriver
driver = pythonscripts.md()
# Never gets here, attempts to use driver get NameError: name 'driver' is not defined
Pythonscripts md method:
def md():
return webdriver.Chrome("./venv/selenium/webdriver/chromedriver")
Full error output:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/brydenr/server_scripts/cad_tests/pythonscripts.py", line 65, in md
return webdriver.Chrome("./venv/selenium/webdriver/chromedriver")
File "/Users/brydenr/server_scripts/venv/lib/python3.4/site-packages/selenium/webdriver/chrome/webdriver.py", line 69, in __init__
desired_capabilities=desired_capabilities)
File "/Users/brydenr/server_scripts/venv/lib/python3.4/site-packages/selenium/webdriver/remote/webdriver.py", line 92, in __init__
self.start_session(desired_capabilities, browser_profile)
File "/Users/brydenr/server_scripts/venv/lib/python3.4/site-packages/selenium/webdriver/remote/webdriver.py", line 179, in start_session
response = self.execute(Command.NEW_SESSION, capabilities)
File "/Users/brydenr/server_scripts/venv/lib/python3.4/site-packages/selenium/webdriver/remote/webdriver.py", line 234, in execute
response = self.command_executor.execute(driver_command, params)
File "/Users/brydenr/server_scripts/venv/lib/python3.4/site-packages/selenium/webdriver/remote/remote_connection.py", line 407, in execute
return self._request(command_info[0], url, body=data)
File "/Users/brydenr/server_scripts/venv/lib/python3.4/site-packages/selenium/webdriver/remote/remote_connection.py", line 439, in _request
resp = self._conn.getresponse()
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/http/client.py", line 1171, in getresponse
response.begin()
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/http/client.py", line 351, in begin
version, status, reason = self._read_status()
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/http/client.py", line 321, in _read_status
raise BadStatusLine(line)
http.client.BadStatusLine: ''
Tried doing
try:
webdriver.Chrome("./venv/selenium/webdriver/chromedriver")
except Exception:
webdriver.Chrome("./venv/selenium/webdriver/chromedriver")
The result is two of the same traceback as before, and two chromedriver instances. It seems like this question points to an error in urllib, but it is for a slightly different situation.
This happened to me after I updated chrome to the latest version.
I just updated chromedriver to 2.25 and it works again.
I'm having trouble debugging my code because I cannot understand the socket error being raised.
Here is the traceback.
Traceback (most recent call last):
File "clickpression.py", line 517, in <module> presser.main()
File "clickpression.py", line 391, in main
File "clickpression.py", line 121, in clickpress self.refresh_proxies(country=country)
File "clickpression.py", line 458, in refresh_proxies self.proxies = self.get_proxies(country=country)
File "helpers.py", line 72, in wrapper return func(*args, **kwargs)
File "clickpression.py", line 264, in get_proxies self.settings.SUPER_PROXY).read().decode('utf-8')
File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 161, in urlopen return opener.open(url, data, timeout)
File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 463, in open response = self._open(req, data)
File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 481, in _open '_open', req)
File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 441, in _call_chain result = func(*args)
File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 1210, in http_open return self.do_open(http.client.HTTPConnection, req)
File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 1185, in do_open r = h.getresponse()
File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/http/client.py", line 1171, in getresponse response.begin()
File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/http/client.py", line 351, in begin version, status, reason = self._read_status()
File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/http/client.py", line 313, in _read_status line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/usr/local/Cellar/python3/3.4.3/Frameworks/Python.framework/Versions/3.4/lib/python3.4/socket.py", line 374, in readinto return self._sock.recv_into(b)
ConnectionResetError: [Errno 54] Connection reset by peer
According to the errno library Errno 54 is errno.EXFULL which in the python 3 documentation is explained as exchange full.
To my understanding the Connection reset by peer is Errno 104 i.e errno.ECONNRESET.
So what does errno.EXFULL mean? and why does socket raise the error with a connection reset by peer description instead of exchange full. And or how are the two errors errno.EXFULL and errno.ECONNRESET related?
PS: I read that the errno 54 might be related to http proxy (I'm using a proxy in my code). If so, how?
According to the errno library Errno 54 is errno.EXFULL
Did you determine that by examining errno.errorcode[54]? Anyway - this errno library might be at fault. You could verify the meaning of an error code on your system by looking into errno.h, e. g. with the help of gcc:
gcc -xc -imacros errno.h -Wp,-P -E <(echo ECONNRESET)
Also, the Python documentation says:
To translate a numeric error code to an error message, use
os.strerror().
It may well be that error number 54 is ECONNRESET on your system, and that os.strerror(54) will attest that.
Now that you have verified that os.strerror(54) returns 'Exchange full', I am puzzled why the error number 54 and the error string Connection reset by peer do not match. If that happens on a system with strace or something similar, I would further check which error is returned by the operating system through use of strace -e network on the affected process.
Regarding your question about EXFULL: Its meaning seems somewhat system dependent; e. g. on Linux, EXFULL is returned from only a handful places in the kernel, the only network-related place being in br_if.c concerning network bridges, when no available bridge port number is found (other places are in USB and SCSI drivers).
I tried to use python to crew coin market on OKEX.com using WebSocket,cause the url is an outer address,i used a vpn service provided by us,but it still can work. here is the code an traceback.
from ws4py.client.threadedclient import WebSocketClient
class DummyClient(WebSocketClient):
def opened(self):
# self.send("{'event': 'addChannel', 'channel': 'ok_sub_futureusd_btc_ticker_this_week'}") #发送请求数据格式
# self.send("www.baidu.com")
self.send("{'event':'addChannel','channel':'ok_sub_spot_bch_btc_ticker'}")
def closed(self, code, reason=None):
print("Closed down", code, reason)
#服务器返回消息
def received_message(self, m):
print("recv:", m)
if __name__ == '__main__':
try:
# 服务器连接地址wss://real.okex.com:10440/websocket/okexapi
# ws = DummyClient('wss://real.okcoin.cn:10440/websocket/okcoinapi', protocols=['chat'])
ws = DummyClient('wss://real.okex.com:10440/websocket/okexapi', protocols=['chat'])
ws.connect()
#ws.send("my test...")
ws.run_forever()
except KeyboardInterrupt:
ws.close()
You can try this code to your project:
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
if it not work,make sure the server open TLSv1 support.
I'm on version 1.9.9 of the SDK and I'm having issues with the devserver. I have a manually scaled module with 1 instance. I created a webapp2.RequestHandler for /_ah/start. In that handler I start a background thread. When I run my app in the devserver, the _ah/start handler returns a 200, but /_ah/background will randomly return 500 errors for a while. After sometime (usually a minute or two, but sometimes more), the 500 errors stop, but will randomly occur again every few hours. It also seems that everytime I open a new browser tab (Chrome), I get the same error. Anyone know what could be causing this?
Here is the RequestHandler for /_ah/start:
class StartupHandler(webapp2.RequestHandler):
def get(self):
runtime.set_shutdown_hook(shutdown_hook)
global foo
if foo is None:
foo = Foo()
background_thread.start_new_background_thread(do_foo, [])
self.response.http_status_message(200)
Here is the 500 error:
ERROR 2014-08-18 07:39:36,256 module.py:717] Request to '/_ah/background' failed
Traceback (most recent call last):
File "\appengine\tools\devappserver2\module.py", line 694, in _handle_request
environ, wrapped_start_response)
File "\appengine\tools\devappserver2\request_rewriter.py", line 311, in _rewriter_middleware
response_body = iter(application(environ, wrapped_start_response))
File "\appengine\tools\devappserver2\module.py", line 1672, in _handle_script_request
request_type)
File "\appengine\tools\devappserver2\module.py", line 1624, in _handle_instance_request
request_id, request_type)
File "\appengine\tools\devappserver2\instance.py", line 382, in handle
request_type))
File "\appengine\tools\devappserver2\http_proxy.py", line 190, in handle
response = connection.getresponse()
File "E:\Programing\Python27\lib\httplib.py", line 1030, in getresponse
response.begin()
File "E:\Programing\Python27\lib\httplib.py", line 407, in begin
version, status, reason = self._read_status()
File "E:\Programing\Python27\lib\httplib.py", line 365, in _read_status
line = self.fp.readline()
File "E:\Programing\Python27\lib\socket.py", line 430, in readline
data = recv(1)
error: [Errno 10054] An existing connection was forcibly closed by the remote host
INFO 2014-08-18 07:39:36,257 module.py:1890] Waiting for instances to restart
INFO 2014-08-18 07:39:36,262 module.py:642] lease: "GET /_ah/background HTTP/1.1" 500 -
Well this might be not the answer , but how long will it take to complete a specific task to assign to a backend? Seems like an issue with concurrency
Looks like the issue (as far as I can currently tell) is that I'm using PyCharm, which synchronizes the project's files when its window is entered or exited. This rewrites the project files even if there are no changes, which causes the devserver to restart all instances, leading to the 500 errors.
More info on PyCharm Synchronization
Link to issue at PyCharm