I am using pandas.read_sql function with hive connection to extract a really large data. I have a script like this:
df = pd.read_sql(query_big, hive_connection)
df2 = pd.read_sql(query_simple, hive_connection)
The big query take a long time, and after it is executed, python returns the following error when trying to execute the second line:
raise NotSupportedError("Hive does not have transactions") # pragma: no cover
It seems there is something wrong with the connection.
Moreover, If I replace the second line with multirpocessing.Manager().Queue(), It returns the following error:
File "/usr/lib64/python3.6/multiprocessing/managers.py", line 662, in temp
token, exp = self._create(typeid, *args, **kwds)
File "/usr/lib64/python3.6/multiprocessing/managers.py", line 554, in _create
conn = self._Client(self._address, authkey=self._authkey)
File "/usr/lib64/python3.6/multiprocessing/connection.py", line 493, in Client
answer_challenge(c, authkey)
File "/usr/lib64/python3.6/multiprocessing/connection.py", line 732, in answer_challenge
message = connection.recv_bytes(256) # reject large message
File "/usr/lib64/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/usr/lib64/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/usr/lib64/python3.6/multiprocessing/connection.py", line 383, in _recv
raise EOFError
EOFError
It seems this kind of error are related to exit function being messed up, in the connection.py. Moreover, when I changed the query in the first command to extract smaller data that doesn't take too long, Everything works fine. So I assume it may be that because it takes too long to execute the first query, something is improperly terminated. which caused the two error, both of which are so different in nature but both are related to broken connection issues.
Related
Inserting via debezium connector to mysql database brought up via docker container.
Trying to query and it is working fine until some number of hours. But, after that, same query is throwing below exception.
export JAVA_HOME=/tmp/tests/artifacts/java-17/jdk-17; export PATH=$PATH:/tmp/tests/artifacts/java-17/jdk-17/bin; docker exec -i mysql_be1e6a mysql --user=demo --password=demo -D demo -e "select count(k) from test_cdc_f0bf84 where uuid = 'd1e5cd6d-8f7a-457c-b2ea-880c2be52f69'"
2023-01-02 16:27:43,812:ERROR: failed to execute query MySQL rows count by uuid:
Traceback (most recent call last):
File "/home/ubuntu/workspace/stress_tests/run_test_with_universe/src/env/lib/python3.11/site-packages/paramiko/channel.py", line 699, in recv
out = self.in_buffer.read(nbytes, self.timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/workspace/stress_tests/run_test_with_universe/src/env/lib/python3.11/site-packages/paramiko/buffered_pipe.py", line 164, in read
raise PipeTimeout()
paramiko.buffered_pipe.PipeTimeout
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ubuntu/workspace/stress_tests/run_test_with_universe/src/suites/cdc/abstract.py", line 667, in try_query
res = query_function()
^^^^^^^^^^^^^^^^
File "/home/ubuntu/workspace/stress_tests/run_test_with_universe/src/suites/cdc/test_cdc.py", line 635, in <lambda>
query = lambda: self.mysql_query(
^^^^^^^^^^^^^^^^^
File "/home/ubuntu/workspace/stress_tests/run_test_with_universe/src/suites/cdc/abstract.py", line 544, in mysql_query
result = self.ssh.exec_on_host(host, [
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/workspace/stress_tests/run_test_with_universe/src/main/connection.py", line 335, in exec_on_host
return self._exec_on_host(host, commands, fetch, timeout=timeout, limit_output=limit_output)[host]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/workspace/stress_tests/run_test_with_universe/src/main/connection.py", line 321, in _exec_on_host
res = list(out)
^^^^^^^^^
File "/home/ubuntu/workspace/stress_tests/run_test_with_universe/src/env/lib/python3.11/site-packages/paramiko/file.py", line 125, in __next__
line = self.readline()
^^^^^^^^^^^^^^^
File "/home/ubuntu/workspace/stress_tests/run_test_with_universe/src/env/lib/python3.11/site-packages/paramiko/file.py", line 291, in readline
new_data = self._read(n)
^^^^^^^^^^^^^
File "/home/ubuntu/workspace/stress_tests/run_test_with_universe/src/env/lib/python3.11/site-packages/paramiko/channel.py", line 1361, in _read
return self.channel.recv(size)
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ubuntu/workspace/stress_tests/run_test_with_universe/src/env/lib/python3.11/site-packages/paramiko/channel.py", line 701, in recv
raise socket.timeout()
TimeoutError
After some time, logged manually to machine and tried to read, it still reads fine. Not sure, what does this issue mean.
As explained, tried querying from database via python. Expected it will return count of rows, which it was happening until certain time, but after that, it threw timeout error and socket error.
Trying to query and it is working fine until some number of hours. But, after that, same query is throwing below exception.
The default value for interactive_timeout and wait_timeout is 28880 seconds (8 hours). you can disable this behavior by setting this system variable to zero in your MySQL config.
source: Configuring session timeouts
Recently, I have been very frequently encountering the error pymongo.errors.AutoReconnect. What is strange is that I never really encountered this error before and it just started happening for some reason.
Here is the full stack trace:
File "/home/ubuntu/.local/lib/python3.9/site-packages/pymongo/command_cursor.py", line 259, in next
doc = self._try_next(True)
File "/home/ubuntu/.local/lib/python3.9/site-packages/pymongo/command_cursor.py", line 270, in _try_next
self._refresh()
File "/home/ubuntu/.local/lib/python3.9/site-packages/pymongo/command_cursor.py", line 196, in _refresh
self.__send_message(
File "/home/ubuntu/.local/lib/python3.9/site-packages/pymongo/command_cursor.py", line 139, in __send_message
response = client._run_operation_with_response(
File "/home/ubuntu/.local/lib/python3.9/site-packages/pymongo/mongo_client.py", line 1342, in _run_operation_with_response
return self._retryable_read(
File "/home/ubuntu/.local/lib/python3.9/site-packages/pymongo/mongo_client.py", line 1464, in _retryable_read
return func(session, server, sock_info, slave_ok)
File "/home/ubuntu/.local/lib/python3.9/site-packages/pymongo/mongo_client.py", line 1334, in _cmd
return server.run_operation_with_response(
File "/home/ubuntu/.local/lib/python3.9/site-packages/pymongo/server.py", line 117, in run_operation_with_response
reply = sock_info.receive_message(request_id)
File "/home/ubuntu/.local/lib/python3.9/site-packages/pymongo/pool.py", line 646, in receive_message
self._raise_connection_failure(error)
File "/home/ubuntu/.local/lib/python3.9/site-packages/pymongo/pool.py", line 643, in receive_message
return receive_message(self.sock, request_id,
File "/home/ubuntu/.local/lib/python3.9/site-packages/pymongo/network.py", line 196, in receive_message
_receive_data_on_socket(sock, 16))
File "/home/ubuntu/.local/lib/python3.9/site-packages/pymongo/network.py", line 261, in _receive_data_on_socket
raise AutoReconnect("connection closed")
pymongo.errors.AutoReconnect: connection closed
This error is raised by various queries in various situations. I'm really not certain why this issue started when it wasn't a problem before.
I have tried correcting the issue at one place where the error was most common by utilizing an exponential backoff.
retry_count = 0
while retry_count < 10:
try:
collection.remove({"index" : {"$lt" : (theTime- datetime.timedelta(minutes=60)).timestamp()}})
break
except:
wait_t = 0.5 * pow(2, retry_count)
self.myLogger.error(f"An error occurred while trying to delete entries. Retry : {retry_count}, Wait Time : {wait_t}")
time.sleep( wait_t )
finally:
retry_count = retry_count + 1
So far, this appears to have worked. My question is as follows:
I have a lot of various MongoDB queries. It would be very tedious to track down each of these queries and wrap it into an exponential backoff block that I have above. Is there a way to apply this in all situations when MongoDB encounters this error, so that I don't have to manually add this code everywhere?
Adding this block of code to each MongoDB query would be excessively tedious.
I was again working with my Pymongo file (really sorry I am asking soo many questions I am really new to pymongo.) and I get this error.. Could you explain what this means becuase I am really new...
ret = await coro(*args, **kwargs)
File "d:\python_projects\Jeli-Bot\cogs\warns.py", line 36, in warn
if collection.count_documents({"memberid":id}) == 0:
File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\pymongo\collection.py", line 1827, in count_documents
return self.__database.client._retryable_read(
File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\pymongo\mongo_client.py", line 1525, in _retryable_read
return func(session, server, sock_info, secondary_ok)
File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\pymongo\collection.py", line 1821, in _cmd
result = self._aggregate_one_result(
File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\pymongo\collection.py", line 1686, in _aggregate_one_result
result = self._command(
File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\pymongo\collection.py", line 238, in _command
return sock_info.command(
File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\pymongo\pool.py", line 726, in command
self._raise_connection_failure(error)
File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\pymongo\pool.py", line 710, in command
return command(self, dbname, spec, secondary_ok,
File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\pymongo\network.py", line 121, in command
request_id, msg, size, max_doc_size = message._op_msg(
File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\site-packages\pymongo\message.py", line 743, in _op_msg
return _op_msg_uncompressed(
bson.errors.InvalidDocument: cannot encode object: <property object at 0x000001AA06B13AE0>, of type: <class 'property'>
I would suggest to copy-paste the chunk of code you are running, also details on what you were trying to accomplish.
With the information provided I would think you are trying to pass an object to a function that shouldn't be receiving one.
When you try and run a pymongo operation, the data you pass to the operation must match defined python types that pymongo understands. The values it understands are represented in the table here: https://pymongo.readthedocs.io/en/stable/api/bson/index.html
Your issue is that you passing a type that isn't in the Python Type column of the above link. For example, maybe you're passing a set, a datetime.date or an arbitrary class object. If this is the case, you will get the InvalidDocument: cannot encode error you are seeing.
Double check the document structure you are passing into the pymongo command.
I have a Python script that downloads product feeds from multiple affiliates in different ways. This didn't give me any problems until last Wednesday, when it started throwing all kinds of timeout exceptions from different locations.
Examples: Here I connect with a FTP service:
ftp = FTP(host=self.host)
threw:
Exception in thread Thread-7:
Traceback (most recent call last):
File "C:\Python27\Lib\threading.py", line 810, in __bootstrap_inner
self.run()
File "C:\Python27\Lib\threading.py", line 763, in run
self.__target(*self.__args, **self.__kwargs)
File "C:\Users\Administrator\Documents\Crawler\src\Crawlers\LDLC.py", line 23, in main
ftp = FTP(host=self.host)
File "C:\Python27\Lib\ftplib.py", line 120, in __init__
self.connect(host)
File "C:\Python27\Lib\ftplib.py", line 138, in connect
self.welcome = self.getresp()
File "C:\Python27\Lib\ftplib.py", line 215, in getresp
resp = self.getmultiline()
File "C:\Python27\Lib\ftplib.py", line 201, in getmultiline
line = self.getline()
File "C:\Python27\Lib\ftplib.py", line 186, in getline
line = self.file.readline(self.maxline + 1)
File "C:\Python27\Lib\socket.py", line 476, in readline
data = self._sock.recv(self._rbufsize)
timeout: timed out
Or downloading an XML File :
xmlFile = urllib.URLopener()
xmlFile.retrieve(url, self.feedPath + affiliate + "/" + website + '.' + fileType)
xmlFile.close()
throws:
File "C:\Users\Administrator\Documents\Crawler\src\Crawlers\FeedCrawler.py", line 106, in save
xmlFile.retrieve(url, self.feedPath + affiliate + "/" + website + '.' + fileType)
File "C:\Python27\Lib\urllib.py", line 240, in retrieve
fp = self.open(url, data)
File "C:\Python27\Lib\urllib.py", line 208, in open
return getattr(self, name)(url)
File "C:\Python27\Lib\urllib.py", line 346, in open_http
errcode, errmsg, headers = h.getreply()
File "C:\Python27\Lib\httplib.py", line 1139, in getreply
response = self._conn.getresponse()
File "C:\Python27\Lib\httplib.py", line 1067, in getresponse
response.begin()
File "C:\Python27\Lib\httplib.py", line 409, in begin
version, status, reason = self._read_status()
File "C:\Python27\Lib\httplib.py", line 365, in _read_status
line = self.fp.readline(_MAXLINE + 1)
File "C:\Python27\Lib\socket.py", line 476, in readline
data = self._sock.recv(self._rbufsize)
IOError: [Errno socket error] timed out
These are just two examples but there are other methods, like authenticate or other API specific methods where my script throws these timeout errors. It never showed this behavior until Wednesday. Also, it starts throwing them at random times. Sometimes at the beginning of the crawl, sometimes later on. My script has this behavior on both my server and my local machine. I've been struggling with it for two days now but can't seem to figure it out.
This is what I know might have caused this:
On Wednesday one affiliate script broke down with the following error:
URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:581)>
I didn't change anything to my script but suddenly it stopped crawling that affiliate and threw that error all the time where I tried to authenticate. I looked it up and found that is was due to an OpenSSL error (where did that come from). I fixed it by adding the following before the authenticate method:
if hasattr(ssl, '_create_unverified_context'):
ssl._create_default_https_context = ssl._create_unverified_context
Little did I know, this was just the start of my problems... At that same time, I changed from Python 2.7.8 to Python 2.7.9. It seems that this is the moment that everything broke down and started throwing timeouts.
I tried changing my script in endless ways but nothing worked and like I said, it's not just one method that throws it. Also I switched back to Python 2.7.8, but this didn't do the trick either. Basically everything that makes a request to an external source can throw an error.
Final note: My script is multi threaded. It downloads product feeds from different affiliates at the same time. It used to run 10 threads per affiliate without a problem. Now I tried lowering it to 3 per affiliate, but it still throws these errors. Setting it to 1 is no option because that will take ages. I don't think that's the problem anyway because it used to work fine.
What could be wrong?
I am getting a lot of records from my MongoDB and along the way I get an error
File "C:\database\models\mongodb.py", line 228, in __iter__
for result in self.results:
File "C:\Python27\Lib\site-packages\pymongo\cursor.py", line 814, in next
if len(self.__data) or self._refresh():
File "C:\Python27\Lib\site-packages\pymongo\cursor.py", line 776, in _refresh
limit, self.__id))
File "C:\Python27\Lib\site-packages\pymongo\cursor.py", line 720, in __send_message
self.__uuid_subtype)
File "C:\Python27\Lib\site-packages\pymongo\helpers.py", line 99, in _unpack_response
cursor_id)
pymongo.errors.OperationFailure: cursor id '866472135294727793' not valid at server
Exception KeyError: KeyError(38556896,) in <module 'threading' from 'C:\Python27\lib\threading.pyc'> ignored
What does this mean and how do I fix it. I don't know if it matters but I did
use from gevent import monkey; monkey.patch_all() when I opened the connection
When the cursor has been open for a long time with no operations on it, it(the cursor) can timeout -> this leads to the error
you can set timeout=False in your find query to turn the timeout off
reference