Google Cloud Storage api performance regressions - python

I have a python 3.5 app that makes calls to Google Cloud Storage using the python sdk.
Every once in a while, for 10-30 minutes, all calls to the API fail with BrokenPipeError or ssl.SSLError errors. After some time, they just start working again, I have not noticed a pattern as to why.
Is this a known issue? Is it specific to the python sdk or is this a real performance regression on the side of google?
It should also be noted that these errors will emanate from the same code running on my local machine as well as from a GCE machine.
The trace for BrokenPipe:
Traceback (most recent call last):
File "oauth2client/util.py", line 140, in positional_wrapper
return wrapped(*args, **kwargs)
File "googleapiclient/http.py", line 722, in execute
body=self.body, headers=self.headers)
File "oauth2client/client.py", line 596, in new_request
redirections, connection_type)
File "httplib2/__init__.py", line 1314, in request
(response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
File "httplib2/__init__.py", line 1064, in _request
(response, content) = self._conn_request(conn, request_uri, method, body, headers)
File "httplib2/__init__.py", line 988, in _conn_request
conn.request(method, request_uri, body, headers)
File "/usr/lib/python3.5/http/client.py", line 1083, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python3.5/http/client.py", line 1128, in _send_request
self.endheaders(body)
File "/usr/lib/python3.5/http/client.py", line 1079, in endheaders
self._send_output(message_body)
File "/usr/lib/python3.5/http/client.py", line 911, in _send_output
self.send(msg)
File "/usr/lib/python3.5/http/client.py", line 885, in send
self.sock.sendall(data)
File "/usr/lib/python3.5/ssl.py", line 886, in sendall
v = self.send(data[count:])
File "/usr/lib/python3.5/ssl.py", line 856, in send
return self._sslobj.write(data)
File "/usr/lib/python3.5/ssl.py", line 581, in write
return self._sslobj.write(data)
BrokenPipeError[Errno 32] Broken pipe
The trace for ssl.SSLError:
File "oauth2client/util.py", line 140, in positional_wrapper
return wrapped(*args, **kwargs)
File "googleapiclient/http.py", line 722, in execute
body=self.body, headers=self.headers)
File "oauth2client/client.py", line 596, in new_request
redirections, connection_type)
File "httplib2/__init__.py", line 1314, in request
(response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
File "httplib2/__init__.py", line 1064, in _request
(response, content) = self._conn_request(conn, request_uri, method, body, headers)
File "httplib2/__init__.py", line 1017, in _conn_request
response = conn.getresponse()
File "/usr/lib/python3.5/http/client.py", line 1174, in getresponse
response.begin()
File "/usr/lib/python3.5/http/client.py", line 282, in begin
version, status, reason = self._read_status()
File "/usr/lib/python3.5/http/client.py", line 243, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/usr/lib/python3.5/socket.py", line 575, in readinto
return self._sock.recv_into(b)
File "/usr/lib/python3.5/ssl.py", line 924, in recv_into
return self.read(nbytes, buffer)
File "/usr/lib/python3.5/ssl.py", line 786, in read
return self._sslobj.read(len, buffer)
File "/usr/lib/python3.5/ssl.py", line 570, in read
v = self._sslobj.read(len, buffer)
ssl.SSLError[SSL: DECRYPTION_FAILED_OR_BAD_RECORD_MAC] decryption failed or bad record mac (_ssl.c:1974)

Definitely looks like an intermittent issue on Googles side.
The broken pipe issue relates to httpclient2 being unable to reconnect an existing connection to their API, this is the error that has the greatest impact on our services. We also on few occasions received "503 Backend Error".
Our "solution" was to basically allow the connections to close themselves by releasing the client once done and creating a new one for the next request.
Bare in mind though that our requests are very sparse, services using Cloud Storage as primary storage probably wants to keep the connections open for as long as possible.

Related

error uploading file to google drive with python

I wrote a code to upload (create and update) a file to google drive,
in Windows 10 with python 3.9 it work, but in windows 2008 server with python 3.8 it give me an error.
just to remember 3.8 is the max version that supports windows 2008
if I try to list from gdrive it work, the problem is just to create or update the file.
just to remember 3.8 is the last python version that supports windows 2008.
I suspect its related with windows 2008 and ssl maybe!?!?
the error is this:
C:\backupmgr>python drive.py
Traceback (most recent call last):
File "drive.py", line 112, in <module>
envia_zip('sexta.7z')
File "drive.py", line 104, in envia_zip
file = service.files().create(body=file_metadata, media_body=media).execute(
)
File "C:\Users\Administrador\AppData\Local\Programs\Python\Python38\lib\site-p
ackages\googleapiclient\_helpers.py", line 130, in positional_wrapper
return wrapped(*args, **kwargs)
File "C:\Users\Administrador\AppData\Local\Programs\Python\Python38\lib\site-p
ackages\googleapiclient\http.py", line 923, in execute
resp, content = _retry_request(
File "C:\Users\Administrador\AppData\Local\Programs\Python\Python38\lib\site-p
ackages\googleapiclient\http.py", line 222, in _retry_request
raise exception
File "C:\Users\Administrador\AppData\Local\Programs\Python\Python38\lib\site-p
ackages\googleapiclient\http.py", line 191, in _retry_request
resp, content = http.request(uri, method, *args, **kwargs)
File "C:\Users\Administrador\AppData\Local\Programs\Python\Python38\lib\site-p
ackages\google_auth_httplib2.py", line 218, in request
response, content = self.http.request(
File "C:\Users\Administrador\AppData\Local\Programs\Python\Python38\lib\site-p
ackages\httplib2\__init__.py", line 1720, in request
(response, content) = self._request(
File "C:\Users\Administrador\AppData\Local\Programs\Python\Python38\lib\site-p
ackages\httplib2\__init__.py", line 1440, in _request
(response, content) = self._conn_request(conn, request_uri, method, body, he
aders)
File "C:\Users\Administrador\AppData\Local\Programs\Python\Python38\lib\site-p
ackages\httplib2\__init__.py", line 1363, in _conn_request
conn.request(method, request_uri, body, headers)
File "C:\Users\Administrador\AppData\Local\Programs\Python\Python38\lib\http\c
lient.py", line 1252, in request
self._send_request(method, url, body, headers, encode_chunked)
File "C:\Users\Administrador\AppData\Local\Programs\Python\Python38\lib\http\c
lient.py", line 1298, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "C:\Users\Administrador\AppData\Local\Programs\Python\Python38\lib\http\c
lient.py", line 1247, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "C:\Users\Administrador\AppData\Local\Programs\Python\Python38\lib\http\c
lient.py", line 1046, in _send_output
self.send(chunk)
File "C:\Users\Administrador\AppData\Local\Programs\Python\Python38\lib\http\c
lient.py", line 968, in send
self.sock.sendall(data)
File "C:\Users\Administrador\AppData\Local\Programs\Python\Python38\lib\ssl.py
", line 1204, in sendall
v = self.send(byte_view[count:])
File "C:\Users\Administrador\AppData\Local\Programs\Python\Python38\lib\ssl.py
", line 1173, in send
return self._sslobj.write(data)
socket.timeout: The write operation timed out
Well it works now, as #DaImTo poited to the issue #632 in the google api github, it is not a problem with the api. The problem is that the socket core module has low default timeout. The pc with windows server 2008 that I was using was very slow and was hiting this default timeout, so I just had rise the default timeout by inserting the code in the beginin of the script:
import socket
socket.setdefaulttimeout(600)

Apache Beam, Dataflow: retrying _gcs_file_copy because we caught exception: socket.timeout

I'm trying to run a Dataflow job that used to rune fine but now I get a timeout exception.
The command that I run:
python -m dataflow --input_topic projects/xxx/subscriptions/yyy --events_table_name events --postgres_user postgres --postgres_password xxx --postgres_db test-db --postgres_host xx.xx.xx.xx --region us-central1 --temp_location gs://temp_test_356/tmp/ --project xxx --runner DataflowRunner --setup_file ./setup.py
The exception:
WARNING:root:Make sure that locally built Python SDK docker image has Python 3.8 interpreter.
WARNING:apache_beam.utils.retry:Retry with exponential backoff: waiting for 4.58774193007135 seconds before retrying _gcs_file_copy because we caught exception: socket.timeout: The write operation timed out
Traceback for above exception (most recent call last):
File "/home/schnitzel/Visual Code Projects/python_scripts/QuestRewardConsumer/env/lib/python3.8/site-packages/apache_beam/utils/retry.py", line 253, in wrapper
return fun(*args, **kwargs)
File "/home/schnitzel/Visual Code Projects/python_scripts/QuestRewardConsumer/env/lib/python3.8/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 565, in _gcs_file_copy
self.stage_file(to_folder, to_name, f, total_size=total_size)
File "/home/schnitzel/Visual Code Projects/python_scripts/QuestRewardConsumer/env/lib/python3.8/site-packages/apache_beam/runners/dataflow/internal/apiclient.py", line 641, in stage_file
response = self._storage_client.objects.Insert(request, upload=upload)
File "/home/schnitzel/Visual Code Projects/python_scripts/QuestRewardConsumer/env/lib/python3.8/site-packages/apache_beam/io/gcp/internal/clients/storage/storage_v1_client.py", line 1152, in Insert
return self._RunMethod(
File "/home/schnitzel/Visual Code Projects/python_scripts/QuestRewardConsumer/env/lib/python3.8/site-packages/apitools/base/py/base_api.py", line 728, in _RunMethod
http_response = http_wrapper.MakeRequest(
File "/home/schnitzel/Visual Code Projects/python_scripts/QuestRewardConsumer/env/lib/python3.8/site-packages/apitools/base/py/http_wrapper.py", line 348, in MakeRequest
return _MakeRequestNoRetry(
File "/home/schnitzel/Visual Code Projects/python_scripts/QuestRewardConsumer/env/lib/python3.8/site-packages/apitools/base/py/http_wrapper.py", line 397, in _MakeRequestNoRetry
info, content = http.request(
File "/home/schnitzel/Visual Code Projects/python_scripts/QuestRewardConsumer/env/lib/python3.8/site-packages/oauth2client/transport.py", line 173, in new_request
resp, content = request(orig_request_method, uri, method, body,
File "/home/schnitzel/Visual Code Projects/python_scripts/QuestRewardConsumer/env/lib/python3.8/site-packages/oauth2client/transport.py", line 280, in request
return http_callable(uri, method=method, body=body, headers=headers,
File "/home/schnitzel/Visual Code Projects/python_scripts/QuestRewardConsumer/env/lib/python3.8/site-packages/oauth2client/transport.py", line 173, in new_request
resp, content = request(orig_request_method, uri, method, body,
File "/home/schnitzel/Visual Code Projects/python_scripts/QuestRewardConsumer/env/lib/python3.8/site-packages/oauth2client/transport.py", line 280, in request
return http_callable(uri, method=method, body=body, headers=headers,
File "/home/schnitzel/Visual Code Projects/python_scripts/QuestRewardConsumer/env/lib/python3.8/site-packages/httplib2/__init__.py", line 1708, in request
(response, content) = self._request(
File "/home/schnitzel/Visual Code Projects/python_scripts/QuestRewardConsumer/env/lib/python3.8/site-packages/httplib2/__init__.py", line 1424, in _request
(response, content) = self._conn_request(conn, request_uri, method, body, headers)
File "/home/schnitzel/Visual Code Projects/python_scripts/QuestRewardConsumer/env/lib/python3.8/site-packages/httplib2/__init__.py", line 1347, in _conn_request
conn.request(method, request_uri, body, headers)
File "/usr/lib/python3.8/http/client.py", line 1256, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/lib/python3.8/http/client.py", line 1302, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/lib/python3.8/http/client.py", line 1251, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/lib/python3.8/http/client.py", line 1050, in _send_output
self.send(chunk)
File "/usr/lib/python3.8/http/client.py", line 972, in send
self.sock.sendall(data)
File "/usr/lib/python3.8/ssl.py", line 1204, in sendall
v = self.send(byte_view[count:])
File "/usr/lib/python3.8/ssl.py", line 1173, in send
return self._sslobj.write(data)
Like I said, this job used to run (it still runs fine local). I drained the previous job and then wanted to start it again but without success. I've spent countless hours on this issue. The last time that I had this problem I had to create a new virtual machine just to get my jobs running but now I want to get to the bottom of this. Things that I've tried:
Set GOOGLE_APPLICATION_CREDENTIALS with service account that is Owner.
Configure gcloud. Remove gcloud.
Use neither of above two and use the automatic login mechanism when you run the script.
Change machine type.
Change job name.
Change region.
Use python, use python3.

BrokenPipeError when using Gmail API

I'm using Gmail API to send an email with attachments in Python 3.
I'm trying the same code as google developers as shown below:
https://developers.google.com/gmail/api/guides/sending
The problem is that when attachments are 4.2KB or 2.6MB, the code works Well; but when attachments are 3.0MB or 9.6MB or bigger, an Error occurs:
Traceback (most recent call last):
File "quickstart2.py", line 184, in <module>
main()
File "quickstart2.py", line 170, in main
send_message(service, "me", message)
File "quickstart2.py", line 147, in send_message
message = (service.users().messages().send(userId=user_id, body=message).execute())
File "/home/yizhu/anaconda3/lib/python3.6/site-packages/oauth2client/_helpers.py", line 133, in positional_wrapper
return wrapped(*args, **kwargs)
File "/home/yizhu/anaconda3/lib/python3.6/site-packages/googleapiclient/http.py", line 837, in execute
method=str(self.method), body=self.body, headers=self.headers)
File "/home/yizhu/anaconda3/lib/python3.6/site-packages/googleapiclient/http.py", line 176, in _retry_request
raise exception
File "/home/yizhu/anaconda3/lib/python3.6/site-packages/googleapiclient/http.py", line 163, in _retry_request
resp, content = http.request(uri, method, *args, **kwargs)
File "/home/yizhu/anaconda3/lib/python3.6/site-packages/oauth2client/transport.py", line 175, in new_request
redirections, connection_type)
File "/home/yizhu/anaconda3/lib/python3.6/site-packages/oauth2client/transport.py", line 282, in request
connection_type=connection_type)
File "/home/yizhu/anaconda3/lib/python3.6/site-packages/httplib2/__init__.py", line 1322, in request
(response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
File "/home/yizhu/anaconda3/lib/python3.6/site-packages/httplib2/__init__.py", line 1072, in _request
(response, content) = self._conn_request(conn, request_uri, method, body, headers)
File "/home/yizhu/anaconda3/lib/python3.6/site-packages/httplib2/__init__.py", line 996, in _conn_request
conn.request(method, request_uri, body, headers)
File "/home/yizhu/anaconda3/lib/python3.6/http/client.py", line 1239, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/home/yizhu/anaconda3/lib/python3.6/http/client.py", line 1285, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/home/yizhu/anaconda3/lib/python3.6/http/client.py", line 1234, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/home/yizhu/anaconda3/lib/python3.6/http/client.py", line 1065, in _send_output
self.send(chunk)
File "/home/yizhu/anaconda3/lib/python3.6/http/client.py", line 986, in send
self.sock.sendall(data)
File "/home/yizhu/anaconda3/lib/python3.6/ssl.py", line 972, in sendall
v = self.send(byte_view[count:])
File "/home/yizhu/anaconda3/lib/python3.6/ssl.py", line 941, in send
return self._sslobj.write(data)
File "/home/yizhu/anaconda3/lib/python3.6/ssl.py", line 642, in write
return self._sslobj.write(data)
BrokenPipeError: [Errno 32] Broken pipe
What's the problem here?
Thanks
It seems an exception is raised at _retry_request.
I haven't encountered this error myself, but there's a thread on github discussing about the same error.
https://github.com/google/google-api-python-client/issues/218
Try using httplib2shim, it seems oauth2client is still not replaced by google-auth.
Another suggestion I found was to use the MEDIA /upload option for files over 10 MB. Docs for how to use /upload: https://developers.google.com/gmail/api/v1/reference/users/messages/send

Pull timeout with google-api-python-client

I am trying to set a user defined timeout on message pull with 'returnImmediately' = False :
PUBSUB_SCOPES = ['https://www.googleapis.com/auth/pubsub']
credentials = oauth2client.GoogleCredentials.get_application_default()
if credentials.create_scoped_required():
credentials = credentials.create_scoped(PUBSUB_SCOPES)
http = httplib2.Http(timeout=timeout)
credentials.authorize(http)
return discovery.build('pubsub', 'v1', http=http)
When the timeout is < 90 seconds I get the following errors:
resp = client.projects().subscriptions().pull(subscription=subscription, body=body).execute()
File "venv\lib\site-packages\oauth2client\util.py", line 137, in positional_wrapper
return wrapped(*args, **kwargs)
File "venv\lib\site-packages\googleapiclient\http.py", line 755, in execute
method=str(self.method), body=self.body, headers=self.headers)
File "venv\lib\site-packages\googleapiclient\http.py", line 93, in _retry_request
resp, content = http.request(uri, method, *args, **kwargs)
File "venv\lib\site-packages\oauth2client\client.py", line 622, in new_request
redirections, connection_type)
File "venv\lib\site-packages\httplib2\__init__.py", line 1609, in request
(response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
File "venv\lib\site-packages\httplib2\__init__.py", line 1351, in _request
(response, content) = self._conn_request(conn, request_uri, method, body, headers)
File "venv\lib\site-packages\httplib2\__init__.py", line 1307, in _conn_request
response = conn.getresponse()
File "C:\python27\Lib\httplib.py", line 1074, in getresponse
response.begin()
File "C:\python27\Lib\httplib.py", line 415, in begin
version, status, reason = self._read_status()
File "C:\python27\Lib\httplib.py", line 371, in _read_status
line = self.fp.readline(_MAXLINE + 1)
File "C:\python27\Lib\socket.py", line 476, in readline
data = self._sock.recv(self._rbufsize)
File "C:\python27\Lib\ssl.py", line 714, in recv
return self.read(buflen)
File "C:\python27\Lib\ssl.py", line 608, in read
v = self._sslobj.read(len or 1024)
SSLError: ('The read operation timed out',)
Thanks.
Unfortunately these client libraries do not support forwarding the timeout values to the server; however, we have just announced the gRPC client libraries, which correctly pass the deadline to the server.
As a workaround for the current libraries, either use returnImmediately=true, or set a deadline higher than 90 seconds, as your question implies.

Python BigQuery really strange timeout

I am building a service to stream data into bigquery. The following code works flawlessly if i remove the part that takes 4-5 minutes to load (i am precaching some mappings)
from googleapiclient import discovery
from oauth2client import file
from oauth2client import client
from oauth2client import tools
from oauth2client.client import SignedJwtAssertionCredentials
## load email and key
credentials = SignedJwtAssertionCredentials(email, key, scope='https://www.googleapis.com/auth/bigquery')
if credentials is None or credentials.invalid:
raw_input('invalid key')
exit(0)
http = httplib2.Http()
http = credentials.authorize(http)
service = discovery.build('bigquery', 'v2', http=http)
## this does not hang, because it is before the long operation
service.tabledata().insertAll(...)
## some code that takes 5 minutes to execute
r = load_mappings()
## aka long operation
## this hangs
service.tabledata().insertAll(...)
If i leave the part that takes 5 minutes to execute, the Google API stops responding to the requests i do afterwards. It simply hangs in there and doesn't even return an error. I left it even 10-20 minutes to see what happens and it just sits there. If i hit ctrl+c, i get this:
^CTraceback (most recent call last):
File "./to_bigquery.py", line 116, in <module>
main(sys.argv)
File "./to_bigquery.py", line 101, in main
print service.tabledata().insertAll(projectId=p_n, datasetId="XXX", tableId="%s_XXXX" % str(shop), body=_mybody).execute()
File "/usr/local/lib/python2.7/dist-packages/oauth2client/util.py", line 132, in positional_wrapper
return wrapped(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/googleapiclient/http.py", line 716, in execute
body=self.body, headers=self.headers)
File "/usr/local/lib/python2.7/dist-packages/oauth2client/util.py", line 132, in positional_wrapper
return wrapped(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/oauth2client/client.py", line 490, in new_request
redirections, connection_type)
File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1593, in request
(response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1335, in _request
(response, content) = self._conn_request(conn, request_uri, method, body, headers)
File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1291, in _conn_request
response = conn.getresponse()
File "/usr/lib/python2.7/httplib.py", line 1030, in getresponse
response.begin()
File "/usr/lib/python2.7/httplib.py", line 407, in begin
version, status, reason = self._read_status()
File "/usr/lib/python2.7/httplib.py", line 365, in _read_status
line = self.fp.readline()
File "/usr/lib/python2.7/socket.py", line 430, in readline
data = recv(1)
File "/usr/lib/python2.7/ssl.py", line 241, in recv
return self.read(buflen)
File "/usr/lib/python2.7/ssl.py", line 160, in read
return self._sslobj.read(len)
I have managed to temporarily fix it by placing the big loading operation BEFORE the credentials authorization, but it seems like a bug to me. What am i missing?
EDIT: I have managed to get an error, while waiting:
File "/usr/local/lib/python2.7/dist-packages/oauth2client/util.py", line 132, in positional_wrapper
return wrapped(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/googleapiclient/http.py", line 716, in execute
body=self.body, headers=self.headers)
File "/usr/local/lib/python2.7/dist-packages/oauth2client/util.py", line 132, in positional_wrapper
return wrapped(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/oauth2client/client.py", line 490, in new_request
redirections, connection_type)
File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1593, in request
(response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1335, in _request
(response, content) = self._conn_request(conn, request_uri, method, body, headers)
File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1291, in _conn_request
response = conn.getresponse()
File "/usr/lib/python2.7/httplib.py", line 1030, in getresponse
response.begin()
File "/usr/lib/python2.7/httplib.py", line 407, in begin
version, status, reason = self._read_status()
File "/usr/lib/python2.7/httplib.py", line 365, in _read_status
line = self.fp.readline()
File "/usr/lib/python2.7/socket.py", line 430, in readline
data = recv(1)
File "/usr/lib/python2.7/ssl.py", line 241, in recv
return self.read(buflen)
File "/usr/lib/python2.7/ssl.py", line 160, in read
return self._sslobj.read(len)
socket.error: [Errno 110] Connection timed out
It said timeout. This seems to happen with cold tables..
def refresh_bq(self):
credentials = SignedJwtAssertionCredentials(email, key, scope='https://www.googleapis.com/auth/bigquery')
if credentials is None or credentials.invalid:
raw_input('invalid key')
exit(0)
http = httplib2.Http()
http = credentials.authorize(http)
service = discovery.build('bigquery', 'v2', http=http)
self.service = service
i am running self.refresh_bq() everytime i do some inserts that do not require preprocessing, and it works flawlessly. messy hack, but i needed to make it work ASAP. There is def. a bug somewhere.

Categories

Resources