I wrote a python script to automatically download a file from a DB and upload it to an S3 account we own. The script works from the PC and I'm successfully pinging Amazon S3 from within the Kubernetees we are working on, but I'm getting 503 when the script tries to upload/download a file from the S3. I'm using the following installation: 'python3.6 -m pip install boto3'
and getting the following error: "otocore.exceptions.ClientError: An error occurred (503) when calling the GetObject operation (reached max retries: 15): Service Unavailable"
I tried adding/removing SSL, changing timeout and max retries, and nothing seems to help. Also tried different boto3 objects (client, session etc.)
The code that crashes is the following: (the line that crashes is the one marked with **)
def write_to_s3():
s3 = get_s3()
object1 = s3.Object(BUCKET_NAME, FILENAME)
print(object1)
**test = object1.get()**
latest_num = int(str(object1.get()['Body'].read())[2:-1])
print(str(latest_num))
...
def get_s3():
my_config = Config(
region_name=REGION,
connect_timeout=25,
retries={
'max_attempts': 15,
'mode': 'standard'
}
)
return boto3.resource('s3', use_ssl=False, config=my_config, aws_access_key_id=os.environ.get("ACCESS_KEY_ID"),
aws_secret_access_key=os.environ.get("SECRET_ACCESS_KEY"))
I really do not understand why this happens and found no answers or similar errors on the web. Please help!
Related
I'm just starting out with boto3 and lambda and was trying to run the below function via Pycharm.
import boto3
client = boto3.client('rds')
response = client.stop_db_instance(
DBInstanceIdentifier='dummy-mysql-rds'
)
But i receive the below error:
botocore.errorfactory.DBInstanceNotFoundFault: An error occurred (DBInstanceNotFound) when calling the StopDBInstance operation: DBInstance dummy-mysql-rds not found.
Do you know what may be causing this?
For the record, I have the AWS toolkit installed for Pycharm and can run simple functions to list and describe ec2 instances and my AWS profile has admin access.
By explicitly defining the profile name the below function now works via Pycharm. Thank you #OleksiiDonoha for your help in getting this resolved.
import boto3
rds = boto3.setup_default_session(profile_name='dev')
client = boto3.client('rds')
response = client.stop_db_instance(
DBInstanceIdentifier='dev-mysql-rds'
)
I am trying to read a json file stored in S3 bucket from spark in local mode via pycharm. But I'm getting the below error message:
"py4j.protocol.Py4JJavaError: An error occurred while calling o37.json.
: com.amazonaws.AmazonClientException: Unable to execute HTTP request: No such host is known (spark-tunes.s3a.ap-south-1.amazonaws.com)"
(spark-tunes is my S3 bucket name).
Below is the code I executed. Please help me to know if I'm missing something.
spark = SparkSession.builder.appName('DF Read').config('spark.master', 'local').getOrCreate()
spark._jsc.hadoopConfiguration().set("fs.s3a.access.key", "access_key")
spark._jsc.hadoopConfiguration().set("fs.s3a.secret.key", "secret_key")
spark._jsc.hadoopConfiguration().set("fs.s3a.endpoint", "s3a.ap-south-1.amazonaws.com")
spark._jsc.hadoopConfiguration().set("com.amazonaws.services.s3a.enableV4", "true")
spark._jsc.hadoopConfiguration().set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
df = spark.read.json("s3a://bucket-name/folder_name/*.json")
df.show(5)
try setting fs.s3a.path.style.access to false and instead of prefixing the bucket name to the host, the aws s3 client will use paths under the endpoint
also: drop the fs.s3a.impl line. That is superstition passed down across stack overflow examples. It's not needed. really.
We get this error when uploading a large file (more than 10Mb but less than 100Mb):
403 POST https://www.googleapis.com/upload/storage/v1/b/dm-scrapes/o?uploadType=resumable: ('Response headers must contain header', 'location')
Or this error when the file is more than 5Mb
403 POST https://www.googleapis.com/upload/storage/v1/b/dm-scrapes/o?uploadType=multipart: ('Request failed with status code', 403, 'Expected one of', <HTTPStatus.OK: 200>)
It seems that this API is looking at the file size and trying to upload it via multi part or resumable method. I can't imagine that is something that as a caller of this API I should be concerned with. Is the problem somehow related to permissions? Does the bucket need special permission do it can accept multipart or resumable upload.
from google.cloud import storage
try:
client = storage.Client()
bucket = client.get_bucket('my-bucket')
blob = bucket.blob('blob-name')
blob.upload_from_filename(zip_path, content_type='application/gzip')
except Exception as e:
print(f'Error in uploading {zip_path}')
print(e)
We run this inside a Kubernetes pod so the permissions get picked up by storage.Client() call automatically.
We already tried these:
Can't upload with gsutil because the container is Python 3 and gsutil does not run in python 3.
Tried this example: but runs into the same error: ('Response headers must contain header', 'location')
There is also this library. But it is basically alpha quality with little activity and no commits for a year.
Upgraded to google-cloud-storage==1.13.0
Thanks in advance
The problem was indeed the credentials. Somehow the error message was very miss-leading. When we loaded the credentials explicitly the problem went away.
# Explicitly use service account credentials by specifying the private key file.
storage_client = storage.Client.from_service_account_json(
'service_account.json')
I found my node pools had been spec'd with
oauthScopes:
- https://www.googleapis.com/auth/devstorage.read_only
and changing it to
oauthScopes:
- https://www.googleapis.com/auth/devstorage.full_control
fixed the error. As described in this issue the problem is an uninformative error message.
I have a couple of (python) scripts running every 15 minutes, about 99% of the time they run without issues but in that 1% of cases it has the following issue:
botocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL: "https://s3-eu-west-1.amazonaws.com/bucketname/path/file.txt"
or
botocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL: "https://sts.amazonaws.com/"
Multiple checks run at the same time, lets call them
check1.1
check1.2
check1.3
check1.4
In the past 30 days, 1.1 has failed 1 time, 1.2 never, 1.3 has failed with the sts error twice, 1.4 has failed twice.
The code is in all instances the same, the only difference is that they try to assume different roles.
A couple of threads I've read pointed to the config not being correct, but if that was the case why does it only fail this 1%? Just in case this is the file from which it pulls it's profile settings:
{
"key_id": "MYSECRETKEYID",
"key_secret": "MYv3ryS3cur3K3y",
"region": "eu-west-1"
}
These get used in the following code:
# open the credential file
credfile = open("myfile.json", "r").read()
json_obj_cred = json.loads(credfile)
awsaccesskeyid = json_obj_cred['key_id']
awssecretaccesskey = json_obj_cred['key_secret']
awsdefaultregion = json_obj_cred['region']
bucket = boto3.resource(
's3',
aws_access_key_id=awsaccesskeyid,
aws_secret_access_key=awssecretaccesskey,
region_name=awsdefaultregion)
Do you guys have any idea what this could be or where i should start looking?
I believe if multiple scripts are reading the same s3 file this maybe the result
I have created a small app that should download file from a AWS S3.
I can download the data correctly in this way:
s3_client = boto3.resource('s3')
req = s3_client.meta.client.download_file(bucket, ob_key, dest)
but if I add this function in a flask route it does not work anymore. I obtain this error:
ClientError: An error occurred (400) when calling the HeadObject operation: Bad Request
I'm not able to figure out why it does not work inside the route. Any idea?
That is related to your AWS region. Mention the region name as an added parameter.
Try it on your local machine, using
aws s3 cp s3://bucket-name/file.png file.png --region us-east-1
If you are able to download the file using this command, then it should work fine from your API also.
The problem was that with flask I needed to declare s3_client as global variable instead of just inside the function.
Now it works perfectly!