Getting files from AWS codecommit - python

I have a file on codecommit with the uri:
codecommit://FruitLoops/apples/granny_smith.json
And when I tried:
import boto3
client = boto3.session(some_key, some_secret, key)
repo = "FruitLoops"
client.get_file(repositoryName="FruitLoops", filePath="apples/granny_smith.json"
It's throwing an error:L
RepositoryDoesNotExistException: An error ocurred (RepositoryDoesNotExistException) when calling GetFile operation on: FruitLoops does not exist
I've tried searching around on Google and found the example https://github.com/boto/boto3/issues/2329 and https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/codecommit.html
But when I used the CLI, the repository is there
aws codecommit get-file --repository-name FruitLoops --file-path "apples/granny_smith.json" --query fileContent --output text
What is the right syntax in boto3 to access the file through codecommit?

You are not passing the region name. Here is the documentation which explains how you can do that.

Related

com.amazonaws.AmazonClientException: Unable to execute HTTP request: No such host is known (spark-tunes.s3a.ap-south-1.amazonaws.com)

I am trying to read a json file stored in S3 bucket from spark in local mode via pycharm. But I'm getting the below error message:
"py4j.protocol.Py4JJavaError: An error occurred while calling o37.json.
: com.amazonaws.AmazonClientException: Unable to execute HTTP request: No such host is known (spark-tunes.s3a.ap-south-1.amazonaws.com)"
(spark-tunes is my S3 bucket name).
Below is the code I executed. Please help me to know if I'm missing something.
spark = SparkSession.builder.appName('DF Read').config('spark.master', 'local').getOrCreate()
spark._jsc.hadoopConfiguration().set("fs.s3a.access.key", "access_key")
spark._jsc.hadoopConfiguration().set("fs.s3a.secret.key", "secret_key")
spark._jsc.hadoopConfiguration().set("fs.s3a.endpoint", "s3a.ap-south-1.amazonaws.com")
spark._jsc.hadoopConfiguration().set("com.amazonaws.services.s3a.enableV4", "true")
spark._jsc.hadoopConfiguration().set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem")
df = spark.read.json("s3a://bucket-name/folder_name/*.json")
df.show(5)
try setting fs.s3a.path.style.access to false and instead of prefixing the bucket name to the host, the aws s3 client will use paths under the endpoint
also: drop the fs.s3a.impl line. That is superstition passed down across stack overflow examples. It's not needed. really.

botocore.exceptions.InvalidConfigError: The source profile "default" must have credentials

The code below fails in row s3 = boto3.client('s3') returning error botocore.exceptions.InvalidConfigError: The source profile "default" must have credentials.
def connect_s3_boto3():
try:
os.environ["AWS_PROFILE"] = "a"
s3 = boto3.client('s3')
return s3
except:
raise
I have set up the key and secret using aws configure
My file vim ~/.aws/credentials looks like:
[default]
aws_access_key_id = XXXXXXXXXXXXXXXXX
aws_secret_access_key = YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY
My file vim ~/.aws/config looks like:
[default]
region = eu-west-1
output = json
[profile b]
region=eu-west-1
role_arn=arn:aws:iam::XX
source_profile=default
[profile a]
region=eu-west-1
role_arn=arn:aws:iam::YY
source_profile=default
[profile d]
region=eu-west-1
role_arn=arn:aws:iam::EE
source_profile=default
If I run aws-vault exec --no-session --debug a
it returns:
aws-vault: error: exec: Failed to get credentials for a9e: InvalidClientTokenId: The security token included in the request is invalid.
status code: 403, request id: 7087ea72-32c5-4b0a-a20e-fd2da9c3c747
I noticed you tagged this question with "docker". Is it possible that you're running your code from a Docker container that does not have your AWS credentials in it?
Use a docker volume to pass your credential files into the container:
https://docs.docker.com/storage/volumes/
It is not a good idea to add credentials into a container image because anybody who uses this image will have and use your credentials.
This is considered a bad practice.
For more information how to properly deal with secrets see https://docs.docker.com/engine/swarm/secrets/
I ran into this problem while trying to assume a role on an ECS container. It turned out that in such cases, instead of source_profile, credential_source should be used. It takes the value of EcsContainer for the container, Ec2InstanceMetadata for the EC2 machine or Environment for other cases.
Since the solution is not very intuitive, I thought it might save someone the trouble despite the age of this question.
Finally the issue is that Docker didn't had the credentials. And despite connect through bash and add them, it didn't work.
So, in the dockerfile I added:
ADD myfolder/aws/credentials /root/.aws/credentials
To move my locahost credentials files added through aws cli using aws configure to the docker. Then, I build the docker again and it works.

Error while Downloading file to my local device from S3

I am trying to download a file from Amazon S3 bucket to my local device using the below code but I got an error saying "Unable to locate credentials"
Given below is the code I have written:
import boto3
import botocore
BUCKET_NAME = 'my-bucket'
KEY = 'my_image_in_s3.jpg'
s3 = boto3.resource('s3')
try:
s3.Bucket(BUCKET_NAME).download_file(KEY, 'my_local_image.jpg')
except botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "404":
print("The object does not exist.")
else:
raise
Could anyone help me on this. Thanks in advance.
AWS use a shared credentials system for AWS CLI and all other AWS SDKs this way there is no risk of leaking your AWS credentials to some code repository, AWS security practices recommend to use a shared credentials file which is located usually on linux
~/.aws/credentials
this file contains an access key and secret key which is used by all sdk and aws cli the file the file can be created manually or automatically using this command
aws configure
it will ask few questions and create the credentials file for you, note that you need to create a user with appropiate permissions before accessing aws resources.
For more information click on the link below -:
AWS cli configuration
You are not using the session you created to download the file, you're using s3 client you created. If you want to use the client you need to specify credentials.
your_bucket.download_file('k.png', '/Users/username/Desktop/k.png')
or
s3 = boto3.client('s3', aws_access_key_id=... , aws_secret_access_key=...)
s3.download_file('your_bucket','k.png','/Users/username/Desktop/k.png')

Uploading large files to Google Storage GCE from a Kubernetes pod

We get this error when uploading a large file (more than 10Mb but less than 100Mb):
403 POST https://www.googleapis.com/upload/storage/v1/b/dm-scrapes/o?uploadType=resumable: ('Response headers must contain header', 'location')
Or this error when the file is more than 5Mb
403 POST https://www.googleapis.com/upload/storage/v1/b/dm-scrapes/o?uploadType=multipart: ('Request failed with status code', 403, 'Expected one of', <HTTPStatus.OK: 200>)
It seems that this API is looking at the file size and trying to upload it via multi part or resumable method. I can't imagine that is something that as a caller of this API I should be concerned with. Is the problem somehow related to permissions? Does the bucket need special permission do it can accept multipart or resumable upload.
from google.cloud import storage
try:
client = storage.Client()
bucket = client.get_bucket('my-bucket')
blob = bucket.blob('blob-name')
blob.upload_from_filename(zip_path, content_type='application/gzip')
except Exception as e:
print(f'Error in uploading {zip_path}')
print(e)
We run this inside a Kubernetes pod so the permissions get picked up by storage.Client() call automatically.
We already tried these:
Can't upload with gsutil because the container is Python 3 and gsutil does not run in python 3.
Tried this example: but runs into the same error: ('Response headers must contain header', 'location')
There is also this library. But it is basically alpha quality with little activity and no commits for a year.
Upgraded to google-cloud-storage==1.13.0
Thanks in advance
The problem was indeed the credentials. Somehow the error message was very miss-leading. When we loaded the credentials explicitly the problem went away.
# Explicitly use service account credentials by specifying the private key file.
storage_client = storage.Client.from_service_account_json(
'service_account.json')
I found my node pools had been spec'd with
oauthScopes:
- https://www.googleapis.com/auth/devstorage.read_only
and changing it to
oauthScopes:
- https://www.googleapis.com/auth/devstorage.full_control
fixed the error. As described in this issue the problem is an uninformative error message.

AWS S3 download file from Flask

I have created a small app that should download file from a AWS S3.
I can download the data correctly in this way:
s3_client = boto3.resource('s3')
req = s3_client.meta.client.download_file(bucket, ob_key, dest)
but if I add this function in a flask route it does not work anymore. I obtain this error:
ClientError: An error occurred (400) when calling the HeadObject operation: Bad Request
I'm not able to figure out why it does not work inside the route. Any idea?
That is related to your AWS region. Mention the region name as an added parameter.
Try it on your local machine, using
aws s3 cp s3://bucket-name/file.png file.png --region us-east-1
If you are able to download the file using this command, then it should work fine from your API also.
The problem was that with flask I needed to declare s3_client as global variable instead of just inside the function.
Now it works perfectly!

Categories

Resources