upload a csv to aws s3 - python

def store_to_10_rss(tempDF):
s3 = boto3.resource('s3')
try:
s3.Object('abcData', 'cr/working-files/unstructured_data/news-links/10_rss.csv').load()
except botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "404":
print('\nThe object does not exist.')
tempDF.to_csv('10_rss.csv')
s3.Object('abcData/cr/working-files/unstructured_data/news-links/', '10_rss.csv').upload_file(Filename='10_rss.csv')
else:
print('\nSomething else has gone wrong.')
raise
else:
print('\nThe object does exist.')
In my S3, there are multiple buckets. I want to go to abcData bucket. Inside this bucket I want to go to cr folder, then to -->working-files folder, then to-->unstructured_data folder, then to-->news-links folder. In the news-links folder I want to check if a csv '10_rss' exists or not. If it doesn't exist, i want to store the passed dataframe ' tempDF' in a csv and upload this file to the path 'abcData/cr/working-files/unstructured_data/news-links/'.
But this gives me an error :
ParamValidationError: Parameter validation failed:
Invalid bucket name "sgfr01data/credit-memorandum/working-files/unstructured_data/news-links/": Bucket name must match the regex "^[a-zA-Z0-9.-_]{1,255}$" or be an ARN matching the regex "^arn:(aws).:s3:[a-z-0-9]+:[0-9]{12}:accesspoint[/:][a-zA-Z0-9-]{1,63}$|^arn:(aws).:s3-outposts:[a-z-0-9]+:[0-9]{12}:outpost[/:][a-zA-Z0-9-]{1,63}[/:]accesspoint[/:][a-zA-Z0-9-]{1,63}$"
Any help would be appreciated!

Please correct the syntax of AWS S3 upload_file. Refer these file upload examples from AWS. Also, you can refer AWS boto3 API guide for more details about same.
In-short, in your current error and sample code, you used combination of AWS S3 bucket keys or AWS S3 folders along with with AWS S3 bucket name, which should not be case for upload_file boto3 api. AWS S3 bucket name i.e. sgfr01data must be separate parameter.

Related

Move all files in s3 bucket from s3 account to another using boto3

I'm trying to move the contents of a bucket from account-a to a bucket in account-b which I already have the credentials for both of them.
Here's the code I'm currently using:
import boto3
SRC_AWS_KEY = 'src-key'
SRC_AWS_SECRET = 'src-secret'
DST_AWS_KEY = 'dst-key'
DST_AWS_SECRET = 'dst-secret'
srcSession = boto3.session.Session(
aws_access_key_id=SRC_AWS_KEY,
aws_secret_access_key=SRC_AWS_SECRET
)
dstSession = boto3.session.Session(
aws_access_key_id=DST_AWS_KEY,
aws_secret_access_key=DST_AWS_SECRET
)
copySource = {
'Bucket': 'src-bucket',
'Key': 'test-bulk-src'
}
srcS3 = srcSession.resource('s3')
dstS3 = dstSession.resource('s3')
dstS3.meta.client.copy(CopySource=copySource, Bucket='dst-bucket', Key='test-bulk-dst', SourceClient=srcS3.meta.client)
print('success')
The problem is that when I specify a file's name in the field Key followed by /file.csv it works really fine, but when I set it to copy the whole folder, as showed in the code, it fails and throws this exception:
botocore.exceptions.ClientError: An error occurred (404) when calling the HeadObject operation: Not Found
What I need to do is to move the contents in one call, not by iterating through the contents of the src-folder, because this is time/money consuming, as I may have thousands of files to be moved.
There is no API call in Amazon S3 to copy folders. (Folders do not actually exist — the Key of each object includes its full path.)
You will need to iterate through each object and copy it.
The AWS CLI (written in Python) provides some higher-level commands that will do this iteration for you:
aws s3 cp --recursive s3://source-bucket/folder/ s3://destination-bucket/folder/
If the buckets are in different accounts, I would recommend:
Use a set of credentials for the destination account (avoids problems with object ownership)
Modify the bucket policy on the source bucket to permit access by the credentials from the destination account (avoids the need to use two sets of credentials)

Download Files From AWS S3 Without Access and Secret Keys in Python

I have a working code to download files from one of my buckets in S3 and does some conversion work through in Python. I do not embed the Access and Secret Keys in the code but the keys are in my AWS CLI configuration.
import boto3
import botocore
BUCKET_NAME = 'converted-parquet-bucket' # replace with your own bucket name
KEY = 'json-to-parquet/names.snappy.parquet' # replace with path and follow with key object
s3 = boto3.resource('s3')
try:
s3.Bucket(BUCKET_NAME).download_file(KEY, 'names.snappy.parquet') # replace the key object name
except botocore.exceptions.ClientError as e: # exception handling
if e.response['Error']['Code'] == "404":
print("The object does not exist.") # if object that you are looking for does not exist it will print this
else:
raise
# Un comment lines 21 and 22 to convert csv to parquet
# dataframe = pandas.read_csv('names.csv')
# dataframe.to_parquet('names.snappy.parquet' ,engine='auto', compression='snappy')
data = pq.read_pandas('names.snappy.parquet', columns=['Year of Birth', 'Gender', 'Ethnicity', "Child's First Name", 'Count', 'Rank']).to_pandas()
#print(data) # this code will print the ALL the data in the parquet file
print(data.loc[data['Gender'] == 'MALE']) # this code will print the data in the parquet file ONLY what is in the query (SQL query)
Could someone help me how to get this code working without having access and secret keys embedded in the code or in AWS configure
If you are running your function locally, you need to have your credentials on your local credentials/config file to interact with AWS resources.
One alternative would be to run on AWS Lambda (if your function runs periodically, you can set that up with CloudWatch Events) and use Environment Variables or AWS Security Token Service (STS) to generate temporary credentials.
If you do not want to use secret/access key, you should use roles and policies, then. Here's the deal:
Define a role (ex. RoleWithAccess) and be sure that your user (defined in your credentials) can assume this role
Set a policy for RoleWithAccess, giving read/write access to your buckets
If you are executing it in your local machine, run the necessary commands (AWS CLI) to create a profile that makes you assume RoleWithAccess (ex. ProfileWithAccess)
Execute your script using a session passing this profile as the argument, what means you need to replace:
s3 = boto3.resource('s3')
with
session = boto3.session.Session(profile_name='ProfileWithAccess')
s3 = session.resource('s3')
The upside of this approach is that if you are running it inside an EC2 instance, you can tie your instance to a specific role when you build it (ex. RoleWithAccess). In that case, you can completely ignore session, profile, all the AWS CLI hocus pocus, and just run s3 = boto3.resource('s3').
You can also use AWS Lambda, setting a role and a policy with read/write permission to your bucket.

Uploading File to a specific location in Amazon S3 Bucket using boto3?

I am trying to upload few files into Amazon S3. I am able to upload the file in my bucket. However, the file needs to be in my-bucket-name,Folder1/Folder2.
import boto3
from boto.s3.key import Key
session = boto3.Session(aws_access_key_id=AWS_ACCESS_KEY_ID,
aws_secret_access_key=AWS_SECRET_ACCESS_KEY)
bucket_name = 'my-bucket-name'
prefix = 'Folder1/Folder2'
s3 = session.resource('s3')
bucket = s3.Bucket(bucket_name)
objs = bucket.objects.filter(Prefix=prefix)
I tried to upload into bucket using this code and succeeded:
s3.meta.client.upload_file('C:/hello.txt', bucket, 'hello.txt')
When I tried to upload the same file into specified Folder2 using this code the failed with error:
s3.meta.client.upload_file('C:/hello.txt', objs, 'hello.txt')
ERROR>>>
botocore.exceptions.ParamValidationError: Parameter validation failed:
Invalid bucket name "s3.Bucket.objectsCollection(s3.Bucket(name='my-bucket-name'), s3.ObjectSummary)": Bucket name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$"
So, how can I upload the file into my-bucket-name,Folder1/Folder2?
s3.meta.client.upload_file('C:/hello.txt', objs, 'hello.txt')
What's happening here is that the bucket argument to the client.upload_file needs to be the name of the bucket as a string
For specific folder
upload_file('C:/hello.txt', bucket, 'Folder1/Folder2/hello.txt')

Error while Downloading file to my local device from S3

I am trying to download a file from Amazon S3 bucket to my local device using the below code but I got an error saying "Unable to locate credentials"
Given below is the code I have written:
import boto3
import botocore
BUCKET_NAME = 'my-bucket'
KEY = 'my_image_in_s3.jpg'
s3 = boto3.resource('s3')
try:
s3.Bucket(BUCKET_NAME).download_file(KEY, 'my_local_image.jpg')
except botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "404":
print("The object does not exist.")
else:
raise
Could anyone help me on this. Thanks in advance.
AWS use a shared credentials system for AWS CLI and all other AWS SDKs this way there is no risk of leaking your AWS credentials to some code repository, AWS security practices recommend to use a shared credentials file which is located usually on linux
~/.aws/credentials
this file contains an access key and secret key which is used by all sdk and aws cli the file the file can be created manually or automatically using this command
aws configure
it will ask few questions and create the credentials file for you, note that you need to create a user with appropiate permissions before accessing aws resources.
For more information click on the link below -:
AWS cli configuration
You are not using the session you created to download the file, you're using s3 client you created. If you want to use the client you need to specify credentials.
your_bucket.download_file('k.png', '/Users/username/Desktop/k.png')
or
s3 = boto3.client('s3', aws_access_key_id=... , aws_secret_access_key=...)
s3.download_file('your_bucket','k.png','/Users/username/Desktop/k.png')

Downloading a file from a requester pays bucket in amazon s3

I need to download some replay files from an API that has the files stored on an amazon s3 bucket, with requester pays enabled.
The problem is, I set up my amazon AWS account, created an AWSAccessKeyId and AWSSecretKey, but I still can't get to download a single file, since I'm getting an Access denied response.
I want to automate all this inside a Python script, so I've been trying to do this with the boto3 package. Also, I installed the Amazon AWS CLI, and set up my access ID and secret key.
The file I've been trying to download (I want to download multiple ones, but for now I'm trying with just one) is this: http://hotsapi.s3-website-eu-west-1.amazonaws.com/18e8b4df-6dad-e1f5-bfc7-48899e6e6a16.StormReplay
From what I've found so far on SO, I've tried something like this:
import boto3
import botocore
BUCKET_NAME = 'hotsapi' # replace with your bucket name
KEY = '18e8b4df-6dad-e1f5-bfc7-48899e6e6a16.StormReplay' # replace with your object key
s3 = boto3.resource('s3')
try:
s3.Bucket(BUCKET_NAME).download_file(KEY, 'test.StormReplay')
except botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "404":
print("The object does not exist.")
else:
raise
And this:
import boto3
s3_client = boto3.Session().client('s3')
response = s3_client.get_object(Bucket='hotsapi',
Key='18e8b4df-6dad-e1f5-bfc7-48899e6e6a16.StormReplay',
RequestPayer='requester')
response_content = response['Body'].read()
with open('./B01.StormReplay', 'wb') as file:
file.write(response_content)
But I still can't manage to download the file.
Any help is welcome! Thanks!

Categories

Resources