Uploading File to a specific location in Amazon S3 Bucket using boto3? - python

I am trying to upload few files into Amazon S3. I am able to upload the file in my bucket. However, the file needs to be in my-bucket-name,Folder1/Folder2.
import boto3
from boto.s3.key import Key
session = boto3.Session(aws_access_key_id=AWS_ACCESS_KEY_ID,
aws_secret_access_key=AWS_SECRET_ACCESS_KEY)
bucket_name = 'my-bucket-name'
prefix = 'Folder1/Folder2'
s3 = session.resource('s3')
bucket = s3.Bucket(bucket_name)
objs = bucket.objects.filter(Prefix=prefix)
I tried to upload into bucket using this code and succeeded:
s3.meta.client.upload_file('C:/hello.txt', bucket, 'hello.txt')
When I tried to upload the same file into specified Folder2 using this code the failed with error:
s3.meta.client.upload_file('C:/hello.txt', objs, 'hello.txt')
ERROR>>>
botocore.exceptions.ParamValidationError: Parameter validation failed:
Invalid bucket name "s3.Bucket.objectsCollection(s3.Bucket(name='my-bucket-name'), s3.ObjectSummary)": Bucket name must match the regex "^[a-zA-Z0-9.\-_]{1,255}$"
So, how can I upload the file into my-bucket-name,Folder1/Folder2?

s3.meta.client.upload_file('C:/hello.txt', objs, 'hello.txt')
What's happening here is that the bucket argument to the client.upload_file needs to be the name of the bucket as a string
For specific folder
upload_file('C:/hello.txt', bucket, 'Folder1/Folder2/hello.txt')

Related

is it possible to upload a file to S3 from a URL in parts? file.aa, file.ab, file.ac

i would like to upload a really large file 3TB from a URL to s3 directly, but want to upload it in parts, similar to if i would split the file on my machine and upload the parts to s3.
I have the following script to upload the file (whole) from url to S3, how can i modify it to upload it to (parts) file.aa, file.ab, file.ac etc..
import requests
import boto3
url = "https://example.com/file.tar"
r = requests.get(url, stream=True)
session = boto3.Session()
s3 = session.resource('s3')
bucket_name = 'bucket-name'
key = 'file.tar' # key is the name of file on your bucket
bucket = s3.Bucket(bucket_name)
bucket.upload_fileobj(r.raw, key)
I would recommend reading this:
https://medium.com/#niyazi_erd/aws-s3-multipart-upload-with-python-and-boto3-9d2a0ef9b085

upload a csv to aws s3

def store_to_10_rss(tempDF):
s3 = boto3.resource('s3')
try:
s3.Object('abcData', 'cr/working-files/unstructured_data/news-links/10_rss.csv').load()
except botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "404":
print('\nThe object does not exist.')
tempDF.to_csv('10_rss.csv')
s3.Object('abcData/cr/working-files/unstructured_data/news-links/', '10_rss.csv').upload_file(Filename='10_rss.csv')
else:
print('\nSomething else has gone wrong.')
raise
else:
print('\nThe object does exist.')
In my S3, there are multiple buckets. I want to go to abcData bucket. Inside this bucket I want to go to cr folder, then to -->working-files folder, then to-->unstructured_data folder, then to-->news-links folder. In the news-links folder I want to check if a csv '10_rss' exists or not. If it doesn't exist, i want to store the passed dataframe ' tempDF' in a csv and upload this file to the path 'abcData/cr/working-files/unstructured_data/news-links/'.
But this gives me an error :
ParamValidationError: Parameter validation failed:
Invalid bucket name "sgfr01data/credit-memorandum/working-files/unstructured_data/news-links/": Bucket name must match the regex "^[a-zA-Z0-9.-_]{1,255}$" or be an ARN matching the regex "^arn:(aws).:s3:[a-z-0-9]+:[0-9]{12}:accesspoint[/:][a-zA-Z0-9-]{1,63}$|^arn:(aws).:s3-outposts:[a-z-0-9]+:[0-9]{12}:outpost[/:][a-zA-Z0-9-]{1,63}[/:]accesspoint[/:][a-zA-Z0-9-]{1,63}$"
Any help would be appreciated!
Please correct the syntax of AWS S3 upload_file. Refer these file upload examples from AWS. Also, you can refer AWS boto3 API guide for more details about same.
In-short, in your current error and sample code, you used combination of AWS S3 bucket keys or AWS S3 folders along with with AWS S3 bucket name, which should not be case for upload_file boto3 api. AWS S3 bucket name i.e. sgfr01data must be separate parameter.

Upload file using pre-signed URL PUT method not working Python requests

I'm trying to read file residing inside an s3 bucket and upload it to another s3 bucket through pre-signed url. I have tried to upload single file, that works fine. But if I try to upload list of files in a bucket, data can't be uploaded. It's always giving the response code as 200, but data is uploaded in the target s3 location especially with large files.
My python code is given below.
import boto3
import requests
from io import BytesIO
for file in file_list:#list of file names in the bucket
pre_url=getPreSinedUrl()#for each file method will return a separate pre-signed url to upload file
data_in_bytes = s3.get_object(Bucket="my-bucket", Key=file)['Body'].read()
res = requests.put(pre_url, data=BytesIO(data_in_bytes), headers=my_header)
print(res.status_code)
Any help why the file in s3 is not getting uploaded?

Get content_type from Google Cloud file

I have two api endpoints, one that takes a file from an http request and uploads it to a google cloud bucket using the python api, and another that downloads it again. in the first view, i get the file content type from the http request and upload it to the bucket,setting that metadata:
from google.cloud import storage
file_obj = request.FILES['file']
client = storage.Client.from_service_account_json(path.join(
path.realpath(path.dirname(__file__)),
'..',
'settings',
'api-key.json'
))
bucket = client.get_bucket('storage-bucket')
blob = bucket.blob(filename)
blob.upload_from_string(
file_text,
content_type=file_obj.content_type
)
Then in another view, I download the file:
...
bucket = client.get_bucket('storage-bucket')
blob = bucket.blob(filename)
blob.download_to_filename(path)
How can I access the file metadata I set earlier (content_type) ? It's not available on the blob object anymore since a new one was instantiated, but it still holds the file.
You should try
blob = bucket.get_blob(blob_name)
blob.content_type

How to read not my own bucket with python and boto3?

I know bucket name, I have access to it, I can browse it by web and by awscli.
How access it by Python's boto3? All examples assume accessing my own buckets:
import boto3
s3 = boto3.resource('s3')
for bucket in s3.buckets.all():
print(bucket.name)
How to reach other's bucket?
If you have access to someone else's bucket and you know the name of that bucket you can access it like
import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('some-bucket-i-have-access-to')
for obj in bucket.objects.all():
print(obj.key)

Categories

Resources