Python Boto3 AWS Multipart Upload Syntax

Python Boto3 AWS Multipart Upload Syntax - python

I am successfully authenticating with AWS and using the 'put_object' method on the Bucket object to upload a file. Now I want to use the multipart API to accomplish this for large files. I found the accepted answer in this question:
How to save S3 object to a file using boto3
But when trying to implement I am getting "unknown method" errors. What am I doing wrong? My code is below. Thanks!
## Get an AWS Session
self.awsSession = Session(aws_access_key_id=accessKey,
aws_secret_access_key=secretKey,
aws_session_token=session_token,
region_name=region_type)
...
# Upload the file to S3
s3 = self.awsSession.resource('s3')
s3.Bucket('prodbucket').put_object(Key=fileToUpload, Body=data) # WORKS
#s3.Bucket('prodbucket').upload_file(dataFileName, 'prodbucket', fileToUpload) # DOESNT WORK
#s3.upload_file(dataFileName, 'prodbucket', fileToUpload) # DOESNT WORK

The upload_file method has not been ported over to the bucket resource yet. For now you'll need to use the client object directly to do this:
client = self.awsSession.client('s3')
client.upload_file(...)

Libcloud S3 wrapper transparently handles all the splitting and uploading of the parts for you.
Use upload_object_via_stream method to do so:
from libcloud.storage.types import Provider
from libcloud.storage.providers import get_driver
# Path to a very large file you want to upload
FILE_PATH = '/home/user/myfile.tar.gz'
cls = get_driver(Provider.S3)
driver = cls('api key', 'api secret key')
container = driver.get_container(container_name='my-backups-12345')
# This method blocks until all the parts have been uploaded.
extra = {'content_type': 'application/octet-stream'}
with open(FILE_PATH, 'rb') as iterator:
obj = driver.upload_object_via_stream(iterator=iterator,
container=container,
object_name='backup.tar.gz',
extra=extra)
For official documentation on S3 Multipart feature, refer to AWS Official Blog.

Related

AWS lambda open pdf using PyPDF2

i was trying to open a PDF using python library PyPDF2 in AWS Lambda
but its giving me access denied
Code
from PyPDF2 import PdfFileReader
pdf = PdfFileReader(open('S3 FILE URL', 'rb'))
if pdf.isEncrypted:
pdf.decrypt('')
width = int(pdf.getPage(0).mediaBox.getWidth())
height = int(pdf.getPage(0).mediaBox.getHeight())
my bucket permission
Block all public access
Off
Block public access to buckets and objects granted through new access control lists (ACLs)
Off
Block public access to buckets and objects granted through any access control lists (ACLs)
Off
Block public access to buckets and objects granted through new public bucket or access point policies
Off
Block public and cross-account access to buckets and objects through any public bucket or access point policies
Off

You're skipping a step by trying to use open() to fetch a URL: open() can only action files on the local filesystem - https://docs.python.org/3/library/functions.html#open
You'll need to use urllib3/etc. to fetch the file from S3 first (assuming the bucket is also publicly-accessible, as Manish pointed out).
urllib3 usage suggestion: What's the best way to download file using urllib3
So combining the two:
pdf = PdfFileReader(open('S3 FILE URL', 'rb'))
becomes (something like)
import urllib3
def fetch_file(url, save_as):
http = urllib3.PoolManager()
r = http.request('GET', url, preload_content=False)
with open(save_as, 'wb') as out:
while True:
data = r.read(chunk_size)
if not data:
break
out.write(data)
r.release_conn()
if __name__ == "__main__":
pdf_filename = "my_pdf_from_s3.pdf"
fetch_file(s3_file_url, pdf_filename)
pdf = PdfFileReader(open(pdf_filename, 'rb'))

I believe you have to make changes in this section of your S3 bucket in the AWS console. I believe this should solve your issue.

How do I get a download link for an object I upload to an AWS bucket?

I'm using AWS S3 boto3 to upload files to my AWS bucket called uploadtesting. Here is an example implementation:
import boto3
...
s3 = boto3.resource('s3')
s3.meta.client.upload_file('files/report.pdf', 'uploadtesting', 'report.pdf')
Accessing the object from the AWS S3 console allows you to see the object URL, however it is not a downloadable link. What I wanted to know is how can I use python to print out a downloadable link to the file I just uploaded?

It appears you are asking how to generate a URL that allows a private object to be downloaded.
This can be done by generating an Amazon S3 pre-signed URL, which provides access to a private S3 object for a limited time.
Basically, using credentials that have access to the object, you can create a URL that is 'signed'. When Amazon S3 receives this URL, it verifies the signature and provides access to the object if the expiry period has not ended.
From Presigned URLs — Boto3 documentation:
response = s3_client.generate_presigned_url('get_object',
Params={'Bucket': bucket_name,
'Key': object_name},
ExpiresIn=expiration)
The ExpiresIn parameter is expressed in seconds.

The format is:
https://BUCKET-NAME.s3.amazonaws.com/OBJECT-KEY
So your object would be:
https://uploadtesting.s3.amazonaws.com/report.pdf
There is no supplied function to generate this string, so use:
url = f'https://{bucket_name}.s3.amazonaws.com/{key}'

Object metadata keys are lowercased when uploading to GCS with Apache Libcloud

I'm using Apache Libcloud to upload files to a Google Cloud Storage bucket together with object metadata.
In the process, the keys in my metadata dict are being lowercased. I'm not sure whether this is due to Cloud Storage or whether this happens in Libcloud.
The issue can be reproduced following the example from the Libcloud docs:
from libcloud.storage.types import Provider
from libcloud.storage.providers import get_driver
cls = get_driver(Provider.GOOGLE_STORAGE)
driver = cls('SA-EMAIL', './SA.json') # provide service account credentials here
FILE_PATH = '/home/user/file'
extra = {'meta_data': {'camelCase': 'foo'}}
# Upload with metadata
with open(FILE_PATH, 'rb') as iterator:
obj = driver.upload_object_via_stream(iterator=iterator,
container=container,
object_name='file',
extra=extra)
The file uploads succesfully, but resulting metadata will look as follows:
Where camelCase has been turned into camelcase.
I don't think GCS disallows camelcase for object metadata, since it's possible to edit the metadata manually in that sense:
I went through Libcloud's source code, but I don't see any explicit lowercasing going on. Any pointers on how to upload camelcased metadata with libcloud are most welcome.

I also checked the library and wasn't able to see anything obvious. But I guess to open a new issue there will be a great start.
As far as what's concerned on the Google Cloud Storage side, and as you could verify by yourself it does admit camelcase. I was able to successfully edit the metadata of a file by using the code offered on their public docs (but wasn't able to figure out something on libcloud itself):
from google.cloud import storage
def set_blob_metadata(bucket_name, blob_name):
"""Set a blob's metadata."""
# bucket_name = 'your-bucket-name'
# blob_name = 'your-object-name'
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.get_blob(blob_name)
metadata = {'camelCase': 'foo', 'NaMe': 'TeSt'}
blob.metadata = metadata
blob.patch()
print("The metadata for the blob {} is {}".format(blob.name, blob.metadata))
So, I believe that this could be a good workaround on your case if you are not able to work it out with libcloud. Do notice that the Cloud Storage Client Libraries base their authentication on environment variables and the following docs should be followed.
Addition by question author: As hinted at in the comments, metadata can be added to a blob before uploading a file as follows:
from google.cloud import storage
gcs = storage.Client()
bucket = gcs.get_bucket('my-bucket')
blob = bucket.blob('document')
blob.metadata = {'camelCase': 'foobar'}
blob.upload_from_file(open('/path/to/document', 'rb'))
This allows to set metadata without having to patch an existing blob, and provides an effective workaround for the issue with libcloud.

How to print buckets available?

I'm trying to print available buckets on AWS but failed. I tried multiple tutorials online and i would get cannot locate credentials and 's3.ServiceResource' object has no attribute errors.
s3 = boto3.resource('s3',aws_access_key_id = "Random",aws_secret_access_key = "Secret" )
client = s3.client('s3')
response = client.list_buckets()
print(response)

Can you try:
for bucket in s3.buckets.all():
print(bucket.name)

The problem is probably because you are defining s3 as a resource:
s3 = boto3.resource('s3')
But then you are trying to use it as a client:
client = s3.client('s3')
That won't work. If you want a client, create one with:
s3_client = boto3.client('s3')
Or, you can extract a client from the resource:
s3_resource = boto3.resource('s3')
response = s3_resource.meta.client.list_buckets()
Or, sticking with the resource, you can use:
s3_resource = boto3.resource('s3')
for bucket in s3_resource.buckets.all():
# Do something with bucket
Confused? Try to stick with one method. Client directly matches the underlying API calls made to S3 and is the same as all other languages. Resource is a more "Pythonic" way of accessing resources. The calls get translated to client API calls. Resources can be a little more challenging when figuring out required permissions, since there isn't a one-to-one mapping to actual API call.

Get content_type from Google Cloud file

I have two api endpoints, one that takes a file from an http request and uploads it to a google cloud bucket using the python api, and another that downloads it again. in the first view, i get the file content type from the http request and upload it to the bucket,setting that metadata:
from google.cloud import storage
file_obj = request.FILES['file']
client = storage.Client.from_service_account_json(path.join(
path.realpath(path.dirname(__file__)),
'..',
'settings',
'api-key.json'
))
bucket = client.get_bucket('storage-bucket')
blob = bucket.blob(filename)
blob.upload_from_string(
file_text,
content_type=file_obj.content_type
)
Then in another view, I download the file:
...
bucket = client.get_bucket('storage-bucket')
blob = bucket.blob(filename)
blob.download_to_filename(path)
How can I access the file metadata I set earlier (content_type) ? It's not available on the blob object anymore since a new one was instantiated, but it still holds the file.

You should try
blob = bucket.get_blob(blob_name)
blob.content_type

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Boto3 AWS Multipart Upload Syntax - python

The upload_file method has not been ported over to the bucket resource yet. For now you'll need to use the client object directly to do this: client = self.awsSession.client('s3') client.upload_file(...)

Related

AWS lambda open pdf using PyPDF2

How do I get a download link for an object I upload to an AWS bucket?

Object metadata keys are lowercased when uploading to GCS with Apache Libcloud

How to print buckets available?

Get content_type from Google Cloud file

Categories

Resources