Can't download from AWS boto3 generated signed url - python

I'm trying to upload a file to an existing AWS s3 bucket, generate a public URL and use that URL (somewhere else) to download the file.
I'm closely following the example here:
import os
import boto3
import requests
import tempfile
s3 = boto3.client('s3')
with tempfile.NamedTemporaryFile(mode="w", delete=False) as outfile:
outfile.write("dummycontent")
file_name = outfile.name
with open(file_name, mode="r") as outfile:
s3.upload_file(outfile.name, "twistimages", "filekey")
os.unlink(file_name)
url = s3.generate_presigned_url(
ClientMethod='get_object',
Params={
'Bucket': 'twistimages',
'Key': 'filekey'
}
)
response = requests.get(url)
print(response)
I would expect to see a success return code (200) from the requests library.
Instead, stdout is: <Response [400]>
Also, if I navigate to the corresponding URL with a webbrowser, I get an XML file with an error code: InvalidRequest and an error message:
The authorization mechanism you have provided is not supported. Please
use AWS4-HMAC-SHA256.
How can I use boto3 to generate a public URL, which can easily be downloaded by any user by just navigating to the corresponding URL, without generating complex headers?
Why does the example code from the official documentation not work in my case?

AWS S3 still support legacy old v2 signature in US region(prior 2014). But under new AWS region, only AWS4-HMAC-SHA256(s3v4) are allowed.
To support this features , you must specify them explicitly in .aws/config file or during boto3.s3 resource/client instantiation. e.g.
# add this entry under ~/.aws/config
[default]
s3.signature_version = s3v4
[other profile]
s3.signature_version = s3v4
Or declare them explicitly
s3client = boto3.client('s3', config= boto3.session.Config(signature_version='s3v4'))
s3resource = boto3.resource('s3', config= boto3.session.Config(signature_version='s3v4'))

I solved the issue. I'm using an S3 bucket in the eu-central-1 region and after specifying the region in the config file, everything worked as expected and the script stdout was <Response [200]>.
The configuration file (~/.aws/config) now looks like:
[default]
region=eu-central-1

Related

use boto3 session when opening s3 url

I see a lot of code that uses an S3 bucket url to open a file. I would like to use smart open to open a compressed file
session = boto3.Session(ID, pass)
file = open("s3://bucket/file.txt.gz", transport_params=dict(session=session), encoding="utf-8")
However, all the examples I see about smart open and other pulls from boto3 using a url never specify how to use the session when pulling the data from the url only when pushing data to a new bucket. Is there a way to use the url and the session without needing to create a client from my session and access the bucket and key?
As mentioned you just need to replace "wb" with "rb". I was mistaken that this didn't work
from smart_open import open
import boto3
url = 's3://bucket/your/keyz'
session = boto3.Session(aws_access_key_id,
aws_secret_access_key,
region_name)
with open(url, 'rb', transport_params={'client': session.client('s3')}) as fin:
file = fin.read()
print(file)

AWS lambda open pdf using PyPDF2

i was trying to open a PDF using python library PyPDF2 in AWS Lambda
but its giving me access denied
Code
from PyPDF2 import PdfFileReader
pdf = PdfFileReader(open('S3 FILE URL', 'rb'))
if pdf.isEncrypted:
pdf.decrypt('')
width = int(pdf.getPage(0).mediaBox.getWidth())
height = int(pdf.getPage(0).mediaBox.getHeight())
my bucket permission
Block all public access
Off
Block public access to buckets and objects granted through new access control lists (ACLs)
Off
Block public access to buckets and objects granted through any access control lists (ACLs)
Off
Block public access to buckets and objects granted through new public bucket or access point policies
Off
Block public and cross-account access to buckets and objects through any public bucket or access point policies
Off
You're skipping a step by trying to use open() to fetch a URL: open() can only action files on the local filesystem - https://docs.python.org/3/library/functions.html#open
You'll need to use urllib3/etc. to fetch the file from S3 first (assuming the bucket is also publicly-accessible, as Manish pointed out).
urllib3 usage suggestion: What's the best way to download file using urllib3
So combining the two:
pdf = PdfFileReader(open('S3 FILE URL', 'rb'))
becomes (something like)
import urllib3
def fetch_file(url, save_as):
http = urllib3.PoolManager()
r = http.request('GET', url, preload_content=False)
with open(save_as, 'wb') as out:
while True:
data = r.read(chunk_size)
if not data:
break
out.write(data)
r.release_conn()
if __name__ == "__main__":
pdf_filename = "my_pdf_from_s3.pdf"
fetch_file(s3_file_url, pdf_filename)
pdf = PdfFileReader(open(pdf_filename, 'rb'))
I believe you have to make changes in this section of your S3 bucket in the AWS console. I believe this should solve your issue.

using boto3 to get file list and download files

I was given this s3 url: s3://file.share.external.bdex.com/Offrs
In this url is a battery of files I need to download.
I have this code:
import boto3
s3_client = boto3.client('s3',
aws_access_key_id='<<ACCESS KEY>>',
aws_secret_access_key='<<SECRET_ACCESS_KEY>>'
)
object_listing = s3_client.list_objects_v2(Bucket='file.share.external.bdex.com/Offrs',
Prefix='')
print(object_listing)
I have tried:
Bucket='file.share.external.bdex.com', Prefix='Offrs'
Bucket='s3://file.share.external.bdex.com/Offrs/'
Bucket='file.share.external.bdx.com/Offrs', Prefix='Offrs'
and several other configurations, all saying I'm not following the regex. due to the slash, or not found.
What am I missing?
Thank you.
Bucket = 'file.share.external.bdx.com'
Prefix = 'Offrs/'
You can test your access permissions via the AWS CLI:
aws s3 ls s3://file.share.external.bdex.com/Offrs/

Uploading file to AWS S3 through Chalice API call

I'm trying to upload a file to my S3 bucket through Chalice (I'm playing around with it currently, still new to this). However, I can't seem to get it right.
I have AWS setup correctly, doing the tutorial successfully returns me some messages. Then I try to do some upload/download, and problem shows up.
s3 = boto3.resource('s3', region_name=<some region name, in this case oregon>)
BUCKET= 'mybucket'
UPLOAD_FOLDER = os.path.abspath('') # the file I wanna upload is in the same folder as my app.py, so I simply get the current folder name
#app.route('/upload/{file_name}', methods=['PUT'])
def upload_to_s3(file_name):
s3.meta.client.upload_file(UPLOAD_FOLDER+file_name, BUCKET, file_name)
return Response(message='upload successful',
status_code=200,
headers={'Content-Type': 'text/plain'}
)
Please don't worry about how I set my file path, unless that's the issue, of course.
I got the error log:
No such file or directory: ''
in this case file_name is just mypic.jpg.
I'm wondering why the UPLOAD_FOLDER part is not being picked up. Also, for the reference, it seems like using absolute path will be troublesome with Chalice (while testing, I've seen the code being moved to /var/task/)
Does anyone know how to set it up correctly?
EDIT:
the complete script
from chalice import Chalice, Response
import boto3
app = Chalice(app_name='helloworld') # I'm just modifying the script I used for the tutorial
s3 = boto3.client('s3', region_name='us-west-2')
BUCKET = 'chalicetest1'
#app.route('/')
def index():
return {'status_code': 200,
'message': 'welcome to test API'}
#app.route('/upload/{file_name}, methods=['PUT'], content_types=['application/octet-stream'])
def upload_to_s3(file_name):
try:
body = app.current_request.raw_body
temp_file = '/tmp/' + file_name
with open(temp_file, 'wb') as f:
f.write(body)
s3.upload_file(temp_file, BUCKET, file_name)
return Response(message='upload successful',
headers=['Content-Type': 'text/plain'],
status_code=200)
except Exception, e:
app.log.error('error occurred during upload %s' % e)
return Response(message='upload failed',
headers=['Content-Type': 'text/plain'],
status_code=400)
I got it running and this works for me as app.py in an AWS Chalice project:
from chalice import Chalice, Response
import boto3
app = Chalice(app_name='helloworld')
BUCKET = 'mybucket' # bucket name
s3_client = boto3.client('s3')
#app.route('/upload/{file_name}', methods=['PUT'],
content_types=['application/octet-stream'])
def upload_to_s3(file_name):
# get raw body of PUT request
body = app.current_request.raw_body
# write body to tmp file
tmp_file_name = '/tmp/' + file_name
with open(tmp_file_name, 'wb') as tmp_file:
tmp_file.write(body)
# upload tmp file to s3 bucket
s3_client.upload_file(tmp_file_name, BUCKET, file_name)
return Response(body='upload successful: {}'.format(file_name),
status_code=200,
headers={'Content-Type': 'text/plain'})
You can test this with curl and its --upload-file directly from the command line with:
curl -X PUT https://YOUR_API_URL_HERE/upload/mypic.jpg --upload-file mypic.jpg --header "Content-Type:application/octet-stream"
To get this running, you have to manually attach the policy to write to s3 to the role of your lambda function. This role is auto-generated by Chalice. Attach the policy (e.g. AmazonS3FullAccess) manually next to the existing policy in the AWS IAM web interface to the role created by your Chalice project.
Things to mention:
You cannot write to the working directory /var/task/ of the Lambda functions, but you have some space at /tmp/, see this answer.
You have to specify the accepted content-type 'application/octet-stream' for the #app.route (and upload the file accordingly via curl).
HTTP PUT puts a file or resource at a specific URI, so to use PUT this file has to be uploaded to the API via HTTP.

Python Boto3 AWS Multipart Upload Syntax

I am successfully authenticating with AWS and using the 'put_object' method on the Bucket object to upload a file. Now I want to use the multipart API to accomplish this for large files. I found the accepted answer in this question:
How to save S3 object to a file using boto3
But when trying to implement I am getting "unknown method" errors. What am I doing wrong? My code is below. Thanks!
## Get an AWS Session
self.awsSession = Session(aws_access_key_id=accessKey,
aws_secret_access_key=secretKey,
aws_session_token=session_token,
region_name=region_type)
...
# Upload the file to S3
s3 = self.awsSession.resource('s3')
s3.Bucket('prodbucket').put_object(Key=fileToUpload, Body=data) # WORKS
#s3.Bucket('prodbucket').upload_file(dataFileName, 'prodbucket', fileToUpload) # DOESNT WORK
#s3.upload_file(dataFileName, 'prodbucket', fileToUpload) # DOESNT WORK
The upload_file method has not been ported over to the bucket resource yet. For now you'll need to use the client object directly to do this:
client = self.awsSession.client('s3')
client.upload_file(...)
Libcloud S3 wrapper transparently handles all the splitting and uploading of the parts for you.
Use upload_object_via_stream method to do so:
from libcloud.storage.types import Provider
from libcloud.storage.providers import get_driver
# Path to a very large file you want to upload
FILE_PATH = '/home/user/myfile.tar.gz'
cls = get_driver(Provider.S3)
driver = cls('api key', 'api secret key')
container = driver.get_container(container_name='my-backups-12345')
# This method blocks until all the parts have been uploaded.
extra = {'content_type': 'application/octet-stream'}
with open(FILE_PATH, 'rb') as iterator:
obj = driver.upload_object_via_stream(iterator=iterator,
container=container,
object_name='backup.tar.gz',
extra=extra)
For official documentation on S3 Multipart feature, refer to AWS Official Blog.

Categories

Resources