Download files from public S3 bucket with boto3 - python

I cannot download a file or even get a listing of the public S3 bucket with boto3.
The code below works with my own bucket, but not with public one:
def s3_list(bucket, s3path_or_prefix):
bsession = boto3.Session(aws_access_key_id=settings.AWS['ACCESS_KEY'],
aws_secret_access_key=settings.AWS['SECRET_ACCESS_KEY'],
region_name=settings.AWS['REGION_NAME'])
s3 = bsession.resource('s3')
my_bucket = s3.Bucket(bucket)
items = my_bucket.objects.filter(Prefix=s3path_or_prefix)
return [ii.key for ii in items]
I get an AccessDenied error on this code. The bucket is not in my own and I cannot set permissions there, but I am sure it is open to public read.

I had the similar issue in the past. I have found a key to this bug in https://github.com/boto/boto3/issues/134 .
You can use undocumented trick:
import botocore
def s3_list(bucket, s3path_or_prefix, public=False):
bsession = boto3.Session(aws_access_key_id=settings.AWS['ACCESS_KEY'],
aws_secret_access_key=settings.AWS['SECRET_ACCESS_KEY'],
region_name=settings.AWS['REGION_NAME'])
client = bsession.client('s3')
if public:
client.meta.events.register('choose-signer.s3.*', botocore.handlers.disable_signing)
result = client.list_objects(Bucket=bucket, Delimiter='/', Prefix=s3path_or_prefix)
return [obj['Prefix'] for obj in result.get('CommonPrefixes')]

Related

How to retrieve AWS S3 objects URL using python

I need to write a lambda function that retrieves s3 object URL for object preview. I came across this solution, but I have a question about it. In my case, I would like to retrieve URL of any object in my s3 bucket, hence there is no Keyname.How can i retriece url of any future objects stored in my s3 bucket.
bucket_name = 'aaa'
aws_region = boto3.session.Session().region_name
object_key = 'aaa.png'
s3_url = f"https://{bucket_name}.s3.{aws_region}.amazonaws.com/{object_key}"
return {
'statusCode': 200,
'body': json.dumps({'s3_url': s3_url})
}
You have some examples here. But, what exactly would you like to do? What do you mean by future objects? You can put a creation event on your bucket that will trigger your lambda each time when a new object is uploaded into that bucket.
import boto3
def lambda_handler(event, context):
print(event)
bucket = event['Records'][0]['s3']['bucket']['name']
key = event['Records'][0]['s3']['object']['key']
s3 = boto3.client('s3')
obj = s3.get_object(
Bucket=bucket,
Key=key
)
print(obj['Body'].read().decode('utf-8'))

boto3 python - list objects

While trying to list objects with a prefix, the return is only fetching only 1 object in my Lambda. Not sure what is missing.
import boto3
s3 = boto3.resource('s3')
def lambda_handler(event, context):
try:
## Bucket to use
bucket = s3.Bucket(mybucket)
## List objects within a given prefix
for obj in bucket.objects.filter(Prefix='output/group1'):
print(obj.key)
It's hard to know what the exact problem is when we can't see a valid function or any returned errors. This code works without issue for me:
import boto3
s3 = boto3.resource('s3')
def lambda_handler(event, context):
bucket = s3.Bucket('your-bucket-name')
for obj in bucket.objects.filter(Prefix='output/group1'):
print(obj.key)
lambda_handler('event','context')
Make sure 'output/group1' actually has more than 1 file in it to return.

boto3 aws check if s3 bucket is encrypted

I have the following code posted below which gets all the s3 bucket list on aws and I am trying to write code that checks if the buckets are encrypted in python but I am having trouble figuring out how to do that. Can anyone tell me how to modify my code to do that. I tried online examples and looked at the documentation.
my code is:
from __future__ import print_function
import boto3
import os
os.environ['AWS_DEFAULT_REGION'] = "us-east-1"
# Create an S3 client
s3 = boto3.client('s3')
# Call S3 to list current buckets
response = s3.list_buckets()
# Get a list of all bucket names from the response
buckets = [bucket['Name'] for bucket in response['Buckets']]
# Print out the bucket list
print("Bucket List: %s" % buckets)
Tried the following codes but they don't work:
s3 = boto3.resource('s3')
bucket = s3.Bucket('my-bucket-name')
for obj in bucket.objects.all():
key = s3.Object(bucket.name, obj.key)
print key.server_side_encryption
and
#!/usr/bin/env python
import boto3
s3_client = boto3.client('s3')
head = s3_client.head_object(
Bucket="<S3 bucket name>",
Key="<S3 object key>"
)
if 'ServerSideEncryption' in head:
print head['ServerSideEncryption']
It's first worth understanding a few things about S3 and encryption.
When you enable default encryption on an S3 bucket, you're actually configuring a server-side encryption configuration rule on the bucket that will cause S3 to encrypt every object uploaded to the bucket after the rule was configured.
Unrelated to #1, you can apply an S3 bucket policy to a bucket, denying any uploads of objects that are not encrypted. This will prevent you from adding unencrypted data but it will not automatically encrypt anything.
You can encrypt uploads on an object-by-object basis; encryption does not have to be bucket-wide.
So, one way to find out which buckets fall into category #1 (will automatically encrypt anything uploaded to them), you can do this:
import boto3
from botocore.exceptions import ClientError
s3 = boto3.client('s3')
response = s3.list_buckets()
for bucket in response['Buckets']:
try:
enc = s3.get_bucket_encryption(Bucket=bucket['Name'])
rules = enc['ServerSideEncryptionConfiguration']['Rules']
print('Bucket: %s, Encryption: %s' % (bucket['Name'], rules))
except ClientError as e:
if e.response['Error']['Code'] == 'ServerSideEncryptionConfigurationNotFoundError':
print('Bucket: %s, no server-side encryption' % (bucket['Name']))
else:
print("Bucket: %s, unexpected error: %s" % (bucket['Name'], e))
This will result in output like this:
Bucket: mycats, no server-side encryption
Bucket: mydogs, no server-side encryption
Bucket: mytaxreturn, Encryption: [{'ApplyServerSideEncryptionByDefault': {'SSEAlgorithm': 'AES256'}}]

Get a specific file from s3 bucket (boto3)

So I have a file.csv on my bucket 'test', I'm creating a new session and I wanna download the contents of this file:
session = boto3.Session(
aws_access_key_id=KEY,
aws_secret_access_key=SECRET_KEY
)
s3 = session.resource('s3')
obj = s3.Bucket('test').objects.filter(Prefix='file.csv')
This returns me a collection but is there a way to fetch the file directly? Without any loops, I wanna do something like:
s3.Bucket('test').objects.get(key='file.csv')
I could achieve the same result without passing credentials like this:
s3 = boto3.client('s3')
obj = s3.get_object(Bucket='test', Key='file.csv')
If you take a look at the client method:
import boto3
s3_client = boto3.client('s3')
s3_client.download_file('mybucket', 'hello.txt', '/tmp/hello.txt')
and the resource method:
import boto3
s3 = boto3.resource('s3')
s3.meta.client.download_file('mybucket', 'hello.txt', '/tmp/hello.txt')
you'll notice that you can convert from the resource to the client with meta.client.
So, combine it with your code to get:
session = boto3.Session(aws_access_key_id=KEY, aws_secret_access_key=SECRET_KEY)
s3 = session.resource('s3')
obj = s3.meta.client.download_file('mybucket', 'hello.txt', '/tmp/hello.txt')
I like mpu.aws.s3_download, but I'm biased ;-)
It does it like that:
import os
import boto3
def s3_download(bucket_name, key, profile_name, exists_strategy='raise'):
session = boto3.Session(profile_name=profile_name)
s3 = session.resource('s3')
if os.path.isfile(destination):
if exists_strategy == 'raise':
raise RuntimeError('File \'{}\' already exists.'
.format(destination))
elif exists_strategy == 'abort':
return
s3.Bucket(bucket_name).download_file(key, destination)
For authentication, I recommend using environment variables. See boto3: Configuring Credentials for details.
you can use the following boto3 method.
download_file(Bucket, Key, Filename, ExtraArgs=None, Callback=None,
Config=None)
s3 = boto3.resource('s3')
s3.meta.client.download_file('mybucket', 'hello.txt', '/tmp/hello.txt')
find more details here - download_file()

copy file from gcs to s3 in boto3

I am looking to copy files from gcs to my s3 bucket. In boto2, easy as a button.
conn = connect_gs(user_id, password)
gs_bucket = conn.get_bucket(gs_bucket_name)
for obj in bucket:
s3_key = key.Key(s3_bucket)
s3_key.key = obj
s3_key.set_contents_from_filename(obj)
However in boto3, I am lost trying to find equivalent code. Any takers?
If all you're doing is a copy:
import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('bucket-name')
for obj in gcs:
s3_obj = bucket.Object(gcs.key)
s3_obj.put(Body=gcs.data)
Docs: s3.Bucket, s3.Bucket.Object, s3.Bucket.Object.put
Alternatively, if you don't want to use the resource model:
import boto3
s3_client = boto3.client('s3')
for obj in gcs:
s3_client.put_object(Bucket='bucket-name', Key=gcs.key, Body=gcs.body)
Docs: s3_client.put_object
Caveat: The gcs bits are pseudocode, I am not familiar with their API.
EDIT:
So it seems gcs supports an old version of the S3 API and with that an old version of the signer. We still have support for that old signer, but you have to opt into it. Note that some regions don't support old signing versions (you can see a list of which S3 regions support which versions here), so if you're trying to copy over to one of those you will need to use a different client.
import boto3
from botocore.client import Config
# Create a client with the s3v2 signer
resource = boto3.resource('s3', config=Config(signature_version='s3'))
gcs_bucket = resource.Bucket('phjordon-test-bucket')
s3_bucket = resource.Bucket('phjordon-test-bucket-tokyo')
for obj in gcs_bucket.objects.all():
s3_bucket.Object(obj.key).copy_from(
CopySource=obj.bucket_name + "/" + obj.key)
Docs: s3.Object.copy_from
This, of course, will only work assuming gcs is still S3 compliant.

Categories

Resources