Accessing nested buckets in S3 with Boto3 - python

I know the path of the bucket I want to access /bucket1/bucket2/etc/ but I can't figure out how to access it via boto3.
I can enumerate all buckets starting from the source, but can't get to the bucket I want.
For example I can do:
prod_bucket = s3.Bucket('prod')
But I cannot do:
prod_bucket = s3.Bucket('prod/prod2/')
TIA

There are no nested buckets. You have bucket and objects.
s3 = boto3.client('s3')
object = s3.get_object(Bucket='prod', Key='prod2/..')
Or:
s3 = boto3.resource('s3')
bucket = s3.Bucket('prod')
object = bucket.Object('prod2/..')
See: get_object

Related

How to retrieve AWS S3 objects URL using python

I need to write a lambda function that retrieves s3 object URL for object preview. I came across this solution, but I have a question about it. In my case, I would like to retrieve URL of any object in my s3 bucket, hence there is no Keyname.How can i retriece url of any future objects stored in my s3 bucket.
bucket_name = 'aaa'
aws_region = boto3.session.Session().region_name
object_key = 'aaa.png'
s3_url = f"https://{bucket_name}.s3.{aws_region}.amazonaws.com/{object_key}"
return {
'statusCode': 200,
'body': json.dumps({'s3_url': s3_url})
}
You have some examples here. But, what exactly would you like to do? What do you mean by future objects? You can put a creation event on your bucket that will trigger your lambda each time when a new object is uploaded into that bucket.
import boto3
def lambda_handler(event, context):
print(event)
bucket = event['Records'][0]['s3']['bucket']['name']
key = event['Records'][0]['s3']['object']['key']
s3 = boto3.client('s3')
obj = s3.get_object(
Bucket=bucket,
Key=key
)
print(obj['Body'].read().decode('utf-8'))

boto3 python - list objects

While trying to list objects with a prefix, the return is only fetching only 1 object in my Lambda. Not sure what is missing.
import boto3
s3 = boto3.resource('s3')
def lambda_handler(event, context):
try:
## Bucket to use
bucket = s3.Bucket(mybucket)
## List objects within a given prefix
for obj in bucket.objects.filter(Prefix='output/group1'):
print(obj.key)
It's hard to know what the exact problem is when we can't see a valid function or any returned errors. This code works without issue for me:
import boto3
s3 = boto3.resource('s3')
def lambda_handler(event, context):
bucket = s3.Bucket('your-bucket-name')
for obj in bucket.objects.filter(Prefix='output/group1'):
print(obj.key)
lambda_handler('event','context')
Make sure 'output/group1' actually has more than 1 file in it to return.

Get a specific file from s3 bucket (boto3)

So I have a file.csv on my bucket 'test', I'm creating a new session and I wanna download the contents of this file:
session = boto3.Session(
aws_access_key_id=KEY,
aws_secret_access_key=SECRET_KEY
)
s3 = session.resource('s3')
obj = s3.Bucket('test').objects.filter(Prefix='file.csv')
This returns me a collection but is there a way to fetch the file directly? Without any loops, I wanna do something like:
s3.Bucket('test').objects.get(key='file.csv')
I could achieve the same result without passing credentials like this:
s3 = boto3.client('s3')
obj = s3.get_object(Bucket='test', Key='file.csv')
If you take a look at the client method:
import boto3
s3_client = boto3.client('s3')
s3_client.download_file('mybucket', 'hello.txt', '/tmp/hello.txt')
and the resource method:
import boto3
s3 = boto3.resource('s3')
s3.meta.client.download_file('mybucket', 'hello.txt', '/tmp/hello.txt')
you'll notice that you can convert from the resource to the client with meta.client.
So, combine it with your code to get:
session = boto3.Session(aws_access_key_id=KEY, aws_secret_access_key=SECRET_KEY)
s3 = session.resource('s3')
obj = s3.meta.client.download_file('mybucket', 'hello.txt', '/tmp/hello.txt')
I like mpu.aws.s3_download, but I'm biased ;-)
It does it like that:
import os
import boto3
def s3_download(bucket_name, key, profile_name, exists_strategy='raise'):
session = boto3.Session(profile_name=profile_name)
s3 = session.resource('s3')
if os.path.isfile(destination):
if exists_strategy == 'raise':
raise RuntimeError('File \'{}\' already exists.'
.format(destination))
elif exists_strategy == 'abort':
return
s3.Bucket(bucket_name).download_file(key, destination)
For authentication, I recommend using environment variables. See boto3: Configuring Credentials for details.
you can use the following boto3 method.
download_file(Bucket, Key, Filename, ExtraArgs=None, Callback=None,
Config=None)
s3 = boto3.resource('s3')
s3.meta.client.download_file('mybucket', 'hello.txt', '/tmp/hello.txt')
find more details here - download_file()

Download files from public S3 bucket with boto3

I cannot download a file or even get a listing of the public S3 bucket with boto3.
The code below works with my own bucket, but not with public one:
def s3_list(bucket, s3path_or_prefix):
bsession = boto3.Session(aws_access_key_id=settings.AWS['ACCESS_KEY'],
aws_secret_access_key=settings.AWS['SECRET_ACCESS_KEY'],
region_name=settings.AWS['REGION_NAME'])
s3 = bsession.resource('s3')
my_bucket = s3.Bucket(bucket)
items = my_bucket.objects.filter(Prefix=s3path_or_prefix)
return [ii.key for ii in items]
I get an AccessDenied error on this code. The bucket is not in my own and I cannot set permissions there, but I am sure it is open to public read.
I had the similar issue in the past. I have found a key to this bug in https://github.com/boto/boto3/issues/134 .
You can use undocumented trick:
import botocore
def s3_list(bucket, s3path_or_prefix, public=False):
bsession = boto3.Session(aws_access_key_id=settings.AWS['ACCESS_KEY'],
aws_secret_access_key=settings.AWS['SECRET_ACCESS_KEY'],
region_name=settings.AWS['REGION_NAME'])
client = bsession.client('s3')
if public:
client.meta.events.register('choose-signer.s3.*', botocore.handlers.disable_signing)
result = client.list_objects(Bucket=bucket, Delimiter='/', Prefix=s3path_or_prefix)
return [obj['Prefix'] for obj in result.get('CommonPrefixes')]

copy file from gcs to s3 in boto3

I am looking to copy files from gcs to my s3 bucket. In boto2, easy as a button.
conn = connect_gs(user_id, password)
gs_bucket = conn.get_bucket(gs_bucket_name)
for obj in bucket:
s3_key = key.Key(s3_bucket)
s3_key.key = obj
s3_key.set_contents_from_filename(obj)
However in boto3, I am lost trying to find equivalent code. Any takers?
If all you're doing is a copy:
import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('bucket-name')
for obj in gcs:
s3_obj = bucket.Object(gcs.key)
s3_obj.put(Body=gcs.data)
Docs: s3.Bucket, s3.Bucket.Object, s3.Bucket.Object.put
Alternatively, if you don't want to use the resource model:
import boto3
s3_client = boto3.client('s3')
for obj in gcs:
s3_client.put_object(Bucket='bucket-name', Key=gcs.key, Body=gcs.body)
Docs: s3_client.put_object
Caveat: The gcs bits are pseudocode, I am not familiar with their API.
EDIT:
So it seems gcs supports an old version of the S3 API and with that an old version of the signer. We still have support for that old signer, but you have to opt into it. Note that some regions don't support old signing versions (you can see a list of which S3 regions support which versions here), so if you're trying to copy over to one of those you will need to use a different client.
import boto3
from botocore.client import Config
# Create a client with the s3v2 signer
resource = boto3.resource('s3', config=Config(signature_version='s3'))
gcs_bucket = resource.Bucket('phjordon-test-bucket')
s3_bucket = resource.Bucket('phjordon-test-bucket-tokyo')
for obj in gcs_bucket.objects.all():
s3_bucket.Object(obj.key).copy_from(
CopySource=obj.bucket_name + "/" + obj.key)
Docs: s3.Object.copy_from
This, of course, will only work assuming gcs is still S3 compliant.

Categories

Resources