I am working with python and jupyter notebook, and would like to open files from an s3 bucket into my current jupyter directory.
I have tried:
s3 = boto3.resource('s3')
bucket = s3.Bucket('bucket')
for obj in bucket.objects.all():
key = obj.key
body = obj.get()['Body'].read()
but I believe this is just reading them, and I would like to save them into this directory. Thank you!
You can use AWS Command Line Interface (CLI), specifically the aws s3 cp command to copy files to your local directory.
late response but was struggling with this earlier today and thought I'd throw in my solution. I needed to work with a bunch of pdfs stored on S3 using Jupyter Notebooks on Sagemaker.
I used a workaround by downloading the files to my repo, which works a lot faster than uploading them and makes my code reproducible for anyone with access to S3.
Step 1
create a list of all the objects to be downloaded, then split each element by '/' so that the file name can be extracted for iteration in step 2
import awswrangler as wr
objects = wr.s3.list_objects({"s3 URI"})
objects_list = [obj.split('/') for obj in objects]
Step 2
Make local folder called data and then iterate through list objects to download them into jupyter notebooks to a folder called data
import boto3
import os
os.makedirs("./data")
s3_client = boto3.client('s3')
for obj in objects_list:
s3_client.download_file({'bucket'}, #can also use obj[2]
{"object_path"}+obj[-1],#object_path is everything that comes after the / after the bucket in your S3 URI
'../data/'+obj[-1])
Thats it! First time answering anything on this so I hope its useful to someone.
Related
I am copying files from a website to a S3 bucket. Everything else is copying fine, even odd extensions that I haven't heard of before. The extension that I am having problems with is ".2D".
Currently using this code, and it is working for all but the .2D files. Might be a VERSACAD file.
Anyone work with this file or know how to figure out how to work with this? No, I can't include an example.
It is failing on the r.data.decode("utf'8") line. Using "utf-16" doesn't work either.
data=r.data.decode("utf-8")
key_path="downloaded_docs/{0}/{1}/{2}/{3}".format(year,str(month).zfill(2),str(day).zfill(2),docname)
To save to s3 bucket:
s3.Object('s3_bucket_name',key_path).put(Body=data)
WHAT WORKED
I ended up having to do the following, but it worked. Thanks!!
if extracted_link.endswith('.2D'):
r=http.request("GET", url_to_file, preload_content=False)
# to get binary data
text_b=r.data
filepath=<filepath>
s3_client.upload_fileobj(io.BytesIO(text_b),Bucket=<bucketname>,Key=filepath)
Upload it in binary
from file path
import boto3
client = boto3.client("s3")
client.upload_file(2D_filepath, bucket, key)
from file object
import boto3, io
client = boto3.client("s3")
client.upload_fileobj(io.BytesIO(b"binary data"), bucket, key)
what ive been doing:
ive been reading a lot on the documentation on boto3 but im still struggling to get it working the way i am wanting as this is my first time using AWS.
im trying to use boto3 to access microsoft excel files that are uploaded to a S3 bucket... im able to use boto3.session() to give my "hard coded" credentials and from there print the names of the files that are in my bucket
however, im trying to figure out how to access the contents of that excel file in that bucket...
My end goal/what im trying to do:
The end goal of this project is to have people upload excel files into the S3 bucket with zip codes in them(organized in the cells)... and then have that file sent to an EC2 instance to have a program i wrote read from the file one zip code at a time and process certain things...
any help is greatly appreciated as this is all new and overwhelming
This is the code i am trying:
import boto3
session = boto3.Session(
aws_access_key_id='put key here',
aws_secret_access_key='put key here',
)
s3 = session.resource('s3')
bucket = s3.Bucket('bucket name')
for f in bucket.objects.all():
print(f.key)
f.download_file('testrun')
Since you are using the boto3 resource API instead of the client API, you should be calling the Object.download_file method inside the for loop, like so:
for f in bucket.objects.all():
print(f.key)
f.download_file('the name i want it to be downloaded as')
Try the following:
for f in bucket.objects.all():
obj = bucket.Object(f.key)
with open(f.key, 'wb') as data:
obj.download_fileobj(data)
I have written a code to delete older files and and keep the latest one. My code is working in local but wanted to apply the same code when accessing AWS s3 bucket folder to perform the similar operation.
The code working fine when providing local path.
import os
import glob
path = r'C:\Desktop\MyFolder'
allfiles =[os.path.basename(file) for file in glob.glob(path + '\*.*')]
diff_pattern=set()
deletefile=[]
for file in allfiles:
diff_pattern.add('_'.join(file.split('_',2)[:2]))
print('Pattern Found - ',diff_pattern)
for pattern in diff_pattern:
patternfiles=[os.path.basename(file) for file in glob.glob(path + '\\'+pattern+'_*.*')]
patternfiles.sort()
if len(patternfiles)>1:
deletefile=deletefile+patternfiles[:len(patternfiles)-1]
print('Files Need to Delete - ',deletefile)
for file in deletefile:
os.remove(path+'\\'+file)
print('File Deleted')
I expect the same code to work for AWS s3 buckets. Below is the files format and example with there status(keep/delete) that I'm working with.
file format: file_name_yyyyMMdd.txt
v_xyz_20190501.txt Delete
v_xyz_20190502.txt keep
v_xyz_20190430.txt Delete
v_abc_20190505.txt Keep
v_abc_20190504.txt Delete
I don't think you can access S3 files like local path.
You may need to use boto3 library in python to access s3 folders.
Here is a sample for you to see how it works..
https://dluo.me/s3databoto3
Have found many questions related to this with solutions using boto3, however I am in a position where I have to use boto, running Python 2.38.
Now I can successfully transfer my files in their folders (Not real folders I know as S3 doesn't have this concept) but I want them to be saved into a particular folder in my destination bucket
from boto.s3.connection import S3Connection
def transfer_files():
conn = S3Connection()
srcBucket = conn.get_bucket("source_bucket")
dstBucket = conn.get_bucket(bucket_name="destination_bucket")
objectlist = srcbucket.list()
for obj in objectlist:
dstBucket.copy_key(obj.key, srcBucket.name, obj.key)
My srcBucket will look like folder/subFolder/anotherSubFolder/file.txt which when transferred will land in the dstBucket like so destination_bucket/folder/subFolder/anotherSubFolder/file.txt
I would like it to end up in destination_bucket/targetFolder so the final directory structure would look like
destination_bucket/targetFolder/folder/subFolder/anotherSubFolder/file.txt
Hopefully I have explained this well enough and it makes sense
The first parameter is the name of the destination key.
Therefore, just use:
dstBucket.copy_key('targetFolder/' + obj.key, srcBucket.name, obj.key)
I am trying to traverse all objects inside a specific folder in my S3 bucket. The code I already have is like follows:
s3 = boto3.resource('s3')
bucket = s3.Bucket('bucket-name')
for obj in bucket.objects.filter(Prefix='folder/'):
do_stuff(obj)
I need to use boto3.resource and not client. This code is not getting any objects at all although I have a bunch of text files in the folder. Can someone advise?
Try adding the Delimiter attribute: Delimiter = '\' as you are filtering objects. The rest of the code looks fine.
I had to make sure to skip the first file. For some reason it thinks the folder name is the first file and that may not be what you want.
for video_item in source_bucket.objects.filter(Prefix="my-folder-name/", Delimiter='/'):
if video_item.key == 'my-folder-name/':
continue
do_something(video_item.key)