what ive been doing:
ive been reading a lot on the documentation on boto3 but im still struggling to get it working the way i am wanting as this is my first time using AWS.
im trying to use boto3 to access microsoft excel files that are uploaded to a S3 bucket... im able to use boto3.session() to give my "hard coded" credentials and from there print the names of the files that are in my bucket
however, im trying to figure out how to access the contents of that excel file in that bucket...
My end goal/what im trying to do:
The end goal of this project is to have people upload excel files into the S3 bucket with zip codes in them(organized in the cells)... and then have that file sent to an EC2 instance to have a program i wrote read from the file one zip code at a time and process certain things...
any help is greatly appreciated as this is all new and overwhelming
This is the code i am trying:
import boto3
session = boto3.Session(
aws_access_key_id='put key here',
aws_secret_access_key='put key here',
)
s3 = session.resource('s3')
bucket = s3.Bucket('bucket name')
for f in bucket.objects.all():
print(f.key)
f.download_file('testrun')
Since you are using the boto3 resource API instead of the client API, you should be calling the Object.download_file method inside the for loop, like so:
for f in bucket.objects.all():
print(f.key)
f.download_file('the name i want it to be downloaded as')
Try the following:
for f in bucket.objects.all():
obj = bucket.Object(f.key)
with open(f.key, 'wb') as data:
obj.download_fileobj(data)
Related
I am copying files from a website to a S3 bucket. Everything else is copying fine, even odd extensions that I haven't heard of before. The extension that I am having problems with is ".2D".
Currently using this code, and it is working for all but the .2D files. Might be a VERSACAD file.
Anyone work with this file or know how to figure out how to work with this? No, I can't include an example.
It is failing on the r.data.decode("utf'8") line. Using "utf-16" doesn't work either.
data=r.data.decode("utf-8")
key_path="downloaded_docs/{0}/{1}/{2}/{3}".format(year,str(month).zfill(2),str(day).zfill(2),docname)
To save to s3 bucket:
s3.Object('s3_bucket_name',key_path).put(Body=data)
WHAT WORKED
I ended up having to do the following, but it worked. Thanks!!
if extracted_link.endswith('.2D'):
r=http.request("GET", url_to_file, preload_content=False)
# to get binary data
text_b=r.data
filepath=<filepath>
s3_client.upload_fileobj(io.BytesIO(text_b),Bucket=<bucketname>,Key=filepath)
Upload it in binary
from file path
import boto3
client = boto3.client("s3")
client.upload_file(2D_filepath, bucket, key)
from file object
import boto3, io
client = boto3.client("s3")
client.upload_fileobj(io.BytesIO(b"binary data"), bucket, key)
I am working with python and jupyter notebook, and would like to open files from an s3 bucket into my current jupyter directory.
I have tried:
s3 = boto3.resource('s3')
bucket = s3.Bucket('bucket')
for obj in bucket.objects.all():
key = obj.key
body = obj.get()['Body'].read()
but I believe this is just reading them, and I would like to save them into this directory. Thank you!
You can use AWS Command Line Interface (CLI), specifically the aws s3 cp command to copy files to your local directory.
late response but was struggling with this earlier today and thought I'd throw in my solution. I needed to work with a bunch of pdfs stored on S3 using Jupyter Notebooks on Sagemaker.
I used a workaround by downloading the files to my repo, which works a lot faster than uploading them and makes my code reproducible for anyone with access to S3.
Step 1
create a list of all the objects to be downloaded, then split each element by '/' so that the file name can be extracted for iteration in step 2
import awswrangler as wr
objects = wr.s3.list_objects({"s3 URI"})
objects_list = [obj.split('/') for obj in objects]
Step 2
Make local folder called data and then iterate through list objects to download them into jupyter notebooks to a folder called data
import boto3
import os
os.makedirs("./data")
s3_client = boto3.client('s3')
for obj in objects_list:
s3_client.download_file({'bucket'}, #can also use obj[2]
{"object_path"}+obj[-1],#object_path is everything that comes after the / after the bucket in your S3 URI
'../data/'+obj[-1])
Thats it! First time answering anything on this so I hope its useful to someone.
I am trying to check all the keys in an S3 bucket, if it has a certain userd-defined metadata header, then move onto the next, but it does not have the user-defined metadata header -- I want to add it with a certain key value pair. I have been trying to do some kind of put command but I am fairly new to this.
I have gotten this far:
Basically able to print all the keys within my specified bucket
import boto3
s3 = boto3.resource('s3')
# This will search ALL of the files within a specified bucket
my_bucket = s3.Bucket('client-data')
for file in my_bucket.objects.all():
print(file.key)
If I am able to do a conditional statement saying if the metadata header does not exist, then add it, that would be awesome. Any help is appreciated.
Thanks!
I extracted an Excel file with multiple sheets from S3 and I am turning each sheet into csv format and doing a simple cleansing before uploading it to another S3 bucket.
This is my code so far for my Lambda function but I have no idea how to upload the csv file for each sheet to S3.
Also I want to change the empty cells in the excel files with Nan but I don't know how.
Update: I tried the solution from the answer below. I am getting "errorMessage": "'Records'", "errorType": "KeyError". My lambda function is also not getting triggered by s3.
You can store files in Lambda's local file system within the /tmp/ directory. There is a limit of 500MB, so delete those files once you have finished with them.
Therefore, when you are creating a file, put it in that directory:
with open("/tmp/data%s.csv" %(sheet6.name.replace(" ","")), "w", encoding='utf-8') as file:
You can then upload it to Amazon S3 by using upload_file(file, bucket, key):
s3.upload_file('/tmp/data1.csv', 'mybucket', 'data1.csv')
Here's some code I have for extracting the Bucket and Key that triggered a Lambda function:
import urllib
def lambda_handler(event, context):
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'])
...
Say I have the following bucket set up on S3
MyBucket/MySubBucket
And, I am planning on serving static media for a website out of MySubBucket (like images users have uploaded, etc.). The way the S3 configurations appear to be currently set up, there is no way to make the top-level bucket "MyBucket" public, so to mimic this you need to make every individual item in that bucket public after it has been inserted.
You can, make a sub-bucket like "MySubBucket" public, but the problem I am currently having is trying to figure out how to call boto to insert a test image into that bucket? Can anyone provide an example?
Thanks
You actually just need to put the sub bucket name and a / before the file name. The following code snippet will put a file called 'test.jpg' into MySubBucket. The k.make_public() makes that file public, but not necessarily the bucket itself.
from boto.s3.connection import S3Connection
from boto.s3.key import Key
connection = S3Connection(AWSKEY,AWS_SECRET_KEY)
bucket = connection.get_bucket('MyBucket')
file = 'test.jpg'
k = Key(bucket)
k.key = 'MySubBucket/' + file
k.set_contents_from_filename(file)
k.make_public()