Related
In boto 2, you can write to an S3 object using these methods:
Key.set_contents_from_string()
Key.set_contents_from_file()
Key.set_contents_from_filename()
Key.set_contents_from_stream()
Is there a boto 3 equivalent? What is the boto3 method for saving data to an object stored on S3?
In boto 3, the 'Key.set_contents_from_' methods were replaced by
Object.put()
Client.put_object()
For example:
import boto3
some_binary_data = b'Here we have some data'
more_binary_data = b'Here we have some more data'
# Method 1: Object.put()
s3 = boto3.resource('s3')
object = s3.Object('my_bucket_name', 'my/key/including/filename.txt')
object.put(Body=some_binary_data)
# Method 2: Client.put_object()
client = boto3.client('s3')
client.put_object(Body=more_binary_data, Bucket='my_bucket_name', Key='my/key/including/anotherfilename.txt')
Alternatively, the binary data can come from reading a file, as described in the official docs comparing boto 2 and boto 3:
Storing Data
Storing data from a file, stream, or string is easy:
# Boto 2.x
from boto.s3.key import Key
key = Key('hello.txt')
key.set_contents_from_file('/tmp/hello.txt')
# Boto 3
s3.Object('mybucket', 'hello.txt').put(Body=open('/tmp/hello.txt', 'rb'))
boto3 also has a method for uploading a file directly:
s3 = boto3.resource('s3')
s3.Bucket('bucketname').upload_file('/local/file/here.txt','folder/sub/path/to/s3key')
http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Bucket.upload_file
You no longer have to convert the contents to binary before writing to the file in S3. The following example creates a new text file (called newfile.txt) in an S3 bucket with string contents:
import boto3
s3 = boto3.resource(
's3',
region_name='us-east-1',
aws_access_key_id=KEY_ID,
aws_secret_access_key=ACCESS_KEY
)
content="String content to write to a new S3 file"
s3.Object('my-bucket-name', 'newfile.txt').put(Body=content)
Here's a nice trick to read JSON from s3:
import json, boto3
s3 = boto3.resource("s3").Bucket("bucket")
json.load_s3 = lambda f: json.load(s3.Object(key=f).get()["Body"])
json.dump_s3 = lambda obj, f: s3.Object(key=f).put(Body=json.dumps(obj))
Now you can use json.load_s3 and json.dump_s3 with the same API as load and dump
data = {"test":0}
json.dump_s3(data, "key") # saves json to s3://bucket/key
data = json.load_s3("key") # read json from s3://bucket/key
A cleaner and concise version which I use to upload files on the fly to a given S3 bucket and sub-folder-
import boto3
BUCKET_NAME = 'sample_bucket_name'
PREFIX = 'sub-folder/'
s3 = boto3.resource('s3')
# Creating an empty file called "_DONE" and putting it in the S3 bucket
s3.Object(BUCKET_NAME, PREFIX + '_DONE').put(Body="")
Note: You should ALWAYS put your AWS credentials (aws_access_key_id and aws_secret_access_key) in a separate file, for example- ~/.aws/credentials
After some research, I found this. It can be achieved using a simple csv writer. It is to write a dictionary to CSV directly to S3 bucket.
eg: data_dict = [{"Key1": "value1", "Key2": "value2"}, {"Key1": "value4", "Key2": "value3"}]
assuming that the keys in all the dictionary are uniform.
import csv
import boto3
# Sample input dictionary
data_dict = [{"Key1": "value1", "Key2": "value2"}, {"Key1": "value4", "Key2": "value3"}]
data_dict_keys = data_dict[0].keys()
# creating a file buffer
file_buff = StringIO()
# writing csv data to file buffer
writer = csv.DictWriter(file_buff, fieldnames=data_dict_keys)
writer.writeheader()
for data in data_dict:
writer.writerow(data)
# creating s3 client connection
client = boto3.client('s3')
# placing file to S3, file_buff.getvalue() is the CSV body for the file
client.put_object(Body=file_buff.getvalue(), Bucket='my_bucket_name', Key='my/key/including/anotherfilename.txt')
it is worth mentioning smart-open that uses boto3 as a back-end.
smart-open is a drop-in replacement for python's open that can open files from s3, as well as ftp, http and many other protocols.
for example
from smart_open import open
import json
with open("s3://your_bucket/your_key.json", 'r') as f:
data = json.load(f)
The aws credentials are loaded via boto3 credentials, usually a file in the ~/.aws/ dir or an environment variable.
You may use the below code to write, for example an image to S3 in 2019. To be able to connect to S3 you will have to install AWS CLI using command pip install awscli, then enter few credentials using command aws configure:
import urllib3
import uuid
from pathlib import Path
from io import BytesIO
from errors import custom_exceptions as cex
BUCKET_NAME = "xxx.yyy.zzz"
POSTERS_BASE_PATH = "assets/wallcontent"
CLOUDFRONT_BASE_URL = "https://xxx.cloudfront.net/"
class S3(object):
def __init__(self):
self.client = boto3.client('s3')
self.bucket_name = BUCKET_NAME
self.posters_base_path = POSTERS_BASE_PATH
def __download_image(self, url):
manager = urllib3.PoolManager()
try:
res = manager.request('GET', url)
except Exception:
print("Could not download the image from URL: ", url)
raise cex.ImageDownloadFailed
return BytesIO(res.data) # any file-like object that implements read()
def upload_image(self, url):
try:
image_file = self.__download_image(url)
except cex.ImageDownloadFailed:
raise cex.ImageUploadFailed
extension = Path(url).suffix
id = uuid.uuid1().hex + extension
final_path = self.posters_base_path + "/" + id
try:
self.client.upload_fileobj(image_file,
self.bucket_name,
final_path
)
except Exception:
print("Image Upload Error for URL: ", url)
raise cex.ImageUploadFailed
return CLOUDFRONT_BASE_URL + id
I am currently trying to load a pickled file from S3 into AWS lambda and store it to a list (the pickle is a list).
Here is my code:
import pickle
import boto3
s3 = boto3.resource('s3')
with open('oldscreenurls.pkl', 'rb') as data:
old_list = s3.Bucket("pythonpickles").download_fileobj("oldscreenurls.pkl", data)
I get the following error even though the file exists:
FileNotFoundError: [Errno 2] No such file or directory: 'oldscreenurls.pkl'
Any ideas?
Super simple solution
import pickle
import boto3
s3 = boto3.resource('s3')
my_pickle = pickle.loads(s3.Bucket("bucket_name").Object("key_to_pickle.pickle").get()['Body'].read())
As shown in the documentation for download_fileobj, you need to open the file in binary write mode and save to the file first. Once the file is downloaded, you can open it for reading and unpickle.
import pickle
import boto3
s3 = boto3.resource('s3')
with open('oldscreenurls.pkl', 'wb') as data:
s3.Bucket("pythonpickles").download_fileobj("oldscreenurls.pkl", data)
with open('oldscreenurls.pkl', 'rb') as data:
old_list = pickle.load(data)
download_fileobj takes the name of an object in S3 plus a handle to a local file, and saves the contents of that object to the file. There is also a version of this function called download_file that takes a filename instead of an open file handle and handles opening it for you.
In this case it would probably be better to use S3Client.get_object though, to avoid having to write and then immediately read a file. You could also write to an in-memory BytesIO object, which acts like a file but doesn't actually touch a disk. That would look something like this:
import pickle
import boto3
from io import BytesIO
s3 = boto3.resource('s3')
with BytesIO() as data:
s3.Bucket("pythonpickles").download_fileobj("oldscreenurls.pkl", data)
data.seek(0) # move back to the beginning after writing
old_list = pickle.load(data)
This is the easiest solution. You can load the data without even downloading the file locally using S3FileSystem
from s3fs.core import S3FileSystem
s3_file = S3FileSystem()
data = pickle.load(s3_file.open('{}/{}'.format(bucket_name, file_path)))
According to my implementation, S3 file path read with pickle.
import pickle
import boto3
name = img_url.split('/')[::-1][0]
folder = 'media'
file_name = f'{folder}/{name}'
bucket_name = bucket_name
s3 = boto3.client('s3', aws_access_key_id=aws_access_key_id,aws_secret_access_key=aws_secret_access_key)
response = s3.get_object(Bucket=bucket_name, Key=file_name)
body = response['Body'].read()
data = pickle.loads(body)
I'm trying to do a "hello world" with new boto3 client for AWS.
The use-case I have is fairly simple: get object from S3 and save it to the file.
In boto 2.X I would do it like this:
import boto
key = boto.connect_s3().get_bucket('foo').get_key('foo')
key.get_contents_to_filename('/tmp/foo')
In boto 3 . I can't find a clean way to do the same thing, so I'm manually iterating over the "Streaming" object:
import boto3
key = boto3.resource('s3').Object('fooo', 'docker/my-image.tar.gz').get()
with open('/tmp/my-image.tar.gz', 'w') as f:
chunk = key['Body'].read(1024*8)
while chunk:
f.write(chunk)
chunk = key['Body'].read(1024*8)
or
import boto3
key = boto3.resource('s3').Object('fooo', 'docker/my-image.tar.gz').get()
with open('/tmp/my-image.tar.gz', 'w') as f:
for chunk in iter(lambda: key['Body'].read(4096), b''):
f.write(chunk)
And it works fine. I was wondering is there any "native" boto3 function that will do the same task?
There is a customization that went into Boto3 recently which helps with this (among other things). It is currently exposed on the low-level S3 client, and can be used like this:
s3_client = boto3.client('s3')
open('hello.txt').write('Hello, world!')
# Upload the file to S3
s3_client.upload_file('hello.txt', 'MyBucket', 'hello-remote.txt')
# Download the file from S3
s3_client.download_file('MyBucket', 'hello-remote.txt', 'hello2.txt')
print(open('hello2.txt').read())
These functions will automatically handle reading/writing files as well as doing multipart uploads in parallel for large files.
Note that s3_client.download_file won't create a directory. It can be created as pathlib.Path('/path/to/file.txt').parent.mkdir(parents=True, exist_ok=True).
boto3 now has a nicer interface than the client:
resource = boto3.resource('s3')
my_bucket = resource.Bucket('MyBucket')
my_bucket.download_file(key, local_filename)
This by itself isn't tremendously better than the client in the accepted answer (although the docs say that it does a better job retrying uploads and downloads on failure) but considering that resources are generally more ergonomic (for example, the s3 bucket and object resources are nicer than the client methods) this does allow you to stay at the resource layer without having to drop down.
Resources generally can be created in the same way as clients, and they take all or most of the same arguments and just forward them to their internal clients.
For those of you who would like to simulate the set_contents_from_string like boto2 methods, you can try
import boto3
from cStringIO import StringIO
s3c = boto3.client('s3')
contents = 'My string to save to S3 object'
target_bucket = 'hello-world.by.vor'
target_file = 'data/hello.txt'
fake_handle = StringIO(contents)
# notice if you do fake_handle.read() it reads like a file handle
s3c.put_object(Bucket=target_bucket, Key=target_file, Body=fake_handle.read())
For Python3:
In python3 both StringIO and cStringIO are gone. Use the StringIO import like:
from io import StringIO
To support both version:
try:
from StringIO import StringIO
except ImportError:
from io import StringIO
# Preface: File is json with contents: {'name': 'Android', 'status': 'ERROR'}
import boto3
import io
s3 = boto3.resource('s3')
obj = s3.Object('my-bucket', 'key-to-file.json')
data = io.BytesIO()
obj.download_fileobj(data)
# object is now a bytes string, Converting it to a dict:
new_dict = json.loads(data.getvalue().decode("utf-8"))
print(new_dict['status'])
# Should print "Error"
Note: I'm assuming you have configured authentication separately. Below code is to download the single object from the S3 bucket.
import boto3
#initiate s3 client
s3 = boto3.resource('s3')
#Download object to the file
s3.Bucket('mybucket').download_file('hello.txt', '/tmp/hello.txt')
If you wish to download a version of a file, you need to use get_object.
import boto3
bucket = 'bucketName'
prefix = 'path/to/file/'
filename = 'fileName.ext'
s3c = boto3.client('s3')
s3r = boto3.resource('s3')
if __name__ == '__main__':
for version in s3r.Bucket(bucket).object_versions.filter(Prefix=prefix + filename):
file = version.get()
version_id = file.get('VersionId')
obj = s3c.get_object(
Bucket=bucket,
Key=prefix + filename,
VersionId=version_id,
)
with open(f"{filename}.{version_id}", 'wb') as f:
for chunk in obj['Body'].iter_chunks(chunk_size=4096):
f.write(chunk)
Ref: https://botocore.amazonaws.com/v1/documentation/api/latest/reference/response.html
When you want to read a file with a different configuration than the default one, feel free to use either mpu.aws.s3_download(s3path, destination) directly or the copy-pasted code:
def s3_download(source, destination,
exists_strategy='raise',
profile_name=None):
"""
Copy a file from an S3 source to a local destination.
Parameters
----------
source : str
Path starting with s3://, e.g. 's3://bucket-name/key/foo.bar'
destination : str
exists_strategy : {'raise', 'replace', 'abort'}
What is done when the destination already exists?
profile_name : str, optional
AWS profile
Raises
------
botocore.exceptions.NoCredentialsError
Botocore is not able to find your credentials. Either specify
profile_name or add the environment variables AWS_ACCESS_KEY_ID,
AWS_SECRET_ACCESS_KEY and AWS_SESSION_TOKEN.
See https://boto3.readthedocs.io/en/latest/guide/configuration.html
"""
exists_strategies = ['raise', 'replace', 'abort']
if exists_strategy not in exists_strategies:
raise ValueError('exists_strategy \'{}\' is not in {}'
.format(exists_strategy, exists_strategies))
session = boto3.Session(profile_name=profile_name)
s3 = session.resource('s3')
bucket_name, key = _s3_path_split(source)
if os.path.isfile(destination):
if exists_strategy is 'raise':
raise RuntimeError('File \'{}\' already exists.'
.format(destination))
elif exists_strategy is 'abort':
return
s3.Bucket(bucket_name).download_file(key, destination)
from collections import namedtuple
S3Path = namedtuple("S3Path", ["bucket_name", "key"])
def _s3_path_split(s3_path):
"""
Split an S3 path into bucket and key.
Parameters
----------
s3_path : str
Returns
-------
splitted : (str, str)
(bucket, key)
Examples
--------
>>> _s3_path_split('s3://my-bucket/foo/bar.jpg')
S3Path(bucket_name='my-bucket', key='foo/bar.jpg')
"""
if not s3_path.startswith("s3://"):
raise ValueError(
"s3_path is expected to start with 's3://', " "but was {}"
.format(s3_path)
)
bucket_key = s3_path[len("s3://"):]
bucket_name, key = bucket_key.split("/", 1)
return S3Path(bucket_name, key)
Is there any feasible way to upload a file which is generated dynamically to amazon s3 directly without first create a local file and then upload to the s3 server? I use Python.
Here is an example downloading an image (using requests library) and uploading it to s3, without writing to a local file:
import boto
from boto.s3.key import Key
import requests
#setup the bucket
c = boto.connect_s3(your_s3_key, your_s3_key_secret)
b = c.get_bucket(bucket, validate=False)
#download the file
url = "http://en.wikipedia.org/static/images/project-logos/enwiki.png"
r = requests.get(url)
if r.status_code == 200:
#upload the file
k = Key(b)
k.key = "image1.png"
k.content_type = r.headers['content-type']
k.set_contents_from_string(r.content)
You could use BytesIO from the Python standard library.
from io import BytesIO
bytesIO = BytesIO()
bytesIO.write('whee')
bytesIO.seek(0)
s3_file.set_contents_from_file(bytesIO)
The boto library's Key object has several methods you might be interested in:
send_file
set_contents_from_file
set_contents_from_string
set_contents_from_stream
For an example of using set_contents_from_string, see Storing Data section of the boto documentation, pasted here for completeness:
>>> from boto.s3.key import Key
>>> k = Key(bucket)
>>> k.key = 'foobar'
>>> k.set_contents_from_string('This is a test of S3')
I assume you're using boto. boto's Bucket.set_contents_from_file() will accept a StringIO object, and any code you have written to write data to a file should be easily adaptable to write to a StringIO object. Or if you generate a string, you can use set_contents_from_string().
def upload_to_s3(url, **kwargs):
'''
:param url: url of image which have to upload or resize to upload
:return: url of image stored on aws s3 bucket
'''
r = requests.get(url)
if r.status_code == 200:
# credentials stored in settings AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
conn = boto.connect_s3(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, host=AWS_HOST)
# Connect to bucket and create key
b = conn.get_bucket(AWS_Bucket_Name)
k = b.new_key("{folder_name}/{filename}".format(**kwargs))
k.set_contents_from_string(r.content, replace=True,
headers={'Content-Type': 'application/%s' % (FILE_FORMAT)},
policy='authenticated-read',
reduced_redundancy=True)
# TODO Change AWS_EXPIRY
return k.generate_url(expires_in=AWS_EXPIRY, force_http=True)
I had a dict object which I wanted to store as a json file on S3, without creating a local file. The below code worked for me:
from smart_open import smart_open
with smart_open('s3://access-key:secret-key#bucket-name/file.json', 'wb') as fout:
fout.write(json.dumps(dict_object).encode('utf8'))
In boto3, there is a simple way to upload a file content, without creating a local file using following code. I have modified JimJty example code for boto3
import boto3
from botocore.retries import bucket
import requests
from io import BytesIO
# set the values
aws_access_key_id=""
aws_secret_access_key=""
region_name=""
bucket=""
key=""
session = boto3.session.Session(aws_access_key_id=aws_access_key_id,aws_secret_access_key=aws_secret_access_key, region_name=region_name)
s3_client = session.client('s3')
#download the file
url = "http://en.wikipedia.org/static/images/project-logos/enwiki.png"
r = requests.get(url)
if r.status_code == 200:
#convert content to bytes, since upload_fileobj requires file like obj
bytesIO = BytesIO(bytes(r.content))
with bytesIO as data:
s3_client.upload_fileobj(data, bucket, key)
You can try using smart_open (https://pypi.org/project/smart_open/). I used it exactly for that: writing files directly in S3.
Given that encryption at rest is a much desired data standard now, smart_open does not support this afaik
This implementation is an example of uploading a list of images (NumPy list, OpenCV image objects) directly to S3
Note: you need to convert image objects to bytes or buffer to bytes while uploading the file that's how you can upload files without corruption error
#Consider you have images in the form of a list i.e. img_array
import boto3
s3 = boto3.client('s3')
res_url = []
for i,img in enumerate(img_array):
s3_key = "fileName_on_s3.png"
response = s3.put_object(Body=img.tobytes(), Bucket='bucket_name',Key=s3_key,ACL='public-read',ContentType= 'image/png')
s3_url = 'https://bucket_name.s3.ap-south-1.amazonaws.com/'+s3_key
res_url.append(s3_url)
#res_url is the list of URLs returned from S3 Upload
Update for boto3:
aws_session = boto3.Session('my_access_key_id', 'my_secret_access_key')
s3 = aws_session.resource('s3')
s3.Bucket('my_bucket').put_object(Key='file_name.txt', Body=my_file)
I am having a similar issue, was wondering if there was a final answer, because with my code below , the "starwars.json" keeps on saving locally but I just want to push through each looped .json file into S3 and have no file stored locally.
for key, value in star_wars_actors.items():
response = requests.get('http:starwarsapi/' + value)
data = response.json()
with open("starwars.json", "w+") as d:
json.dump(data, d, ensure_ascii=False, indent=4)
s3.upload_file('starwars.json', 'test-bucket',
'%s/%s' % ('test', str(key) + '.json'))
I want to write a Python script that will read and write files from s3 using their url's, eg:'s3:/mybucket/file'. It would need to run locally and in the cloud without any code changes. Is there a way to do this?
Edit: There are some good suggestions here but what I really want is something that allows me to do this:
myfile = open("s3://mybucket/file", "r")
and then use that file object like any other file object. That would be really cool. I might just write something like this for myself if it doesn't exist. I could build that abstraction layer on simples3 or boto.
For opening, it should be as simple as:
import urllib
opener = urllib.URLopener()
myurl = "https://s3.amazonaws.com/skyl/fake.xyz"
myfile = opener.open(myurl)
This will work with s3 if the file is public.
To write a file using boto, it goes a little something like this:
from boto.s3.connection import S3Connection
conn = S3Connection(AWS_KEY, AWS_SECRET)
bucket = conn.get_bucket(BUCKET)
destination = bucket.new_key()
destination.name = filename
destination.set_contents_from_file(myfile)
destination.make_public()
lemme know if this works for you :)
Here's how they do it in awscli :
def find_bucket_key(s3_path):
"""
This is a helper function that given an s3 path such that the path is of
the form: bucket/key
It will return the bucket and the key represented by the s3 path
"""
s3_components = s3_path.split('/')
bucket = s3_components[0]
s3_key = ""
if len(s3_components) > 1:
s3_key = '/'.join(s3_components[1:])
return bucket, s3_key
def split_s3_bucket_key(s3_path):
"""Split s3 path into bucket and key prefix.
This will also handle the s3:// prefix.
:return: Tuple of ('bucketname', 'keyname')
"""
if s3_path.startswith('s3://'):
s3_path = s3_path[5:]
return find_bucket_key(s3_path)
Which you could just use with code like this
from awscli.customizations.s3.utils import split_s3_bucket_key
import boto3
client = boto3.client('s3')
bucket_name, key_name = split_s3_bucket_key(
's3://example-bucket-name/path/to/example.txt')
response = client.get_object(Bucket=bucket_name, Key=key_name)
This doesn't address the goal of interacting with an s3 key as a file like object but it's a step in that direction.
I haven't seen something that would work directly with S3 urls, but you could use an S3 access library (simples3 looks decent) and some simple string manipulation:
>>> url = "s3:/bucket/path/"
>>> _, path = url.split(":", 1)
>>> path = path.lstrip("/")
>>> bucket, path = path.split("/", 1)
>>> print bucket
'bucket'
>>> print path
'path/'
Try s3fs
First example on the docs:
>>> import s3fs
>>> fs = s3fs.S3FileSystem(anon=True)
>>> fs.ls('my-bucket')
['my-file.txt']
>>> with fs.open('my-bucket/my-file.txt', 'rb') as f:
... print(f.read())
b'Hello, world'
You can use Boto Python API for accessing S3 by python. Its a good library. After you do the installation of Boto, following sample programe will work for you
>>> k = Key(b)
>>> k.key = 'yourfile'
>>> k.set_contents_from_filename('yourfile.txt')
You can find more information here http://boto.cloudhackers.com/s3_tut.html#storing-data
http://s3tools.org/s3cmd works pretty well and support the s3:// form of the URL structure you want. It does the business on Linux and Windows. If you need a native API to call from within a python program then http://code.google.com/p/boto/ is a better choice.