Writing json to file in s3 bucket

Writing json to file in s3 bucket - python

This code writes json to a file in s3,
what i wanted to achieve is instead of opening data.json file and writing to s3 (sample.json) file,
how do i pass the json directly and write to a file in s3 ?
import boto3
s3 = boto3.resource('s3', aws_access_key_id='aws_key', aws_secret_access_key='aws_sec_key')
s3.Object('mybucket', 'sample.json').put(Body=open('data.json', 'rb'))

I'm not sure, if I get the question right. You just want to write JSON data to a file using Boto3? The following code writes a python dictionary to a JSON file.
import json
import boto3
s3 = boto3.resource('s3')
s3object = s3.Object('your-bucket-name', 'your_file.json')
s3object.put(
Body=(bytes(json.dumps(json_data).encode('UTF-8')))
)

I don't know if anyone is still attempting to use this thread, but I was attempting to upload a JSON to s3 trying to use the method above, which didnt quite work for me. Boto and s3 might have changed since 2018, but this achieved the results for me:
import json
import boto3
s3 = boto3.client('s3')
json_object = 'your_json_object here'
s3.put_object(
Body=json.dumps(json_object),
Bucket='your_bucket_name',
Key='your_key_here'
)

Amazon S3 is an object store (File store in reality). The primary operations are PUT and GET. You can not add data into an existing object in S3. You can only replace the entire object itself.
For a list of available operations you can perform on s3 see this link
http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectOps.html

An alternative to Joseph McCombs answer can be achieved using s3fs.
from s3fs import S3FileSystem
json_object = {'test': 3.14}
path_to_s3_object = 's3://your-s3-bucket/your_json_filename.json'
s3 = S3FileSystem()
with s3.open(path_to_s3_object, 'w') as file:
json.dump(json_object, file)

Related

How can i read file pdf in AWS S3 with boto3 in python?

I would like to read .pdf files in S3 bucket, but the problem is that it returns formatted bytes,
Whereas if the file is in .csv or .txt this code works
What's wrong with .pdf files?
the code :
import boto3
s3client = boto3.client('s3')
fileobj = s3client.get_object(
Bucket=BUCKET_NAME,
Key='file.pdf'
)
filedata = fileobj['Body'].read()
contents = filedata
print(contents)
it returns :
b'%PDF-1.4\n%\xd3\xeb\xe9\xe1\n1 0 obj\n<</Title (Architecture technique)\n/Producer (Skia/PDF m99 Google Docs Renderer)>>\nendobj\n3 0 obj\n<</ca 1\n/BM /Normal>>\nendobj\n6 0 obj\n<</Type /XObject\n/Subtype /Image\n/Width 1424\n/Height 500\n/ColorSpace /DeviceRGB\n/SMask 7 0 R\n/BitsPerComponent 8\n/Filter /FlateDecode\n/Length 26885>> stream\nx\x9c\xed\xdd\xeb\x93$Y\x99\xe7\xf7'
another solution that i try but not work too:
import boto3
from PyPDF2 import PdfFileReader
from io import BytesIO
s3 = boto3.resource('s3')
obj = s3.Object(BUCKET_NAME,'file.pdf')
fs = obj.get()['Body'].read()
pdfFile = PdfFileReader(BytesIO(fs))
it's return :
<PyPDF2.pdf.PdfFileReader at 0x7efbc8aead00>

Start by writing some Python code to access a PDF file on your local disk (search for a Python PDF library on the web).
Once you have that working, then you can look at reading the file from Amazon S3.
When reading a file from S3, you have two options:
Use fileobj['Body'].read() (as you already are doing) to obtain the bytes from the file directly, or
Use download_file() to download the file from S3 to the local disk, then process the file from disk
Which method to choose will depend upon the PDF library that you choose to use.

Download multiple files from S3 bucket using boto3

I have a csv file containing numerous uuids
I'd like to write a python script using boto3 which:
Connects to an AWS S3 bucket
Uses each uuid contained in the CSV to copy the file contained
Files are all contained in a filepath like this: BUCKET/ORG/FOLDER1/UUID/DATA/FILE.PNG
However, the file contained in DATA/ can be different file types.
Put the copied file in a new S3 bucket
So far, I have successfully connected to the s3 bucket and checked its contents in python using boto3, but need help implementing the rest
import boto3
#Create Session
session = boto3.Session(
aws_access_key_id='ACCESS_KEY_ID',
aws_secret_access_key='SECRET_ACCESS_KEY',
)
#Initiate S3 Resource
s3 = session.resource('s3')
your_bucket = s3.Bucket('BUCKET-NAME')
for s3_file in your_bucket.objects.all():
print(s3_file.key) # prints the contents of bucket

To read the CSV file you can use csv library (see: https://docs.python.org/fr/3.6/library/csv.html)
Example:
import csv
with open('file.csv', 'r') as file:
reader = csv.reader(file)
for row in reader:
print(row)
To push files to the new bucket, you can use the copy method (see: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.copy)
Example:
import boto3
s3 = boto3.resource('s3')
source = {
'Bucket': 'BUCKET-NAME',
'Key': 'mykey'
}
bucket = s3.Bucket('SECOND_BUCKET-NAME')
bucket.copy(source, 'SECOND_BUCKET-NAME')

how do I write a list of data to S3 in ORC format

I need to write a file in ORC format directly to an S3 bucket. the file will be a result of a query to a db.
I know how to write a CSV file directly to S3 but couldn't find a way to write directly in ORC.. any recommendations?

save ORC content to file
using default values as per the linked documentation as there is no code sample to work with
df = spark.read.load("examples/src/main/resources/users.parquet")
df.select("name", "favorite_color").write.save("namesAndFavColors.parquet")
upload file
import boto3
# Create an S3 client
s3 = boto3.client('s3')
filename = 'file.txt'
bucket_name = 'my-bucket'
# Uploads the given file using a managed uploader, which will split up large
# files automatically and upload parts in parallel.
s3.upload_file(filename, bucket_name, filename)

How to load a pickle file from S3 to use in AWS Lambda?

I am currently trying to load a pickled file from S3 into AWS lambda and store it to a list (the pickle is a list).
Here is my code:
import pickle
import boto3
s3 = boto3.resource('s3')
with open('oldscreenurls.pkl', 'rb') as data:
old_list = s3.Bucket("pythonpickles").download_fileobj("oldscreenurls.pkl", data)
I get the following error even though the file exists:
FileNotFoundError: [Errno 2] No such file or directory: 'oldscreenurls.pkl'
Any ideas?

Super simple solution
import pickle
import boto3
s3 = boto3.resource('s3')
my_pickle = pickle.loads(s3.Bucket("bucket_name").Object("key_to_pickle.pickle").get()['Body'].read())

As shown in the documentation for download_fileobj, you need to open the file in binary write mode and save to the file first. Once the file is downloaded, you can open it for reading and unpickle.
import pickle
import boto3
s3 = boto3.resource('s3')
with open('oldscreenurls.pkl', 'wb') as data:
s3.Bucket("pythonpickles").download_fileobj("oldscreenurls.pkl", data)
with open('oldscreenurls.pkl', 'rb') as data:
old_list = pickle.load(data)
download_fileobj takes the name of an object in S3 plus a handle to a local file, and saves the contents of that object to the file. There is also a version of this function called download_file that takes a filename instead of an open file handle and handles opening it for you.
In this case it would probably be better to use S3Client.get_object though, to avoid having to write and then immediately read a file. You could also write to an in-memory BytesIO object, which acts like a file but doesn't actually touch a disk. That would look something like this:
import pickle
import boto3
from io import BytesIO
s3 = boto3.resource('s3')
with BytesIO() as data:
s3.Bucket("pythonpickles").download_fileobj("oldscreenurls.pkl", data)
data.seek(0) # move back to the beginning after writing
old_list = pickle.load(data)

This is the easiest solution. You can load the data without even downloading the file locally using S3FileSystem
from s3fs.core import S3FileSystem
s3_file = S3FileSystem()
data = pickle.load(s3_file.open('{}/{}'.format(bucket_name, file_path)))

According to my implementation, S3 file path read with pickle.
import pickle
import boto3
name = img_url.split('/')[::-1][0]
folder = 'media'
file_name = f'{folder}/{name}'
bucket_name = bucket_name
s3 = boto3.client('s3', aws_access_key_id=aws_access_key_id,aws_secret_access_key=aws_secret_access_key)
response = s3.get_object(Bucket=bucket_name, Key=file_name)
body = response['Body'].read()
data = pickle.loads(body)

Write a json to a parquet object to put into S3 with Lambda Python

I would like to write a json object to S3 in parquet using Amazon Lambda (python)!
However I cannot connect fastparquet lib with boto3 in order to do it since the first lib has a method to writo into a file and boto3 expect an object to put into the S3 bucket
Any suggestion ?
fastparquet example
fastparque.write('test.parquet', df, compression='GZIP', file_scheme='hive')
Boto3 example
client = authenticate_s3()
response = client.put_object(Body=Body, Bucket=Bucket, Key=Key)
the Body would correspond to the parquet content! and it would allow to write into S3

You can write any dataframe to S3 by using the open_with argument of the write method (see fastparquet's doc)
import s3fs
from fastparquet import write
s3 = s3fs.S3FileSystem()
myopen = s3.open
write(
'bucket-name/filename.parq.gzip',
frame,
compression='GZIP',
open_with=myopen
)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Writing json to file in s3 bucket - python

An alternative to Joseph McCombs answer can be achieved using s3fs. from s3fs import S3FileSystem json_object = {'test': 3.14} path_to_s3_object = 's3://your-s3-bucket/your_json_filename.json' s3 = S3FileSystem() with s3.open(path_to_s3_object, 'w') as file: json.dump(json_object, file)

Related

How can i read file pdf in AWS S3 with boto3 in python?

Download multiple files from S3 bucket using boto3

how do I write a list of data to S3 in ORC format

How to load a pickle file from S3 to use in AWS Lambda?

Write a json to a parquet object to put into S3 with Lambda Python

Categories

Resources