AWS Boto3 upload files issue - python

I am facing a weird issue.
I am trying to upload few parquet files from local PC to S3 bucket. Below is the script I used.
It ran well for the first file , but as soon as I change the folder and try loading a different file to the same s3 bucket. It doesn't load. The code doesn't fail. But the 2nd file is not visible in the s3 bucket. I have no clue why its behaving this way.
s3 = boto3.resource('s3', aws_access_key_id='*****', aws_secret_access_key='****')
bucket = s3.Bucket(BUCKET)
bucket.upload_file("****.parquet", "****.parquet")

Related

How can I access the created folder in s3 to write csv file into it?

I have created the folder code but how can i access the folder to write csv file into that folder?
# Creating folder on S3 for unmatched data
client = boto3.client('s3')
# Variables
target_bucket = obj['source_and_destination_details']['s3_bucket_name']
subfolder = obj['source_and_destination_details']['s3_bucket_uri-new_folder_path'] + obj['source_and_destination_details']['folder_name_for_unmatched_column_data']
# Create subfolder (objects)
client.put_object(Bucket = target_bucket, Key = subfolder)
Folder is getting created succesfully by above code but how to write csv file into it?
Below is the code which i have tried to write but its not working
# Writing csv on AWS S3
df.reindex(idx).to_csv(obj['source_and_destination_details']['s3_bucket_uri-write'] + obj['source_and_destination_details']['folder_name_for_unmatched_column_data'] + obj['source_and_destination_details']['file_name_for_unmatched_column_data'], index=False)
An S3 bucket is not a file system.
I assume that the to_csv() method is supposed to do write to some sort of file system, but this is not the way it works with S3. While there are solutions to mount S3 buckets as file systems, this is not the preferred way.
Usually, you would interact with S3 via the AWS REST APIs, the AWS CLI or a client library such as Boto, which you’re already using.
So in order to store your content on S3, you first create the file locally, e.g. in the system’s /tmp folder. Then use Boto’s put_object() method to upload the file. Remove from your local storage afterwards.

Automatically Upload New Files in SharePoint to S3 with Python

I'm very new to AWS, and relatively new to python. Please go easy on me.
I want to upload files from a Sharepoint location to an S3 bucket. From there, I'll be able to perform analysis on those files.
The below code uploads a file in a local directory to an example S3 bucket. I'd like to modify this to only upload new files from a Sharepoint location (and not upload new files).
import boto3
BUCKET_NAME = "test_bucket"
s3 = boto3.client("s3")
with open("./burger.jpg", "rb") as f:
s3.upload_fileobj(f, BUCKET_NAME, "burger_new_upload.jpg", ExtraArgs={"ACL": "public-read"})
Would I find use of AWS Lambda via Python code? Thank you for sharing your knowledge.

Download blob from gcp storage bucket with python flask

I have a GCP bucket with some pdf files.
I am able to download the files from bucket with python(flask) on local host using below code -
storage_client = storage.Client.from_service_account_json("ebillaae7f0224519.json")
bucket = storage_client.get_bucket("bill_storage1")
blob = bucket.blob(file_name)
blob.download_to_filename("file1.pdf")
But when deployed on GCP(app engine) i get this error "line 1282, in download_to_filename with open(filename, "wb") as file_obj: OSError: [Errno 30] Read-only file system: 'file1.pdf'"
Then i modified code to -
storage_client = storage.Client.from_service_account_json("ebillaae7f0224519.json")
bucket = storage_client.get_bucket("bill_storage1")
blob = bucket.blob(file_name)
blob.download_to_filename("/tmp/file1.pdf")
This works after deployment but i cant download the files from bucket to local folder through my app, how to do that?
You cannot write or make modifications within Google App Engine's file system post deployment. You can only read files within the app. That is why certain storage solutions such as Cloud Storage or any database is a recommended way to store files or data.
Google App Engine is serverless. It can scale up quickly because it is simply spinning up instances based from the container image built from your source code and pre-built runtime. Once the container image is built during deployment, you can no longer write/edit its contents. You will have to redeploy to make changes.
One exception is the directory called /tmp where you can store temporary files. It's because this directory is mounted in the instance's RAM. Storing temporary data consumes RAM. Therefore, when the instance is deleted, files in the /tmp directory are deleted as well.
You may check this documentation for additional information.
You can use tempfile.gettempdir() function to get the tmp directory according to your runtime environment
For example
import tempfile
storage_client = storage.Client.from_service_account_json("ebillaae7f0224519.json")
bucket = storage_client.get_bucket("bill_storage1")
blob = bucket.blob(file_name)
blob.download_to_filename("{}/file1.pdf".format(tempfile.gettempdir()))
Locally, the file will be download in your OS default temporary directory.

Run a Python script on S3 Files

I want to run a python script on my entire S3 Bucket.
The script takes the files and inserts them into a csv file.
how can I run on the S3 files like a local script does?
using "python https://s3url/" doesn't work for me.
You can use boto3 to get the list of all the files in s3 bucket:
import boto3
bucketName = "Your S3 BucketName"
# Create an S3 client
s3 = boto3.client('s3')
for key in s3.list_objects(Bucket=bucketName)['Contents']:
print(key['Key'])
A good idea would be to use boto3. Here is a simple guide on how to use the module

How to mount S3 bucket as local FileSystem?

I have a python app running on a Jupiter-notebook on AWS. I loaded a C-library into my python code which expects a path to a file.
I would like to access this file from the S3 bucket.
I tried to use s3fs:
s3 = s3fs.S3FileSystem(anon=False)
using s3.ls('..') lists all my bucket files... this is ok so far. But, the library I am using should actually use the s3 variable inside where I have no access. I can only pass the path to the c library.
Is there a way to mount the s3 bucket in a way, where I don't have to call
s3.open(), and can just call open(/path/to/s3) were somewhere hidden the s3 bucket is really mounted as a local filesystem?
I think it should work like this without using s3. Because I can't change the library I am using internally to use the s3 variable...
with s3.open("path/to/s3/file",'w') as f:
df.to_csv(f)
with open("path/to/s3/file",'w') as f:
df.to_csv(f)
Or am I doing it completely wrong?
The c library iam using is loaded as DLL in python and i call a function :
lib.OpenFile(path/to/s3/file)
I have to pass the path to s3 into the library OpenFile function.
If you're looking to mount the S3 bucket as part of the file system, then use s3fs-fuse
https://github.com/s3fs-fuse/s3fs-fuse
That will make it part of the file system, and the regular file system functions will work as you would expect.
If you are targeting windows, it is possible to use rclone along with winfsp to mount a S3 bucket as local FileSystem
The simplified steps are :
rclone config to create a remote
rclone mount remote:bucket * to mount
https://github.com/rclone/rclone
https://rclone.org/
https://github.com/billziss-gh/winfsp
http://www.secfs.net/winfsp/
Might not the completely relevant to this question, but I am certain it will be to a lot of users coming here.

Categories

Resources