Automatically Upload New Files in SharePoint to S3 with Python - python

I'm very new to AWS, and relatively new to python. Please go easy on me.
I want to upload files from a Sharepoint location to an S3 bucket. From there, I'll be able to perform analysis on those files.
The below code uploads a file in a local directory to an example S3 bucket. I'd like to modify this to only upload new files from a Sharepoint location (and not upload new files).
import boto3
BUCKET_NAME = "test_bucket"
s3 = boto3.client("s3")
with open("./burger.jpg", "rb") as f:
s3.upload_fileobj(f, BUCKET_NAME, "burger_new_upload.jpg", ExtraArgs={"ACL": "public-read"})
Would I find use of AWS Lambda via Python code? Thank you for sharing your knowledge.

Related

How can I access the created folder in s3 to write csv file into it?

I have created the folder code but how can i access the folder to write csv file into that folder?
# Creating folder on S3 for unmatched data
client = boto3.client('s3')
# Variables
target_bucket = obj['source_and_destination_details']['s3_bucket_name']
subfolder = obj['source_and_destination_details']['s3_bucket_uri-new_folder_path'] + obj['source_and_destination_details']['folder_name_for_unmatched_column_data']
# Create subfolder (objects)
client.put_object(Bucket = target_bucket, Key = subfolder)
Folder is getting created succesfully by above code but how to write csv file into it?
Below is the code which i have tried to write but its not working
# Writing csv on AWS S3
df.reindex(idx).to_csv(obj['source_and_destination_details']['s3_bucket_uri-write'] + obj['source_and_destination_details']['folder_name_for_unmatched_column_data'] + obj['source_and_destination_details']['file_name_for_unmatched_column_data'], index=False)
An S3 bucket is not a file system.
I assume that the to_csv() method is supposed to do write to some sort of file system, but this is not the way it works with S3. While there are solutions to mount S3 buckets as file systems, this is not the preferred way.
Usually, you would interact with S3 via the AWS REST APIs, the AWS CLI or a client library such as Boto, which you’re already using.
So in order to store your content on S3, you first create the file locally, e.g. in the system’s /tmp folder. Then use Boto’s put_object() method to upload the file. Remove from your local storage afterwards.

AWS Boto3 upload files issue

I am facing a weird issue.
I am trying to upload few parquet files from local PC to S3 bucket. Below is the script I used.
It ran well for the first file , but as soon as I change the folder and try loading a different file to the same s3 bucket. It doesn't load. The code doesn't fail. But the 2nd file is not visible in the s3 bucket. I have no clue why its behaving this way.
s3 = boto3.resource('s3', aws_access_key_id='*****', aws_secret_access_key='****')
bucket = s3.Bucket(BUCKET)
bucket.upload_file("****.parquet", "****.parquet")

Reading Data from AWS S3

I have some data with very particular format (e.g., tdms files generated by NI systems) and I stored them in a S3 bucket. Typically, for reading this data in python if the data was stored in my local computer, I would use npTDMS package. But, how should is read this tdms files when they are stored in a S3 bucket? One solution is to download the data for instance to the EC2 instance and then use npTDMS package for reading the data into python. But it does not seem to be a perfect solution. Is there any way that I can read the data similar to reading CSV files from S3?
Some Python packages (such as Pandas) support reading data directly from S3, as it is the most popular location for data. See this question for example on the way to do that with Pandas.
If the package (npTDMS) doesn't support reading directly from S3, you should copy the data to the local disk of the notebook instance.
The simplest way to copy is to run the AWS CLI in a cell in your notebook
!aws s3 cp s3://bucket_name/path_to_your_data/ data/
This command will copy all the files under the "folder" in S3 to the local folder data
You can use more fine-grained copy using the filtering of the files and other specific requirements using the boto3 rich capabilities. For example:
s3 = boto3.resource('s3')
bucket = s3.Bucket('my-bucket')
objs = bucket.objects.filter(Prefix='myprefix')
for obj in objs:
obj.download_file(obj.key)
import boto3
s3 = boto3.resource('s3')
bucketname = "your-bucket-name"
filename = "the file you want to read"
obj = s3.Object(bucketname, filename)
body = obj.get()['Body'].read()
boto3 is the default option, however, as an alternative awswrangler provides some nice wrappers.

Run a Python script on S3 Files

I want to run a python script on my entire S3 Bucket.
The script takes the files and inserts them into a csv file.
how can I run on the S3 files like a local script does?
using "python https://s3url/" doesn't work for me.
You can use boto3 to get the list of all the files in s3 bucket:
import boto3
bucketName = "Your S3 BucketName"
# Create an S3 client
s3 = boto3.client('s3')
for key in s3.list_objects(Bucket=bucketName)['Contents']:
print(key['Key'])
A good idea would be to use boto3. Here is a simple guide on how to use the module

File upload and download using python

I am looking for suggestions for my program:
One part of my program generates a .csv file that I need to upload to cloud. Essentially, the program should upload the .csv file to cloud and return the url for that location (csv_url)
Another part of my program has to use that csv_url with wget to download this file.
How can I tackle this problem? Will uploading the file to a S3 bucket work for me? How to return a consolidated url in that case? apart from s3 bucket is there any other medium where I can try and upload my file? Any suggestion would be very helpful.
Try boto3 library from Amazon, its has all the functions your would like to do
S3, GET/POST/PUT/DELETE/LIST.
PUT Example--
# Upload a new file
data = open('test.jpg', 'rb')
s3.Bucket('my-bucket').put_object(Key='test.jpg', Body=data)
Yes , uploading the file to AWS s3 will definitely work for you and you need nothing else and if you want to do that with python , it's quite easy
import boto3
s3 = boto3.client('s3')
s3.upload_file('images/4.jpeg', 'mausamrest', 'test/jkl.jpeg',ExtraArgs={'ACL': 'public-read'})
where mausamrest is bucket and test/jkl.jpeg is keyname or you can say filename in s3
and this is how you will have your url
https://s3.amazonaws.com/mausamrest/test/jkl.jpeg
s3.amazonaws.com/bucketname/keyname this is the format of how your object url will be
in my case image is opening in browser as i have done that kind of thing , in your case your csv will get downloaded

Categories

Resources