I am looking for suggestions for my program:
One part of my program generates a .csv file that I need to upload to cloud. Essentially, the program should upload the .csv file to cloud and return the url for that location (csv_url)
Another part of my program has to use that csv_url with wget to download this file.
How can I tackle this problem? Will uploading the file to a S3 bucket work for me? How to return a consolidated url in that case? apart from s3 bucket is there any other medium where I can try and upload my file? Any suggestion would be very helpful.
Try boto3 library from Amazon, its has all the functions your would like to do
S3, GET/POST/PUT/DELETE/LIST.
PUT Example--
# Upload a new file
data = open('test.jpg', 'rb')
s3.Bucket('my-bucket').put_object(Key='test.jpg', Body=data)
Yes , uploading the file to AWS s3 will definitely work for you and you need nothing else and if you want to do that with python , it's quite easy
import boto3
s3 = boto3.client('s3')
s3.upload_file('images/4.jpeg', 'mausamrest', 'test/jkl.jpeg',ExtraArgs={'ACL': 'public-read'})
where mausamrest is bucket and test/jkl.jpeg is keyname or you can say filename in s3
and this is how you will have your url
https://s3.amazonaws.com/mausamrest/test/jkl.jpeg
s3.amazonaws.com/bucketname/keyname this is the format of how your object url will be
in my case image is opening in browser as i have done that kind of thing , in your case your csv will get downloaded
Related
I have created the folder code but how can i access the folder to write csv file into that folder?
# Creating folder on S3 for unmatched data
client = boto3.client('s3')
# Variables
target_bucket = obj['source_and_destination_details']['s3_bucket_name']
subfolder = obj['source_and_destination_details']['s3_bucket_uri-new_folder_path'] + obj['source_and_destination_details']['folder_name_for_unmatched_column_data']
# Create subfolder (objects)
client.put_object(Bucket = target_bucket, Key = subfolder)
Folder is getting created succesfully by above code but how to write csv file into it?
Below is the code which i have tried to write but its not working
# Writing csv on AWS S3
df.reindex(idx).to_csv(obj['source_and_destination_details']['s3_bucket_uri-write'] + obj['source_and_destination_details']['folder_name_for_unmatched_column_data'] + obj['source_and_destination_details']['file_name_for_unmatched_column_data'], index=False)
An S3 bucket is not a file system.
I assume that the to_csv() method is supposed to do write to some sort of file system, but this is not the way it works with S3. While there are solutions to mount S3 buckets as file systems, this is not the preferred way.
Usually, you would interact with S3 via the AWS REST APIs, the AWS CLI or a client library such as Boto, which you’re already using.
So in order to store your content on S3, you first create the file locally, e.g. in the system’s /tmp folder. Then use Boto’s put_object() method to upload the file. Remove from your local storage afterwards.
I'm very new to AWS, and relatively new to python. Please go easy on me.
I want to upload files from a Sharepoint location to an S3 bucket. From there, I'll be able to perform analysis on those files.
The below code uploads a file in a local directory to an example S3 bucket. I'd like to modify this to only upload new files from a Sharepoint location (and not upload new files).
import boto3
BUCKET_NAME = "test_bucket"
s3 = boto3.client("s3")
with open("./burger.jpg", "rb") as f:
s3.upload_fileobj(f, BUCKET_NAME, "burger_new_upload.jpg", ExtraArgs={"ACL": "public-read"})
Would I find use of AWS Lambda via Python code? Thank you for sharing your knowledge.
I have a URL (https://example.com/myfile.txt) of a file and I want to upload it to my bucket (gs://my-sample-bucket) on Google Cloud Storage.
What I am currently doing is:
Downloading the file to my system using the requests library.
Uploading that file to my bucket using python function.
Is there any way I can upload the file directly using the URL.
You can use urllib2 or requests library to get the file from HTTP, then your existing python code to upload to Cloud Storage. Something like this should work:
import urllib2
from google.cloud import storage
client = storage.Client()
filedata = urllib2.urlopen('http://example.com/myfile.txt')
datatoupload = filedata.read()
bucket = client.get_bucket('bucket-id-here')
blob = Blob("myfile.txt", bucket)
blob.upload_from_string(datatoupload)
It still downloads the file into memory on your system, but I don't think there's a way to tell Cloud Storage to do that for you.
There is a way to do this, using a Cloud Storage Transfer job, but depending on your use case, it may be worth doing or not. You would need to create a transfer job to transfer a URL list.
I marked this question as duplicated from this.
I know how to upload a file to s3 buckets in Python. I am looking for a way to upload data to a file in s3 bucket directly. In this way, I do not need to save my data to a local file, and then upload the file. Any suggestions? Thanks!
AFAIK standard Object.put() supports this.
resp = s3.Object('bucket_name', 'key/key.txt').put(Body=b'data')
Edit: it was pointed out that you might want the client method, which is just put_object with the kwargs differently organized
client.put_object(Body=b'data', Bucket='bucket_name', Key='key/key.txt')
I'm making a small app to export data from BigQuery to google-cloud-storage and then copy it into aws s3, but having trouble finding out how to do it in python.
I have already written the code in kotlin (because it was easiest for me, and reasons outside the scope of my question, we want it to run in python), and in kotlin the google sdk allows me to get an InputSteam from the Blob object, which i can then inject into the amazon s3 sdk's AmazonS3.putObject(String bucketName, String key, InputStream input, ObjectMetadata metadata).
With the python sdk it seems i only have the options to download file to a file and as a string.
I would like (as i do in kotlin) to pass some object returned from the Blob object, into the AmazonS3.putObject() method, without having to save the content as a file first.
I am in no way a python pro, so i might have missed an obvious way of doing this.
I ended up with the following solution, as apparently download_to_filename downloads data into a file-like-object that the boto3 s3 client can handle.
This works just fine for smaller files, but as it buffers it all in memory, it could be problematic for larger files.
def copy_data_from_gcs_to_s3(gcs_bucket, gcs_filename, s3_bucket, s3_filename):
gcs_client = storage.Client(project="my-project")
bucket = gcs_client.get_bucket(gcs_bucket)
blob = bucket.blob(gcs_filename)
data = BytesIO()
blob.download_to_file(data)
data.seek(0)
s3 = boto3.client("s3")
s3.upload_fileobj(data, s3_bucket, s3_filename)
If anyone has information/knowledge about something other than BytesIO to handle the data (fx. so i can stream the data directly into s3, without having to buffer it in memory on the host-machine) it would be very much appreciated.
Google-resumable-media can be used to download file through chunks from GCS and smart_open to upload them to S3. This way you don't need to download whole file into memory. Also there is an similar question that addresses this issue Can you upload to S3 using a stream rather than a local file?