I want to read the size of a file before uploading it to S3 in order to check if there is enough available storage left. The following code works. However the file is empty when it is uploaded to S3. If I delete the part that checks the size of the file it is uploaded properly. Is there another way to get the file size? The file comes from an upload form of a HTML page and I'm uploading it directly to S3 without saving it to the server first.
availablestorage = getavailablestorage() #gets the available storage in bytes
latestfile = request.files['filetoupload'] #get the file from the HTML form
latestfile.seek(0,2)
latestsize = latestfile.tell() #this gets the size of the file
if availablestorage < latestsize:
return "No space available. Delete files."
bucketname = request.form.get('spaceforupload')
conn = boto3.client('s3')
conn.upload_fileobj(latestfile, bucketname, latestfile.filename)
return redirect(url_for('showspace', spacename=bucketname))
of course, you just seeked to the end to get the size, now latestfile handle current position is "end of file".
Just do:
latestfile.seek(0)
before running conn.upload_fileobj. That should work.
Related
I was trying to open a file/image in python/django and upload it to s3 but I get different errors depending on what I try. I can get it to work when I send the image using the front end html form but not when opening the file on the back end. I get errors such as "'bytes' object has no attribute 'file'" Any ideas how to open an image and upload it to s3? I wasn't sure if I was using the correct upload function, but it worked when I received the file from an html form instead of opening it directly.
image = open(fileURL, encoding="utf-8")
S3_BUCKET = settings.AWS_BUCKET
session = boto3.Session(
aws_access_key_id=settings.AWS_ACCESS_KEY_ID,
aws_secret_access_key=settings.AWS_SECRET_ACCESS_KEY,
)
s3 = session.resource('s3')
s3.Bucket(S3_BUCKET).put_object(Key='folder/%s' % fileName, Body=image)
Thanks.
The open command return a file object. Therefore Body=image does not contain the actual contents of the object.
Since you want to upload an existing object, you could use:
Key = 'folder/' + fileName
s3.Object(S3_BUCKET, Key).upload_file(fileURL)
I am building files for download by exporting them from BigQuery into Google Cloud storage. Some of these files are relatively large, and so they have to be composed together after they are sharded by the BigQuery export.
The user inputs a file name for the file that will be generated on the front-end.
On the backend, I am generating a random temporary name so that I can then compose the files together.
# Job Configuration for the job to move the file to bigQuery
job_config = bigquery.QueryJobConfig()
job_config.destination = table_ref
query_job = client.query(sql, job_config=job_config)
query_job.result()
# Generate a temporary name for the file
fname = "".join(random.choices(string.ascii_letters, k=10))
# Generate Destination URI
destinationURI = info["bucket_uri"].format(filename=f"{fname}*.csv")
extract_job = client.extract_table(table_ref, destination_uris=destinationURI, location="US")
extract_job.result()
# Delete the temporary table, if it doesn't exist ignore it
client.delete_table(f"{info['did']}.{info['tmpid']}", not_found_ok=True)
After the data export has completed, I unshard the files by composing the blobs together.
client = storage.Client(project=info["pid"])
bucket = client.bucket(info['bucket_name'])
all_blobs = list(bucket.list_blobs(prefix=fname))
blob_initial = all_blobs.pop(0)
prev_ind = 0
for i in range(31, len(all_blobs), 31):
# Compose files in chunks of 32 blobs (GCS Limit)
blob_initial.compose([blob_initial, *all_blobs[prev_ind:i]])
# PREVENT GCS RATE-LIMIT FOR FILES EXCEEDING ~100GB
time.sleep(1.0)
prev_ind = i
else:
# Compose all remaining files when less than 32 files are left
blob_initial.compose([blob_initial, *all_blobs[prev_ind:]])
for b in all_blobs:
# Delete the sharded files
b.delete()
After all the files have been composed into one file, I rename the blob to the user provided filename. Then I generate a signed URL which gets posted to firebase for the front-end to provide the file for download.
# Rename the file to the user provided filename
bucket.rename_blob(blob_initial, data["filename"])
# Generate signed url to post to firebase
download_url = blob_initial.generate_signed_url(datetime.now() + timedelta(days=10000))
The issue I am encountering occurs because of the use of the random filename used when the files are sharded. The reason I chose to use a random filename instead of the user-provided filename is because there may be instances when multiple users submit a request using the same (default value) filenames, and so those edge-cases would cause issues with the file sharding.
When I try to download the file, I get the following return:
<Error>
<Code>NoSuchKey</Code>
<Message>The specified key does not exist.</Message>
<Details>No such object: project-id.appspot.com/YMHprgIqMe000000000000.csv</Details>
</Error>
Although I renamed the file, it seems that the download URL is still using the old file name.
Is there a way to inform GCS that the filename has changed when I generate the signed URL?
It appears as though all that was needed was to reload the blob!
bucket.rename_blob(blob_initial, data["filename"])
blob = bucket.get(data["filename"])
# Generate signed url to post to firebase
download_url = blob.generate_signed_url(datetime.now() + timedelta(days=10000))
I am trying to upload a file in to a folder in the box using the below code:
folder_id = '22222'
new_file = client.folder(folder_id).upload('/home/me/document.pdf')
print('File "{0}" uploaded to Box with file ID {1}'.format(new_file.name, new_file.id))
This code is not replacing the existing document.pdf in the box folder, rather it is keeping the older version of the file. I would like to remove the file in the target and keep the latest file. How to achieve this?
Since your goal is to replace the original file, you can try to overwrite its existing content. Here is an example. You will need to check for the filename if it is already present in the BOX folder though
folder_id = '22222'
file_path = '/home/me/document.pdf'
results = client.search().query(query='document', limit=1, ancestor_folder_ids=[folder_id], type='file', file_extensions=['pdf'])
file_id = None
for item in results:
file_id = item.id
if file_id:
updated_file = client.file(file_id).update_contents(file_path)
print('File "{0}" has been updated'.format(updated_file.name))
else:
new_file = client.folder(folder_id).upload(file_path)
print('File "{0}" uploaded to Box with file ID {1}'.format(new_file.name, new_file.id))
Its not replacing it because every time you upload new file it assign it a new id so the old file will never be replaced.
This is what I found in official docs.
Try to give it a name and then try that.
upload[source]
Upload a file to the folder. The contents are taken from the given file path, and it will have the given name. If file_name is not specified, the uploaded file will take its name from file_path.
Parameters:
file_path (unicode) – The file path of the file to upload to Box.
file_name (unicode) – The name to give the file on Box. If None, then use the leaf name of file_path
preflight_check (bool) – If specified, preflight check will be performed before actually uploading the file.
preflight_expected_size (int) – The size of the file to be uploaded in bytes, which is used for preflight check. The default value is ‘0’, which means the file size is unknown.
upload_using_accelerator (bool) –
If specified, the upload will try to use Box Accelerator to speed up the uploads for big files. It will make an extra API call before the actual upload to get the Accelerator upload url, and then make a POST request to that url instead of the default Box upload url. It falls back to normal upload endpoint, if cannot get the Accelerator upload url.
Please notice that this is a premium feature, which might not be available to your app.
Returns:
The newly uploaded file.
Return type:
File
I have a Flask view that generates data and saves it as a CSV file with Pandas, then displays the data. A second view serves the generated file. I want to remove the file after it is downloaded. My current code raises a permission error, maybe because after_request deletes the file before it is served with send_from_directory. How can I delete a file after serving it?
def process_data(data)
tempname = str(uuid4()) + '.csv'
data['text'].to_csv('samo/static/temp/{}'.format(tempname))
return file
#projects.route('/getcsv/<file>')
def getcsv(file):
#after_this_request
def cleanup(response):
os.remove('samo/static/temp/' + file)
return response
return send_from_directory(directory=cwd + '/samo/static/temp/', filename=file, as_attachment=True)
after_request runs after the view returns but before the response is sent. Sending a file may use a streaming response; if you delete it before it's read fully you can run into errors.
This is mostly an issue on Windows, other platforms can mark a file deleted and keep it around until it not being accessed. However, it may still be useful to only delete the file once you're sure it's been sent, regardless of platform.
Read the file into memory and serve it, so that's it's not being read when you delete it later. In case the file is too big to read into memory, use a generator to serve it then delete it.
#app.route('/download_and_remove/<filename>')
def download_and_remove(filename):
path = os.path.join(current_app.instance_path, filename)
def generate():
with open(path) as f:
yield from f
os.remove(path)
r = current_app.response_class(generate(), mimetype='text/csv')
r.headers.set('Content-Disposition', 'attachment', filename='data.csv')
return r
In my flask application, I am using a function to upload file to Amazon s3, using Boto.
Its working fine most of the cases, but some times its uploading files as zero byte file with no extension.
Why its failing sometimes,
I am validating user image file in form.
FileField('Your photo',validators=[FileAllowed(['jpg', 'png'], 'Images only!')])
My image upload function.
def upload_image_to_s3(image_from_form):
#upload pic to amazon
source_file_name_photo = secure_filename(image_from_form.filename)
source_extension = os.path.splitext(source_file_name_photo)[1]
destination_file_name_photo = uuid4().hex + source_extension
s3_file_name = destination_file_name_photo
# Connect to S3 and upload file.
conn = boto.connect_s3('ASJHjgjkhSDJJHKJKLSDH','GKLJHASDJGFAKSJDGJHASDKJKJHbbvhjcKJHSD')
b = conn.get_bucket('mybucket')
# Connect to S3 and upload file.
sml = b.new_key("/".join(["myfolder",destination_file_name_photo]))
sml.set_contents_from_string(image_from_form.read())
acl='public-read'
sml.set_acl(acl)
return s3_file_name
How large are your assets? If there is too large of an upload, you may have to multipart/chunk it otherwise it will timeout.
bucketObject.initiate_multipart_upload('/local/object/as/file.ext')
it means you will not be using set_contents_from_string but rather store and upload. You may have to use something to chuck the file, like FileChuckIO.
An example is here if this applies to you : http://www.bogotobogo.com/DevOps/AWS/aws_S3_uploading_large_file.php
Also, you may want to edit your post above and alter your AWS keys.