Deleting s3 object - python

I am storing user profile images on s3. When user changes his profile image, I generate a new s3 key and store newly returned url as user profile image.
I delete the old key. However, I can still access the previous image via old URL though key has been deleted. Following is my relevant code snippet
import boto
conn = boto.connect_s3( AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY )
image_bucket = conn.get_bucket( IMAGE_BUCKET )
old_s3_key = user.get_old_key()
image_bucket.delete_key( old_s3_key )
Does s3 takes time to remove the url associated with the key?

It might take a short amount of time before consistency is achieved but I don't think that would explain this behavior. I don't know what your get_old_key() method is returning but the delete_key() method expects the name of a key. Is that what you are passing in?
The S3 service does not return an error if you try to delete a key that does not exist in the bucket so if you are passing in the wrong value to delete_key() it would fail silently.

Related

How to convert FileStorage object to b2sdk.v2.AbstractUploadSource in Python

I am using Backblaze B2 and b2sdk.v2 in Flask to upload files.
This is code I tried, using the upload method:
# I am not showing authorization code...
def upload_file(file):
bucket = b2_api.get_bucket_by_name(bucket_name)
file = request.files['file']
bucket.upload(
upload_source=file,
file_name=file.filename,
)
This shows an error like this
AttributeError: 'SpooledTemporaryFile' object has no attribute 'get_content_length'
I think it's because I am using a FileStorage instance for the upload_source parameter.
I want to know whether I am using the API correctly or, if not, how should I use this?
Thanks
You're correct - you can't use a Flask FileStorage instance as a B2 SDK UploadSource. What you need to do is to use the upload_bytes method with the file's content:
def upload_file(file):
bucket = b2_api.get_bucket_by_name(bucket_name)
file = request.files['file']
bucket.upload_bytes(
data_bytes=file.read(),
file_name=file.filename,
...other parameters...
)
Note that this reads the entire file into memory. The upload_bytes method may need to restart the upload if something goes wrong (with the network, usually), so the file can't really be streamed straight through into B2.
If you anticipate that your files will not fit into memory, you should look at using create_file_stream to upload the file in chunks.

Copy Blobs from one storage account to another storage account using Python

I'm trying the following code to copy all blobs from one storage account into another, in the same resource group:
src_storage_client = BlobServiceClient.from_connection_string(src_storage['connectionString'])
src_container = src_storage_client.get_container_client(src_storage['containerName'])
dst_storage_client = BlobServiceClient.from_connection_string(dst_storage['connectionString'])
dst_container = dst_storage_client.get_container_client(dst_storage['containerName'])
try:
for blob in src_container.list_blobs():
src_blob = BlobClient.from_connection_string(src_storage['connectionString'], src_storage['containerName'], blob.name)
new_blob = BlobClient.from_connection_string(dst_storage['connectionString'], dst_storage['containerName'], blob.name)
new_blob.start_copy_from_url(src_blob.url)
I receive the following error:
azure.core.exceptions.ClientAuthenticationError: Operation returned an
invalid status 'Server failed to authenticate the request. Please
refer to the information in the www-authenticate header.'
I tried generate_blob_sas and generate_account_sas to no use!
I checked out other examples, documentations, and repos but none matched my case, unless I missed something.
Using that same new_blob instance with upload method, it works perfectly. However, I want to copy rather than download-then-upload-again.
The code in your issue is correct. And you could use container.get_blob_client(blobName) instead in the loop.
for blob in src_container.list_blobs():
src_blob = src_container.get_blob_client(blob.name)
new_blob = dst_container.get_blob_client(blob.name)
new_blob.start_copy_from_url(src_blob.url)
It's important to check if src_blob.url or blob.name contain the special characters or space, they need to be encode first. It seems caused by special blobs, please try with a blob. When you use SAS token, you should check the allowed permissions and expired time.

Best way to overwrite Azure Blob in Python

If I try to overwrite an existing blob:
blob_client = BlobClient.from_connection_string(connection_string, container_name, blob_name)
blob_client.upload_blob('Some text')
I get a ResourceExistsError.
I can check if the blob exists, delete it, and then upload it:
try:
blob_client.get_blob_properties()
blob_client.delete_blob()
except ResourceNotFoundError:
pass
blob_client.upload_blob('Some text')
Taking into account both what the python azure blob storage API has available as well as idiomatic python style, is there a better way to overwrite the contents of an existing blob? I was expecting there to be some sort of overwrite parameter that could be optionally set to true in the upload_blob method, but it doesn't appear to exist.
From this issue it seems that you can add overwrite=True to upload_blob and it will work.
If you upload the blob with the same name and pass in the overwrite=True param, then all the contents of that file will be updated in place.
blob_client.upload_blob(data, overwrite=True)
During the update readers will continue to see the old data by default.(until new data is committed).
I think there is also an option to read uncommitted data as well if readers wish to.
Below from the docs:
overwrite (bool) – Whether the blob to be uploaded should overwrite
the current data. If True, upload_blob will overwrite the existing
data. If set to False, the operation will fail with
ResourceExistsError. The exception to the above is with Append blob
types: if set to False and the data already exists, an error will not
be raised and the data will be appended to the existing blob. If set
overwrite=True, then the existing append blob will be deleted, and a
new one created. Defaults to False.
The accepted answer might work, but is potentially incorrect according to documentation;
azure.storage.blob.ContainerClient.upload_blob() can take the parameter overwrite=True
See docs for ContainerClient
The documentation for
azure.storage.blob.BlobClient.upload_blob() does not document an overwrite parameter See docs for BlobClient

Using python class.instance after creation

I need to move images from several directories into one and capture metadata on the files before and after the move.
In each directory:
Read the index of jpg images from indexfile.csv, including metadata on each image
Upload the corresponding image file to google drive, with metadata
Add entry to uberindex.csv which includes the metadata from indexfile.csv and the file url from google drive after upload
My plan was to create an instance of the class ybpic() - def below – for each row of indexfile.csv and use that instance to identify the actual file to be moved ( it’s reference in the indexfile ), hold the metadata from the indexfile.csv, then update that ybpic.instance with the results of the google drive upload ( the other metadata ) before finally writing out all of the instances to the uberindex.csv.
I know I’m going to kick myself when the answer comes ( real noob ).
I can csv.reader the indexfile.csv into a ybpic.instance but I’m not able refer to each instance individually to use or update the instance later.
I can just append the rows from indexfile.csv to indexlist[], and I’m able to return the updated list back to the caller but I don’t know a good way to then update that list row, for the corresponding image file, later with the new metadata.
Here's the ybpic def
class ybpic():
def __init__(self,FileID, PHOTO, Source, Vintage, Students,Folder,Log):
self.GOBJ=" "
self.PicID=" "
self.FileID=FileID
self.PHOTO=PHOTO
self.Source=Source
self.Students=Students
self.Vintage=Vintage
self.MultipleStudents=" "
self.CurrentTeacher=" "
self.Folder=Folder ## This may be either the local folder or the drive folder attr
self.Room=" "
self.Log=Log ## The source csvfile from which the FileID came
Here is the function populating the instance and list. The indexfile.csv is passed as photolog and cwd is just the working directory:
def ReadIndex(photolog, cwd, indexlist) :
""" Read the CSV log file into an instance of YBPic. """
with open(photolog,'r') as indexin :
readout = csv.reader(indexin)
for row in readout:
indexrow=ybpic(row[0],row[1],row[2],row[3],row[4],cwd,photolog)
indexlist.append(row) ### THIS WORKS TO APPEND TO THE LIST
### THAT WAS PASSED TO ReadIndex
return(indexlist)
Any and all help is greatly appreciated.
Instead of using a list, you could use a dictionary of objects with PhotoID as the key (assuming it's stored in row[0]).
def ReadIndex(photolog, cwd, indexlist) :
""" Read the CSV log file into an instance of YBPic. """
ybpic_dict = {}
with open(photolog,'r') as indexin :
readout = csv.reader(indexin)
for row in readout:
ybpic_dict[row[0]] = ybpic(row[0],row[1],row[2],row[3],row[4],cwd,photolog)
return ybpic_dict
Then when you need to update the attributes later
ybpic_dict[PhotoID].update(...)
Okay, since I found the answer myself, no kicks are in order....
Store the ybpic.instance object in a list.
The answer was, in the for loop which creates the instance of ybpic from the row of the indexfile, rather than putting the associated values for the instance in a list to be passed back to the caller, append the actual object of the instance into the list which is then passed back to the caller. Once I'm back in the calling function, I then have access to the objects(instances).
I'm not sure this is the best answer, but it's the one that gets me moving to the next.
New code:
def ReadIndex(photolog, cwd, indexlist) :
""" Read the CSV log file into an instance of YBPic. """
with open(photolog,'r') as indexin :
readout = csv.reader(indexin)
for row in readout:
indexrow=ybpic(row[0],row[1],row[2],row[3],row[4],cwd,photolog)
indexlist.append(indexrow) ## Store the ybpic.indexrow instance
return(indexlist)
.

How to change metadata on an object in Amazon S3

If you have already uploaded an object to an Amazon S3 bucket, how do you change the metadata using the API? It is possible to do this in the AWS Management Console, but it is not clear how it could be done programmatically. Specifically, I'm using the boto API in Python and from reading the source it is clear that using key.set_metadata only works before the object is created as it just effects a local dictionary.
It appears you need to overwrite the object with itself, using a "PUT Object (Copy)" with an x-amz-metadata-directive: REPLACE header in addition to the metadata. In boto, this can be done like this:
k = k.copy(k.bucket.name, k.name, {'myKey':'myValue'}, preserve_acl=True)
Note that any metadata you do not include in the old dictionary will be dropped. So to preserve old attributes you'll need to do something like:
k.metadata.update({'myKey':'myValue'})
k2 = k.copy(k.bucket.name, k.name, k.metadata, preserve_acl=True)
k2.metadata = k.metadata # boto gives back an object without *any* metadata
k = k2;
I almost missed this solution, which is hinted at in the intro to an incorrectly-titled question that's actually about a different problem than this question: Change Content-Disposition of existing S3 object
In order to set metadata on S3 files,just don't provide target location as only source information is enough to set metadata.
final ObjectMetadata metadata = new ObjectMetadata();
metadata.addUserMetadata(metadataKey, value);
final CopyObjectRequest request = new CopyObjectRequest(bucketName, keyName, bucketName, keyName)
.withSourceBucketName(bucketName)
.withSourceKey(keyName)
.withNewObjectMetadata(metadata);
s3.copyObject(request);`
If you want your metadata stored remotely use set_remote_metadata
Example:
key.set_remote_metadata({'to_be': 'added'}, ['key', 'to', 'delete'], {True/False})
Implementation is here:
https://github.com/boto/boto/blob/66b360449812d857b4ec6a9834a752825e1e7603/boto/s3/key.py#L1875
You can change the metadata without re-uloading the object by using the copy command. See this question: Is it possible to change headers on an S3 object without downloading the entire object?
For the first answer it's a good idea to include the original content type in the metadata, for example:
key.set_metadata('Content-Type', key.content_type)
In Java, You can copy object to the same location.
Here metadata will not copy while copying an Object.
You have to get metadata of original and set to copy request.
This method is more recommended to insert or update metadata of an Amazon S3 object
ObjectMetadata metadata = amazonS3Client.getObjectMetadata(bucketName, fileKey);
ObjectMetadata metadataCopy = new ObjectMetadata();
metadataCopy.addUserMetadata("yourKey", "updateValue");
metadataCopy.addUserMetadata("otherKey", "newValue");
metadataCopy.addUserMetadata("existingKey", metadata.getUserMetaDataOf("existingValue"));
CopyObjectRequest request = new CopyObjectRequest(bucketName, fileKey, bucketName, fileKey)
.withSourceBucketName(bucketName)
.withSourceKey(fileKey)
.withNewObjectMetadata(metadataCopy);
amazonS3Client.copyObject(request);
here is the code that worked for me. I'm using aws-java-sdk-s3 version 1.10.15
ObjectMetadata metadata = new ObjectMetadata();
metadata.setContentType(fileExtension.getMediaType());
s3Client.putObject(new PutObjectRequest(bucketName, keyName, tempFile)
.withMetadata(metadata));

Categories

Resources