Best way to overwrite Azure Blob in Python - python

If I try to overwrite an existing blob:
blob_client = BlobClient.from_connection_string(connection_string, container_name, blob_name)
blob_client.upload_blob('Some text')
I get a ResourceExistsError.
I can check if the blob exists, delete it, and then upload it:
try:
blob_client.get_blob_properties()
blob_client.delete_blob()
except ResourceNotFoundError:
pass
blob_client.upload_blob('Some text')
Taking into account both what the python azure blob storage API has available as well as idiomatic python style, is there a better way to overwrite the contents of an existing blob? I was expecting there to be some sort of overwrite parameter that could be optionally set to true in the upload_blob method, but it doesn't appear to exist.

From this issue it seems that you can add overwrite=True to upload_blob and it will work.

If you upload the blob with the same name and pass in the overwrite=True param, then all the contents of that file will be updated in place.
blob_client.upload_blob(data, overwrite=True)
During the update readers will continue to see the old data by default.(until new data is committed).
I think there is also an option to read uncommitted data as well if readers wish to.
Below from the docs:
overwrite (bool) – Whether the blob to be uploaded should overwrite
the current data. If True, upload_blob will overwrite the existing
data. If set to False, the operation will fail with
ResourceExistsError. The exception to the above is with Append blob
types: if set to False and the data already exists, an error will not
be raised and the data will be appended to the existing blob. If set
overwrite=True, then the existing append blob will be deleted, and a
new one created. Defaults to False.

The accepted answer might work, but is potentially incorrect according to documentation;
azure.storage.blob.ContainerClient.upload_blob() can take the parameter overwrite=True
See docs for ContainerClient
The documentation for
azure.storage.blob.BlobClient.upload_blob() does not document an overwrite parameter See docs for BlobClient

Related

Copy Blobs from one storage account to another storage account using Python

I'm trying the following code to copy all blobs from one storage account into another, in the same resource group:
src_storage_client = BlobServiceClient.from_connection_string(src_storage['connectionString'])
src_container = src_storage_client.get_container_client(src_storage['containerName'])
dst_storage_client = BlobServiceClient.from_connection_string(dst_storage['connectionString'])
dst_container = dst_storage_client.get_container_client(dst_storage['containerName'])
try:
for blob in src_container.list_blobs():
src_blob = BlobClient.from_connection_string(src_storage['connectionString'], src_storage['containerName'], blob.name)
new_blob = BlobClient.from_connection_string(dst_storage['connectionString'], dst_storage['containerName'], blob.name)
new_blob.start_copy_from_url(src_blob.url)
I receive the following error:
azure.core.exceptions.ClientAuthenticationError: Operation returned an
invalid status 'Server failed to authenticate the request. Please
refer to the information in the www-authenticate header.'
I tried generate_blob_sas and generate_account_sas to no use!
I checked out other examples, documentations, and repos but none matched my case, unless I missed something.
Using that same new_blob instance with upload method, it works perfectly. However, I want to copy rather than download-then-upload-again.
The code in your issue is correct. And you could use container.get_blob_client(blobName) instead in the loop.
for blob in src_container.list_blobs():
src_blob = src_container.get_blob_client(blob.name)
new_blob = dst_container.get_blob_client(blob.name)
new_blob.start_copy_from_url(src_blob.url)
It's important to check if src_blob.url or blob.name contain the special characters or space, they need to be encode first. It seems caused by special blobs, please try with a blob. When you use SAS token, you should check the allowed permissions and expired time.

Using python class.instance after creation

I need to move images from several directories into one and capture metadata on the files before and after the move.
In each directory:
Read the index of jpg images from indexfile.csv, including metadata on each image
Upload the corresponding image file to google drive, with metadata
Add entry to uberindex.csv which includes the metadata from indexfile.csv and the file url from google drive after upload
My plan was to create an instance of the class ybpic() - def below – for each row of indexfile.csv and use that instance to identify the actual file to be moved ( it’s reference in the indexfile ), hold the metadata from the indexfile.csv, then update that ybpic.instance with the results of the google drive upload ( the other metadata ) before finally writing out all of the instances to the uberindex.csv.
I know I’m going to kick myself when the answer comes ( real noob ).
I can csv.reader the indexfile.csv into a ybpic.instance but I’m not able refer to each instance individually to use or update the instance later.
I can just append the rows from indexfile.csv to indexlist[], and I’m able to return the updated list back to the caller but I don’t know a good way to then update that list row, for the corresponding image file, later with the new metadata.
Here's the ybpic def
class ybpic():
def __init__(self,FileID, PHOTO, Source, Vintage, Students,Folder,Log):
self.GOBJ=" "
self.PicID=" "
self.FileID=FileID
self.PHOTO=PHOTO
self.Source=Source
self.Students=Students
self.Vintage=Vintage
self.MultipleStudents=" "
self.CurrentTeacher=" "
self.Folder=Folder ## This may be either the local folder or the drive folder attr
self.Room=" "
self.Log=Log ## The source csvfile from which the FileID came
Here is the function populating the instance and list. The indexfile.csv is passed as photolog and cwd is just the working directory:
def ReadIndex(photolog, cwd, indexlist) :
""" Read the CSV log file into an instance of YBPic. """
with open(photolog,'r') as indexin :
readout = csv.reader(indexin)
for row in readout:
indexrow=ybpic(row[0],row[1],row[2],row[3],row[4],cwd,photolog)
indexlist.append(row) ### THIS WORKS TO APPEND TO THE LIST
### THAT WAS PASSED TO ReadIndex
return(indexlist)
Any and all help is greatly appreciated.
Instead of using a list, you could use a dictionary of objects with PhotoID as the key (assuming it's stored in row[0]).
def ReadIndex(photolog, cwd, indexlist) :
""" Read the CSV log file into an instance of YBPic. """
ybpic_dict = {}
with open(photolog,'r') as indexin :
readout = csv.reader(indexin)
for row in readout:
ybpic_dict[row[0]] = ybpic(row[0],row[1],row[2],row[3],row[4],cwd,photolog)
return ybpic_dict
Then when you need to update the attributes later
ybpic_dict[PhotoID].update(...)
Okay, since I found the answer myself, no kicks are in order....
Store the ybpic.instance object in a list.
The answer was, in the for loop which creates the instance of ybpic from the row of the indexfile, rather than putting the associated values for the instance in a list to be passed back to the caller, append the actual object of the instance into the list which is then passed back to the caller. Once I'm back in the calling function, I then have access to the objects(instances).
I'm not sure this is the best answer, but it's the one that gets me moving to the next.
New code:
def ReadIndex(photolog, cwd, indexlist) :
""" Read the CSV log file into an instance of YBPic. """
with open(photolog,'r') as indexin :
readout = csv.reader(indexin)
for row in readout:
indexrow=ybpic(row[0],row[1],row[2],row[3],row[4],cwd,photolog)
indexlist.append(indexrow) ## Store the ybpic.indexrow instance
return(indexlist)
.

Deleting s3 object

I am storing user profile images on s3. When user changes his profile image, I generate a new s3 key and store newly returned url as user profile image.
I delete the old key. However, I can still access the previous image via old URL though key has been deleted. Following is my relevant code snippet
import boto
conn = boto.connect_s3( AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY )
image_bucket = conn.get_bucket( IMAGE_BUCKET )
old_s3_key = user.get_old_key()
image_bucket.delete_key( old_s3_key )
Does s3 takes time to remove the url associated with the key?
It might take a short amount of time before consistency is achieved but I don't think that would explain this behavior. I don't know what your get_old_key() method is returning but the delete_key() method expects the name of a key. Is that what you are passing in?
The S3 service does not return an error if you try to delete a key that does not exist in the bucket so if you are passing in the wrong value to delete_key() it would fail silently.

Downloaded filename with Google App Engine Blobstore

I'm using the Google App Engine Blobstore to store a range of file types (PDF, XLS, etc) and am trying to find a mechanism by which the original filename of the uploaded file - as stored in blob_info - can be used to name the downloaded file i.e. so that the user sees 'some_file.pdf' in the save dialogue rather than 'very_long_db_key.pdf'.
I can't see anything in the docs that would allow this:
http://code.google.com/appengine/docs/python/blobstore/overview.html
I've seen hints in other posts that you could use the information in blob_info to set the content-disposition header. Is this the best approach to achieving the desired end?
There is an optional 'save_as' parameter in the send_blob function. By default this is set to False. Setting it to True will cause the file to be treated as an attachment (ie it will trigger a 'Save/Open' download dialog) and the user will see the proper filename.
Example:
class ServeHandler(blobstore_handlers.BlobstoreDownloadHandler):
def get(self, resource):
resource = str(urllib.unquote(resource))
blob_info = blobstore.BlobInfo.get(resource)
self.send_blob(blob_info,save_as=True)
It is also possible to overwrite the filename by passing in a string:
self.send_blob(blob_info,save_as='my_file.txt')
If you want some content (such as pdfs) to open rather than save you could use the content_type to determine the behavior:
blob_info = blobstore.BlobInfo.get(resource)
type = blob_info.content_type
if type == 'application/pdf':
self.response.headers['Content-Type'] = type
self.send_blob(blob_info,save_as=False)
else:
self.send_blob(blob_info,save_as=True)
For future reference, save_as and the BlobstoreDownloadHandler is documented here:
http://code.google.com/appengine/docs/python/tools/webapp/blobstorehandlers.html
It does seem like it should be a bit easier to find. Let's see if it can be improved.
Another option is to append the file name to the end of the download URL. For example:
/files/AMIfv95HJJY3F75v3lz2EeyvWIvGKxEcDagKtyDSgQSPWiMnE0C2iYTUxLZlFHs2XxnV_j1jdWmmKbSVwBj6lYT0-G_w5wENIdPKDULHqa8Q3E_uyeY1gFu02Iiw9xm523Rxk3LJnqHf9n8209t4sPEHhwVOKdDF2A/prezents-list.doc
If you use Jinja2 for templating, you can construct such an URL like this:
{{file.filename}}
then you should adapt your URL mapping accordingly to something like this:
('/files/([^/]+)/?.*', DownloadHandler)
If you have the blob key in the URL, you can ignore the file name in your server-side code.
The benefit of this approach is that content types like images or PDF open directly in the browser, which is convenient for quick viewing. Other content types will just be saved to disk.
Yes it is the best approach; just query the BlobInfo object using the given Blobstore key and use its content-type property.

How to change metadata on an object in Amazon S3

If you have already uploaded an object to an Amazon S3 bucket, how do you change the metadata using the API? It is possible to do this in the AWS Management Console, but it is not clear how it could be done programmatically. Specifically, I'm using the boto API in Python and from reading the source it is clear that using key.set_metadata only works before the object is created as it just effects a local dictionary.
It appears you need to overwrite the object with itself, using a "PUT Object (Copy)" with an x-amz-metadata-directive: REPLACE header in addition to the metadata. In boto, this can be done like this:
k = k.copy(k.bucket.name, k.name, {'myKey':'myValue'}, preserve_acl=True)
Note that any metadata you do not include in the old dictionary will be dropped. So to preserve old attributes you'll need to do something like:
k.metadata.update({'myKey':'myValue'})
k2 = k.copy(k.bucket.name, k.name, k.metadata, preserve_acl=True)
k2.metadata = k.metadata # boto gives back an object without *any* metadata
k = k2;
I almost missed this solution, which is hinted at in the intro to an incorrectly-titled question that's actually about a different problem than this question: Change Content-Disposition of existing S3 object
In order to set metadata on S3 files,just don't provide target location as only source information is enough to set metadata.
final ObjectMetadata metadata = new ObjectMetadata();
metadata.addUserMetadata(metadataKey, value);
final CopyObjectRequest request = new CopyObjectRequest(bucketName, keyName, bucketName, keyName)
.withSourceBucketName(bucketName)
.withSourceKey(keyName)
.withNewObjectMetadata(metadata);
s3.copyObject(request);`
If you want your metadata stored remotely use set_remote_metadata
Example:
key.set_remote_metadata({'to_be': 'added'}, ['key', 'to', 'delete'], {True/False})
Implementation is here:
https://github.com/boto/boto/blob/66b360449812d857b4ec6a9834a752825e1e7603/boto/s3/key.py#L1875
You can change the metadata without re-uloading the object by using the copy command. See this question: Is it possible to change headers on an S3 object without downloading the entire object?
For the first answer it's a good idea to include the original content type in the metadata, for example:
key.set_metadata('Content-Type', key.content_type)
In Java, You can copy object to the same location.
Here metadata will not copy while copying an Object.
You have to get metadata of original and set to copy request.
This method is more recommended to insert or update metadata of an Amazon S3 object
ObjectMetadata metadata = amazonS3Client.getObjectMetadata(bucketName, fileKey);
ObjectMetadata metadataCopy = new ObjectMetadata();
metadataCopy.addUserMetadata("yourKey", "updateValue");
metadataCopy.addUserMetadata("otherKey", "newValue");
metadataCopy.addUserMetadata("existingKey", metadata.getUserMetaDataOf("existingValue"));
CopyObjectRequest request = new CopyObjectRequest(bucketName, fileKey, bucketName, fileKey)
.withSourceBucketName(bucketName)
.withSourceKey(fileKey)
.withNewObjectMetadata(metadataCopy);
amazonS3Client.copyObject(request);
here is the code that worked for me. I'm using aws-java-sdk-s3 version 1.10.15
ObjectMetadata metadata = new ObjectMetadata();
metadata.setContentType(fileExtension.getMediaType());
s3Client.putObject(new PutObjectRequest(bucketName, keyName, tempFile)
.withMetadata(metadata));

Categories

Resources