How can I Troubleshoot the Google Cloudstorage Client?

How can I Troubleshoot the Google Cloudstorage Client? - python

Given the sample code:
import cloudstorage
from django.shortcuts import render
def list_files(request):
file_list = []
try:
bucket_name = my_bucket
gcs_list_obj = cloudstorage.listbucket('/' + bucket_name, delimiter="/")
for item in gcs_list_obj:
file_list.append(item)
except Exception, e:
raise e
return render(request, 'default.htm', {'file_list': file_list,
'bucket_name': bucket_name})
The expectation would be to see a populated array of iterated objects from cloudstorage.listbucket Instead Django throws an InternalError with the message
5:
What are the common steps for troubleshooting Storage buckets in django?

Did you want to add to file_list GCSFileStat objects (and are you handling it correctly in the template) or just the filenames.
If the latter than you can add item.filename instead.

After much frustration, it seems there the issue was related to my project NOT having a default bucket.
Navigating to:
https://console.developers.google.com/storage/browser/YOUR_PROJECT.appspot.com/
Then uploading a few files / folders seems to have resolved the issue, and the sample code works as expected.

Related

Unable to set file content type in S3

How do you set content type on a file in a webhosting-enabled S3 account via the Python boto module?
I'm doing:
from boto.s3.connection import S3Connection
from boto.s3.key import Key
from boto.cloudfront import CloudFrontConnection
conn = S3Connection(access_key_id, secret_access_key)
bucket = conn.create_bucket('mybucket')
b = conn.get_bucket(bucket)
b.set_acl('public-read')
fn = 'index.html'
template = '<html>blah</html>'
k = Key(b)
k.key = fn
k.set_contents_from_string(template)
k.set_acl('public-read')
k.set_metadata('Content-Type', 'text/html')
However, when I access it from http://mybucket.s3-website-us-east-1.amazonaws.com/index.html my browser prompts me to download the file instead of simply serving it as a webpage.
Looking at the metadata in the S3 Management console shows the Content-Type has actually been set to "application/octet-stream". If I manually change it in the console, I can access the page normally, but if I run my script again, it resets it back to the wrong content type.
What am I doing wrong?

The set_metadata method is really for setting user metadata on S3 objects. Many of the standard HTTP metadata fields have first class attributes to represent them, e.g. content_type. Also, you want to set the metadata before you actually send the object to S3. Something like this should work:
import boto
conn = boto.connect_s3()
bucket = conn.get_bucket('mybucket') # Assumes bucket already exists
key = bucket.new_key('mykey')
key.content_type = 'text/html'
key.set_contents_from_string(mystring, policy='public-read')
Note that you can set canned ACL policies at the time you write the object to S3 which saves having to make another API call.

For people who need one-liner for this,
import boto3
s3 = boto3.resource('s3')
s3.Bucket('bucketName').put_object(Key='keyName', Body='content or fileData', ContentType='contentType', ACL='check below')
Supported ACL values:
'private'|'public-read'|'public-read-write'|'authenticated-read'|'aws-exec-read'|'bucket-owner-read'|'bucket-owner-full-control'
Arguments supported by put_object can be found here, https://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.put_object

I wasn't able to get the above solution to actually persist my metadata changes.
Perhaps because I was using a file and it was resetting the content type using mimetype? Also I am uploading m3u8 and ts files for HLS encoding so that could interfere as well.
Anyway, here's what worked for me.
import boto
conn = boto.connect_s3()
bucket = conn.get_bucket('mybucket')
key_m3u8 = Key(bucket_handle)
key_m3u8.key = s3folder+"/"+s3keyname
key_m3u8.metadata = {"Content-Type":"application/x-mpegURL","Cache-Control":"public,max-age=8"}
key_m3u8.set_contents_from_filename("path_to_my_file", policy="public-read")

If you use AWS S3 Bitbucket Pipelines Python add the parameter content_type:
s3_upload.py
def upload_to_s3(bucket, artefact, bucket_key, content_type):
...
def main():
...
parser.add_argument("content_type", help="Content Type File")
...
if not upload_to_s3(args.bucket, args.artefact, args.bucket_key, args.content_type):
then modify bitbucket-pipelines.yml as follow:
...
- python s3_upload.py bucket_name file key content_type
...
Where content_type param can be one of the following: MIME types (IANA media types)

Get Public URL for File - Google Cloud Storage - App Engine (Python)

Is there a python equivalent to the getPublicUrl PHP method?
$public_url = CloudStorageTools::getPublicUrl("gs://my_bucket/some_file.txt", true);
I am storing some files using the Google Cloud Client Library for Python, and I'm trying to figure out a way of programatically getting the public URL of the files I am storing.

Please refer to https://cloud.google.com/storage/docs/reference-uris on how to build URLs.
For public URLs, there are two formats:
http(s)://storage.googleapis.com/[bucket]/[object]
or
http(s)://[bucket].storage.googleapis.com/[object]
Example:
bucket = 'my_bucket'
file = 'some_file.txt'
gcs_url = 'https://%(bucket)s.storage.googleapis.com/%(file)s' % {'bucket':bucket, 'file':file}
print gcs_url
Will output this:
https://my_bucket.storage.googleapis.com/some_file.txt

You need to use get_serving_url from the Images API. As that page explains, you need to call create_gs_key() first to get the key to pass to the Images API.

Daniel, Isaac - Thank you both.
It looks to me like Google is deliberately aiming for you not to directly serve from GCS (bandwidth reasons? dunno). So the two alternatives according to the docs are either using Blobstore or Image Services (for images).
What I ended up doing is serving the files with blobstore over GCS.
To get the blobstore key from a GCS path, I used:
blobKey = blobstore.create_gs_key('/gs' + gcs_filename)
Then, I exposed this URL on the server -
Main.py:
app = webapp2.WSGIApplication([
...
('/blobstore/serve', scripts.FileServer.GCSServingHandler),
...
FileServer.py:
class GCSServingHandler(blobstore_handlers.BlobstoreDownloadHandler):
def get(self):
blob_key = self.request.get('id')
if (len(blob_key) > 0):
self.send_blob(blob_key)
else:
self.response.write('no id given')

It's not available, but I've filed a bug. In the meantime, try this:
import urlparse
def GetGsPublicUrl(gsUrl, secure=True):
u = urlparse.urlsplit(gsUrl)
if u.scheme == 'gs':
return urlparse.urlunsplit((
'https' if secure else 'http',
'%s.storage.googleapis.com' % u.netloc,
u.path, '', ''))
For example:
>>> GetGsPublicUrl('gs://foo/bar.tgz')
'https://foo.storage.googleapis.com/bar.tgz'

Amazon S3 boto - how to delete folder?

I created a folder in s3 named "test" and I pushed "test_1.jpg", "test_2.jpg" into "test".
How can I use boto to delete folder "test"?

Here is 2018 (almost 2019) version:
s3 = boto3.resource('s3')
bucket = s3.Bucket('mybucket')
bucket.objects.filter(Prefix="myprefix/").delete()

There are no folders in S3. Instead, the keys form a flat namespace. However a key with slashes in its name shows specially in some programs, including the AWS console (see for example Amazon S3 boto - how to create a folder?).
Instead of deleting "a directory", you can (and have to) list files by prefix and delete. In essence:
for key in bucket.list(prefix='your/directory/'):
key.delete()
However the other accomplished answers on this page feature more efficient approaches.
Notice that the prefix is just searched using dummy string search. If the prefix were your/directory, that is, without the trailing slash appended, the program would also happily delete your/directory-that-you-wanted-to-remove-is-definitely-not-t‌his-one.
For more information, see S3 boto list keys sometimes returns directory key.

I feel that it's been a while and boto3 has a few different ways of accomplishing this goal. This assumes you want to delete the test "folder" and all of its objects Here is one way:
s3 = boto3.resource('s3')
objects_to_delete = s3.meta.client.list_objects(Bucket="MyBucket", Prefix="myfolder/test/")
delete_keys = {'Objects' : []}
delete_keys['Objects'] = [{'Key' : k} for k in [obj['Key'] for obj in objects_to_delete.get('Contents', [])]]
s3.meta.client.delete_objects(Bucket="MyBucket", Delete=delete_keys)
This should make two requests, one to fetch the objects in the folder, the second to delete all objects in said folder.
https://boto3.readthedocs.org/en/latest/reference/services/s3.html#S3.Client.delete_objects

A slight improvement on Patrick's solution. As you might know, both list_objects() and delete_objects() have an object limit of 1000. This is why you have to paginate listing and delete in chunks. This is pretty universal and you can give Prefix to paginator.paginate() to delete subdirectories/paths
client = boto3.client('s3', **credentials)
paginator = client.get_paginator('list_objects_v2')
pages = paginator.paginate(Bucket=self.bucket_name)
delete_us = dict(Objects=[])
for item in pages.search('Contents'):
delete_us['Objects'].append(dict(Key=item['Key']))
# flush once aws limit reached
if len(delete_us['Objects']) >= 1000:
client.delete_objects(Bucket=bucket, Delete=delete_us)
delete_us = dict(Objects=[])
# flush rest
if len(delete_us['Objects']):
client.delete_objects(Bucket=bucket, Delete=delete_us)

You can use bucket.delete_keys() with a list of keys (with a large number of keys I found this to be an order of magnitude faster than using key.delete).
Something like this:
delete_key_list = []
for key in bucket.list(prefix='/your/directory/'):
delete_key_list.append(key)
if len(delete_key_list) > 100:
bucket.delete_keys(delete_key_list)
delete_key_list = []
if len(delete_key_list) > 0:
bucket.delete_keys(delete_key_list)

If versioning is enabled on the S3 bucket:
s3 = boto3.resource('s3')
bucket = s3.Bucket('mybucket')
bucket.object_versions.filter(Prefix="myprefix/").delete()

If one needs to filter by object contents like I did, the following is a blueprint for your logic:
def get_s3_objects_batches(s3: S3Client, **base_kwargs):
kwargs = dict(MaxKeys=1000, **base_kwargs)
while True:
response = s3.list_objects_v2(**kwargs)
# to yield each and every file: yield from response.get('Contents', [])
yield response.get('Contents', [])
if not response.get('IsTruncated'): # At the end of the list?
break
continuation_token = response.get('NextContinuationToken')
kwargs['ContinuationToken'] = continuation_token
def your_filter(b):
raise NotImplementedError()
session = boto3.session.Session(profile_name=profile_name)
s3client = session.client('s3')
for batch in get_s3_objects_batches(s3client, Bucket=bucket_name, Prefix=prefix):
to_delete = [{'Key': obj['Key']} for obj in batch if your_filter(obj)]
if to_delete:
s3client.delete_objects(Bucket=bucket_name, Delete={'Objects': to_delete})

#Deleting a Files Inside Folder S3 using boto3#
def delete_from_minio():
"""
This function is used to delete files or folder inside the another Folder
"""
try:
logger.info("Deleting from minio")
aws_access_key_id='Your_aws_acess_key'
aws_secret_access_key = 'Your_aws_Secret_key'
host = 'your_aws_endpoint'
s3 = boto3.resource('s3', aws_access_key_id=aws_access_key_id,
aws_secret_access_key=aws_secret_access_key ,
config=boto3.session.Config(signature_version='your_version'),
region_name="your_region",
endpoint_url=host ,
verify=False)
bucket = s3.Bucket('Your_bucket_name')
for obj in bucket.objects.filter(Prefix='Directory/Sub_Directory'):
s3.Object(bucket.name, obj.key).delete()
except Exception as e:
print(f"Error Occurred while deleting from the S3,{str(e)}")
Hope this Helps :)

How to check if a folder exists by its complete path in Plone?

I use xmlrpclib, wsapi4plone to connect to plone:
client = xmlrpclib.ServerProxy('http://user:password#blah.com/plone')
is there a method to check if a folder on plone exists by its url? something like: client.exists('/sites/ng/path/to/folder')
What I did is a bit of cheating:
try:
client.get_types('/sites/ng/path/to/folder')
except:
#if there's an exception, that means there's no folder -> create it here
client.post_object(folder)
I dont have the admin rights so i can't look at the methods list (which I was told that it's somewhere on the plone site but I need to be the admin). I don't want to keep having to ask question on here about what method is available, is there a plone's methods list anywhere on the web?

A fast solution is to query the catalog, like this:
client = xmlrpclib.ServerProxy('http://user:password#blah.com/plone')
completePath = '/'.join(client.getPhysicalPath()) + '/sites/ng/path/to/folder'
if len(client.portal_catalog.searchResults(path=completePath)):
return True
Another solution could be to traverse the folders structure like this:
client = xmlrpclib.ServerProxy('http://user:password#blah.com/plone')
path = '/sites/ng/path/to/folder'
subdirs = path.split('/')[1:]
dir = client
for subdir in subdirs:
if subdir in dir.objectIds():
dir = dir[subdir]
else:
return False
return True
edit:
I have to ammend my answer. I tried to interact with the portal_catalog via xmlrpc and actually it's not so easy. My two options are good, but not for use via xmlrpc. So, taking as example transmogrify.ploneremote, a simple option (not very different from your implementation) for checking if a remote folder exists is this:
try:
path = 'http://user:password#blah.com/plone/sites/ng/path/to/folder'
xmlrpclib.ServerProxy(path).getPhysicalPath()
return True
except xmlrpclib.Fault, e:
return False

how can i set the key 'blob-key' about BlobStore?

I tried to use the jquery plugin "uploadify" to upload multiple files to My App in Google App-Engine, and then save them with blobstore, but it failed. I traced the code into get_uploads, it seems field.type_options is empty, and of course does not have 'blob-key'. Where does the key 'blob-key' come from?
the code like this:
def upload(request):
for blob in blogstorehelper.get_uploads(request, 'Filedata'):
file = File()
file.blobref = blob
file.save()
return ……
but, blogstorehelper.get_uploads(request, 'Filedata') is always empty. In fact, the request has contained the uploaded file(I print the request). I debugged into the blogstorehelper.get_uploads, and found that field.type_options is empty. who can tell me why? thank you! here is the source about get_uploads: http://appengine-cookbook.appspot.com/recipe/blobstore-get_uploads-helper-function-for-django-request/?id=ahJhcHBlbmdpbmUtY29va2Jvb2tyjwELEgtSZWNpcGVJbmRleCI4YWhKaGNIQmxibWRwYm1VdFkyOXZhMkp2YjJ0eUZBc1NDRU5oZEdWbmIzSjVJZ1pFYW1GdVoyOE0MCxIGUmVjaXBlIjphaEpoY0hCbGJtZHBibVV0WTI5dmEySnZiMnR5RkFzU0NFTmhkR1ZuYjNKNUlnWkVhbUZ1WjI4TTIxDA

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How can I Troubleshoot the Google Cloudstorage Client? - python

Did you want to add to file_list GCSFileStat objects (and are you handling it correctly in the template) or just the filenames. If the latter than you can add item.filename instead.

Related

Unable to set file content type in S3

Get Public URL for File - Google Cloud Storage - App Engine (Python)

Amazon S3 boto - how to delete folder?

How to check if a folder exists by its complete path in Plone?

how can i set the key 'blob-key' about BlobStore?

Categories

Resources