TransformationError Reasons get_serving_url / Images API - python

I'm running into the following error using get_serving_url in order to serve images from my bucket.
Traceback (most recent call last):
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.5.2/webapp2.py", line 570, in dispatch
return method(*args, **kwargs)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/tasklets.py", line 1087, in synctasklet_wrapper
return taskletfunc(*args, **kwds).get_result()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/tasklets.py", line 1057, in tasklet_wrapper
result = func(*args, **kwds)
File "/base/data/home/apps/e~tileserve20171207t210519.406056180994857717/blob_upload.py", line 70, in post
bf.put_async()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 3473, in _put_async
self._pre_put_hook()
File "/base/data/home/apps/e~tileserve/20171207t210519.406056180994857717/blob_files.py", line 124, in _pre_put_hook
print images.get_serving_url(None, filename='/gs' + '/tileserve.appspot.com/user2test4test4RGB20170927.tif')
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/images/__init__.py", line 1868, in get_serving_url
return rpc.get_result()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 613, in get_result
return self.__get_result_hook(self)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/images/__init__.py", line 1972, in get_serving_url_hook
raise _ToImagesError(e, readable_blob_key)
TransformationError
When I upload an image to my bucket then it works but when I create an image through processing which should be exposed through get_serving_url I get TransformationError.
I tried two variants for serving images:
test = blobstore.create_gs_key('/gs' + '/path2object')
images.get_serving_url(test, secure_url=True)
images.get_serving_url(None, filename='/gs' + '/' + <bucket name>+ '/'+ <object name>)
I also set the bucket object ACM permissions and the IAM App Engine app default service account permissions (storage admin). Both variants changed nothing but are important in order to access objects of a bucket.
Did somebody experience this issue ? What could be the error? I do not understand why it works when I upload an image but not for images which are generated through processing.

The traceback suggests you may be trying to call images.get_serving_url() while an async operation may still be in progress.
If that op is actually saving the transformed image in GCS then it could explain the failure: get_serving_url() includes a check of the file being a valid image, which would fail with TransformationError if the file is not yet saved.
If so - you can fix the problem by either:
switching to the sync version of saving the image
waiting until saving the image completes (in case you have some other stuff to do in the meantime) - get the result of the sync call before invoking get_serving_url()
repeatedly trying get_serving_url() while catching TransformationError until it no longer fails that way. This is a bit risky as it can end up in an infinite loop if TransformationError is raised for reasons other than simply saving the image being incomplete.

The issue is not with the get_serving_url() method. It is how you are accessing the objects from google cloud storage bucket.
I switched the access control on my bucket from uniform to fine grained which fixed the problem for me.

Related

How to use Flow-guided video completion (FGVC)?

How can you use Flow-guided video completion (FGVC) for a personal file?
Operation is not specified on the various official sources for those who would like to use the FGVC freely from the Google Colab platform (https://colab.research.google.com/drive/1pb6FjWdwq_q445rG2NP0dubw7LKNUkqc?usp=sharing).
I, as a test, I uploaded a video to Google Drive (of the same account from which I was running Google Colab's scripts) divided into various frames, located in a .zip folder called "demo1.zip".
I then ran the first script in the sequence, called "Prepare environment", I activated video sharing via public link and I copied the link in the second script (immediately after the first word "wget ​​–quiet") and in the first entry "rm" I entered "demo1.zip", in relation to the name of my video file.
I proceeded like this after reading the description just above the run button of the second script: "We show a demo on a 15-frames sequence. To process your own data, simply upload the sequence and specify the path."
Running the second script as well, this is successful and my video file is loaded.
I then go to the fourth (and last) script which consists in processing the content through an AI to obtain the final product with an enlarged Field Of View (FOV => larger aspect ratio).
After a few seconds of running, the process ends with an error:
File "video_completion.py", line 613, in <module>
main (args)
File "video_completion.py", line 576, in main
video_completion_sphere (args)
File "video_completion.py", line 383, in video_completion_sphere
RAFT_model = initialize_RAFT (args)
File "video_completion.py", line 78, in initialize_RAFT
model.load_state_dict (torch.load (args.model))
File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 594, in load
return _load (opened_zipfile, map_location, pickle_module, ** pickle_load_args)
File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 853, in _load
result = unpickler.load ()
File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 845, in persistent_load
load_tensor (data_type, size, key, _maybe_decode_ascii (location))
File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 834, in load_tensor
loaded_storages [key] = restore_location (storage, location)
File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 175, in default_restore_location
result = fn (storage, location)
File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 151, in _cuda_deserialize
device = validate_cuda_device (location)
File "/usr/local/lib/python3.6/dist-packages/torch/serialization.py", line 135, in validate_cuda_device
raise RuntimeError ('Attempting to deserialize object on a CUDA'
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available () is False. If you are running on a CPU-only machine, please use torch.load with map_location = torch.device ('cpu') to map your storages to the CPU.
What's wrong with execution? Is there a way to fix and allow to finish the process with Google Colab?
Let me know!

How to upload a blob into azure storage container with sub directories using the python sdk?

I was following along this article: Quickstart: Manage blobs with Python v12 SDK and the documentation for ContainerClient.upload_blob
Here's the snippet to upload a blob with this directory structure: testcontainer / backup / HelloWorld.cab
bsc = BlobServiceClient.from_connection_string('<connection-string>')
cc = bsc.get_container_client('testcontainer')
cc.upload_blob(name='testcontainer/backup/HelloWorld.cab', data=open(r"\\network\path\to\backup\HelloWorld.cab", 'rb').read())
But I get the following error. Any ideas on what I'm doing wrong?
azure.storage.blob._generated.models._models_py3.StorageErrorException: Operation returned an invalid status 'The specifed resource name contains invalid characters.'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python3\lib\site-packages\azure\core\tracing\decorator.py", line 83, in wrapper_use_tracer
return func(*args, **kwargs)
File "C:\Python3\lib\site-packages\azure\storage\blob\_container_client.py", line 836, in upload_blob
blob.upload_blob(
File "C:\Python3\lib\site-packages\azure\core\tracing\decorator.py", line 83, in wrapper_use_tracer
return func(*args, **kwargs)
File "C:\Python3\lib\site-packages\azure\storage\blob\_blob_client.py", line 496, in upload_blob
return upload_block_blob(**options)
File "C:\Python3\lib\site-packages\azure\storage\blob\_upload_helpers.py", line 153, in upload_block_blob
process_storage_error(error)
File "C:\Python3\lib\site-packages\azure\storage\blob\_shared\response_handlers.py", line 147, in process_storage_error
raise error
azure.core.exceptions.HttpResponseError: The specifed resource name contains invalid characters.
RequestId:71cad76d-801e-0097-8068-1fc9e0000000
Time:2020-05-01T03:28:33.5320153Z
ErrorCode:InvalidResourceName
Error:None
Note: I also saw this answer to this question: Microsoft Azure: How to create sub directory in a blob container
I am able to reproduce this issue if I use invalid resource name (which is what the error message is telling you).
For example, if I use testcontainer as my blob container name (which is correct), I am able to upload the blob.
However if I use testContainer as my blob container name (which is invalid, notice the uppercase "C"), I get the same error as you're getting.
Please check the name of the blob container and the blob. Please see this link for naming convention for blob resources: https://learn.microsoft.com/en-us/rest/api/storageservices/naming-and-referencing-containers--blobs--and-metadata.

How to access Google Cloud Storage Bucket from AI Platform job

My Google AI Platform / ML Engine training job doesn't seem to have access to the training file I put into a Google Cloud Storage bucket.
Google's AI Platform / ML Engine requires you store training data files in one of their Cloud Storage buckets. Accessing locally from CLI works fine. However, when I send a training job (after ensuring the data is in the appropriate location in my Cloud Storage bucket), I get an error seeming to be due to no access to the bucket Link URL.
The error is from trying to read what looks to me like the contents of a web page that Google served up saying "Hey, you don't have access to this." I see this gaia.loginAutoRedirect.start(5000, and a URL with this flag at the end: noautologin=true.
I know permissions between AI Platform and Cloud Storage are a thing, but both are under the same project. The walkthroughs I'm using at very least imply that no further action is required if under the same project.
I am assuming I need to use the Link URL provided in the bucket Overview tab. Tried the Link for gsutil but the python (from Google's CloudML Samples repo) was upset about using gs://.
I think Google's examples are proving insufficient since their example data is from a public URL rather than a private Cloud Storage bucket.
Ultimately, the error message I get is a Python error. But like I said, this is preceded by a bunch of gross INFO logs of HTML/CSS/JS from Google saying I don't have permission to get the file I'm trying to get. These logs are actually just because I added a print statement to the util.py file as well - right before read_csv() on the train file. (So the Python parse error is due to trying to parse HTML as a CSV).
...
INFO g("gaia.loginAutoRedirect.stop",function(){var b=n;b.b=!0;b.a&&(clearInterval(b.a),b.a=null)});
INFO gaia.loginAutoRedirect.start(5000,
INFO 'https:\x2F\x2Faccounts.google.com\x2FServiceLogin?continue=https%3A%2F%2Fstorage.cloud.google.com%2F<BUCKET_NAME>%2Fdata%2F%2Ftrain.csv\x26followup=https%3A%2F%2Fstorage.cloud.google.com%2F<BUCKET_NAME>%2Fdata%2F%2Ftrain.csv\x26service=cds\x26passive=1209600\x26noautologin=true',
ERROR Command '['python', '-m', u'trainer.task', u'--train-files', u'gs://<BUCKET_NAME>/data/train.csv', u'--eval-files', u'gs://<BUCKET_NAME>/data/test.csv', u'--batch-pct', u'0.2', u'--num-epochs', u'1000', u'--verbosity', u'DEBUG', '--job-dir', u'gs://<BUCKET_NAME>/predictor']' returned non-zero exit status 1.
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 137, in <module>
train_and_evaluate(args)
File "/root/.local/lib/python2.7/site-packages/trainer/task.py", line 80, in train_and_evaluate
train_x, train_y, eval_x, eval_y = util.load_data()
File "/root/.local/lib/python2.7/site-packages/trainer/util.py", line 168, in load_data
train_df = pd.read_csv(training_file_path, header=0, names=_CSV_COLUMNS, na_values='?')
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 678, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 446, in _read
data = parser.read(nrows)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1036, in read
ret = self._engine.read(nrows)
File "/usr/local/lib/python2.7/dist-packages/pandas/io/parsers.py", line 1848, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 876, in pandas._libs.parsers.TextReader.read
File "pandas/_libs/parsers.pyx", line 891, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas/_libs/parsers.pyx", line 945, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 932, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 2112, in pandas._libs.parsers.raise_parser_error
ParserError: Error tokenizing data. C error: Expected 5 fields in line 205, saw 961
To get the data, I'm more or less trying to mimic this:
https://github.com/GoogleCloudPlatform/cloudml-samples/blob/master/census/tf-keras/trainer/util.py
Various ways I have tried to address my bucket in my copy of util.py:
https://console.cloud.google.com/storage/browser/<BUCKET_NAME>/data (think this was the "Link URL" back in May)
https://storage.cloud.google.com/<BUCKET_NAME>/data (this is the "Link URL" now - in July)
gs://<BUCKET_NAME>/data (this is the URI - which gives a different error about not liking gs as a url type)
Transferring the answer from a comment above:
Looks like the URL approach requires cookie based authentication if it's not a public object. Instead of using a URL, I would suggest using tf.gfile with a gs:// path, as is used in the Keras sample. If you need to download the file from GCS in a separate step, you can use the GCS client library.

Python PermissionError uploading to an Azure Datalake folder

I am trying to upload a file to azure datalake using python script.
I am able to download a file from the datalake, but the uploading raise a permission error, whereas i checked all permissions at all levels (Read Write Execute and the option for the decendants).
## works fine
multithread.ADLDownloader(adls, lpath='C:\\Users\\User1\\file1.txt', rpath='/Test/', nthreads=64, overwrite=True,
buffersize=4194304,
blocksize=4194304)
## Raise error
multithread.ADLUploader(adls, rpath='/Test', lpath='C:\\Users\\User1\\HC',
nthreads=64 , chunksize=268435456, buffersize=4194304, blocksize=4194304, client=None, run=True,
overwrite=False, verbose=True)
the error:
File "C:\Users\Python37-32\test_azure.py", line 64, in <module>
overwrite=False, verbose=True)
File "C:\Users\Python37-32\lib\site-packages\azure\datalake\store\multithread.py", line 442, in __init__
self.run()
File "C:\Users\Python37-32\lib\site-packages\azure\datalake\store\multithread.py", line 548, in run
self.client.run(nthreads, monitor)
File "C:\Users\Python37-32\lib\site-packages\azure\datalake\store\transfer.py", line 525, in run
raise DatalakeIncompleteTransferException('One more more exceptions occured during transfer, resulting in an incomplete transfer. \n\n List of exceptions and errors:\n {}'.format('\n'.join(error_list)))
azure.datalake.store.exceptions.DatalakeIncompleteTransferException: One more more exceptions occured during transfer, resulting in an incomplete transfer.
List of exceptions and errors:
C:\Users\User1\HC\AC.TXT -> \Test\AC.TXT, chunk \Test\AC.TXT 0: errored, "PermissionError('/Test/AC.TXT')"
Does somebody have an idea of the problem ?
The azure account I am using has got all the privileges on the Datalake, but the Azure Application didn't.

Using self.render() in a StaticFileHandler

I'm trying to extend StaticFileHandler in such a way that I can process file requests but call self.render(filename, **kwargs) on the file to actually serve it to the client. (Yes, I realize that at that point it's no longer a static file per se).
Here's the code I'm trying to run:
class MustacheFileHandler(tornado.web.StaticFileHandler):
def get(self, filename):
self.render(_STATIC_ROOT_ + '/' + path.split('/')[len(path.split('/'))-2], userLoginStatus='you are logged out!')
# ...
class Application(tornado.web.Application):
def __init__(self, **overrides):
handlers = [(r'/(.*)', MustacheFileHandler, {'path' : _STATIC_ROOT_})]
# ...
... Where _STATIC_ROOT_ is a variable included in my server's config file loaded on startup.
The issue I've got is, whenever I try to do a GET on a file I know exists on the server, I get the following error:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/tornado/web.py", line 1332, in _execute
result = method(*self.path_args, **self.path_kwargs)
File "myfile.py", line 173, in get
self.render(_STATIC_ROOT_ + '/' + path.split('/')[len(path.split('/'))-2], userLoginStatus='you are logged out!')
File "/usr/local/lib/python2.7/dist-packages/tornado/web.py", line 747, in render
self.finish(html)
File "/usr/local/lib/python2.7/dist-packages/tornado/web.py", line 877, in finish
self.set_etag_header()
File "/usr/local/lib/python2.7/dist-packages/tornado/web.py", line 1257, in set_etag_header
etag = self.compute_etag()
File "/usr/local/lib/python2.7/dist-packages/tornado/web.py", line 2185, in compute_etag
version_hash = self._get_cached_version(self.absolute_path)
AttributeError: 'MustacheFileHandler' object has no attribute 'absolute_path'
I'm not sure what's causing this error or how I can handle it.
Why are you using StaticFileHandler if the response is not static? This is likely to break assumptions built into the class.
StaticFileHandler is designed to be subclassed in limited ways as described in its documentation. In particular, subclasses should not override get(), and attempting to do so results in the error you're seeing.
If you want to use the Tornado template engine as a kind of preprocessor of the files on disk, you could try to do this by overriding both get_content and get_content_size, and making them call self.render_string() (also consider that if your templates are not individually self-contained you'll need to change get_content_version to take all dependencies into account). However, this requires messy caching to avoid rendering the template multiple times. It's probably better to either A) Render the templates on the fly with an ordinary RequestHandler. or B) Write a little script to render all your templates, write them to disk, and serve the results as actual static files.

Categories

Resources