I am using Backblaze B2 and b2sdk.v2 in Flask to upload files.
This is code I tried, using the upload method:
# I am not showing authorization code...
def upload_file(file):
bucket = b2_api.get_bucket_by_name(bucket_name)
file = request.files['file']
bucket.upload(
upload_source=file,
file_name=file.filename,
)
This shows an error like this
AttributeError: 'SpooledTemporaryFile' object has no attribute 'get_content_length'
I think it's because I am using a FileStorage instance for the upload_source parameter.
I want to know whether I am using the API correctly or, if not, how should I use this?
Thanks
You're correct - you can't use a Flask FileStorage instance as a B2 SDK UploadSource. What you need to do is to use the upload_bytes method with the file's content:
def upload_file(file):
bucket = b2_api.get_bucket_by_name(bucket_name)
file = request.files['file']
bucket.upload_bytes(
data_bytes=file.read(),
file_name=file.filename,
...other parameters...
)
Note that this reads the entire file into memory. The upload_bytes method may need to restart the upload if something goes wrong (with the network, usually), so the file can't really be streamed straight through into B2.
If you anticipate that your files will not fit into memory, you should look at using create_file_stream to upload the file in chunks.
Related
My intention is to have a large image stored on my S3 server and then get a lambda function to read/process the file and save the resulting output(s). I'm using a package called python-bioformats to work with a proprietary image file (which is basically a whole bunch of tiffs stacked together). When I use
def lambda_handler(event, context):
import boto3
key = event['Records'][0]['s3']['object']['key'].encode("utf-8")
bucket = 'bucketname'
s3 = boto3.resource('s3')
imageobj = s3.Object(bucket, key).get()['Body'].read()
bioformats.get_omexml_metadata(imageobj)
I have a feeling that the lambda function tries to download the entire file (5GB) when making imageobj. Is there a way I can just get the second function (which takes a filepath as argument) to refer to the s3 object in a filepath-like manner? I'd also like to not expose the s3 bucket/object publicly, so doing this server-side would be ideal.
If your bioformats.get_omexml_metadata() function requires a filepath as an argument, then you will need to have the object downloaded before calling the function.
This could be a problem in an AWS Lambda function because there is a 500MB limit on available disk space (and only in /tmp/).
If the data can instead be processed as a stream, you could read the data as it is required without saving to disk first. However, the python-bioformats documentation does not show this as an option. In fact, I would be surprised if your above code works, given that it is expecting a path while imageobj is the contents of the file.
How to validate JSON format without loading the file? I am copying files from one S3 bucket to another S3 bucket. After JSONL files are
copied , I want to check if file format is correct in the sense curly braces and commas are fine.
I don't want to use json.load() because file size and number are big and it will slow down the process plus file is already copied so no need to parse it , just validation is requirement.
There is no capability within Amazon S3 itself to validate the content of objects.
You could configure S3 to trigger an AWS Lambda function whenever a file is created in the S3 bucket. The Lambda function could then parse the file and perform some action (eg send a notification or move the object to another location) if the validation fails.
Streaming the file seems to be the way to go about it, put into a generator and yield line by line to check if the JSON is valid. The requests library supports streaming of a file.
The solution would look something like this:
import requests
def get_data():
r = requests.get('s3_file_url', stream=True)
yield from r.iter_lines():
def parse_data():
# initialize generator
gd_gen = get_data()
while True:
try:
ge_gen.__next__()
except StopIteration:
break
# put your validation code here
Let me know you need a better clarification
I have a file bound to a form in this manner:
(forms.py)
class UploadFileForm(forms.Form):
wbfile = forms.FileField(label='Upload workbook' , help_text='Please Ensure file is in .xlsx format')
Now I have can access this in a view function using request.FILES['wbfile']. But I want to send this file to a template and then to another view function. So I bound it to a form like this:
f = form.fields['wbfile']
Now I want to save this file in the disk, but how do I access this file, this is what I am trying:
f = form.fields['file'].value()
with open(/tmp/xyz) as destination:
contents = f.read()
destination.write(contents)
But this throws an error saying: 'FileField' object has no attribute 'value'.
This is what for.fields[wbfile] shows:
<django.forms.fields.FileField object at 0x7f91ff1c49d0>
Hence the file is definitely bound to the form.
Please help and forgive me if the doubt is too obvious, I am a beginner!
I think there is everythink you need in the Django documentation.
For FileField : https://docs.djangoproject.com/en/dev/ref/models/fields/#filefield
Try using YourFileField.storage and YourFileField.path to have access to the file.
Then with this doc : https://docs.djangoproject.com/en/dev/ref/files/storage/
You can use open to open your file in memory. I guess it can give a code like this :
storage, path = YourFileField.storage, YourFileField.path
File f = storage.open(path)
I don't really understand what you want to do next, but if you want to save the file somewhere else, I guess you can probably use :
f.save(path) #or something similar, haven't tested any code like this
I am converting code away from the deprecated files api.
I have the following code that works fine in the SDK server but fails in production. Is what I am doing even correct? If yes what could be wrong, any ideas how to troubleshoot it?
# Code earlier writes the file bs_file_name. This works fine because I can see the file
# in the Cloud Console.
bk = blobstore.create_gs_key( "/gs" + bs_file_name)
assert(bk)
if not isinstance(bk,blobstore.BlobKey):
bk = blobstore.BlobKey(bk)
assert isinstance(bk,blobstore.BlobKey)
# next line fails here in production only
assert(blobstore.get(bk)) # <----------- blobstore.get(bk) returns None
Unfortunately, as per the documentation, you can't get a BlobInfo object for GCS files.
https://developers.google.com/appengine/docs/python/blobstore/#Python_Using_the_Blobstore_API_with_Google_Cloud_Storage
Note: Once you obtain a blobKey for the GCS object, you can pass it around, serialize it, and otherwise use it interchangeably anywhere you can use a blobKey for objects stored in Blobstore. This allows for usage where an app stores some data in blobstore and some in GCS, but treats the data otherwise identically by the rest of the app. (However, BlobInfo objects are currently not available for GCS objects.)
I encountered this exact same issue today and it feels very much like a bug within the blobstore api when using google cloud storage.
Rather than leveraging the blobstore api I made use of the google cloud storage client library. The library can be downloaded here: https://developers.google.com/appengine/docs/python/googlecloudstorageclient/download
To access a file on GCS:
import cloudstorage as gcs
with gcs.open(GCSFileName) as f:
blob_content = f.read()
print blob_content
It sucks that GAE has different behaviours when using blobInfo in local mode or the production environment, it took me a while to find out that, but a easy solution is that:
You can use a blobReader to access the data when you have the blob_key.
def getBlob(blob_key):
logging.info('getting blob('+blob_key+')')
with blobstore.BlobReader(blob_key) as f:
data_list = []
chunk = f.read(1000)
while chunk != "":
data_list.append(chunk)
chunk = f.read(1000)
data = "".join(data_list)
return data`
https://developers.google.com/appengine/docs/python/blobstore/blobreaderclass
I'm banging my head against the wall with this one:
What I want to do is store a file that is returned from an API in the data store as a blob.
Here is the code that I use on my local machine (which of course works due to an existing file system):
client.convertHtml(html, open('html.pdf', 'wb'))
Since I cannot write to a file on App Engine I tried several ways to store the response, without success.
Any hints on how to do this? I was trying to do it with StringIO and managed to store the response but then weren't able to store it as a blob in the data store.
Thanks,
Chris
Found the error. Here is how it looks like right now (simplified).
output = StringIO.StringIO()
try:
client.convertURI("example.com", output)
Report.pdf = db.Blob(output.getvalue())
Report.put()
except pdfcrowd.Error, why:
logging.error('PDF creation failed %s' % why)
I was trying to save the output without calling "getvalue()", that was the problem. Perhaps this is of use to someone in the future :)