Saving to S3, but file is Blank

Saving to S3, but file is Blank - python

I was using this answer to upload a PNG to S3:
https://stackoverflow.com/a/6693577/815878
The file is being uploaded to S3 however whenever I double click on the image to display it the url is "about:blank" and the screen is blank.
When I download the image, it is showing up on my computer as the image I saved. My last recourse was to manually test out the url. I made the photo public then tried:
https://s3.amazonaws.com/BUCKET_NAME/IMAGE_NAME.png
which gives me this:
Is there another step from the answer above that is making the file upload improperly? I'm going to paste my code (which is very similar to the link above) just in case...
image = Image.open(self.image)
conn = S3Connection(settings.AWS_ACCESS_KEY_ID, settings.AWS_SECRET_ACCESS_KEY)
out_im2 = cStringIO.StringIO()
image.save(out_im2, 'PNG')
b = conn.get_bucket('new_test_bucket')
k = b.new_key(self.title+'.png')
k.set_contents_from_filename(out_im2.getvalue())

I'm more of a PHP than a Python guy, but from what I know Amazon S3 Requires defining the type of the file.
You need to send a mime type (e.g. image/png) for the server to recognize the file, since the S3 isn't an actual web server, it doesn't care much for the extension of your file. You could just as much call it "dipididoo.moo" and as long as the type is image/png , it would work.

Set mimetype as below:
k.set_contents_from_filename(out_im2.getvalue(), {'Content-Type' : 'image/png'})

Related

Storing and retrieving a PDF from Python's Mongoengine

I recently learned that the PDF files and images I uploaded to my Heroku website were removed whenever I updated the website. Due to this, I have been trying to store my PDFs in my MongoDB database using Mongoengine (with Flask and Python), and then retrieving them and storing them in the static folder (I was able to successfully do this with my images), with no luck.
Below is the relevant code for my Mongoengine class:
class Article(Document):
uploaded_content = FileField() # Field for storing PDF
uploaded_content_name = StringField() # File name for PDF
The relevant code for my Flask route that is trying to store the PDF:
data = Article()
if request.files['uploaded-article']:
data.uploaded_content = request.files['uploaded-article']
# uploaded_content_name given random name below, and stored in
# database
And then here is my code that tries to retrieve the PDF from mongoengine, and save it to my blog folder:
articles = Article.objects()
for art in articles:
path = os.path.join(app.config['BLOG_FOLDER'], art.uploaded_content_name)
if not os.path.isfile(path):
f = open(art.uploaded_content.read(), 'wb') # This lines gives the error
f.save(os.path.join(app.config['BLOG_FOLDER'] + art.uploaded_content_name), "PDF")
The line that gives me the error is when I try to open the PDF file I stored in my database. I have tried many different ways and have gotten various errors, but one I get is:
No such file or directory: b''. I can confirm that if I read() the database object, its just an empty byte string.
I have also tried changing my flask route to the code below, by storing the open PDF from Flask's request object. However, this gave me the error ValueError: embedded null byte when I tried to open it. However, the read() method gave me at least a really long byte string.
data = Article()
if request.files['uploaded-article']:
# store the PDF in the blog folder
article_pdf = request.files['uploaded-article']
article_pdf.save(os.path.join(app.config['BLOG_FOLDER'], article_pdf_filename))
# Open the PDF just stored in the blog folder
with open(os.path.join(app.config['BLOG_FOLDER'], article_pdf_filename), 'rb') as f:
# Store the opened PDF in the database
data.uploaded_content.put(f)
f.close()
# uploaded_content_name given random name below, and stored in
# database
Another random thing I tried was trying to open the PDF file using the BytesIO data structure, but it resulted in the same error above of an embedded null byte.
Are there any suggestions for how I can properly store and retrieve my PDF from my mongoengine database? My apologies for the complexity of my question - however, if needed I can add more details. If there are any alternative ways of storing my PDFs so they do not get lost on Heroku, I would take that as a valid solution as well.

As a reference for the future, it looks like this was not working because I did not set the content type correctly when putting the pdf in. My original code when saving the PDF to the data.uploaded_content field was:
data.uploaded_content.put(f)
However, I needed to define the mimetype correctly:
data.uploaded_content.put(f, content_type='application/pdf')
With this change it then worked, and I was able to successfully store the PDF in mongoengine. As far as storing the PDF to a folder after it was successfully uploaded, I used the following code:
if art.uploaded_content_name:
extension = art.uploaded_content_name.rsplit('.', 1)[1].lower()
path = os.path.join(app.config['BLOG_FOLDER'], art.uploaded_content_name)
if not os.path.isfile(path):
pdf = art.uploaded_content.read()
with open(os.path.join(app.config['BLOG_FOLDER'], art.uploaded_content_name), 'wb') as f:
f.write(pdf)

How to convert FileStorage object to b2sdk.v2.AbstractUploadSource in Python

I am using Backblaze B2 and b2sdk.v2 in Flask to upload files.
This is code I tried, using the upload method:
# I am not showing authorization code...
def upload_file(file):
bucket = b2_api.get_bucket_by_name(bucket_name)
file = request.files['file']
bucket.upload(
upload_source=file,
file_name=file.filename,
)
This shows an error like this
AttributeError: 'SpooledTemporaryFile' object has no attribute 'get_content_length'
I think it's because I am using a FileStorage instance for the upload_source parameter.
I want to know whether I am using the API correctly or, if not, how should I use this?
Thanks

You're correct - you can't use a Flask FileStorage instance as a B2 SDK UploadSource. What you need to do is to use the upload_bytes method with the file's content:
def upload_file(file):
bucket = b2_api.get_bucket_by_name(bucket_name)
file = request.files['file']
bucket.upload_bytes(
data_bytes=file.read(),
file_name=file.filename,
...other parameters...
)
Note that this reads the entire file into memory. The upload_bytes method may need to restart the upload if something goes wrong (with the network, usually), so the file can't really be streamed straight through into B2.
If you anticipate that your files will not fit into memory, you should look at using create_file_stream to upload the file in chunks.

send data from blobstore as email attachment in GAE

Why isn't the code below working? The email is received, and the file comes through with the correct filename (it's a .png file). But when I try to open the file, it doesn't open correctly (Windows Gallery reports that it can't open this photo or video and that the file may be unsupported, damaged or corrupted).
When I download the file using a subclass of blobstore_handlers.BlobstoreDownloadHandler (basically the exact handler from the GAE docs), and the same blob key, everything works fine and Windows reads the image.
One more bit of info - the binary files from the download and the email appear very similar, but have a slightly different length.
Anyone got any ideas on how I can get email attachments sending from GAE blobstore? There are similar questions on S/O, suggesting other people have had this issue, but there don't appear to be any conclusions.
from google.appengine.api import mail
from google.appengine.ext import blobstore
def send_forum_post_notification():
blob_reader = blobstore.BlobReader('my_blobstore_key')
blob_info = blobstore.BlobInfo.get('my_blobstore_key')
value = blob_reader.read()
mail.send_mail(
sender='my.email#address.com',
to='my.email#address.com',
subject='this is the subject',
body='hi',
reply_to='my.email#address.com',
attachments=[(blob_info.filename, value)]
)
send_forum_post_notification()

I do not understand why you use a tuple for the attachment. I use :
message = mail.EmailMessage(sender = ......
message.attachments = [blob_info.filename,blob_reader.read()]

I found that this code doesn't work on dev_appserver but does work when pushed to production.

I ran into a similar problem using the blobstore on a Python Google App Engine application. My application handles PDF files instead of images, but I was also seeing a "the file may be unsupported, damaged or corrupted" error using code similar to your code shown above.
Try approaching the problem this way: Call open() on the BlobInfo object before reading the binary stream. Replace this line:
value = blob_reader.read()
... with these two lines:
bstream = blob_info.open()
value = bstream.read()
Then you can remove this line, too:
blob_reader = blobstore.BlobReader('my_blobstore_key')
... since bstream above will be of type BlobReader.
Relevant documentation from Google is located here:
https://cloud.google.com/appengine/docs/python/blobstore/blobinfoclass#BlobInfo_filename

Saving binary data into file on model via django storages boto s3

I'm pulling back a pdf from the echosign API, which gives me the bytes of a file.
I'm trying to take those bytes and save them into a boto s3 backed FileField. I'm not having much luck.
This was the closest I got, but it errors on saving 'speaker', and the pdf, although written to S3, appears to be corrupt.
Here speaker is an instance of my model and fileData is the 'bytes' string returned from the echosign api
afile = speaker.the_file = S3BotoStorageFile(filename, "wb", S3BotoStorage())
afile.write(fileData)
afile.close()
speaker.save()

I'm closer!
content = ContentFile(fileData)
speaker.profile_file.save(filename, content)
speaker.save()
Turns out the FileField is already a S3BotoStorage, and you can create a new file by passing the raw datadin like that. What I don't know is how to make it binary (I'm assuming its not). My file keeps coming up corrupted, despite having a good amount of data in it.
For reference here is the response from echosign:
https://secure.echosign.com/static/apiv14/sampleXml/getDocuments-response.xml
I'm essentially grabbing the bytes and passing it to ContentFile as fileData. Maybe I need to base64 decode. Going to try that!
Update
That worked!
It seems I have to ask the question here before I figure out the answer. Sigh. So the final code looks something like this:
content = ContentFile(base64.b64decode(fileData))
speaker.profile_file.save(filename, content)
speaker.save()

Python: what's the gdata method for uploading an image with enabled OCR?

as demonstrated on this PHP code, (http://code.google.com/p/gdata-samples/source/browse/trunk/doclist/OCRDemo/ocr.php?r=194 )
where an image can be uploaded to google docs that is automatically converted to text. i'm wondering how to do this in python. there is an "upload" method, but i'm just puzzled how to enable the OCR function.

assuming you've started here:
http://code.google.com/apis/documents/docs/3.0/developers_guide_python.html
you have an authenticated client object already created.
f = open('/path/to/your/test.pdf')
ms = gdata.data.MediaSource(file_handle=f, content_type='application/pdf', content_length=os.path.getsize(f.name))
folder = "https://docs.google.com/feeds/default/private/full" # folder in google docs.
entry = client.Upload(ms, f.name, folder_or_uri= folder + '?ocr=true') # ?ocr=true is the kicker
specifying the folder_or_uri with the trailing ?ocr=true param is what causes the conversion to happen.
after you create it, you can now export it as a txt document.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Saving to S3, but file is Blank - python

Set mimetype as below: k.set_contents_from_filename(out_im2.getvalue(), {'Content-Type' : 'image/png'})

Related

Storing and retrieving a PDF from Python's Mongoengine

How to convert FileStorage object to b2sdk.v2.AbstractUploadSource in Python

send data from blobstore as email attachment in GAE

Saving binary data into file on model via django storages boto s3

Python: what's the gdata method for uploading an image with enabled OCR?

Categories

Resources