Uploading mp4 to Heroku -- file is not saved - python

I'm trying to upload a .mp4 video to be processed by Python server in Heroku. In my server.py, I've have following code:
# Initialize the Flask application
MYDIR = os.path.dirname(__file__)
app = Flask(__name__)
app.config['UPLOAD_PATH'] = 'uploads'
cors = CORS(app, resources={r"/api/*": {"origins": "*"}})
# route http posts to this method
#app.route('/api/uploadFile', methods=['POST'])
def uploadFile():
r = request # this is the incoming request
uploaded_file = r.files['uploadedfile']
# save the incoming video file locally
if uploaded_file.filename != '':
content = r.files['uploadedfile'].read()
uploaded_file.save(os.path.join(MYDIR, uploaded_file.filename))
However, when I run heroku run bash and do ls I don't see the file at all. I've looked at the previous posts regarding this and doing what many have suggested, i.e., to have an absolute path os.path.join(MYDIR, uploaded_file.filename). But I still don't see the file. I know that Heroku is an ephemeral system and doesn't store files, but all I want to do is process the video file and after it's done, I'll myself delete it. Oh by the way, the video is approx 25MB in size. I tried a low quality version which is around 4MB and the problem still persists, so my guess is it is not the size of the video that's the problem.
Also, when I read the content via content = r.files['uploadedfile'].read(), I do see that content is approximately of length 25M suggesting that I'm indeed receiving the video correctly.
Ideally, I would love to convert the byte stream to something like list of frames and then process it using opencv. But I thought, I can, as a first step, just save it locally, then read and process. I don't care that it is deleted after 5 minutes of my processing is done.
Any help would be greatly appreciated. Banged my head around for a day looking for a solution. Thanks!!!
Just want to add that the commands above give me no error. So it seems like everything worked but then when I do heroku run bash, I see no file anywhere. It would've been great if there was an error or an warning provided by the system :-(.
Also, I tried running things locally on localhost server and it does store the file. So it works locally on my machine, just can't seem to save it in Heroku.

Related

Why is Docker able to save files to the source-directory?

I don't really know if this even classifies as a problem, but it somehow surprised me, since I thought that Docker containers run in sandboxed enviroments.
I have a running docker-container that includes a django-rest server.
When making the appropiate requests from the frontend (that's also served through a seperate NGINX-container in the same docker network) the request successfully triggers the response of the django server that includes writing the contents of the requests to a text-file.
Now, the path to the text file points to a subdirectory in the folder where the script itself is located.
But now the problem: the file is saved in the directory on the host-machine itself and (to my knowledge) not in the container.
Now is that something that docker is capable of doing or am I just not aware that django itself (which is running in the WSGI development server mind you) writes the file to the host machine.
now = datetime.now()
timestamp = datetime.timestamp(now)
polygon = json.loads(data['polygon'])
string = ''
for key in polygon:
string += f'{key}={polygon[key]}\r\n'
with open(f"./path/to/images/image_{timestamp}.txt", "w+") as f:
f.write(string)
return 200
Is there something I am missing or something I do not understand about docker or django yet?

gcs client library stopped working with dev_appserver

Google cloud storage client library is returning 500 error when I attempt to upload via development server.
ServerError: Expect status [200] from Google Storage. But got status 500.
I haven't changed anything with the project and the code still works correctly in production.
I've attempted gcloud components update to get the latest dev_server and I've updated to the latest google cloud storage client library.
I've run gcloud init again to make sure credentials are loaded and I've made sure I'm using the correct bucket.
The project is running on windows 10.
Python version 2.7
Any idea why this is happening?
Thanks
Turns out this has been a problem for a while.
It has to do with how blobstore filenames are generated.
https://issuetracker.google.com/issues/35900575
The fix is to monkeypatch this file:
google-cloud-sdk\platform\google_appengine\google\appengine\api\blobstore\file_blob_storage.py
def _FileForBlob(self, blob_key):
"""Calculate full filename to store blob contents in.
This method does not check to see if the file actually exists.
Args:
blob_key: Blob key of blob to calculate file for.
Returns:
Complete path for file used for storing blob.
"""
blob_key = self._BlobKey(blob_key)
# Remove bad characters.
import re
blob_fname = re.sub(r"[^\w\./\\]", "_", str(blob_key))
# Make sure it's a relative directory.
if blob_fname and blob_fname[0] in "/\\":
blob_fname = blob_fname[1:]
return os.path.join(self._DirectoryForBlob(blob_key), blob_fname)

Best way to upload large csv files using python flask

Requirement: To Upload files using flask framework. Once uploaded to the server user should be able to see the file in UI.
Current code: In order to meet above requirement i wrote the code to upload sufficiently large files and its working fine with (~30 MB file, yes ofcourse not that fast). But when i am trying to upload (~100 MB) file, It is taking too long and process never completes.
This is what currently i am doing:
UPLOAD_FOLDER = '/tmp'
file = request.files['filename']
description = request.form['desc']
filename = secure_filename(file.filename)
try:
file.save(os.path.join(UPLOAD_FOLDER, filename))
filepath = os.path.join(UPLOAD_FOLDER, filename)
except Exception as e:
return e
data = None
try:
with open(filepath) as file:
data = file.read()
except Exception as e:
log.exception(e)
So what i am doing is first saving the file to temporary location in server and then from then reading the data and putting it into our database. I think this is where i am struggling i am not sure what is the best approach.
Should i take the input from user and return the success message(obviously user won't be able to access the file immediately then) and make putting the data into database a background process, using some kind of queue system. Or What else should be done to optimize the code.
On the flask side make sure you have the MAX_CONTENT_LENGTH config value set high enough:
app.config['MAX_CONTENT_LENGTH'] = 100 * 1024 * 1024 # 100MB limit
Also you may want to look into the Flask-Upload extension.
There is another SO post similar to this one: Large file upload in Flask.
Other than that you problem may be a timeouts somewhere along the line. What does the rest of your stack look like? Apache? Nginx and Gunicorn? Are you getting a Connection reset error, Connection timed out error or does it just hang?
If you are using Nginx try setting proxy_read_timeout to a value high enough for the upload to finish. Apache may also have a default setting causing you trouble if that is what you are using. It's hard to tell without knowing more about your stack and what the error is that you are getting and what the logs are showing.

Prefered way of using Flask and S3 for large files

I know that this is a little bit open ended but I am confused as to what strategy/method to apply for a large file upload service developed using Flask and boto3. For smaller files and all it is fine. But it would be really nice to see what you guys think when the size exceeds 100 MB
What I have in mind are following -
a) Stream the file to Flask app using some kind of AJAX uploader(What I am trying to build is just a REST interface using Flask-Restful. Any example of using these components, e.g. Flask-Restful, boto3 and streaming large files are welcome.). The upload app is going to be (I believe) part of a microservices platform that we are building. I do not know whether there will be a Nginx proxy in front of the flask app or it will be directly served from a Kubernetes pod/service. In case it is directly served, is there something that I have to change for large file upload either in kubernetes and/or Flask layer?
b) Using a direct JS uploader (like http://www.plupload.com/) and stream the file into s3 bucket directly and when finished get the URL and pass it to the Flask API app and store it in DB. The problem with this is, the credentials need to be there somewhere in JS which means a security threat. (Not sure if any other concerns are there)
What among them (or something different I did not think about at all) you think is the best way and where can I find some code example for that?
Thanks in advance.
[EDIT]
I have found this - http://blog.pelicandd.com/article/80/streaming-input-and-output-in-flask where the author is dealing with kind of similar situation like me and he proposed a solution. But he is opening a file already present in disk. What if I want to directly upload the file as it comes in as one single object in a s3 bucket? I feel that this can be a base of a solution but not the solution itself.
Alternatively you can use Minio-py client library, its Open Source and compatible with S3 API. It handles multipart upload for you natively.
A simple put_object.py example:
import os
from minio import Minio
from minio.error import ResponseError
client = Minio('s3.amazonaws.com',
access_key='YOUR-ACCESSKEYID',
secret_key='YOUR-SECRETACCESSKEY')
# Put a file with default content-type.
try:
file_stat = os.stat('my-testfile')
file_data = open('my-testfile', 'rb')
client.put_object('my-bucketname', 'my-objectname', file_data, file_stat.st_size)
except ResponseError as err:
print(err)
# Put a file with 'application/csv'
try:
file_stat = os.stat('my-testfile.csv')
file_data = open('my-testfile.csv', 'rb')
client.put_object('my-bucketname', 'my-objectname', file_data,
file_stat.st_size, content_type='application/csv')
except ResponseError as err:
print(err)
You can find list of complete API operations with examples here
Installing Minio-Py library
$ pip install minio
Hope it helps.
Disclaimer: I work for Minio
Flask can only use the memory to save all http request body, so there is no feature such as disk buffing as I know.
Nginx upload module is a really good way to do large file upload. the document is here.
You can also use html5, flash to send trunked file data and process the data in Flask, but it's complicated.
Try to look up if s3 offer the one time token.
Using the link I have posted above I finally ended up doing the following. Please tell me if you think it is a good solution
import boto3
from flask import Flask, request
.
.
.
#app.route('/upload', methods=['POST'])
def upload():
s3 = boto3.resource('s3', aws_access_key_id="key", aws_secret_access_key='secret', region_name='us-east-1')
s3.Object('bucket-name','filename').put(Body=request.stream.read(CHUNK_SIZE))
.
.
.

Flask project works in local environment, throws weird error on heroku about missing files (that are dynamically added anyway!)

I'm developing a flask application for image editing and up until a couple commits, it's been working great on both my local environment and heroku. I commited a tiny change and now I'm seeing a weird error. Yes, rolling back fixes the error but based on the change I made, I can't explain why heroku would be giving me trouble.
The app pulls images from an api and stores them in a tmp directory on the server* where edits are made to the image. The tmp directory is part of the git repo and the images should be pulled from the api every time the page loads. The error I'm getting on heroku now is:
File "/app/app/views.py", line 71, in edit_result
Jan 17 22:05:23 *** app/web.1: i.save('%s/%s' % (MEDIA_FOLDER, filename))
Jan 17 22:05:23 *** app/web.1: File "/app/.heroku/python/lib/python2.7/site-packages/PIL/Image.py", line 1459, in save
Jan 17 22:05:23 *** app/web.1: fp = builtins.open(fp, "wb")
Jan 17 22:05:23 *** app/web.1: IOError: [Errno 2] No such file or directory: u'/app/tmp/5cfae63a182ff935fe0fb142_640.jpg'
The relevant lines in my views file are:
#app.route('/edit/', methods = ['GET', 'POST'])
def edit_result():
url = request.args.get('url')
filename = url.split('/')[-1]
i = Image.open(StringIO(requests.get(url).content))
i.save('%s/%s' % (MEDIA_FOLDER, filename))
return render_template("edit.html",
image_filename=filename,
)
Looking at the working git commit vs the next one that broke things, this is all that was commited:
<option value="Default">Choose Font</option> was added to a Jinja2 template, #cache.cached(timeout=50) was removed from a view and "Default": "/ttf-bitstream-vera/Vera.ttf", was added to a dictionary
A couple more notes:
I tried to pull up the image listed in the error message and it
actually does come up on the heroku instance.
I thought maybe I was
confusing the commits and looked over the last few prior to the one I
thought was the working commit, and the next few following the broken
commit. I can't see anything that would cause just heroku to fail,
not my local environment.
I, probably stupidly, continued working on the app on my local environment. I hadn't finished coding logging into the app so I was just getting error 500 on heroku and thought it was a fluke. I'm a dozen commits ahead of the heroku deploy now and a lot has changed including those commits that broke things being removed entirely for a better user experience.
Things are still working perfectly, locally.
Permissions haven't changed on the tmp directory
I'm new to heroku, this is my first app deployed there. Maybe I'm missing something which is why I'm coming to you! Any advice would be appreciated.
*I know heroku doesn't want things like this stored locally and I'm in the process of rewriting the views to use S3 instead...
You can read more about my answer in the comment left by dirn on the first post but essentially my directory was NOT included in my git repo as I thought, so it wasn't on the heroku dyno after it cleared. I added a condition in my views.py to check for the directory and create it if it doesn't exist which fixed the problem:
if not os.path.exists(MEDIA_FOLDER):
os.makedirs(MEDIA_FOLDER)

Categories

Resources