How to upload large files on Streamlit using st.file_uploader - python

I am trying to upload a binary file of size 10gb using st.file_uploader, however, I get the following error message. In fact, I get the same error message pretty much when I am trying to upload any file above 2gb, i.e I didn’t get this error message when I uploaded 1.2gb file.
I have set my file capacity to 10gb in config file.
As a matter of fact, the file loading shows as it should, then the file appears to be uploaded for a very short time (maybe a second), however, then the attachment disappears along with file_uploader widget itself, and the following message pops up.

By default, upload files are limited to 200MB. You can configure the server.maxUploadSize config option as such;
Set the config option in .streamlit/config.toml:
[server]
maxUploadSize=10000

https://github.com/streamlit/streamlit/issues/5938
The bug is raised in Streamlit repo. To be fixed.

Related

How to read and write to a text file using python while deploying on heroku

I use a bot that writes (file ids) of the files that sent from user to a text file and then read from this text file the (file ids) then send it back to the user.The method worked, but when I deploy it to Heroku, I can no longer see, process, or download the text file.
Is there a way to view the text files that we deploy to heroku? Or is there a way to upload the text files on a cloud website and then make the bot open (read & write) the text file using the text file URL (but I think this would allow any user on the internet to access and modify my text files, which means it is not safe)? Create SQL database and upload text files and link each text file with its own URL (But I'm new to SQL)?
Is there any other simple method to solve this problem? What do you advise me to do in this case?
https://github.com/zieadshabkalieh/a
NOTE: The text file in my code named first.txt
Heroku has an ephemeral filesystem: every file created by the application will be removed (also any change to existing files deployed with the application) when there is a new deployment or application restart.
Heroku Dynos also restart every 24 hours.
It is a good idea to persist data to a remote storage (like S3) or a DB (always a good option but requires a little bit more work).
For reading/writing simple files you can check HerokuFiles repository with some Python examples and options. I would suggest S3 (using Python boto module) as it is easy to use, even if the number/size of files will one day increase.

video upload on tweepy

I was trying to update my own status with only a video, but for some reason, it doesn't work. All of them are .mp4 with less than 1MB.
When I try upload_result = api.media_upload('file.mp4') it raises a TweepError saying that the file type could not be determined.
When I try upload_result = api.media_upload(filename = "path of the file", file = "file.mp4") it triggers [Errno13], saying "Permission Denied"
My IDE is Pycharm Community Edition and I am using python 3.9
If you want the full code, just ask for it in the comments
According to the Tweepy documentation you can only upload images
Use this endpoint to upload images to Twitter.
The upload of video is not currently supported, there is an issue open so it will probably come at some point.

Check if a file in SharePoint has finished uploading

I am building a pipeline which SharePoint will be the source for data ingestion. I want to use Azure LogicApps with a trigger to run when a file is created or modified. When a file is uploaded to SharePoint, LogicApps should copy the file to Blob Storage. I am facing a problem which the trigger can happen even if the file is not 100% uploaded yet, which leads to copying empty or incomplete files.
I tried several SharePoint triggers to see if it's only a problem with one of them but they all have the same issue.
I decided to use Python with Office365-REST-Python-Client deployed in Azure Functions to handle copying the files to Azure Blob Storage. I have the following code:
def download_file(context, sharepoint_file_path, local_file_path):
response = File.open_binary(context, sharepoint_file_path)
response.raise_for_status()
with open(local_file_path, 'wb') as f:
f.write(response.content)
I checked the response's status_code and even for incomplete files it returns 200 which still does not help with checking if the file is complete.
How can I solve this? Thank you.
In my opinion, it is difficult for us to check if the file in sharepoint has finished uploading. But we can avoid the situation of the file hasn't been finished uploading by some workaround.
You need to estimate how long the uploading process will take according to the size of your files, and then add a "Delay" action after the trigger.
For example, it delay 5 minutes in the screenshot above. After 5 minutes, the file complete the upload operation and will be copied to blob storage successful.

Size Limit On Download From Cloud Storage With App Engine

tldr: Is there a file size limit to send a file from Cloud Storage to my user's web browser as a download? Am I using the Storage Python API wrong, or do I need to increase the resources set by my App Engine YAML file?
This is just an issue with downloads. Uploads work great up to any file size, using chunking.
The symptoms
I created a file transfer application with App Engine Python 3.7 Standard environment. Users are able to upload files of any size, and that is working well. But users are running into what appears to be a size limit on downloading the resulting file from Cloud Storage.
The largest file I have successfully sent and received through my entire upload/download process was 29 megabytes. Then I sent myself a 55 megabyte file, but when I attempted to receive it as a download, Flask gives me this error:
Error: Server Error
The server encountered an error and could not complete your request.
Please try again in 30 seconds.
Application Structure
To create my file transfer application, I used Flask to set up two Services, internal and external, each with its own Flask routing file, its own webpage/domain, and its own YAML files.
In order to test the application, I visit the internal webpage which I created. I use it to upload a file in chunks to my application, which successfully composes the chunks in Cloud Storage. I then log into Google Cloud Platform Console as an admin, and when I look at Cloud Storage, it will show me the 55 megabyte file which I uploaded. It will allow me to download it directly through the Cloud Platform Console, and the file is good.
(Up to that point of the process, this even worked for a 1.5 gigabyte file.)
Then I go to my external webpage as a non-admin user. I use the form to attempt to receive that same file as a download. I get the above error. However, this whole process encounters no errors for my 29 megabyte test file, or smaller.
Stacktrace Debugger logs for this service reveal:
logMessage: "The process handling this request unexpectedly died. This is likely to cause a new process to be used for the next request to your application. (Error code 203)"
Possible solutions
I added the following lines to my external service YAML file:
resources:
memory_gb: 100
disk_size_gb: 100
The error remained the same. Apparently this is not a limit of system resources?
Perhaps I'm misusing the Python API for Cloud Storage. I'm importing storage from google.cloud. Here is where my application responds to the user's POST request by sending the user the file they requested:
#app.route('/download', methods=['POST'])
def provide_file():
return external_download()
This part is in external_download:
storage_client = storage.Client()
bucket = storage_client.get_bucket(current_app.cloud_storage_bucket)
bucket_filename = request.form['filename']
blob = bucket.blob(bucket_filename)
return send_file(io.BytesIO(blob.download_as_string()),
mimetype="application/octet-stream",
as_attachment=True,
attachment_filename=filename)
Do I need to implement chunking for downloads, not just uploads?
I wouldn’t recommend to use Flask's send_file() method for managing large files transfer, Flask file handling methods were intended for use by developers or API's, to exchange system messages mostly, like logs, cookies and other light objects.
Besides, the download_as_string() method might indeed conceal a buffer limitation, I did reproduce your scenario and got the same error message with files larger than 30mb, however I couldn’t find more information about such constraint. It might be intentional, induced by the method’s purpose (download content as string, not fitted for large objects).
Proven ways to handle file transfer efficiently with Cloud Storage and Python:
Use methods of the Cloud Storage API directly, to download and upload objects without using Flask. As #FridayPush mentioned, it will offload your application, and you can control access with Signed URL’s.
Use the Blobstore API, a straightforward, lightweight, and simple solution to handle file transfer, fully integrated with GCS buckets and intended for this type of situation.
Use the built-in Python Requests module, requires to create your own handlers to communicate with GCS.

Partial download of file (Just what changed)

I do an url fetch to get info from an online txt file. It's a big file (like 2Mb and counting) that gets modified all the time, automatically.
I'm using memcache from Google App Engine to keep the data for a while. But for each new request, the incoming bandwith increased, and I started to get Over Quota error.
I need a way to make a partial download of this file downloading only whats changed, instead of all the file.
Any ideas? :)
Only if you know what part of the file has been changed.
For example, if you know that the file is only appended to, then you could use a HTTP Range request to request only the end of the file.
If you have no way of knowing where the file has been changed, then it would work only if the server sent you a patch or delta to a previous version.

Categories

Resources