I'm trying to upload a huge file from my Nokia N95 mobile to my webserver using Pys60 python code. However the code crashes because I'm trying to load the file into memory and trying to post to a HTTP url. Any idea how to upload huge files > 120 MB to webserver using Pys60.
Following is the code I use to send the HTTP request.
f = open(soundpath + audio_filename)
fields = [('timestamp', str(audio_start_time)), ('test_id', str(test_id)), ('tester_name', tester_name), ('sensor_position', str(sensor_position)), ('sensor', 'audio') ]
files = [('data', audio_filename, f.read())]
post_multipart(MOBILE_CONTEXT_HOST, MOBILE_CONTEXT_SERVER_PORT, '/MobileContext/AudioServlet', fields, files)
f.close
where does this post_multipart() function comes from ?
if it is from here, then it should be easy to adapt the code so that it takes a file object in argument and not the full content of the file, so that post_mutipart reads small chunks of data while posting instead of loading the whole file in memory before posting.
this is definitely possible.
You can't. It's pretty much physically impossible. You'll need to split the file into small chunks and upload it bit by bit, which is very difficult to do quickly and efficiently on that sort of platform.
Jamie
You'll need to craft a client code to split your source file in small chunks and rebuild that pieces server-side.
Related
I'm trying to download a .doc file using requests.get() request (though I've heard about other methods - they all require saving too)
Is there any method I could use to extract the text from it (or even convert it into a .txt for example) straight away without saving it into a file?
I've tried passing request.raw into various conventors (docx2txt.process() for example) but I assume they all work with files, not with streams.
While the script is running the memory allocation are handled by the python interpreter but if you save the content to a file the memory allocated is different. This article can be helpful to you.
Link: article
I want to upload some files to sharepoint via office365 REST Python client.
On documentation on github, I found two examples:
one for larger files where this is executed:
uploaded_file = target_folder.files.create_upload_session(local_path, size_chunk, print_upload_progress).execute_query()
one for small files :
target_file = target_folder.upload_file(name, file_content).execute_query() .
In my case, I want to be able to upload files who are small and also files who are very large.
For testing, I wanted to see if the method for larger files works with smaller files too.
With a small file, while putting the size_chunk at 1Mb, the uploading was done, but the file uploaded was empty (0b), so I lost my content while uploading.
I wanted to know if there is someone who knows how can we do something more generic for whatever size of files. Also I don't understand what is the size chunk for larger files case. Do you know how one should choose it?
Thank you so much!
This problem is solved by installing office365-rest-python-client instead of office365-rest-client.
Good morning all.
I have a generic question about the best approach to handle large files with Django.
I created a python project where the user is able to read a binary file (usually the size is between 30-100MB). Once the file is read, the program processes the file and shows relevant metrics to the user. Basically it outputs the max, min, average, std of the data.
At the moment, you can only run this project from the cmd line. I'm trying to create a user interface so that anyone can use it. I decided to create a webpage using django. The page is very simple. The user uploads files, he then selects which file he wants to process and it shows the metrics to the user.
Working on my local machine I was able to implement it. I upload the files (it saves on the user's laptop and then it processes it). I then created an S3 account, and now the files are all uploaded to S3. The problem that I'm having is that when I try to get the file (I'm using smart_open (https://pypi.org/project/smart-open/)) it is really slow to read the file (for a 30MB file it's taking 300sec), but if I download the file and read it, it only takes me 8sec.
My question is: What is the best approach to retrieve files from S3, and process them? I'm thinking of simply downloading the file to my server, process it, and then deleting it. I've tried this on my localhost and it's fast. Downloading from S3 takes 5sec and processing takes 4sec.
Would this be a good approach? I'm a bit afraid that for instance if I have 10 users at the same time and each one creates a report then I'll have 10*30MB = 300MB of space that the server needs. Is this something practical, or will I fill up the server?
Thank you for your time!
Edit
To give a bit more of a context, what's making it show is the f.read() line. Due to the format of the binary file. I have to read the file in the following way:
name = f.read(30)
unit = f.read(5)
data_length = f.read(2)
data = f.read(data_length) <- This is the part that is taking a lot of time when I read it directly from S3. If I download the file, then this is super fast.
All,
After some experimenting, I found a solution that works for me.
with open('temp_file_name', 'wb') as data:
s3.download_fileobj(Bucket='YOURBUCKETNAME', Key='YOURKEY', data)
read_file('temp_file_name')
os.remove('temp_file_name')
I don't know if this is the best approach or what are the possible downfalls of this approach. I'll use it and come back to this post if I end up using a different solution.
The problem with my previous approach was that f.read() was taking too long, the problem seems to be that every time I need to read a new line, the program needs to connect to S3 (or something) and this is taking too long. What ended up working for me, was to download the file directly to my server, read it, and then deleting it once I read the file. Using this solution I was able to get the speeds that I was getting when working on a localserver (reading directly from my laptop).
If you are working with medium size files (30-50mb) this approach seems to work. My only concern is if we try to download a really large file if the server will run out of disk space.
I created a text scraping program in which the user enters a word and it searches through a large text file (250MG and growing) on my computer, but now I want to deploy it through Heroku.
Is there a workaround that I need to implement or is there a (rather elusive) way to accomplish this? As far as I can tell, there is no way to upload my text file to Heroku as is.
Here's my suggestion.
Host the text file on a site like pastebin as long as it doesn't contain any confidential information. This allows you to update it freely without needing to re-deploy your app each time you add to it.
Once you've uploaded/pasted the text into a "paste" & save it you'll be able to get the "raw" link that will return the content of the file when requested.
Use requests to fetch the file from your app & parse it however you need to.
import requests
resp = requests.get("https://pastebin.com/raw/LjcPg3UL")
# if all entries are on individual lines
mywords = [word for word in resp.iter_lines()]
# if comma-separated or otherwise
#mywords = resp.text.split(",")
Now you have all your content in a list to work with in your app.
Edit:
Since you want to accomplish this with larger files you could host the file on dropbox and follow the instructions from here to get the raw link. However, if you're dealing with that large of a file you're going to notice significant overhead. If the file is going to be that large, I'd suggest the added precaution of utilizing requests stream parameter (details), so the request line becomes
resp = requests.get("https://www.dropbox.com/s/FILE_ID/filename.extension?raw=1", stream=True)
This will read chunks of the file instead of reading the entire file at once, which will help cut down of memory consumption.
I im trying to store 30 second user mp3 recordings as Blobs in my app engine data store. However, in order to enable this feature (App Engine has a 1MB limit per upload) and to keep the costs down I would like to compress the file before upload and decompress the file every time it is requested. How would you suggest I accomplish this (It can happen in the background by the way via a task queue but an efficient solution is always good)
Based on my own tests and research - I see two possible approaches to accomplish this
Zlib
For this I need to compress a certain number of blocks at a time using a While loop. However, App Engine doesnt allow you to write to the file system. I thought about using a Temporary File to accomplish this but I havent had luck with this approach when trying to decompress the content from a Temporary File
Gzip
From reading around the web, it appears that the app engine url fetch function requests content gzipped already and then decompresses it. Is there a way to stop the function from decompressing the content so that I can just put it in the datastore in gzipped format and then decompress it when I need to play it back to a user on demand?
Let me know how you would suggest using zlib or gzip or some other solution to accmoplish this. Thanks
"Compressing before upload" implies doing it in the user's browser -- but no text in your question addresses that! It seems to be about compression in your GAE app, where of course the data will only be after the upload. You could do it with a Firefox extension (or other browsers' equivalents), if you can develop those and convince your users to install them, but that has nothing much to do with GAE!-) Not to mention that, as #RageZ's comment mentions, MP3 is, essentially, already compressed, so there's little or nothing to gain (though maybe you could, again with a browser extension for the user, reduce the MP3's bit rate and thus the file's dimension, that could impact the audio quality, depending on your intended use for those audio files).
So, overall, I have to second #jldupont's suggestion (also in a comment) -- use a different server for storage of large files (S3, Amazon's offering, is surely a possibility though not the only one).
While the technical limitations (mentioned in other answers) of compressing MP3 files via standard compression or reencoding at a lower bitrate are correct, your aim is to store 30 seconds of MP3 encoded data. Assuming that you can enforce that on your users, you should be alright without applying additional compression techniques if the MP3 bitrate is 256kbit constant bitrate (CBR) or lower. At 256kbit CBR, 30 seconds of audio would require:
(((256 * 1000) / 8) * 30) / 1048576 = 0.91MB
The maximum standard bitrate is 320kbit which equates to 1.14MB, so you'd have to use 256 or less. The most commonly used bitrate in the wild is 128kbits.
There are additional overheads that will increase the final file size such as ID3 tags and framing, but you should be OK. If not, drop down to 224kbits as your maximum (30 secs = 0.80MB). There are other complexities such as variable bit rate encoding for which the file size is not so predictable and I am ignoring these.
So your problem is no longer how to compress MP3 files, but how to ensure that your users are aware that they can not upload more than 30 seconds encoded at 256kbits CBR, and how to enforce that policy.
You could try the new Blobstore API allowing the storage and serving of files up to 50MB
http://www.cloudave.com/link/the-new-google-app-engine-blobstore-api-first-thoughts
http://code.google.com/appengine/docs/python/blobstore/
http://code.google.com/appengine/docs/java/blobstore/
As Aneto mentions in a comment, you will not be able to compress MP3 data with a standard compression library like gzip or zlib. However, you could reencode the MP3 at a MUCH lower bitrate, possible with LAME.
You can store up to 10Mb with a list of Blobs. Search for google file service.
It's much more versatile than BlobStore in my opinion, since I just started using BlobStore Api yesterday and I'm still figuring out if it is possible to access the data bytewise.. as in changing doc to pdf, jpeg to gif..
You can storage Blobs of 1Mb * 10 = 10 Mb (max entity size I think), or you can use BlobStore API and get the same 10Mb or get 50Mb if you enable billing (you can enable it but if you don't pass the free quota you don't pay).