file I/O with google app engine - python

I want to provide a field in my html file so that people can upload their XML files to be imported to the datastore. How can I read and process this file inside the app engine once it is uploaded ? (I dont want to store the file with blobstore. Just want to read, process and throw it away) Thanks

Use a StringIO when you need a file-like object for use with libraries that act on files. (Although I believe most XML parsers will happily accept a string instead of requiring a file-like object.)

Related

Python Flask Uploading FIles

I'm trying to upload user selected image into my firebase.
When I browse for the file
file = request.files['inputFile']
and I try this
storage.child("images/examples.jpg").put(file)
I get an error
io.UnsupportedOperation: fileno
How do I go about fixing this? I just want user to select the file and I be able to make use of the .jpg file and upload it
The put method takes a path to a local file (and an optional user token).
request.files[key] returns a custom object that represents the uploaded file. Flask documentation links: file uploads quickstart, incoming request data api, FileStorage class.
You need to store the uploaded file data to a local file, and the pass that file name to the put method:
request.files['inputFile'].save("some_filename.ext")
storage.child("images/examples.jpg").put("some_filename.ext")
Look into the tempfile module to generate random temporary file names (instead of using the hard coded some_filename.ext, which obviously is not a very good idea with concurrent requests).

Python API: Tweet with media without file

I'm using twitter from python in an environment where I can't store files.
I get a HTTP POST with a text and an image and want to create a tweet from this data without writing a local file (it's zappa on AWS api environment).
Tweepy only allows filenames, which does not work for me.
python-twitter seems to have something like that, but I can't find a doc for this.
Should I just send POST requests to twitter for uploading the images? Is there a simpler way?
Try passing a io.BytesIO to tweepy's API.update_with_media as file.
filename – The filename of the image to upload. This will automatically be opened unless file is specified
...
file – A file object, which will be used instead of opening filename. filename is still required, for MIME type detection and to use as a form field in the POST data
Edit:
It looks like you have the image data base64 encoded. You can use base64.b64decode to decode it before creating the io.BytesIO:
file = io.BytesIO(base64.b64decode(base64_data))

What is the Difference between file_upload() and put_object() when uploading files to S3 using boto3

I'm using boto3 and trying to upload files. It will be helpful if anyone will explain exact difference between file_upload() and put_object() s3 bucket methods in boto3 ?
Is there any performance difference?
Does anyone among these handles multipart upload feature in behind the scenes?
What are the best use cases for both?
The upload_file method is handled by the S3 Transfer Manager, this means that it will automatically handle multipart uploads behind the scenes for you, if necessary.
The put_object method maps directly to the low-level S3 API request. It does not handle multipart uploads for you. It will attempt to send the entire body in one request.
One other difference I feel might be worth noticing is upload_file() API allows you to track upload using callback function. You can check about it here.
Also as already mentioned by boto's creater #garnaat that upload_file() uses multipart behind the scenes so its not straight forward to check end to end file integrity (there exists a way) but put_object() uploads whole file at one shot (capped at 5GB though) making it easier to check integrity by passing Content-MD5 which is already provided as a parameter in put_object() API.
One other thing to mention is that put_object() requires a file object whereas upload_file() requires the path of the file to upload. For example, if I have a json file already stored locally then I would use upload_file(Filename='/tmp/my_file.json', Bucket=my_bucket, Key='my_file.json').
Whereas if I had a dict within in my job, I could transform the dict into json and use put_object() like so:
records_to_update = {'Name': 'Sally'}
records_to_update_json = json.dumps(records_to_update, default=str)
put_object(Body=records_to_update_json, Bucket=my_bucket, Key='my_records')

Security around user uploaded files

I have a client's python website which runs a dropbox-like feature that allows uploading of files.
I want to make sure that uploading files does not open up the server to vulnerabilities.
So, I store all uploaded files as blobs in a postgres database and do not trust the file name and extension of the file, I let the application determine that for itself.
I ran into problems when trying to let the application decide the file format itself, so my question boils down to:
Is it necessary, for security, to limit what file formats are allowed to be uploaded?
If yes, how, if not using something like libmagic, can I determine the file format in the best way?
Are there other measures I need to make in order to remain safe when allowing publically loaded files?
Thanks.
The referenced "bug" question (which chains to this)doesn't refer to a bug, it says that some MS Office file types are, like Java jars, packaged and compressed as zipfiles. If you rename a .xlsx file to .zip, you can view the contents - I found 13 .xml files and a .bin printer settings file in a simple example.
For security you can't "trust" mime-type and file extension provided by the user, but you can in principle use them to validate that the contents are valid for the claimed file type. The first level of checking would ensure that the claimed Office files are in fact valid zipfiles, the second would check that the contents conforms to what is expected by the Office application. Not being an Office developer I don't know of a process to inspect a zip archive, determine which Office application it is for, and validate that the application can open it, but I'm sure it exists somewhere on MSDN.
More fundamentally, what do you mean by "security is important to my application"? Security prevents unwanted events - you need to define what you want to prevent. Do you want users to only be able to upload files for whitelisted applications? Should they be prevented from uploading blacklisted file types (like .exe)? Is it OK with you if a user uploaded 10MB of random bits and called it a .xyz file?

What's a Django/Python solution for providing a one-time url for people to download files?

I'm looking for a way to sell someone a card at an event that will have a unique code that they will be able to use later in order to download a file (mp3, pdf, etc.) only one time and mask the true file location so a savvy person downloading the file won't be able to download the file more than once. It would be nice to host the file on Amazon S3 to save on bandwidth where our server is co-located.
My thought for the codes would be to pre-generate the unique codes that will get printed on the cards and store those in a database that could also have a field that stores the number of times the file was downloaded. This way we could set how many attempts we would allow the user for downloading the file.
The part that I need direction on is how do I hide/mask the original file location so people can't steal that url and then download the file as many times as they want. I've done Google searches and I'm either not searching using the right keywords or there aren't very many libraries or snippets out there already for this type of thing.
I'm guessing that I might be able to rig something up using django.views.static.serve that acts as a sort of proxy between the actual file and the user downloading the file. The only drawback to this method I would think is that I would need to use the actual web server and wouldn't be able to store the file on Amazon S3.
Any suggestions or thoughts are greatly appreciated.
Neat idea. However, I would warn against the single-download method, because there is no guarantee that their first download attempt will be successful. Perhaps use a time-expiration method instead?
But it is certainly possible to do this with Django. Here is an outline of the basic approach:
Set up a django url for serving these files
Use a GET parameter which is a unique string to identify which file to get.
Keep a database table which has a FileField for the file to download. This table maps the unique strings to the location of the file on the file system.
To serve the file as a download, set the response headers in the view like this:
(path is the location of the file to serve)
with open(path, 'rb') as f:
response = HttpResponse(f.read())
response['Content-Type'] = 'application/octet-stream';
response['Content-Disposition'] = 'attachment; filename="%s"' % 'insert_filename_here'
return response
Since we are using this Django page to serve the file, the user cannot find out the original file location.
You can just use something simple such as mod_xsendfile. This functionality is also available in other popular webservers such lighttpd or nginx.
It works like this: when enabled your application (e.g. a trivial PHP script) can send a special response header, causing the webserver to serve a static file.
If you want it to work with S3 you will need to handle each and every request this way, meaning the traffic will go through your site, from there to AWS, back to your site and back to the client. Does S3 support symbolic links / aliases? If so you might just redirect a valid user to one of the symbolic URLs and delete that symlink after a couple of hours.

Categories

Resources