File management - python

I am working on python and biopython right now. I have a file upload form and whatever file is uploaded suppose(abc.fasta) then i want to pass same name in execute (abc.fasta) function parameter and display function parameter (abc.aln). Right now i am changing file name manually, but i want to have it automatically.
Workflow goes like this.
----If submit is not true then display only header and form part
--- if submit is true then call execute() and get file name from form input
--- Then displaying result file name is same as executed file name but only change in extension
My raw code is here -- http://pastebin.com/FPUgZSSe
Any suggestions, changes and algorithm is appreciated
Thanks

You need to read the uploaded file out of the cgi.FieldStorage() and save it onto the server. Ususally a temp directory (/tmp on Linux) is used for this. You should remove these files after processing or on some schedule to clean up the drive.
def main():
import cgi
import cgitb; cgitb.enable()
f1 = cgi.FieldStorage()
if "dfile" in f1:
fileitem = f1["dfile"]
pathtoTmpFile = os.path.join("path/to/temp/directory", fileitem.filename)
fout = file(pathtoTmpFile, 'wb')
while 1:
chunk = fileitem.file.read(100000)
if not chunk: break
fout.write (chunk)
fout.close()
execute(pathtoTmpFile)
os.remove(pathtoTmpFile)
else:
header()
form()
This modified the execute to take the path to the newly saved file.
cline = ClustalwCommandline("clustalw", infile=pathToFile)
For the result file, you could also stream it back so the user gets a "Save as..." dialog. That might be a little more usable than displaying it in HTML.

Related

file upload from container using webdav results into empty file upload

I'm attempting to wrap my brain around this because the identical code generates two different sets of outcomes, implying that there must be a fundamental difference between the settings in which the code is performed.
This is the code I use:
from webdav3.client import Client
if __name__ == "__main__":
client = Client(
{
"webdav_hostname": "http://some_address"
+ "project"
+ "/",
"webdav_login": "somelogin",
"webdav_password": "somepass",
}
)
ci = "someci"
version = "someversion"
directory = f'release-{ci.replace("/", "-")}-{version}'
client.webdav.disable_check = (
True # Need to be disabled as the check can not be performed on the root
)
f = "a.rst"
with open(f, "r") as fh:
contents = fh.read()
print(contents)
evaluated = contents.replace("#PIPELINE_URL#", "DUMMY PIPELINE URL")
with open(f, "w") as fh:
fh.write(evaluated)
print(contents)
client.upload(local_path=f, remote_path=f)
The file a.rst contains some text like:
Please follow instruction link below
#####################################
`Click here for instructions <https://some_website>`_
When I execute this code from macOS, a file with the same contents of a.rst appears on my website.
When I execute this script from within a container with a base image of Python 3.9 and the webdav dependencies, it creates a file on my website, but the content is always empty. I'm not sure why, but it could have something to do with the fact that I'm running it from within a Docker container that on top of it can't handle the special characters in the file (plain text seems to work though)?
Anyone have any ideas as to why this is happening and how to fix it?
EDIT:
It seems that the character ":" is creating the problem..

File corrupted when using send_file() from flask, data from pymongo gridfs

Well my English is not good, and the title may looks weird.
Anyway, I'm now using flask to build a website that can store files, and mongodb is the database.
The file upload, document insert functions have no problems, the weird thing is that the file sent from flask send_file() was truncated for no reasons. Here's my code
from flask import ..., send_file, ...
import pymongo
import gridfs
#...
#app.route("/record/download/<record_id>")
def api_softwares_record_download(record_id):
try:
#...
file = files_gridfs.find_one({"_id": record_id})
file_ext = filetype.guess_extension(file.read(2048))
filename = "{}-{}{}".format(
app["name"],
record["version"],
".{}".format(file_ext) if file_ext else "",
)
response = send_file(file, as_attachment=True, attachment_filename=filename)
return response
except ...
The original image file, for example, is 553KB. But the response body returns 549.61KB, and the image was broken. But if I just directly write the file to my disk
#...
with open('test.png', 'wb+') as file:
file.write(files_gridfs.find_one({"_id": record_id}).read())
The image file size is 553KB and the image is readable.
When I compare the two files with VS Code's text editor, I found that the correct file starts with �PNG, but the corrupted file starts with �ϟ8���>�L�y
search the corrupted file head in the correct file
And I tried to use Blob object and download it from the browser. No difference.
Is there any wrong with my code or I misused send_file()? Or should I use flask_pymongo?
And it's interesting that I have found what is wrong with my code.
This is how I solved it
...file.read(2048)
file.seek(0)
...
file.read(2048)
file.seek(0)
...
response = send_file(file, ...)
return response
And here's why:
For some reasons, I use filetype to detect the file's extension name and mime type, so I sent 2048B to filetype for detection.
file_ext = filetype.guess_extension(file.read(2048))
file_mime = filetype.guess_mime(file.read(2048)) #this line wasn't copied in my question. My fault.
And I have just learned from the pymongo API that python (or pymongo or gridfs, completely unknown to this before) reads file by using a cursor. When I try to find the cursor's position using file.seek(), it returns 4096. So when I call file.read() again in send_file(), the cursor reads from 4096B away to the file head. 549+4=553, and here's the problem.
Finally I set the cursor to position 0 after every read() operation, and it returns the correct file.
Hope this can help if you made the same mistake just like me.

File upload at web2py

I am using the web2py framework.
I have uploaded txt a file via SQLFORM and the file is stored in the "upload folder", now I need to read this txt file from the controller, what is the file path I should use in the function defined in the default.py ?
def readthefile(uploaded_file):
file = open(uploaded_file, "rb")
file.read()
....
You can do join of application directory and upload folder to build path to file.
Do something like this:
import os
filepath = os.path.join(request.folder, 'uploads', uploaded_file_name)
file = open(filepath, "rb")
request.folder: the application directory. For example if the
application is "welcome", request.folder is set to the absolute path
"/path/to/welcome". In your programs, you should always use this
variable and the os.path.join function to build paths to the files you
need to access.
Read request.folder
The transformed name of the uploaded file is stored in the upload field of your database table, so you need a way to query the specific record that was inserted via the SQLFORM submission in order to get the name of the stored file. Here is how it would look assuming you know the record ID:
stored_filename = db.mytable(record_id).my_upload_field
original_filename, stream = db.mytable.my_upload_field.retrieve(stored_filename)
stream.read()
When you pass a filename to the .retrieve method of an upload field, it will return a tuple containing the original filename as well as the open file object (called stream in the code above).

Delete an uploaded file after downloading it from Flask

I am currently working on a small web interface which allows different users to upload files, convert the files they have uploaded, and download the converted files. The details of the conversion are not important for my question.
I am currently using flask-uploads to manage the uploaded files, and I am storing them in the file system. Once a user uploads and converts a file, there are all sorts of pretty buttons to delete the file, so that the uploads folder doesn't fill up.
I don't think this is ideal. What I really want is for the files to be deleted right after they are downloaded. I would settle for the files being deleted when the session ends.
I've spent some time trying to figure out how to do this, but I have yet to succeed. It doesn't seem like an uncommon problem, so I figure there must be some solution out there that I am missing. Does anyone have a solution?
There are several ways to do this.
send_file and then immediately delete (Linux only)
Flask has an after_this_request decorator which could work for this use case:
#app.route('/files/<filename>/download')
def download_file(filename):
file_path = derive_filepath_from_filename(filename)
file_handle = open(file_path, 'r')
#after_this_request
def remove_file(response):
try:
os.remove(file_path)
file_handle.close()
except Exception as error:
app.logger.error("Error removing or closing downloaded file handle", error)
return response
return send_file(file_handle)
The issue is that this will only work on Linux (which lets the file be read even after deletion if there is still an open file pointer to it). It also won't always work (I've heard reports that sometimes send_file won't wind up making the kernel call before the file is already unlinked by Flask). It doesn't tie up the Python process to send the file though.
Stream file, then delete
Ideally though you'd have the file cleaned up after you know the OS has streamed it to the client. You can do this by streaming the file back through Python by creating a generator that streams the file and then closes it, like is suggested in this answer:
def download_file(filename):
file_path = derive_filepath_from_filename(filename)
file_handle = open(file_path, 'r')
# This *replaces* the `remove_file` + #after_this_request code above
def stream_and_remove_file():
yield from file_handle
file_handle.close()
os.remove(file_path)
return current_app.response_class(
stream_and_remove_file(),
headers={'Content-Disposition': 'attachment', 'filename': filename}
)
This approach is nice because it is cross-platform. It isn't a silver bullet however, because it ties up the Python web process until the entire file has been streamed to the client.
Clean up on a timer
Run another process on a timer (using cron, perhaps) or use an in-process scheduler like APScheduler and clean up files that have been on-disk in the temporary location beyond your timeout (e. g. half an hour, one week, thirty days, after they've been marked "downloaded" in RDMBS)
This is the most robust way, but requires additional complexity (cron, in-process scheduler, work queue, etc.)
You can also store the file's data in memory, delete it, then serve what you have in memory.
For example, if you were serving a PDF:
import io
import os
#app.route('/download')
def download_file():
file_path = get_path_to_your_file()
return_data = io.BytesIO()
with open(file_path, 'rb') as fo:
return_data.write(fo.read())
# (after writing, cursor will be at last byte, so move it to start)
return_data.seek(0)
os.remove(file_path)
return send_file(return_data, mimetype='application/pdf',
attachment_filename='download_filename.pdf')
(above I'm just assuming it's PDF, but you can get the mimetype programmatically if you need)
Flask has an after_request decorator which could work in this case:
#app.route('/', methods=['POST'])
def upload_file():
uploaded_file = request.files['file']
file = secure_filename(uploaded_file.filename)
#app.after_request
def delete(response):
os.remove(file_path)
return response
return send_file(file_path, as_attachment=True, environ=request.environ)
Based on #Garrett comment, the better approach is to not blocking the send_file while removing the file. IMHO, the better approach is to remove it in the background, something like the following is better:
import io
import os
from flask import send_file
from multiprocessing import Process
#app.route('/download')
def download_file():
file_path = get_path_to_your_file()
return_data = io.BytesIO()
with open(file_path, 'rb') as fo:
return_data.write(fo.read())
return_data.seek(0)
background_remove(file_path)
return send_file(return_data, mimetype='application/pdf',
attachment_filename='download_filename.pdf')
def background_remove(path):
task = Process(target=rm(path))
task.start()
def rm(path):
os.remove(path)

how to get (txt) file content from FileField?

I have a problem with opening a text file in django that is stored in my database. I want to access it via FileField of my model... the model looks something like this
class MyModel(models.Model):
saved_file = FileField()
I upload a test file via admin interface, which works ok. In my view I want to access this file. If I open it with standard python open() it works ok...
f = open(path, 'r')
a = f.readlines()
return render_to_response('base.html', {'content': a}, context_instance=RequestContext(request))
this displays lines of the file ok...
according to https://docs.djangoproject.com/en/dev/ref/models/fields/#filefield one gets a FieldFile proxy when FileField from a model is called, so
f = MyModel.objects.all().get(id=0).saved_file
should store FieldFile in f, furthermore documentation states that one opens a file from model by casting .open(mode='rb') on FieldFile, so
file = f.open(mode='rb')
should work like python .open() as stated in the documentation. So to get lines I do
file.readlines()
should return me list of lines. What happens is that I get an error saying that .readlines() atribute does not exist. I do not need the file to display it, this is just a way to test if opening a file works, but I need the file content in a variable in my view to further use it in my business logic.
Could anyone suggest a way to get the file content out of a FileField frem a model?
The documentation states that files are opened in 'rb' mode by default, but you would want to open in 'r' to treat the file as a text file:
my_object = MyModel.objects.get(pk=1)
try:
my_object.saved_file.open('r')
lines = my_object.saved_file.readlines()
finally:
my_object.saved_file.close()
Even better, you can use a context manager in Django v2.0+
my_object = MyModel.objects.get(pk=1)
with my_object.saved_file.open('r') as f:
lines = f.readlines()
Since Django 2.0 File.open does return you a file, so suggested way to work is as with context manager:
saved_file = MyModel.objects.all().get(id=0).saved_file
with saved_file.open() as f:
data = f.readlines()
Applies to old versions of Django < 2
FieldFile.open opens the file, but doesn't return anything. So in your example file is None.
You should call readlines on FieldFile. In your example it would be:
f = MyModel.objects.all().get(id=0).saved_file
try:
f.open(mode='rb')
lines = f.readlines()
finally:
f.close()
UPD: I added try/finally block, as the good practice is to always close the resource, even if exception happened.

Categories

Resources