I would like to realize the following with twisted web:
The user clicks on a link on a page
This is bring the user to a temporary page where the user is informed that the PDF is generated
When the PDF generation is finished, the download starts automatically.
This is my example code:
from twisted.web.server import Site
from twisted.web.resource import Resource
from twisted.internet import reactor
from twisted.web.server import NOT_DONE_YET
class DownloadResource(Resource):
isLeaf = True
def render_GET(self, request: Resource):
# Start generating the PDF file in the background
reactor.callInThread(generate_pdf, request)
request.write(b"<html><body>Please wait while the PDF file is generated and download will start soon...</body></html>")
return NOT_DONE_YET
def generate_pdf(request: Resource):
# Generate the PDF data
# for the time being, just open it from the disk.
# the real application will generate it.
filename = r'/path/to/pdf'
with open(filename, 'rb') as f:
pdf_data = f.read()
# Serve the PDF data
request.setHeader(b"content-type", b"application/pdf")
request.setHeader(b"content-length", str(len(pdf_data)).encode('utf-8'))
request.write(pdf_data)
request.finish()
resource = DownloadResource()
This works almost ok, the point is that the pdf data are written as bytes next to the text message. Something like this:
Please wait while the PDF file is generated and download will start soon...%PDF-1.4 %ª«¬ 1 0 obj << /Title....
Instead I would like the pdf file to be downloaded. Ideally I would need a kind of send_file. Does it exist?
Can you help me in solving this issue?
Related
I am trying to have my server, in python 3, go grab files from URLs. Specifically, I would like to pass a URL into a function, I would like the function to go grab an audio file(of many varying formats) and save it as an MP3, probably using ffmpeg or ffmpy. If the URL also has a PDF, I would also like to save that, as a PDF. I haven't done much research on the PDF yet, but I have been working on the audio piece and wasn't sure if this was even possible.
I have looked at several questions here, but most notably;
How do I download a file over HTTP using Python?
It's a little old but I tried several methods in there and always get some sort of issue. I have tried using the requests library, urllib, streamripper, and maybe one other.
Is there a way to do this and with a recommended library?
For example, most of the ones I have tried do save something, like the html page, or an empty file called 'file.mp3' in this case.
Streamripper received a try changing user agents error.
I am not sure if this is possible, but I am sure there is something I'm not understanding here, could someone point me in the right direction?
This isn't necessarily the code I'm trying to use, just an example of something I have used that doesn't work.
import requests
url = "http://someurl.com/webcast/something"
r = requests.get(url)
with open('file.mp3', 'wb') as f:
f.write(r.content)
# Retrieve HTTP meta-data
print(r.status_code)
print(r.headers['content-type'])
print(r.encoding)
**Edit
import requests
import ffmpy
import datetime
import os
## THIS SCRIPT CAN BE PASSED A URL AND IF THE URL RETURNS
## HTTP HEADER FOR CONTENT TYPE AUDIO/MPEG, THE FILE WILL
## BE SAVED AS THE CURRENT-DATE-AND-TIME.MP3
##
## THIS SCRIPT CAN BE PASSED A URL AND IF THE URL RETURNS
## HTTP HEADER FOR CONTENT TYPE application/pdf, THE FILE WILL
## BE SAVED AS THE CURRENT-DATE-AND-TIME.PDF
##
## THIS SCRIPT CAN BE PASSED A URL AND IF THE URL RETURNS
## HTTP HEADER FOR CONTENT TYPE other than application/pdf, OR
## audio/mpeg, THE FILE WILL NOT BE SAVED
def BordersPythonDownloader(url):
print('Beginning file download requests')
r = requests.get(url, stream=True)
contype = r.headers['content-type']
if contype == "audio/mpeg":
print("audio file")
filename = '[{}].mp3'.format(str(datetime.datetime.now()))
with open('file.mp3', 'wb+') as f:
f.write(r.content)
ff = ffmpy.FFmpeg(
inputs={'file.mp3': None},
outputs={filename: None}
)
ff.run()
if os.path.exists('file.mp3'):
os.remove('file.mp3')
elif contype == "application/pdf":
print("pdf file")
filename = '[{}].pdf'.format(str(datetime.datetime.now()))
with open(filename, 'wb+') as f:
f.write(r.content)
else:
print("URL DID NOT RETURN AN AUDIO OR PDF FILE, IT RETURNED {}".format(contype))
# INSERT YOUR URL FOR TESTING
# OR CALL THIS SCRIPT FROM ELSEWHERE, PASSING IT THE URL
#DEFINE YOUR URL
#url = 'http://archive.org/download/testmp3testfile/mpthreetest.mp3'
#CALL THE SCRIPT; PASSING IT YOUR URL
#x = BordersPythonDownloader(url)
#ANOTHER EXAMPLE WITH A PDF
#url = 'https://www.cisco.com/c/en/us/td/docs/switches/lan/catalyst6500/ios/12-2SY/configuration/guide/sy_swcg/etherchannel.pdf'
#x = BordersPythonDownloader(url)
Thanks Richard, this code works and helps me understand this better. Any suggestions for improving the above working example?
Preface
This is my first post on stackoverflow so I apologize if I mess up somewhere. I searched the internet and stackoverflow heavily for a solution to my issues but I couldn't find anything.
Situation
What I am working on is creating a digital photo frame with my raspberry pi that will also automatically download pictures from my wife's facebook page. Luckily I found someone who was working on something similar:
https://github.com/samuelclay/Raspberry-Pi-Photo-Frame
One month ago this gentleman added the download_facebook.py script. This is what I needed! So a few days ago I started working on this script to get it working in my windows environment first (before I throw it on the pi). Unfortunately there is no documentation specific to that script and I am lacking in python experience.
Based on the from urllib import urlopen statement, I can assume that this script was written for Python 2.x. This is because Python 3.x is now from urlib import request.
So I installed Python 2.7.9 interpreter and I've had fewer issues than when I was attempting to work with Python 3.4.3 interpreter.
Problem
I've gotten the script to download pictures from the facebook account; however, the pictures are corrupted.
Here is pictures of the problem: http://imgur.com/a/3u7cG
Now, I originally was using Python 3.4.3 and had issues with my method urlrequest(url) (see code at bottom of post) and how it was working with the image data. I tried decoding with different formats such as utf-8 and utf-16 but according to the content headers, it shows utf-8 format (I think).
Conclusion
I'm not quite sure if the problem is with downloading the image or with writing the image to the file.
If anyone can help me with this I'd be forever grateful! Also let me know what I can do to improve my posts in the future.
Thanks in advance.
Code
from urllib import urlopen
from json import loads
from sys import argv
import dateutil.parser as dateparser
import logging
# plugin your username and access_token (Token can be get and
# modified in the Explorer's Get Access Token button):
# https://graph.facebook.com/USER_NAME/photos?type=uploaded&fields=source&access_token=ACCESS_TOKEN_HERE
FACEBOOK_USER_ID = "**USER ID REMOVED"
FACEBOOK_ACCESS_TOKEN = "** TOKEN REMOVED - GET YOUR OWN **"
def get_logger(label='lvm_cli', level='INFO'):
"""
Return a generic logger.
"""
format = '%(asctime)s - %(levelname)s - %(message)s'
logging.basicConfig(format=format)
logger = logging.getLogger(label)
logger.setLevel(getattr(logging, level))
return logger
def urlrequest(url):
"""
Make a url request
"""
req = urlopen(url)
data = req.read()
return data
def get_json(url):
"""
Make a url request and return as a JSON object
"""
res = urlrequest(url)
data = loads(res)
return data
def get_next(data):
"""
Get next element from facebook JSON response,
or return None if no next present.
"""
try:
return data['paging']['next']
except KeyError:
return None
def get_images(data):
"""
Get all images from facebook JSON response,
or return None if no data present.
"""
try:
return data['data']
except KeyError:
return []
def get_all_images(url):
"""
Get all images using recursion.
"""
data = get_json(url)
images = get_images(data)
next = get_next(data)
if not next:
return images
else:
return images + get_all_images(next)
def get_url(userid, access_token):
"""
Generates a useable facebook graph API url
"""
root = 'https://graph.facebook.com/'
endpoint = '%s/photos?type=uploaded&fields=source,updated_time&access_token=%s' % \
(userid, access_token)
return '%s%s' % (root, endpoint)
def download_file(url, filename):
"""
Write image to a file.
"""
data = urlrequest(url)
path = 'C:/photos/%s' % filename
f = open(path, 'w')
f.write(data)
f.close()
def create_time_stamp(timestring):
"""
Creates a pretty string from time
"""
date = dateparser.parse(timestring)
return date.strftime('%Y-%m-%d-%H-%M-%S')
def download(userid, access_token):
"""
Download all images to current directory.
"""
logger = get_logger()
url = get_url(userid, access_token)
logger.info('Requesting image direct link, please wait..')
images = get_all_images(url)
for image in images:
logger.info('Downloading %s' % image['source'])
filename = '%s.jpg' % create_time_stamp(image['created_time'])
download_file(image['source'], filename)
if __name__ == '__main__':
download(FACEBOOK_USER_ID, FACEBOOK_ACCESS_TOKEN)
Answering the question of why #Alastair's solution from the comments worked:
f = open(path, 'wb')
From https://docs.python.org/2/tutorial/inputoutput.html#reading-and-writing-files:
On Windows, 'b' appended to the mode opens the file in binary mode, so
there are also modes like 'rb', 'wb', and 'r+b'. Python on Windows
makes a distinction between text and binary files; the end-of-line
characters in text files are automatically altered slightly when data
is read or written. This behind-the-scenes modification to file data
is fine for ASCII text files, but it’ll corrupt binary data like that
in JPEG or EXE files. Be very careful to use binary mode when reading
and writing such files. On Unix, it doesn’t hurt to append a 'b' to
the mode, so you can use it platform-independently for all binary
files.
(I was on a Mac, which explains why the problem wasn't reproduced for me.)
Alastair McCormack posted something that worked!
He said Try setting binary mode when you open the file for writing: f = open(path, 'wb')
It is now successfully downloading the images correctly. Does anyone know why this worked?
I am currently working on a small web interface which allows different users to upload files, convert the files they have uploaded, and download the converted files. The details of the conversion are not important for my question.
I am currently using flask-uploads to manage the uploaded files, and I am storing them in the file system. Once a user uploads and converts a file, there are all sorts of pretty buttons to delete the file, so that the uploads folder doesn't fill up.
I don't think this is ideal. What I really want is for the files to be deleted right after they are downloaded. I would settle for the files being deleted when the session ends.
I've spent some time trying to figure out how to do this, but I have yet to succeed. It doesn't seem like an uncommon problem, so I figure there must be some solution out there that I am missing. Does anyone have a solution?
There are several ways to do this.
send_file and then immediately delete (Linux only)
Flask has an after_this_request decorator which could work for this use case:
#app.route('/files/<filename>/download')
def download_file(filename):
file_path = derive_filepath_from_filename(filename)
file_handle = open(file_path, 'r')
#after_this_request
def remove_file(response):
try:
os.remove(file_path)
file_handle.close()
except Exception as error:
app.logger.error("Error removing or closing downloaded file handle", error)
return response
return send_file(file_handle)
The issue is that this will only work on Linux (which lets the file be read even after deletion if there is still an open file pointer to it). It also won't always work (I've heard reports that sometimes send_file won't wind up making the kernel call before the file is already unlinked by Flask). It doesn't tie up the Python process to send the file though.
Stream file, then delete
Ideally though you'd have the file cleaned up after you know the OS has streamed it to the client. You can do this by streaming the file back through Python by creating a generator that streams the file and then closes it, like is suggested in this answer:
def download_file(filename):
file_path = derive_filepath_from_filename(filename)
file_handle = open(file_path, 'r')
# This *replaces* the `remove_file` + #after_this_request code above
def stream_and_remove_file():
yield from file_handle
file_handle.close()
os.remove(file_path)
return current_app.response_class(
stream_and_remove_file(),
headers={'Content-Disposition': 'attachment', 'filename': filename}
)
This approach is nice because it is cross-platform. It isn't a silver bullet however, because it ties up the Python web process until the entire file has been streamed to the client.
Clean up on a timer
Run another process on a timer (using cron, perhaps) or use an in-process scheduler like APScheduler and clean up files that have been on-disk in the temporary location beyond your timeout (e. g. half an hour, one week, thirty days, after they've been marked "downloaded" in RDMBS)
This is the most robust way, but requires additional complexity (cron, in-process scheduler, work queue, etc.)
You can also store the file's data in memory, delete it, then serve what you have in memory.
For example, if you were serving a PDF:
import io
import os
#app.route('/download')
def download_file():
file_path = get_path_to_your_file()
return_data = io.BytesIO()
with open(file_path, 'rb') as fo:
return_data.write(fo.read())
# (after writing, cursor will be at last byte, so move it to start)
return_data.seek(0)
os.remove(file_path)
return send_file(return_data, mimetype='application/pdf',
attachment_filename='download_filename.pdf')
(above I'm just assuming it's PDF, but you can get the mimetype programmatically if you need)
Flask has an after_request decorator which could work in this case:
#app.route('/', methods=['POST'])
def upload_file():
uploaded_file = request.files['file']
file = secure_filename(uploaded_file.filename)
#app.after_request
def delete(response):
os.remove(file_path)
return response
return send_file(file_path, as_attachment=True, environ=request.environ)
Based on #Garrett comment, the better approach is to not blocking the send_file while removing the file. IMHO, the better approach is to remove it in the background, something like the following is better:
import io
import os
from flask import send_file
from multiprocessing import Process
#app.route('/download')
def download_file():
file_path = get_path_to_your_file()
return_data = io.BytesIO()
with open(file_path, 'rb') as fo:
return_data.write(fo.read())
return_data.seek(0)
background_remove(file_path)
return send_file(return_data, mimetype='application/pdf',
attachment_filename='download_filename.pdf')
def background_remove(path):
task = Process(target=rm(path))
task.start()
def rm(path):
os.remove(path)
I am working on python and biopython right now. I have a file upload form and whatever file is uploaded suppose(abc.fasta) then i want to pass same name in execute (abc.fasta) function parameter and display function parameter (abc.aln). Right now i am changing file name manually, but i want to have it automatically.
Workflow goes like this.
----If submit is not true then display only header and form part
--- if submit is true then call execute() and get file name from form input
--- Then displaying result file name is same as executed file name but only change in extension
My raw code is here -- http://pastebin.com/FPUgZSSe
Any suggestions, changes and algorithm is appreciated
Thanks
You need to read the uploaded file out of the cgi.FieldStorage() and save it onto the server. Ususally a temp directory (/tmp on Linux) is used for this. You should remove these files after processing or on some schedule to clean up the drive.
def main():
import cgi
import cgitb; cgitb.enable()
f1 = cgi.FieldStorage()
if "dfile" in f1:
fileitem = f1["dfile"]
pathtoTmpFile = os.path.join("path/to/temp/directory", fileitem.filename)
fout = file(pathtoTmpFile, 'wb')
while 1:
chunk = fileitem.file.read(100000)
if not chunk: break
fout.write (chunk)
fout.close()
execute(pathtoTmpFile)
os.remove(pathtoTmpFile)
else:
header()
form()
This modified the execute to take the path to the newly saved file.
cline = ClustalwCommandline("clustalw", infile=pathToFile)
For the result file, you could also stream it back so the user gets a "Save as..." dialog. That might be a little more usable than displaying it in HTML.
I have a small project that would be perfect for Google App Engine. Implementing it hinges on the ability to generate a ZIP file and return it.
Due to the distributed nature of App Engine, from what I can tell, the ZIP file couldn't be created "in-memory" in the traditional sense. It would basically have to be generated and and sent in a single request/response cycle.
Does the Python zip module even exist in the App Engine environment?
zipfile is available at appengine and reworked example follows:
from contextlib import closing
from zipfile import ZipFile, ZIP_DEFLATED
from google.appengine.ext import webapp
from google.appengine.api import urlfetch
def addResource(zfile, url, fname):
# get the contents
contents = urlfetch.fetch(url).content
# write the contents to the zip file
zfile.writestr(fname, contents)
class OutZipfile(webapp.RequestHandler):
def get(self):
# Set up headers for browser to correctly recognize ZIP file
self.response.headers['Content-Type'] ='application/zip'
self.response.headers['Content-Disposition'] = \
'attachment; filename="outfile.zip"'
# compress files and emit them directly to HTTP response stream
with closing(ZipFile(self.response.out, "w", ZIP_DEFLATED)) as outfile:
# repeat this for every URL that should be added to the zipfile
addResource(outfile,
'https://www.google.com/intl/en/policies/privacy/',
'privacy.html')
addResource(outfile,
'https://www.google.com/intl/en/policies/terms/',
'terms.html')
import zipfile
import StringIO
text = u"ABCDEFGHIJKLMNOPQRSTUVWXYVabcdefghijklmnopqqstuvweyxáéöüï东 廣 広 广 國 国 国 界"
zipstream=StringIO.StringIO()
file = zipfile.ZipFile(file=zipstream,compression=zipfile.ZIP_DEFLATED,mode="w")
file.writestr("data.txt.zip",text.encode("utf-8"))
file.close()
zipstream.seek(0)
self.response.headers['Content-Type'] ='application/zip'
self.response.headers['Content-Disposition'] = 'attachment; filename="data.txt.zip"'
self.response.out.write(zipstream.getvalue())
From What is Google App Engine:
You can upload other third-party
libraries with your application, as
long as they are implemented in pure
Python and do not require any
unsupported standard library modules.
So, even if it doesn't exist by default you can (potentially) include it yourself. (I say potentially because I don't know if the Python zip library requires any "unsupported standard library modules".