I have a front-end web server written in Twisted Web, that interfaces with another web server. Clients upload files to my front-end server, which then sends the files along to the back-end server. I'd like to receive the uploaded file, and then send an immediate response to the client before sending the file on to the back-end server. That way the client doesn't have to wait for both uploads to occur before getting a response.
I'm trying to do this by starting the upload to the back-end server in a separate thread. The problem is, after sending a response to the client, I'm no longer able to access the uploaded file from the Request object. Here's an example:
class PubDir(Resource):
def render_POST(self, request):
if request.args["t"][0] == 'upload':
thread.start_new_thread(self.upload, (request,))
### Send response to client while the file gets uploaded to the back-end server:
return redirectTo('http://example.com/uploadpage')
def upload(self, request):
postheaders = request.getAllHeaders()
try:
postfile = cgi.FieldStorage(
fp = request.content,
headers = postheaders,
environ = {'REQUEST_METHOD':'POST',
'CONTENT_TYPE': postheaders['content-type'],
}
)
except Exception as e:
print 'something went wrong: ' + str(e)
filename = postfile["file"].filename
file = request.args["file"][0]
#code to upload file to back-end server goes here...
When I try this, I get an error: I/O operation on closed file.
You need to actually copy the file into a buffer in memory or into a tempfile on disk before you finish the request object (which is what happens when you redirect).
So you are starting your thread and handing it the request object, it's maybe opening a connection to your backend server and beginning to copy when you redirect which finishes the request and closes any associated tempfiles and you're in trouble.
Instead of passing the whole request to your thread a quick test would be trying to just pass the content of the request to your thread:
thread.start_new_thread(self.upload, (request.content.read(),))
Related
In a Flask Application I am uploading a file to a server. I receive the file from a client that sends files sequentially. So to make the upload faster I wish to return a success response to the client as soon as I make the upload api request, without waiting for the response from the server.
I need to achieve asynchronous behaviour in my app similar to how node works.
What I need to be able to do is
async def upload_file(f):
result = await upload_api_call(f)
## do stuff based on result, in the background
def scp(data):
file = data.file
# upload file
upload_file(f)
# return success to client without waiting for upload to finish
return 'success'
I have tried using asyncio and am able to get asynchronous behaviour. However the problem is I still cant send a return response to the client before the entire execution completes. This is because needing to use following at the end of scp function, before return.
try:
loop.run_forever()
finally:
loop.close()
This nullifies my reason to wanting asynchronous behaviour as I am essentially waiting for the upload before returning to client.
I'm looking for some advice, or a relevant tutorial regarding the following:
My task is to set up a flask route that POSTs to API endpoint X, receives a new endpoint Y in X's response, then GETs from endpoint Y repeatedly until it receives a certain status message in the body of Y's response, and then returns Y's response.
The code below (irrelevant data redacted) accomplishes that goal in, I think, a very stupid way. It returns the appropriate data occasionally, but not reliably. (It times out 60% of the time.) When I console log very thoroughly, it seems as though I have bogged down my server with multiple while loops running constantly, interfering with each other.
I'll also receive this error occasionally:
SIGPIPE: writing to a closed pipe/socket/fd (probably the client disconnected) on request /book
import sys, requests, time, json
from flask import Flask, request
# create the Flask app
app = Flask(__name__)
# main booking route
#app.route('/book', methods=['POST']) #GET requests will be blocked
def book():
# defining the api-endpoints
PRICING_ENDPOINT = ...
# data to be sent to api
data = {...}
# sending post request and saving response as response object
try:
r_pricing = requests.post(url = PRICING_ENDPOINT, data = data)
except requests.exceptions.RequestException as e:
return e
sys.exit(1)
# extracting response text
POLL_ENDPOINT = r_pricing.headers['location']
# setting data for poll
data_for_poll = {...}
r_poll = requests.get(POLL_ENDPOINT, data = data_for_poll)
# poll loop, looking for 'UpdatesComplete'
j = 1
poll_json = r_poll.json()
update_status = poll_json['Status']
while update_status == 'UpdatesPending':
time.sleep(2)
j = float(j) + float(1)
r_poll = requests.get(POLL_ENDPOINT, data = data_for_poll)
poll_json = r_poll.json()
update_status = poll_json['Status']
return r_poll.text
This is more of an architectural issue more than a Flask issue. Long-running tasks in Flask views are always a poor design choice. In this case, the route's response is dependent on two endpoints of another server. In effect, apart from carrying the responsibility of your app, you are also carrying the responsibility of another server.
Since the application's design seems to be a proxy for another service, I would recommend creating the proxy in the right way. Just like book() offers the proxy for PRICING_ENDPOINT POST request, create another route for POLL_ENDPOINT GET request and move the polling logic to the client code (JS).
Update:
If you cannot for some reason trust the client (browser -> JS) with the POLL_ENDPOINT information in a hidden proxy like situation, then maybe move the polling to a task runner like Celery or Python RQ. Although, it will introduce additional components to your application, it would be the right way to go.
Probably you get that error because of the HTTP connection time out with your API server that is looping. There are some standards for HTTP time connection and loop took more time that is allowed for the connection. The first (straight) solution is to "play" with Apache configs and increase the HTTP connection time for your wsgi. You can also make a socket connection and in it check the update status and close it while the goal was achieved. Or you can move your logic to the client side.
I'm trying to build client-server app in Python.
My client use requests module to connect to the server and upload json and files.
Server use tornado framework. When server receive data from client, they start processing and send result to client by parts.
Example of my post handler:
class PostAd(tornado.web.RequestHandler):
def post(self):
jdata = self.get_body_arguments('json', False)[0]
jdata = json.loads(jdata)
id = self.insert_ad(jdata)
fpath_list = self.save_files(self.request.files.values(), id)
self.insert_file_path(id, fpath_list)
self.write("Successfully posted into SQL with sql id: {0}".format(id))
self.flush()
self.write("Are u there?")
self.finish()
in the client requests used to post data
r=agent.post("http://localhost:8888/api/v1/add-ad", data={"json": thread_data}, files=files)
in this way I cannot receive data by pieces because r=agent.post will wait until server will close connection but I need to check returned values every time when tornado server will send me data with self.flush() command (in my example I expect to get two answers, first one: "Successfully posted into SQL with sql id: 100" and second: "Are u there?").
is it possible to do it with requests module or I need to use something else here?
I don't know what agent.post() is, but you can do this with tornado's HTTP client and the streaming_callback option. You'll have to format the request body yourself, though, since Tornado doesn't have built-in client-side support for multipart file uploads.
await AsyncHTTPClient().fetch(url, body=encoded_body, streaming_callback=print)
There is no guarantee that the chunks observed by streaming_callback will align with the calls to flush, so you should format the data so that the client can determine where messages begin or end.
As a follow up to another question I asked, I have a basic question about the easiest way to get a webapp2 python server to provide json data that is too large (about 100 kb) to send as a Channel API message to a client .
The webapp2 server generates several data files over several minutes based on a client request, and I am thinking that I would like the Channel API to send messages with the url to the client when the data is ready, and the client (a GWT app) could perform a http GET request to get the data. Each data file is unique to the client and therefore the server will have to have a request handler that will give the appropriate data file for the client.
Can you write a request handler that can provide the correct data file directly from another request handler for that particular client when the request is called? Or Do I need to store the data using Cloud SQL or the Data Store first until the client asks for it? Here is some incomplete sample code of what I would like to do:
class MainPage(webapp2.RequestHandler):
def get(self):
## This opens the GWT app
class Service_handler(webapp2.RequestHandler):
def get(self, parameters):
## This is called by the GWT app and generates the data to be
## sent to the client.
## A channel API message is sent to the client with the url
## for each data file generated.
class kml_handler(webapp2.RequestHandler):
def get(self, client_id):
## I would like to return the correct data here when it is
## called by the client. Do I need to store the data in
## Cloud SQL or the Data Store and then retrieve it
## or can this handler take the results directly from the
## Service_handler as soon as it is generated?
app = webapp2.WSGIApplication([
webapp2.Route(r'/', handler=MainPage),
webapp2.Route(r'/Service/', handler=Service_handler),
webapp2.Route(r'/_ah/channel/<connected>/', handler = connection_handler),
webapp2.Route(r'/kml/<client_id>', handler = kml_handler)
],
debug=True)
You can write files to the blobstore and serve those files from the blobstore.
Here is an example:
https://developers.google.com/appengine/docs/python/blobstore/overview#Complete_Sample_App
Given, when a user requests /foo on my server, I send the following HTTP response (not closing the connection):
Content-Type: multipart/x-mixed-replace; boundary=-----------------------
-----------------------
Content-Type: text/html
foo
When the user goes to /bar (which will send 204 No Content so the view doesn't change), I want to send the following data in the initial response.
-----------------------
Content-Type: text/html
bar
How would I get the second request to trigger this from the initial response? I'm planning on possibly creating a fancy [engines that support multipart/x-mixed-replace (currently only Gecko)]-only email webapp that does server-push and Ajax effects without any JavaScript, just for fun.
No complete answer, but:
In your question, you're describing a Comet-style architecture. Regarding support of Comet-style techniques in Python/WSGI, there is a StackOverflow question, which talks about various Python servers with support for long-running requests a la Comet.
Also interesting is this mail thread in the Python Web-SIG: "Could WSGI handle Asynchronous response?". In May 2008, there was a broad discussion in the Web-SIG about the topic of asynchronous requests in WSGI.
A recent development is evserver, a lightweight WSGI server, which implements the Asynchronous WSGI extension proposed by Christopher Stawarz in the Web-SIG in May 2008.
Finally, the Tornado web server supports non-blocking asynchronous requests. It has a chat example application using long polling, which has similarities with your requirements.
If the problem is to pass some command from /bar application to /foo application and you are using some servlet-like approach (the Python code is loaded once and not for each request as in CGI), you can just change some class property of the /foo application and be ready to react to the change in the /foo instance (by checking the property state).
Obviously the /foo application should not return right after the first request and yield content line by line.
Thought this is just theory, I have not tried that myself.
I have created some small example (just for fun, you know :))
import threading
num = 0
cond = threading.Condition()
def app(environ, start_response):
global num
cond.acquire()
num += 1
cond.notifyAll()
cond.release()
start_response("200 OK", [("Content-Type", "multipart/x-mixed-replace; boundary=xxx")])
while True:
n = num
s = "--xxx\r\nContent-Type: text/html\r\n\r\n%s\n" % n
yield s
# wait for num change:
cond.acquire()
while num == n:
cond.wait()
cond.release()
from cherrypy.wsgiserver import CherryPyWSGIServer
server = CherryPyWSGIServer(("0.0.0.0", 3000), app)
try:
server.start()
except KeyboardInterrupt:
server.stop()
# Now whenever you visit http://127.0.0.1:3000/, the number increases.
# It also automatically increases in all previously opened windows/tabs.
The idea of a shared variable and thread synchronization (using condition variable object) is based on the fact that WSGI server provided by CherryPyWSGIServer is threaded.
Not sure if this is quite what you're looking for, but there is a fairly old way of doing server push using a mime content of multipart/x-mixed-replace
Basically you compose the response as a mime object with content type multipart/x-mixed-replace, and send the first "version" of a document down. The browser will keep the socket open.
Then as the server decides to push more data, a new "version" of the document gets sent from the server, and the browser will intelligently replace (within whatever frame/iframe contains the content) the content.
This was an early way of doing webcams, where the server would send down (push) image after image, and the browser would just keep replacing the image in the document over and over. This is also a way of doing a "Loading..." message over a single HTTP request.