Gevent joinall blocking error - python

A bit of background on the application:
Clients make web requests
Server has to handle web requests ( each request takes 20 seconds )
Send response to clients.
My approach was to first parallelize the request serving part using webpy and mod_wsgi. If I start the code with 20 or so threads, but since each thread takes 20 seconds to complete, the request level multi-threading is of less use. So I had to reduce the 20 seconds which I did by spawning greenlets. Something like this
File wordprocess_gevent.py
import gevent
urls = ('/'. A)
class A:
output = {}
def POST(self):
#words is a list of words
method A(words):
def A(self, words):
threads = []
for word in words:
i_thread = gevent.spawn(B, word)
threads.append(i_thread)
gevent.joinall(threads, timeout=2)
def B(self, word):
#Process word takes 20 seconds
result = process(word)
a[word] = result
application = web.application(urls, globals()).wsgifunc()
I start this code using the mod_wsgi-express as given below:
mod_wsgi-express start-server wordprocess_gevent.py --processes 5 --server-root wsgi_logs/ --with-wdb &
When simultaneous POST requests arrive I get this error
LoopExit: This operation would block forever
at the line
gevent.joinall(threads, timeout=2)
BUT if I post a single POST request - I get the required results. Can someone help me out here please.

So I solved this by removing mod_wsgi altogether. Even with one process I was getting the same result when multiple requests came.
Added the following and its now working like a charm :)
if __name__ == "__main__":
application = web.application(urls, globals()).wsgifunc()
appserver = WSGIServer(('', 8000), application)
appserver.serve_forever()

Related

Python HTTP Request/ Response very slow speed

I want to test how much http request an easy ping-pong service can handle in a certain amount of time. I have already implemented it in Java and Go and it works fine. But when testing it with python one single ping-pong cycle needs a bit more than 2s on my machine and that's enormously long. In the other languages I have nearly 1 per each milliseconds. More than 2000 times what I get with python... how can that be? I think the http server needs to be build up each time again with python, but is it really that and how could I fix it?
And of course the code to my little python scripts is down below:
import requests
add = 'aaaaaaaaaa'
sum = ''
def getTime(i):
site_request = requests.post("http://localhost:8082/ping", data=i)
site_response = str(site_request.content)
return site_response
def printToFile():
return
for x in range(5):
print(getTime(sum))
sum += add
from flask import Flask
from flask import request
import requests
import time
app = Flask(__name__)
#app.route("/ping", methods=['POST'])
def region():
data = request.get_data()
start = time.time()
site_request = requests.post("http://localhost:8080/pong", data=data)
site_response = str(site_request.content)
end = time.time()
cl = int(site_request.headers['content-length'])
if cl != len(data):
return end - start + ", " + cl + "," + 500
return str(end - start) + "," + str(cl) + "," + str(200)
app.run(host='127.0.0.1', port= 8082)
from flask import Flask
from flask import request
app = Flask(__name__)
#app.route("/pong", methods=['POST'])
def region():
data = request.get_data()
return data
app.run(host='127.0.0.1', port= 8080)
The problem you are facing here is that the Python requests library is a synchronous library.
While your Flask app (if set up correctly) will be capable of handling many requests at once, your code used to send the requests will only send one at a time and will block until each one is finished in sequence.
A better test of the speed of your server will be to use Python's somewhat newer async features utilizing a library like asyncio
Give that a try as the method for firing off your test requests and see if it is still slow. If so, at least you will have ruled out one potential issue!
As an example of an excellent library you could try that utilizes asyncio, look at AIOHTTP.

issue where Flask app request handling is choked when a regex.serach is happening?

Setup :
Language i am using is Python
I am running a flask app with threaded =True and inside the flask app when an endpoint is hit, it starts a thread and returns a thread started successfully with status code 200.
Inside the thread, there is re.Search happening, which takes 30-40 seconds(worst case) and once the thread is completed it hits a callback URL completing one request.
Ideally, the Flask app is able to handle concurrent requests
Issue:
When the re.search is happening inside the thread, the flask app is not accepting concurrent requests. I am assuming some kind of thread locking is happening and unable to figure out.
Question :
is it ok to do threading(using multi processing) inside flask app which has "threaded = True"?
When regex is happening does it do any thread locking?
Code snippet: hit_callback does a post request to another api which is not needed for this issue.
import threading
from flask import *
import re
app = Flask(__name__)
#app.route(/text)
def temp():
text = request.json["text"]
def extract_company(text)
attrib_list = ["llc","ltd"] # realone is 700 in length
entity_attrib = r"((\s|^)" + r"(.)?(\s|$)|(\s|^)".join(attrib_list) + r"(\s|$))"
raw_client = re.search("(.*(?:" + entity_attrib + "))", text, re.I)
hit_callback(raw_client)
extract_thread = threading.Thread(target=extract_company, )
extract_thread.start()
return jsonify({"Response": True}), 200
if __name__ == "__main__":
app.run(debug=True, host='0.0.0.0', port=4557, threaded=True)
Please read up on the GIL - basically, Python can only execute ONE piece of code at the same time - even if in threads. So your re-search runs and blocks all other threads until run to completion.
Solutions: Use Gunicorn etc. pp to run multiple flask processes - do not attempt to do everything in one flask process.
To add something: Also your design is problematic - 40 seconds or so for a http answer is way to long. A better design would have at least two services: a web server and a search service. The first would register a search and give back an id. Your search service would communicate asynchronously with the web service and return resullt +id when the result is ready. Your clients could pull the web service until they get a result.

Python flask server taking long time to boot

I have an odd problem at which when I run my code below in PyCharm or through the console (python script.py) the flask server takes an extremely long time to boot meaning when trying to access it it shows no content for a good few minutes.
import threading
from flask import render_template, request, logging, Flask, redirect
def setupFlask():
appn = Flask(__name__)
log = logging.getLogger('werkzeug')
log.setLevel(logging.ERROR)
#appn.route('/')
def page():
return render_template('index.html')
#appn.route('/submit', methods=['POST'])
def submit():
token = request.form['ID']
ID = token
return redirect('/')
appn.run()
a = threading.Thread(target=setupFlask)
a.daemon = True
a.start()
while True:
pass
The odd thing is that when I run the same code above in the PyCharm debugger, the flask server takes about 5 seconds to boot, massively quicker than the few minutes it takes when run in the console. I would love that kind of speed when running the script normally and cant find a solution due to the problem fixing itself when in debugger!
This code snippet is part of a larger application, however I have adapted it to be run on its own and the same problem occurs.
I am not running in a virtualenv.
All help appreciated.
EDIT: The index.html document is very basic and only contains a few scripts and elements therefore I could not see it taking a long time to load.
The problem was with your Flask installation, but there's another one. You should not wait for your thread with a while loop. The better way is to join your thread, like this:
a = threading.Thread(target=setupFlask)
a.daemon = True
a.start()
a.join()

Does bottle handle requests with no concurrency?

At first, I think Bottle will handle requests concurrently, so I wrote test code bellow:
import json
from bottle import Bottle, run, request, response, get, post
import time
app = Bottle()
NUMBERS = 0
#app.get("/test")
def test():
id = request.query.get('id', 0)
global NUMBERS
n = NUMBERS
time.sleep(0.2)
n += 1
NUMBERS = n
return id
#app.get("/status")
def status():
return json.dumps({"numbers": NUMBERS})
run(app, host='0.0.0.0', port=8000)
Then I use jmeter to request /test url with 10 threads loops 20 times.
After that, /status gives me {"numbers": 200}, which seems like that bottle does not handle requests concurrently.
Did I misunderstand anything?
UPDATE
I did another test, I think it can prove that bottle deal with requests one by one(with no concurrency). I did a little change to the test function:
#app.get("/test")
def test():
t1 = time.time()
time.sleep(5)
t2 = time.time()
return {"t1": t1, "t2": t2}
And when I access /test twice in a browser I get:
{
"t2": 1415941221.631711,
"t1": 1415941216.631761
}
{
"t2": 1415941226.643427,
"t1": 1415941221.643508
}
Concurrency isn't a function of your web framework -- it's a function of the web server you use to serve it. Since Bottle is WSGI-compliant, it means you can serve Bottle apps through any WSGI server:
wsgiref (reference server in the Python stdlib) will give you no concurrency.
CherryPy dispatches through a thread pool (number of simultaneous requests = number of threads it's using).
nginx + uwsgi gives you multiprocess dispatch and multiple threads per process.
Gevent gives you lightweight coroutines that, in your use case, can easily achieve C10K+ with very little CPU load (on Linux -- on Windows it can only handle 1024 simultaneous open sockets) if your app is mostly IO- or database-bound.
The latter two can serve massive numbers of simultaneous connections.
According to http://bottlepy.org/docs/dev/api.html , when given no specific instructions, bottle.run uses wsgiref to serve your application, which explains why it's only handling one request at once.

Start backend with async urlfetch on Google App Engine

I am experimenting with several of GAE's features.
I 've built a Dynamic Backend but I am having several issues getting this thing to work without task queues
Backend code:
class StartHandler(webapp2.RequestHandler):
def get(self):
#... do stuff...
if __name__ == '__main__':
_handlers = [(r'/_ah/start', StartHandler)]
run_wsgi_app(webapp2.WSGIApplication(_handlers))
The Backend is dynamic. So whenever it receives a call it does it's stuff and then stops.
Everything is worikng fine when I use inside my handlers:
url = backends.get_url('worker') + '/_ah/start'
urlfetch.fetch(url)
But I want this call to be async due to the reason that the Backend might take up to 10 minutes to finish it's work.
So I changed the above code to:
url = backends.get_url('worker') + '/_ah/start'
rpc = urlfetch.create_rpc()
urlfetch.make_fetch_call(rpc, url)
But then the Backend does not start. I am not interested into completion of the request or getting any data out of it.
What am I missing - implementing wrong?
Thank you all
Using RPC for async call without calling get_result() on the rpc object will not grantee that the urlfetch will be called. Once your code exits the pending async calls that were not completed will be aborted.
The only way to make the handler async is to queue the code in a push queue.

Categories

Resources