Can I improve the response time of this login service application by making parts of it asynchronous? - python

I have just written user login and logout but im trying to figure out what the most correct way of doing this is. From the documentation, there seems to be ways to make some of the code asynchronous, do i really need to do this? I've included the hashing functions (which i got from stackoverflow) so this is complete and can be built upon for very simple applications.
import os
import sqlite3
import hashlib
import binascii
from tornado.ioloop import IOLoop
from tornado.web import Application, RequestHandler
from tornado.options import define, options
define('port', default=80, help='port to listen on')
settings = dict(
template_path=os.path.join(os.path.dirname(__file__), "templates"),
static_path=os.path.join(os.path.dirname(__file__), "static"),
debug=True,
cookie_secret="changethis",
login_url="/login",
# xsrf_cookies=True,
)
def hash_password(password):
"""Hash a password for storing."""
salt = hashlib.sha256(os.urandom(60)).hexdigest().encode('ascii')
pwdhash = hashlib.pbkdf2_hmac('sha512', password.encode('utf-8'),
salt, 100000)
pwdhash = binascii.hexlify(pwdhash)
return (salt + pwdhash).decode('ascii')
def verify_password(stored_password, provided_password):
"""Verify a stored password against one provided by user"""
salt = stored_password[:64]
stored_password = stored_password[64:]
pwdhash = hashlib.pbkdf2_hmac('sha512',
provided_password.encode('utf-8'),
salt.encode('ascii'), 100000)
pwdhash = binascii.hexlify(pwdhash).decode('ascii')
return pwdhash == stored_password
try:
db = sqlite3.connect('file:aaa.db?mode=rw', uri=True)
except sqlite3.OperationalError:
db = sqlite3.connect("aaa.db")
db.execute("CREATE TABLE Users (id INTEGER PRIMARY KEY, username TEXT NOT NULL UNIQUE, password TEXT NOT NULL);")
class BaseHandler(RequestHandler):
def get_current_user(self):
return self.get_secure_cookie("session")
class IndexHandler(BaseHandler):
def get(self):
if not self.current_user:
self.write("not logged in")
return
count = db.execute("SELECT COUNT(*) FROM Users;").fetchone()
self.write('{} users so far!'.format(count[0]))
class LoginHandler(BaseHandler):
def get(self):
if self.current_user:
self.write("already logged in")
return
self.render("login.html")
def post(self):
if self.current_user:
self.write("already logged in")
return
name=self.get_body_argument("username")
query= db.execute("SELECT COUNT(*) FROM Users WHERE username = ?;", (name,)).fetchone()
if query[0] == 0:
self.write("user does not exist")
else:
hashed_password = db.execute("SELECT (password) FROM Users WHERE username = ?;", (name,)).fetchone()[0]
if verify_password(hashed_password, self.get_body_argument("password")):
self.set_secure_cookie("session", name)
self.write("cookie set, logged in")
else:
self.write("wrong password")
class SignupHandler(BaseHandler):
def get(self):
if self.current_user:
self.write("already logged in")
return
self.render("signup.html")
def post(self):
if self.current_user:
self.write("already logged in")
return
name=self.get_body_argument("username")
password=self.get_body_argument("password")
try:
with db:
db.execute("INSERT INTO Users(username,password) VALUES (?,?);", (name, hash_password(password)))
except sqlite3.IntegrityError:
self.write("user exists")
return
self.write("user added")
class LogoutHandler(BaseHandler):
def get(self):
self.clear_cookie("session")
self.write("logged out")
def main():
routes=(
(r'/', IndexHandler),
(r'/login', LoginHandler),
(r'/logout', LogoutHandler),
(r'/signup', SignupHandler),
)
app = Application(routes, **settings)
app.listen(options.port)
IOLoop.current().start()
if __name__=="__main__":
main()

Short answer: No.
Async code doesn't make things "faster". Async code is just regular sync code with some extra capabilities to pause/resume operations. There's no speed gain. The purpose of async code is not speed, it's to achieve concurrency without the overhead of threads.
See these two functions in the following code:
def func1():
data = get_data_from_database()
return data
async def func2():
data = await get_data_from_database()
return data
func1 is synchronous and func2 is asynchronous. Both functions will have the same speed because they both have to wait for the database to return the data.
So, you can make your code async, but it won't result in any speed gain because the database will return the data at it's regular speed and only after that will your code be able to perform further actions.
And don't use SQLite with Tornado. It runs in the same Python process as your Tornado code. And since it's going to read and write data to/from disk, it will result in slower, blocking code which will block the whole Tornado server and result in poor performance. See below for an explanation of "Blocking Code".
Now, yes, you can make it asynchronous, by running it in a separate thread, but then why not just use a standalone database like PostgreSQL or MySQL in the first place?
Blocking Code
A code which stops the program from moving further or doing anything else is called blocking code.
Blocking code can be of any of the following types:
Network bound operations. For example you're making an http request, or a database request, this is a network bound operation and it is slower. And this results in blocking code because the code can't move forward until it gets the response.
Disk bound operations. For example if you read a file from the disk, it results in blocking code because if the disk is busy or slow, your code can't move forward until it gets the data from the disk.
CPU bound operations. For example doing really heavy calculations which take up significant CPU time. It will result in blocking code because the code can't move forward until it gets the result of the calculation from the CPU.
Asynchronous code is useful for network bound operations. For disk and CPU bound operations, synchronous code is better.

Related

Does my lambda function go inside my main python script?

I don't know how to write a Lambda. Here is my main_script.py that executes 2 stored procedures. It inserts records every day then finds the difference between yesterday's and today's records and writes them to a table.
import logging
import pymysql as pm
import os
import json
class className:
env=None
config=None
def __init__(self, env_filename):
self.env=env_filename
self.config=self.get_config()
def get_config(self):
with open(self.env) as file_in:
return json.load(file_in)
def DB_connection(self):
config=className.get_config(self)
username=config["exceptions"]["database-secrets"]["aws_secret_username"]
password=config["exceptions"]["database-secrets"]["aws_secret_password"]
host=config["exceptions"]["database-secrets"]["aws_secret_host"]
port=config["exceptions"]["database-secrets"]["aws_secret_port"]
database=config["exceptions"]["database-secrets"]["aws_secret_db"]
return pm.connect(
user=username,
password=password,
host=host,
port=port,
database=database
)
def run_all(self):
def test_function(self):
test_function_INSERT_QUERY = "CALL sp_test_insert();"
test_function_EXCEPTIONS_QUERY = "CALL sp_test_exceptions();"
test = self.config["exceptions"]["functions"]["test_function"]
if test:
with self.DB_connection() as cnxn:
with cnxn.cursor() as cur:
try:
cur.execute(test_function_INSERT_QUERY)
print("test_function_INSERT_QUERY insertion query ran successfully, {} records updated.".format(cur.rowcount))
cur.execute(test_function_EXCEPTIONS_QUERY)
print("test_function_EXCEPTIONS_QUERY exceptions query ran successfully, {} exceptions updated.".format(cur.rowcount))
except pm.Error as e:
print(f"Error: {e}")
except Exception as e:
logging.exception(e)
else:
cnxn.commit()
test_function(self)
def main():
cwd=os.getcwd()
vfc=(cwd+"\_config"+".json")
ve=className(vfc)
ve.run_all()
if __name__ == "__main__":
main()
Would I write my lambda_handler function inside my script above or have it as a separate script?
def lambda_handler(event, context):
#some code
I would treat lambda_handler(event, context) as the equivalent of main() with the exception that you do not need if __name__ ... clause because you never run a lambda function from the console.
You would also need to use boto3 library to abstract away AWS services and their functions. Have a look at the tutorial to get started.
As the first order of business, I would put the DB credentials out of the file system and into a secure datastore. You can of course configure Lambda environment variables, but Systems Manager Parameter Store is more secure and super-easy to call from the code, e.g.:
import boto3
ssm = boto3.client('ssm', region_name='us-east-1')
def lambda_handler(event, context):
password = ssm.get_parameters(Names=['/pathto/password'], WithDecryption=True)['Parameters'][0]['Value']
return {"password": password}
There is a more advanced option, the Secrets Manager, which for a little money will even rotate passwords for you (because it is fully integrated with Relational Database Service).

How to test python tornado application that use Mongodb with Motor Client

I want to test my tornado python application with pytest.
for that purpose, I want to have a mock db for the mongo and to use motor "fake" client to simulate the calls to the mongodb.
I found alot of solution for pymongo but not for motor.
any idea?
I do not clearly understand your problem — why not just have hard-coded JSON data?
If you just want to have a class that would mock the following:
from motor.motor_tornado import MotorClient
client = MotorClient(MONGODB_URL)
my_db = client.my_db
result = await my_db['my_collection'].insert_one(my_json_load)
So I recommend creating a Class:
Class Collection():
database = []
async def insert_one(self,data):
database.append(data)
data['_id'] = "5063114bd386d8fadbd6b004" ## You may make it random or consequent
...
## Also, you may save the 'database' list to the pickle on disk to preserve data between runs
return data
async def find_one(self, data):
## Search in the list
return data
async def delete_one(self, data_id):
delete_one.deleted_count = 1
return
## Then create a collection:
my_db = {}
my_db['my_collecton'] = Collection()
### The following is the part of 'views.py'
from tornado.web import RequestHandler, authenticated
from tornado.escape import xhtml_escape
class UserHandler(RequestHandler):
async def post(self, name):
getusername = xhtml_escape(self.get_argument("user_variable"))
my_json_load = {'username':getusername}
result = await my_db['my_collection'].insert_one(my_json_load)
...
return self.write(result)
If you would clarify your question, I will develop my answer.

Flask Celery task locking

I am using Flask with Celery and I am trying to lock a specific task so that it can only be run one at a time. In the celery docs it gives a example of doing this Celery docs, Ensuring a task is only executed one at a time. This example that was given was for Django however I am using flask I have done my best to convert this to work with Flask however I still see myTask1 which has the lock can be run multiple times.
One thing that is not clear to me is if I am using the cache correctly, I have never used it before so all of it is new to me. One thing from the doc's that is mentioned but not explained is this
Doc Notes:
In order for this to work correctly you need to be using a cache backend where the .add operation is atomic. memcached is known to work well for this purpose.
Im not truly sure what that means, should i be using the cache in conjunction with a database and if so how would I do that? I am using mongodb. In my code I just have this setup for the cache cache = Cache(app, config={'CACHE_TYPE': 'simple'}) as that is what was mentioned in the Flask-Cache doc's Flask-Cache Docs
Another thing that is not clear to me is if there is anything different I need to do as I am calling my myTask1 from within my Flask route task1
Here is an example of my code that I am using.
from flask import (Flask, render_template, flash, redirect,
url_for, session, logging, request, g, render_template_string, jsonify)
from flask_caching import Cache
from contextlib import contextmanager
from celery import Celery
from Flask_celery import make_celery
from celery.result import AsyncResult
from celery.utils.log import get_task_logger
from celery.five import monotonic
from flask_pymongo import PyMongo
from hashlib import md5
import pymongo
import time
app = Flask(__name__)
cache = Cache(app, config={'CACHE_TYPE': 'simple'})
app.config['SECRET_KEY']= 'super secret key for me123456789987654321'
######################
# MONGODB SETUP
#####################
app.config['MONGO_HOST'] = 'localhost'
app.config['MONGO_DBNAME'] = 'celery-test-db'
app.config["MONGO_URI"] = 'mongodb://localhost:27017/celery-test-db'
mongo = PyMongo(app)
##############################
# CELERY ARGUMENTS
##############################
app.config['CELERY_BROKER_URL'] = 'amqp://localhost//'
app.config['CELERY_RESULT_BACKEND'] = 'mongodb://localhost:27017/celery-test-db'
app.config['CELERY_RESULT_BACKEND'] = 'mongodb'
app.config['CELERY_MONGODB_BACKEND_SETTINGS'] = {
"host": "localhost",
"port": 27017,
"database": "celery-test-db",
"taskmeta_collection": "celery_jobs",
}
app.config['CELERY_TASK_SERIALIZER'] = 'json'
celery = Celery('task',broker='mongodb://localhost:27017/jobs')
celery = make_celery(app)
LOCK_EXPIRE = 60 * 2 # Lock expires in 2 minutes
#contextmanager
def memcache_lock(lock_id, oid):
timeout_at = monotonic() + LOCK_EXPIRE - 3
# cache.add fails if the key already exists
status = cache.add(lock_id, oid, LOCK_EXPIRE)
try:
yield status
finally:
# memcache delete is very slow, but we have to use it to take
# advantage of using add() for atomic locking
if monotonic() < timeout_at and status:
# don't release the lock if we exceeded the timeout
# to lessen the chance of releasing an expired lock
# owned by someone else
# also don't release the lock if we didn't acquire it
cache.delete(lock_id)
#celery.task(bind=True, name='app.myTask1')
def myTask1(self):
self.update_state(state='IN TASK')
lock_id = self.name
with memcache_lock(lock_id, self.app.oid) as acquired:
if acquired:
# do work if we got the lock
print('acquired is {}'.format(acquired))
self.update_state(state='DOING WORK')
time.sleep(90)
return 'result'
# otherwise, the lock was already in use
raise self.retry(countdown=60) # redeliver message to the queue, so the work can be done later
#celery.task(bind=True, name='app.myTask2')
def myTask2(self):
print('you are in task2')
self.update_state(state='STARTING')
time.sleep(120)
print('task2 done')
#app.route('/', methods=['GET', 'POST'])
def index():
return render_template('index.html')
#app.route('/task1', methods=['GET', 'POST'])
def task1():
print('running task1')
result = myTask1.delay()
# get async task id
taskResult = AsyncResult(result.task_id)
# push async taskid into db collection job_task_id
mongo.db.job_task_id.insert({'taskid': str(taskResult), 'TaskName': 'task1'})
return render_template('task1.html')
#app.route('/task2', methods=['GET', 'POST'])
def task2():
print('running task2')
result = myTask2.delay()
# get async task id
taskResult = AsyncResult(result.task_id)
# push async taskid into db collection job_task_id
mongo.db.job_task_id.insert({'taskid': str(taskResult), 'TaskName': 'task2'})
return render_template('task2.html')
#app.route('/status', methods=['GET', 'POST'])
def status():
taskid_list = []
task_state_list = []
TaskName_list = []
allAsyncData = mongo.db.job_task_id.find()
for doc in allAsyncData:
try:
taskid_list.append(doc['taskid'])
except:
print('error with db conneciton in asyncJobStatus')
TaskName_list.append(doc['TaskName'])
# PASS TASK ID TO ASYNC RESULT TO GET TASK RESULT FOR THAT SPECIFIC TASK
for item in taskid_list:
try:
task_state_list.append(myTask1.AsyncResult(item).state)
except:
task_state_list.append('UNKNOWN')
return render_template('status.html', data_list=zip(task_state_list, TaskName_list))
Final Working Code
from flask import (Flask, render_template, flash, redirect,
url_for, session, logging, request, g, render_template_string, jsonify)
from flask_caching import Cache
from contextlib import contextmanager
from celery import Celery
from Flask_celery import make_celery
from celery.result import AsyncResult
from celery.utils.log import get_task_logger
from celery.five import monotonic
from flask_pymongo import PyMongo
from hashlib import md5
import pymongo
import time
import redis
from flask_redis import FlaskRedis
app = Flask(__name__)
# ADDING REDIS
redis_store = FlaskRedis(app)
# POINTING CACHE_TYPE TO REDIS
cache = Cache(app, config={'CACHE_TYPE': 'redis'})
app.config['SECRET_KEY']= 'super secret key for me123456789987654321'
######################
# MONGODB SETUP
#####################
app.config['MONGO_HOST'] = 'localhost'
app.config['MONGO_DBNAME'] = 'celery-test-db'
app.config["MONGO_URI"] = 'mongodb://localhost:27017/celery-test-db'
mongo = PyMongo(app)
##############################
# CELERY ARGUMENTS
##############################
# CELERY USING REDIS
app.config['CELERY_BROKER_URL'] = 'redis://localhost:6379/0'
app.config['CELERY_RESULT_BACKEND'] = 'mongodb://localhost:27017/celery-test-db'
app.config['CELERY_RESULT_BACKEND'] = 'mongodb'
app.config['CELERY_MONGODB_BACKEND_SETTINGS'] = {
"host": "localhost",
"port": 27017,
"database": "celery-test-db",
"taskmeta_collection": "celery_jobs",
}
app.config['CELERY_TASK_SERIALIZER'] = 'json'
celery = Celery('task',broker='mongodb://localhost:27017/jobs')
celery = make_celery(app)
LOCK_EXPIRE = 60 * 2 # Lock expires in 2 minutes
#contextmanager
def memcache_lock(lock_id, oid):
timeout_at = monotonic() + LOCK_EXPIRE - 3
print('in memcache_lock and timeout_at is {}'.format(timeout_at))
# cache.add fails if the key already exists
status = cache.add(lock_id, oid, LOCK_EXPIRE)
try:
yield status
print('memcache_lock and status is {}'.format(status))
finally:
# memcache delete is very slow, but we have to use it to take
# advantage of using add() for atomic locking
if monotonic() < timeout_at and status:
# don't release the lock if we exceeded the timeout
# to lessen the chance of releasing an expired lock
# owned by someone else
# also don't release the lock if we didn't acquire it
cache.delete(lock_id)
#celery.task(bind=True, name='app.myTask1')
def myTask1(self):
self.update_state(state='IN TASK')
print('dir is {} '.format(dir(self)))
lock_id = self.name
print('lock_id is {}'.format(lock_id))
with memcache_lock(lock_id, self.app.oid) as acquired:
print('in memcache_lock and lock_id is {} self.app.oid is {} and acquired is {}'.format(lock_id, self.app.oid, acquired))
if acquired:
# do work if we got the lock
print('acquired is {}'.format(acquired))
self.update_state(state='DOING WORK')
time.sleep(90)
return 'result'
# otherwise, the lock was already in use
raise self.retry(countdown=60) # redeliver message to the queue, so the work can be done later
#celery.task(bind=True, name='app.myTask2')
def myTask2(self):
print('you are in task2')
self.update_state(state='STARTING')
time.sleep(120)
print('task2 done')
#app.route('/', methods=['GET', 'POST'])
def index():
return render_template('index.html')
#app.route('/task1', methods=['GET', 'POST'])
def task1():
print('running task1')
result = myTask1.delay()
# get async task id
taskResult = AsyncResult(result.task_id)
# push async taskid into db collection job_task_id
mongo.db.job_task_id.insert({'taskid': str(taskResult), 'TaskName': 'myTask1'})
return render_template('task1.html')
#app.route('/task2', methods=['GET', 'POST'])
def task2():
print('running task2')
result = myTask2.delay()
# get async task id
taskResult = AsyncResult(result.task_id)
# push async taskid into db collection job_task_id
mongo.db.job_task_id.insert({'taskid': str(taskResult), 'TaskName': 'task2'})
return render_template('task2.html')
#app.route('/status', methods=['GET', 'POST'])
def status():
taskid_list = []
task_state_list = []
TaskName_list = []
allAsyncData = mongo.db.job_task_id.find()
for doc in allAsyncData:
try:
taskid_list.append(doc['taskid'])
except:
print('error with db conneciton in asyncJobStatus')
TaskName_list.append(doc['TaskName'])
# PASS TASK ID TO ASYNC RESULT TO GET TASK RESULT FOR THAT SPECIFIC TASK
for item in taskid_list:
try:
task_state_list.append(myTask1.AsyncResult(item).state)
except:
task_state_list.append('UNKNOWN')
return render_template('status.html', data_list=zip(task_state_list, TaskName_list))
if __name__ == '__main__':
app.secret_key = 'super secret key for me123456789987654321'
app.run(port=1234, host='localhost')
Here is also a screen shot you can see that I ran myTask1 two times and myTask2 a single time. Now I have the expected behavior for myTask1. Now myTask1 will be run by a single worker if another worker attempt to pick it up it will just keep retrying based on whatever i define.
In your question, you point out this warning from the Celery example you used:
In order for this to work correctly you need to be using a cache backend where the .add operation is atomic. memcached is known to work well for this purpose.
And you mention that you don't really understand what this means. Indeed, the code you show demonstrates that you've not heeded that warning, because your code uses an inappropriate backend.
Consider this code:
with memcache_lock(lock_id, self.app.oid) as acquired:
if acquired:
# do some work
What you want here is for acquired to be true only for one thread at a time. If two threads enter the with block at the same time, only one should "win" and have acquired be true. This thread that has acquired true can then proceed with its work, and the other thread has to skip doing the work and try again later to acquire the lock. In order to ensure that only one thread can have acquired true, .add must be atomic.
Here's some pseudo code of what .add(key, value) does:
1. if <key> is already in the cache:
2. return False
3. else:
4. set the cache so that <key> has the value <value>
5. return True
If the execution of .add is not atomic, this could happen if two threads A and B execute .add("foo", "bar"). Assume an empty cache at the start.
Thread A executes 1. if "foo" is already in the cache and finds that "foo" is not in the cache, and jumps to line 3 but the thread scheduler switches control to thread B.
Thread B also executes 1. if "foo" is already in the cache, and also finds that "foo" is not in the cache. So it jumps to line 3 and then executes line 4 and 5 which sets the key "foo" to the value "bar" and the call returns True.
Eventually, the scheduler gives control back to Thread A, which continues executing 3, 4, 5 and also sets the key "foo" to the value "bar" and also returns True.
What you have here is two .add calls that return True, if these .add calls are made within memcache_lock this entails that two threads can have acquired be true. So two threads could do work at the same time, and your memcache_lock is not doing what it should be doing, which is only allow one thread to work at a time.
You are not using a cache that ensures that .add is atomic. You initialize it like this:
cache = Cache(app, config={'CACHE_TYPE': 'simple'})
The simple backend is scoped to a single process, has no thread-safety, and has an .add operation which is not atomic. (This does not involve Mongo at all by the way. If you wanted your cache to be backed by Mongo, you'd have to specify a backed specifically made to send data to a Mongo database.)
So you have to switch to another backend, one that guarantees that .add is atomic. You could follow the lead of the Celery example and use the memcached backend, which does have an atomic .add operation. I don't use Flask, but I've does essentially what you are doing with Django and Celery, and used the Redis backend successfully to provide the kind of locking you're using here.
I also found this to be a surprisingly hard problem. Inspired mainly by Sebastian's work on implementing a distributed locking algorithm in redis I wrote up a decorator function.
A key point to bear in mind about this approach is that we lock tasks at the level of the task's argument space, e.g. we allow multiple game update/process order tasks to run concurrently, but only one per game. That's what argument_signature achieves in the code below. You can see documentation on how we use this in our stack at this gist:
import base64
from contextlib import contextmanager
import json
import pickle as pkl
import uuid
from backend.config import Config
from redis import StrictRedis
from redis_cache import RedisCache
from redlock import Redlock
rds = StrictRedis(Config.REDIS_HOST, decode_responses=True, charset="utf-8")
rds_cache = StrictRedis(Config.REDIS_HOST, decode_responses=False, charset="utf-8")
redis_cache = RedisCache(redis_client=rds_cache, prefix="rc", serializer=pkl.dumps, deserializer=pkl.loads)
dlm = Redlock([{"host": Config.REDIS_HOST}])
TASK_LOCK_MSG = "Task execution skipped -- another task already has the lock"
DEFAULT_ASSET_EXPIRATION = 8 * 24 * 60 * 60 # by default keep cached values around for 8 days
DEFAULT_CACHE_EXPIRATION = 1 * 24 * 60 * 60 # we can keep cached values around for a shorter period of time
REMOVE_ONLY_IF_OWNER_SCRIPT = """
if redis.call("get",KEYS[1]) == ARGV[1] then
return redis.call("del",KEYS[1])
else
return 0
end
"""
#contextmanager
def redis_lock(lock_name, expires=60):
# https://breadcrumbscollector.tech/what-is-celery-beat-and-how-to-use-it-part-2-patterns-and-caveats/
random_value = str(uuid.uuid4())
lock_acquired = bool(
rds.set(lock_name, random_value, ex=expires, nx=True)
)
yield lock_acquired
if lock_acquired:
rds.eval(REMOVE_ONLY_IF_OWNER_SCRIPT, 1, lock_name, random_value)
def argument_signature(*args, **kwargs):
arg_list = [str(x) for x in args]
kwarg_list = [f"{str(k)}:{str(v)}" for k, v in kwargs.items()]
return base64.b64encode(f"{'_'.join(arg_list)}-{'_'.join(kwarg_list)}".encode()).decode()
def task_lock(func=None, main_key="", timeout=None):
def _dec(run_func):
def _caller(*args, **kwargs):
with redis_lock(f"{main_key}_{argument_signature(*args, **kwargs)}", timeout) as acquired:
if not acquired:
return TASK_LOCK_MSG
return run_func(*args, **kwargs)
return _caller
return _dec(func) if func is not None else _dec
Implementation in our task definitions file:
#celery.task(name="async_test_task_lock")
#task_lock(main_key="async_test_task_lock", timeout=UPDATE_GAME_DATA_TIMEOUT)
def async_test_task_lock(game_id):
print(f"processing game_id {game_id}")
time.sleep(TASK_LOCK_TEST_SLEEP)
How we test against a local celery cluster:
from backend.tasks.definitions import async_test_task_lock, TASK_LOCK_TEST_SLEEP
from backend.tasks.redis_handlers import rds, TASK_LOCK_MSG
class TestTaskLocking(TestCase):
def test_task_locking(self):
rds.flushall()
res1 = async_test_task_lock.delay(3)
res2 = async_test_task_lock.delay(5)
self.assertFalse(res1.ready())
self.assertFalse(res2.ready())
res3 = async_test_task_lock.delay(5)
res4 = async_test_task_lock.delay(5)
self.assertEqual(res3.get(), TASK_LOCK_MSG)
self.assertEqual(res4.get(), TASK_LOCK_MSG)
time.sleep(TASK_LOCK_TEST_SLEEP)
res5 = async_test_task_lock.delay(3)
self.assertFalse(res5.ready())
(as a goodie there's also a quick example of how to setup a redis_cache)
With this setup, you should still expect to see workers receiving the task, since the lock is checked inside of the task itself. The only difference will be that the work won't be performed if the lock is acquired by another worker.
In the example given in the docs, this is the desired behavior; if a lock already exists, the task will simply do nothing and finish as successful. What you want is slightly different; you want the work to be queued up instead of ignored.
In order to get the desired effect, you would need to make sure that the task will be picked up by a worker and performed some time in the future. One way to accomplish this would be with retrying.
#task(bind=True, name='my-task')
def my_task(self):
lock_id = self.name
with memcache_lock(lock_id, self.app.oid) as acquired:
if acquired:
# do work if we got the lock
print('acquired is {}'.format(acquired))
return 'result'
# otherwise, the lock was already in use
raise self.retry(countdown=60) # redeliver message to the queue, so the work can be done later

How to trigger a function after return statement in Flask

I have 2 functions.
1st function stores the data received in a list and 2nd function writes the data into a csv file.
I'm using Flask. Whenever a web service has been called it will store the data and send response to it, as soon as it sends response it triggers the 2nd function.
My Code:
from flask import Flask, flash, request, redirect, url_for, session
import json
app = Flask(__name__)
arr = []
#app.route("/test", methods=['GET','POST'])
def check():
arr.append(request.form['a'])
arr.append(request.form['b'])
res = {'Status': True}
return json.dumps(res)
def trigger():
df = pd.DataFrame({'x': arr})
df.to_csv("docs/xyz.csv", index=False)
return
Obviously the 2nd function is not called.
Is there a way to achieve this?
P.S: My real life problem is different where trigger function is time consuming and I don't want user to wait for it to finish execution.
One solution would be to have a background thread that will watch a queue. You put your csv data in the queue and the background thread will consume it. You can start such a thread before first request:
import threading
from multiprocessing import Queue
class CSVWriterThread(threading.Thread):
def __init__(self, *args, **kwargs):
threading.Thread.__init__(self, *args, **kwargs)
self.input_queue = Queue()
def send(self, item):
self.input_queue.put(item)
def close(self):
self.input_queue.put(None)
self.input_queue.join()
def run(self):
while True:
csv_array = self.input_queue.get()
if csv_array is None:
break
# Do something here ...
df = pd.DataFrame({'x': csv_array})
df.to_csv("docs/xyz.csv", index=False)
self.input_queue.task_done()
time.sleep(1)
# Done
self.input_queue.task_done()
return
#app.before_first_request
def activate_job_monitor():
thread = CSVWriterThread()
app.csvwriter = thread
thread.start()
And in your code put the message in the queue before returning:
#app.route("/test", methods=['GET','POST'])
def check():
arr.append(request.form['a'])
arr.append(request.form['b'])
res = {'Status': True}
app.csvwriter.send(arr)
return json.dumps(res)
P.S: My real life problem is different where trigger function is time consuming and I don't want user to wait for it to finish execution.
Consider using celery which is made for the very problem you're trying to solve. From docs:
Celery is a simple, flexible, and reliable distributed system to process vast amounts of messages, while providing operations with the tools required to maintain such a system.
I recommend you integrate celery with your flask app as described here. your trigger method would then become a straightforward celery task that you can execute without having to worry about long response time.
Im actually working on another interesting case on my side where i pass the work off to a python worker that sends the job to a redis queue. There are some great blogs using redis with Flask , you basically need to ensure redis is running (able to connect on port 6379)
The worker would look something like this:
import os
import redis
from rq import Worker, Queue, Connection
listen = ['default']
redis_url = os.getenv('REDISTOGO_URL', 'redis://localhost:6379')
conn = redis.from_url(redis_url)
if __name__ == '__main__':
with Connection(conn):
worker = Worker(list(map(Queue, listen)))
worker.work()
In my example I have a function that queries a database for usage and since it might be a lengthy process i pass it off to the worker (running as a seperate script)
def post(self):
data = Task.parser.parse_args()
job = q.enqueue_call(
func=migrate_usage, args=(my_args),
result_ttl=5000
)
print("Job ID is: {}".format(job.get_id()))
job_key = job.get_id()
print(str(Job.fetch(job_key, connection=conn).result))
if job:
return {"message": "Job : {} added to queue".format(job_key)}, 201
Credit due to the following article:
https://realpython.com/flask-by-example-implementing-a-redis-task-queue/#install-requirements
You can try use streaming. See next example:
import time
from flask import Flask, Response
app = Flask(__name__)
#app.route('/')
def main():
return '''<div>start</div>
<script>
var xhr = new XMLHttpRequest();
xhr.open('GET', '/test', true);
xhr.onreadystatechange = function(e) {
var div = document.createElement('div');
div.innerHTML = '' + this.readyState + ':' + this.responseText;
document.body.appendChild(div);
};
xhr.send();
</script>
'''
#app.route('/test')
def test():
def generate():
app.logger.info('request started')
for i in range(5):
time.sleep(1)
yield str(i)
app.logger.info('request finished')
yield ''
return Response(generate(), mimetype='text/plain')
if __name__ == '__main__':
app.run('0.0.0.0', 8080, True)
All magic in this example in genarator where you can start response data, after do some staff and yield empty data to end your stream.
For details look at http://flask.pocoo.org/docs/patterns/streaming/.
You can defer route specific actions with limited context by combining after_this_request and response.call_on_close. Note that request and response context won't be available but the route function context remains available. So you'll need to copy any request/response data you'll need into local variables for deferred access.
I moved your array to a local var to show how the function context is preserved. You could change your csv write function to an append so you're not pushing data endlessly into memory.
from flask import Flask, flash, request, redirect, url_for, session
import json
app = Flask(__name__)
#app.route("/test", methods=['GET','POST'])
def check():
arr = []
arr.append(request.form['a'])
arr.append(request.form['b'])
res = {'Status': True}
#flask.after_this_request
def add_close_action(response):
#response.call_on_close
def process_after_request():
df = pd.DataFrame({'x': arr})
df.to_csv("docs/xyz.csv", index=False)
return response
return json.dumps(res)

Am I using aiohttp together with psycopg2 correctly?

I'm very new to using asyncio/aiohttp, but I have a Python script that read a batch of URL:s from a Postgres table, downloads the URL:s, runs a processing function on each download (not relevant for the question), and saves back the result of the processing to the table.
In simplified form it looks like this:
import asyncio
import psycopg2
from aiohttp import ClientSession, TCPConnector
BATCH_SIZE = 100
def _get_pgconn():
return psycopg2.connect()
def db_conn(func):
def _db_conn(*args, **kwargs):
with _get_pgconn() as conn:
with conn.cursor() as cur:
return func(cur, *args, **kwargs)
conn.commit()
return _db_conn
async def run():
async with ClientSession(connector=TCPConnector(ssl=False, limit=100)) as session:
while True:
count = await run_batch(session)
if count == 0:
break
async def run_batch(session):
tasks = []
for url in get_batch():
task = asyncio.ensure_future(process_url(url, session))
tasks.append(task)
await asyncio.gather(*tasks)
results = [task.result() for task in tasks]
save_batch_result(results)
return len(results)
async def process_url(url, session):
try:
async with session.get(url, timeout=15) as response:
body = await response.read()
return process_body(body)
except:
return {...}
#db_conn
def get_batch(cur):
sql = "SELECT id, url FROM db.urls WHERE processed IS NULL LIMIT %s"
cur.execute(sql, (BATCH_SIZE,))
return cur.fetchall()
#db_conn
def save_batch_result(cur, results):
sql = "UPDATE db.urls SET a = %(a)s, processed = true WHERE id = %(id)s"
cur.executemany(sql, tuple(results))
loop = asyncio.get_event_loop()
loop.run_until_complete(run())
But I have the feeling that I must be missing something here. The script runs but it seems to become slower and slower with each batch. Specially it seems like the call to the process_url function becomes slower over time. Also the used memory keeps growing so I'm guessing there might be something that I fail to clean up properly between runs?
I also have problems increasing the batch size much, if I go much over 200 I seem to get a much higher proportion of exceptions from the call to session.get. I have tried playing with the limit argument to the TCPConnector, setting it both higher and lower but I can't see that it helps much. Have also tried running it on a few different server but it seems to be the same. Is there some way to think about how to set these values more effectively?
Would be grateful for some pointers to what I might do wrong here!
The problem of your code is mixing asynchronous aiohttp library with synchronous psycopg2 client.
As a consequence calls to DB blocks the event loop entirely affecting all other parallel tasks.
To solve it you need to use asynchronous DB client: aiopg (a wrapper around psycopg2 async mode) or asyncpg (it has a different API but works faster).

Categories

Resources