Saving a model in a separate thread - python

In my simple webapp I have a model called Document. When the document is created it is empty. The user can then request to generate it, which means that its content is filled with data. Since this generating step can take some time, it is an asynchronous request: the server starts a thread to generate the document, the user obtains a quick response saying that the generation process started, and after some time the generation is over and the database is updated.
This is the code that describes the model:
import time
from threading import Thread
from django.db import models
STATE_EMPTY = 0
STATE_GENERATING = 1
STATE_READY = 2
class Document(models.Model):
text = models.TextField(blank=True, null=True)
state = models.IntegerField(default=STATE_EMPTY, choices=(
(STATE_EMPTY, 'empty'),
(STATE_GENERATING, 'generating'),
(STATE_READY, 'ready'),
))
def generate(self):
def generator():
time.sleep(5)
self.state = STATUS_READY
self.text = 'This is the content of the document'
self.state = STATE_GENERATING
self.save()
t = Thread(target=generator, name='GeneratorThread')
t.start()
As you can see, the generate function changes the state, saves the document and spawns a thread. The thread works for a while (well,... sleeps for a while), then changes and state and the content.
This is the corresponding test:
def test_document_can_be_generated_asynchronously(self):
doc = Document()
doc.save()
self.assertEqual(STATE_EMPTY, doc.state)
doc.generate()
self.assertEqual(STATE_GENERATING, doc.state)
time.sleep(8)
self.assertEqual(STATE_READY, doc.state)
self.assertEqual('This is the content of the document', doc.text)
This test passes. The document object correctly undergoes all expected changes.
Unfortunately, the code is wrong: after changing the content of the document, it is never saved, so the changes are not persistent. This can be verified by adding the following line to the test:
self.assertEqual(STATE_READY, Document.objects.first().state)
This assertion fails:
self.assertEqual(STATE_READY, Document.objects.first().state)
AssertionError: 2 != 1
The solution is simple: just add self.save() at the end of the generator function. But this results in different kind of problem:
Destroying test database for alias 'default'...
Traceback (most recent call last):
File ".../virtualenvs/DjangoThreadTest-elBGAiyX/lib/python3.7/site-packages/django/db/backends/utils.py", line 82, in _execute
return self.cursor.execute(sql)
psycopg2.errors.ObjectInUse: database "test_postgres" is being accessed by other users
DETAIL: There is 1 other session using the database.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
...
File ".../virtualenvs/DjangoThreadTest-elBGAiyX/lib/python3.7/site-packages/django/db/backends/utils.py", line 82, in _execute
return self.cursor.execute(sql)
django.db.utils.OperationalError: database "test_postgres" is being accessed by other users
DETAIL: There is 1 other session using the database.
The problem seems related to the save() placed in a different thread. The engine used does not seem to affect the result: I obtain almost identical error messages when using postgresql (as shown) and sqlite (in that case the error is along the lines of "The database table is locked").
Some similar questions obtain replies such as "Just use Celery to manage heavy processing tasks". I would rather understand what I'm doing wrong and how to solve it using Django tools. In fact, there is no heavy processing, nor the need to scale to large users (the webapp is to be used by one user at the time)

When you spawn a new thread, Django creates a new connection to the database for that thread. Normally, all connections are closed in the start/end of the request cycle and at the end of a test run. But if the thread is manually spawned, there is no code to close connection - the thread ends, its local data is destroyed but the connection is not closed on database side properly (connections is stored in thread.local object if you are interested).
So, to solve the issue you have to manually close connections at the end of a thread.
from django.db import connection
def generate(self):
def generator():
time.sleep(5)
self.state = STATUS_READY
self.text = 'This is the content of the document'
self.save()
connection.close()
self.state = STATE_GENERATING
self.save()
t = Thread(target=generator, name='GeneratorThread')
t.start()

Related

Python muiltithreading is mixing the data of different request in django

I am using python muiltithreading for achieving a task which is like 2 to 3 mins long ,i have made one api endpoint in django project.
Here is my code--
from threading import Thread
def myendpoint(request):
print("hello")
lis = [ *args ]
obj = Model.objects.get(name =" jax")
T1 = MyThreadClass(lis, obj)
T1.start()
T1.deamon = True
return HttpResponse("successful", status=200)
Class MyThreadClass(Thread):
def __init__(self,lis,obj):
Thread.__init__(self)
self.lis = lis
self.obj = obj
def run(self):
for i in lis:
res =Func1(i)
self.obj.someattribute = res
self.obj.save()
def Func1(i):
'''Some big codes'''
context =func2(*args)
return context
def func2(*args):
"' some codes "'
return res
By this muiltithreading i can achieve the quick response from the django server on calling the endpoint function as the big task is thrown in another tread and execution of the endpoint thread is terminated on its return statement without keeping track of the spawned thread.
This part works for me correctly if i hit the url once , but if i hit the url 2 times as soon as 1st execution starts then on 2nd request i can see my request on console. But i cant get any response from it.
And if i hit the same url from 2 different client at the same time , both the individual datas are getting mixed up and i see few records of one client's request on other client data.
I am testing it to my local django runserver.
So guys please help , and i know about celery so dont recommend celery. Just tell me why this thing is happening or can it be fixed . As my task is not that long to use celery. I want to achieve it by muiltithreading.

tornado reverse url with post request

I have a report service in tornado application.
I would like to re-use the function that creates reports from a report Json.
Meaning, in the new handler that "regenerate" existing report, I would like to reuse an existing handler that knows how to create reports from a Json.
server.py:
def create_server():
return tornado.web.Application([
(r"/task", generator.GenHandler),
(r"/task/(.+)", generator.GenHandler),
url(r"/regenerate_task", generator.GenHandler, name="regenerate_task"),
url(r"/regenerate_task/(.+)", generator.GenHandler, name="regenerate_task"),
(r"/report_status/regenerate", report_status.Regenerate)
genHandler.class:
class GenHandler(tornado.web.RequestHandler):
async def post(self):
try:
LOGGER.info(str(self.request.body))
gen_args = self.parsed_body
# create here report using the parsed body
and this is the handler I am trying to create.
It will take a saved json from DB and create a completely new report with the original report logic.
class Regenerate(tornado.web.RequestHandler):
async def post(self):
rep_id = self.request.arguments.get('rep_id')[0].decode("utf-8") if self.request.arguments.get('rep_id') \
else 0
try:
report = db_handler.get_report_by_id(rep_id)
if *REPORT IS VALID*:
return self.reverse_url("regenerate_task", report)
else:
report = dict(success=True, report_id=rep_id, report=[])
except Exception as ex:
report = dict(success=False, report_id=rep_id, report=[], error=str(ex))
finally:
self.write(report)
Right now, nothing happens. I just get the JSON I needed, but no entry for GenHandler and no report being regenerated
reverse_url returns a url for specified alias, but doesn't invoke it.
You have such problem where you have to invoke another handler because you have poor code organisation. Storing a report generation code (i.e. business logic) in handler is a bad practice and you should move it to a separate class (which is usually called a Controller in a MVC pattern and handler is a View) or at least separate method and then reuse it in your Renegate handler.

Failed ndb transaction attempt not rolling back all changes?

I have some trouble understanding a sequence of events causing a bug in my appplication which can only be seen intermittently in the app deployed on GAE, and never when running with the local devserver.py.
All the related code snippets below (trimmed for MCV, hopefully I didn't lose anything significant) are executed during handling of the same task queue request.
The entry point:
def job_completed_task(self, _):
# running outside transaction as query is made
if not self.all_context_jobs_completed(self.context.db_key, self):
# this will transactionally enqueue another task
self.trigger_job_mark_completed_transaction()
else:
# this is transactional
self.context.jobs_completed(self)
The corresponding self.context.jobs_completed(self) is:
#ndb.transactional(xg=True)
def jobs_completed(self, job):
if self.status == QAStrings.status_done:
logging.debug('%s jobs_completed %s NOP' % (self.lid, job.job_id))
return
# some logic computing step_completed here
if step_completed:
self.status = QAStrings.status_done # includes self.db_data.put()
# this will transactionally enqueue another task
job.trigger_job_mark_completed_transaction()
The self.status setter, hacked to obtain a traceback for debugging this scenario:
#status.setter
def status(self, new_status):
assert ndb.in_transaction()
status = getattr(self, self.attr_status)
if status != new_status:
traceback.print_stack()
logging.info('%s status change %s -> %s' % (self.name, status, new_status))
setattr(self, self.attr_status, new_status)
The job.trigger_job_mark_completed_transaction() eventually enqueues a new task like this:
task = taskqueue.add(queue_name=self.task_queue_name, url=url, params=params,
transactional=ndb.in_transaction(), countdown=delay)
The GAE log for the occurence, split as it doesn't fit into a single screen:
My expectation from the jobs_completed transaction is to either see the ... jobs_completed ... NOP debug message and no task enqueued or to at least see the status change running -> done info message and a task enqueued by job.trigger_job_mark_completed_transaction().
What I'm actually seeing is both messages and no task enqueued.
The logs appears to indicate the transaction is attempted twice:
1st time it finds the status not done, so it executes the logic, sets the status to done (and displays the traceback and the info msg) and should transactionally enqueue the new task - but it doesn't
2nd time it finds the status done and just prints the debug message
My question is - if the 1st transaction attempt fails shouldn't the status change be rolled back as well? What am I missing?
I found a workaround: specifying no retries to the jobs_completed() transaction:
#ndb.transactional(xg=True, retries=0)
def jobs_completed(self, job):
This prevents the automatic repeated execution, instead causing an exception:
TransactionFailedError(The transaction could not be committed. Please
try again.)
Which is acceptable as I already have in place a back-off/retry safety net for the entire job_completed_task(). Things are OK now.
As for why the rollback didn't happen, the only thing that crosses my mind is that somehow the entity was read (and cached in my object attribute) prior to entering the transaction, thus not being considered part of the (same) transaction. But I couldn't find a code path that would do that, so it's just speculation.

Session data not being stored during testing in Django

I am currently writing tests for our project, and I ran into an issue. We have this section of a view, which will redirect the user back to the page where they came from including an error message (that's being stored in the session):
if request.GET.get('error_code'):
"""
Something went wrong or the call was cancelled
"""
errorCode = request.GET.get('error_code')
if errorCode == 4201:
request.session['errormessage'] = _('Action cancelled by the user')
return HttpResponseRedirect('/socialMedia/manageAccessToken')
Once the HttpResponseRedirect kicks in, the first thing that the new view does is scan the session, to see if any error messages are stored in the session. If there are, we place them in a dictionary and then delete it from the session:
def manageAccessToken(request):
"""
View that handles all things related to the access tokens for Facebook,
Twitter and Linkedin.
"""
contextDict = {}
try:
contextDict['errormessage'] = request.session['errormessage']
contextDict['successmessage'] = request.session['successmessage']
del request.session['errormessage']
del request.session['successmessage']
except KeyError:
pass
We should now have the error message in a dictionary, but after printing the dictionary the error message is not there. I also printed the session just before the HttpResponseRedirect, but the session is an empty dictionary there as well.
This is the test:
class oauthCallbacks(TestCase):
"""
Class to test the different oauth callbacks
"""
def setUp(self):
self.user = User.objects.create(
email='test#django.com'
)
self.c = Client()
def test_oauthCallbackFacebookErrorCode(self):
"""
Tests the Facebook oauth callback view
This call contains an error code, so we will be redirected to the
manage accesstoken page. We check if we get the error message
"""
self.c.force_login(self.user)
response = self.c.get('/socialMedia/oauthCallbackFacebook/',
data={'error_code': 4201},
follow=True,
)
self.assertEqual('Action cancelled by the user', response.context['errormessage'])
It looks like the session can not be accessed or written to directly from the views during testing. I can, however, access a value in the session by manually setting it in the test by using the following bit of code:
session = self.c.session
session['errormessage'] = 'This is an error message'
session.save()
This is however not what I want, because I need the session to be set by the view as there are many different error messages in the entire view. Does anyone know how to solve this? Thanks in advance!
After taking a closer look I found the issue, it is in the view itself:
errorCode = request.GET.get('error_code')
if errorCode == 4201:
request.session['errormessage'] = _('Action cancelled by the user')
The errorCode variable is a string, and I was comparing it to an integer. I fixed it by changing the second line to:
if int(errorCode) == 4201:

Django Test Client does not create database entries

I'm creating unit tests for my views using Django's built-in Test Client to create mock requests.
The view I'm calling should create an object in the database. However, when I query the database from within the test method the object isn't there - it either hasn't been created or has been discarded on returning from the view.
Here's the view:
def apply_to_cmp(request, campaign_id):
""" Creates a new Application to 'campaign_id' for request.user """
campaign = Campaign.objects.get(pk = campaign_id)
if not Application.objects\
.filter(campaign = campaign, user = request.user)\
.exists():
application = Application(**{'campaign' : campaign,
'user' : request.user})
application.save()
return HttpResponseRedirect(request.META.get('HTTP_REFERER'))
This is the test that calls it:
def test_create_campaign_app(self):
""" Calls the method apply_to_cmp in .views """
c = Client()
c.login(username = self.username, password = self.password)
url = '/campaign/' + self.campaign.id + '/apply/'
response = c.get(url)
# Check whether request was successful (should return 302: redirect)
self.assertEqual(response.status_code, 302)
# Verify that an Application object was created
app_count = Application.objects\
.filter(user = self.user, campaign = self.campaign)\
.count()
self.assertEqual(app_count, 1)
This is the output from the running the test:
Traceback (most recent call last):
File "/test_views.py", line 40, in test_create_campaign_app
self.assertEqual(app_count, 1)
AssertionError: 0 != 1
The method apply_to_cmp is definitely being called, since response.status_code == 302, but still the Application object is not created. What am I doing wrong?
Edit: Solution
Client.login failed because the login system was not properly initialised in the setUp method. I fixed this by calling call_command('loaddata', 'initial_data.json') with initial_data.json containing the setup for the login system. Also, HttpResponseRedirect(request.META.get('HTTP_REFERER')) didn't work for obvious reasons. I changed that bit to
if request.META.get('HTTP_REFERER'):
return HttpResponseRedirect(request.META.get('HTTP_REFERER'))
return HttpResponse()
And therefore the test to
self.assertEqual(response.status_code, 200)
Thanks for your help!
Nothing stands out as particularly wrong with your code - but clearly either your test case or the code your are testing is not working the way you think. It is now time to question your assumptions.
The method apply_to_cmp is definitely being called, since response.status_code == 302
This is your first assumption, and it may not be correct. You might get a better picture of what is happening if you examine other details in the response object. For example, check the response.redirect_chain and confirm that it actually redirects where you expect it to:
response = c.get(url, follow=True)
self.assertEqual(response.redirect_chain, [<expected output here>])
What about other details? I can't see where self.username and self.password are defined from the code you provided. Are you 100% sure that your test code to login worked? c.login() returns 'True' or 'False' to indicate if the login was successful. In my test cases, I like to confirm that the login succeeds.
login_success = c.login(username = self.username, password = self.password)
self.assertTrue(login_success)
You can also be a bit more general. You find nothing if you check Application.objects.filter(user=self.user, campaign=self.campaign), but what about checking Application.objects.all()? You know that a specific item isn't in your database, but do you know what is stored in the database (if anything at all) in the test code at that time? Do you expect other items in there? Check to confirm that what you expect is true.
I think you can solve this one, but you'll need to be a bit more aggressive in your analysis of your test case, rather than just seeing that your app_count variable doesn't equal 1. Examine your response object, put in some debug statements, and question every assumption.
First of all, if you are subclassing from django.test.TestCase, please take in consideration the fact that each test is wrapped into transactions (official docs).
Then, you can add db logging to your project to see whether there was a hit to the database or not (official docs).
And finally be sure that you're using correct lookups at this line: filter(user = self.user, campaign = self.campaign)

Categories

Resources