I am working in a Django web application which needs to query a PostgreSQL database. When implementing concurrency using Python threading interface, I am getting DoesNotExist errors for the queried items. Of course, these errors do not occur when performing the queries sequentially.
Let me show a unit test which I wrote to demonstrate the unexpected behavior:
class ThreadingTest(TestCase):
fixtures = ['demo_city',]
def test_sequential_requests(self):
"""
A very simple request to database, made sequentially.
A fixture for the cities has been loaded above. It is supposed to be
six cities in the testing database now. We will made a request for
each one of the cities sequentially.
"""
for number in range(1, 7):
c = City.objects.get(pk=number)
self.assertEqual(c.pk, number)
def test_threaded_requests(self):
"""
Now, to test the threaded behavior, we will spawn a thread for
retrieving each city from the database.
"""
threads = []
cities = []
def do_requests(number):
cities.append(City.objects.get(pk=number))
[threads.append(threading.Thread(target=do_requests, args=(n,))) for n in range(1, 7)]
[t.start() for t in threads]
[t.join() for t in threads]
self.assertNotEqual(cities, [])
As you can see, the first test performs some database requests sequentially, which are indeed working with no problem. The second test, however, performs exactly the same requests but each request is spawned in a thread. This is actually failing, returning a DoesNotExist exception.
The output of the execution of this unit tests is like this:
test_sequential_requests (cesta.core.tests.threadbase.ThreadingTest) ... ok
test_threaded_requests (cesta.core.tests.threadbase.ThreadingTest) ...
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner
self.run()
File "/usr/lib/python2.6/threading.py", line 484, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/jose/Work/cesta/trunk/src/cesta/core/tests/threadbase.py", line 45, in do_requests
cities.append(City.objects.get(pk=number))
File "/home/jose/Work/cesta/trunk/parts/django/django/db/models/manager.py", line 132, in get
return self.get_query_set().get(*args, **kwargs)
File "/home/jose/Work/cesta/trunk/parts/django/django/db/models/query.py", line 349, in get
% self.model._meta.object_name)
DoesNotExist: City matching query does not exist.
... other threads returns a similar output ...
Exception in thread Thread-6:
Traceback (most recent call last):
File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner
self.run()
File "/usr/lib/python2.6/threading.py", line 484, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/jose/Work/cesta/trunk/src/cesta/core/tests/threadbase.py", line 45, in do_requests
cities.append(City.objects.get(pk=number))
File "/home/jose/Work/cesta/trunk/parts/django/django/db/models/manager.py", line 132, in get
return self.get_query_set().get(*args, **kwargs)
File "/home/jose/Work/cesta/trunk/parts/django/django/db/models/query.py", line 349, in get
% self.model._meta.object_name)
DoesNotExist: City matching query does not exist.
FAIL
======================================================================
FAIL: test_threaded_requests (cesta.core.tests.threadbase.ThreadingTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/jose/Work/cesta/trunk/src/cesta/core/tests/threadbase.py", line 52, in test_threaded_requests
self.assertNotEqual(cities, [])
AssertionError: [] == []
----------------------------------------------------------------------
Ran 2 tests in 0.278s
FAILED (failures=1)
Destroying test database for alias 'default' ('test_cesta')...
Remember that all this is happening in a PostgreSQL database, which is supposed to be thread safe, not with the SQLite or similars. Test was ran using PostgreSQL also.
At this point, I am totally lost about what can be failing. Any idea or suggestion?
Thanks!
EDIT: I wrote a little view just to check up if it works out of the tests. Here is the code of the view:
def get_cities(request):
queue = Queue.Queue()
def get_async_cities(q, n):
city = City.objects.get(pk=n)
q.put(city)
threads = [threading.Thread(target=get_async_cities, args=(queue, number)) for number in range(1, 5)]
[t.start() for t in threads]
[t.join() for t in threads]
cities = list()
while not queue.empty():
cities.append(queue.get())
return render_to_response('async/cities.html', {'cities': cities},
context_instance=RequestContext(request))
(Please, do not take into account the folly of writing the application logic inside the view code. Remember that this is only a proof of concept and would not be never in the real app.)
The result is that code is working nice, the requests are made successfully in threads and the view finally shows the cities after calling its URL.
So, I think making queries using threads will only be a problem when you need to test the code. In production, it will work without any problem.
Any useful suggestions to test this kind of code successfully?
Try using TransactionTestCase:
class ThreadingTest(TransactionTestCase):
TestCase keeps data in memory and doesn't issue a COMMIT to database. Probably the threads are trying to connect directly to DB, while the data is not commited there yet. Seedescription here:
https://docs.djangoproject.com/en/dev/topics/testing/?from=olddocs#django.test.TransactionTestCase
TransactionTestCase and TestCase are identical except for the manner
in which the database is reset to a known state and the ability for
test code to test the effects of commit and rollback. A
TransactionTestCase resets the database before the test runs by
truncating all tables and reloading initial data. A
TransactionTestCase may call commit and rollback and observe the
effects of these calls on the database.
Becomes more clear from this part of the documentation
class LiveServerTestCase(TransactionTestCase):
"""
...
Note that it inherits from TransactionTestCase instead of TestCase because
the threads do not share the same transactions (unless if using in-memory
sqlite) and each thread needs to commit all their transactions so that the
other thread can see the changes.
"""
Now, the transaction has not been committed inside a TestCase, hence the changes are not visible to the other thread.
This sounds like it's an issue with transactions. If you're creating elements within the current request (or test), they're almost certainly in an uncommitted transaction that isn't accessible from the separate connection in the other thread. You probably need to manage your transctions manually to get this to work.
Related
I am writing unittests for a program, the majority of functions are all boilerplate code to do some mysql queries with no real return types, to test these I have written tests to check for the query in the cursor:
#mock.patch('mysql.connector.connect')
def test_query1(self, mock_conn):
test_query_data = 100
import app
a = app.query1(test_query_data)
mock_cursor = mock_conn.return_value.cursor.return_value
self.assertEqual(mock_cursor.execute.call_args[0], ('SELECT id FROM table WHERE data=%s limit 1;', (100,)))
this test on its own works fine but when I have others structured the exact same way the patching of the mysql connection breaks causing an exception in the assert statement
Traceback (most recent call last):
File "c:\users\sirwill\appdata\local\programs\python\python38\lib\site-packages\mock\mock.py", line 1346, in patched
return func(*newargs, **newkeywargs)
File "C:\Users\sirwill\python_project\tests.py", line 69, in test_insert_event
self.assertEqual(mock_cursor.execute.call_args[0], ('SELECT id FROM table WHERE data=%s limit 1;', (100,)))
TypeError: 'NoneType' object is not subscriptable
I have tried to delete the module and reimport with no change in the result
for anyone else having this issue the answer was to reload the library upon importing into the test using
importlib.reload(app)
This question already has answers here:
Django related objects are missing from celery task (race condition?)
(3 answers)
Closed 5 years ago.
The assets django app I'm working on runs well with SQLite but I am facing performance issues with deletes / updates of large sets of records and so I am making the transition to a PostgreSQL database.
To do so, I am starting fresh by updating theapp/settings.py to configure PostgreSQL, starting with a fresh db and deleting the assets/migrations/ directory. I am then running:
./manage.py makemigrations assets
./manage.py migrate --run-syncdb
./manage.py createsuperuser
I have a function called within a registered post_create signal. It runs a scan when a Scan object is created. Within the class assets.models.Scan:
#classmethod
def post_create(cls, sender, instance, created, *args, **kwargs):
if not created:
return
from celery.result import AsyncResult
# get the domains for the project, from scan
print("debug: task = tasks.populate_endpoints.delay({})".format(instance.pk))
task = tasks.populate_endpoints.delay(instance.pk)
The offending code:
from celery import shared_task
....
import datetime
#shared_task
def populate_endpoints(scan_pk):
from .models import Scan, Project,
from anotherapp.plugins.sensual import subdomains
scan = Scan.objects.get(pk=scan_pk) #<<<<<<<< django no like
new_entries_count = 0
project = Project.objects.get(id=scan.project.id)
....
The resultant exception DoesNotExist raised:
debug: task = tasks.populate_endpoints.delay(2)
[2017-09-14 23:18:34,950: ERROR/ForkPoolWorker-8] Task assets.tasks.populate_endpoints[4555d329-2873-4184-be60-55e44c46a858] raised unexpected: DoesNotExist('Scan matching query does not exist.',)
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/celery/app/trace.py", line 374, in trace_task
R = retval = fun(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/celery/app/trace.py", line 629, in __protected_call__
return self.run(*args, **kwargs)
File "/usr/src/app/theapp/assets/tasks.py", line 12, in populate_endpoints
scan = Scan.objects.get(pk=scan_pk)
Interacting through ./manage.py shell however indicates that Scan object with pk == 2 exists:
>>> from assets.models import Scan
>>> Scan.objects.all()
<QuerySet [<Scan: ACME Web Test Scan>]>
>>> s = Scan.objects.all().first()
>>> s.pk
2
My only guess is that at the time the post_create function is called, the Scan object still does not exist in the PostgreSQL database, despite save() having been called.
SQLite does not exhibit this problem.
Also, I haven't found a relevant, related problem on stackoverflow as the DoesNotExist exception looks to be fairly generic and caused by many things.
Any ideas on this would be much appreciated.
This is a well known problem resulting from transactions and isolation level - sometimes the transaction has not been commited when the task is executed and if your isolation level is READ COMMITED then you can't indeed read this record from another process. Django 1.9 introduced the on_commit hook as a solution.
NB : technically this question is a duplicate of Django related objects are missing from celery task (race condition?) but the accepted answer uses django-transaction-hooks which has since then been merged into django.
I have built a plugin-based application where "plugins" (python modules) can be loaded by imp and then scheduled for later execution by APScheduler, I was able to successfully integrate them but I want to implement persistence in case of crashes or application reestarts, so I changed the default memory job store to the SqlAlchemyJobStore, it works quite well the first time you execute the program: tasks are loaded, scheduled, saved at the database and executed at the right time.
Problem is when I try to load the application again I get this traceback:
ERROR:apscheduler.jobstores.default:Unable to restore job "d3e0f0068df54d15986e9b7b6757f665" -- removing it
Traceback (most recent call last):
File "/home/jesus/.local/lib/python2.7/site-packages/apscheduler/jobstores/sqlalchemy.py", line 126, in _get_jobs
jobs.append(self._reconstitute_job(row.job_state))
File "/home/jesus/.local/lib/python2.7/site-packages/apscheduler/jobstores/sqlalchemy.py", line 114, in _reconstitute_job
job.__setstate__(job_state)
File "/home/jesus/.local/lib/python2.7/site-packages/apscheduler/job.py", line 228, in __setstate__
self.func = ref_to_obj(self.func_ref)
File "/home/jesus/.local/lib/python2.7/site-packages/apscheduler/util.py", line 257, in ref_to_obj
raise LookupError('Error resolving reference %s: could not import module' % ref)
LookupError: Error resolving reference __init__:run: could not import module
So it is obvious that there is a problem when attempting to import the function again
Here is my scheduler initialization:
executors = {'default': ThreadPoolExecutor(5)}
jobstores = {'default': SQLAlchemyJobStore(url='sqlite:///jobs.sqlite')}
self.scheduler = BackgroundScheduler(executors = executors,jobstores=jobstores)
I have a "tests" dictionary containing the "plugins" that should be loaded and some parameters, "load_plugin" uses imp to load a plugin by it's name.
for test,parameters in tests.items():
if test in pluggins:
module=load_plugin(pluggins[test])
self.jobs[test]=self.scheduler.add_job(module.run,"interval",seconds=parameters["interval"],name=test)
Any idea about how can I handle reconstituting jobs?
Something in the automatic detection of the module name is going wrong. Hard to say what, but the alternative is to manually give it the proper lookup path as a string (e.g. "package.module:function"). If you can do this, you can avoid this problem.
I have a Flask project on GAE and I'd like to start adding try/except blocks around database writes in case the datastore has problems, which will definitely fire when there's a real error, but I'd like to mimic that error in a unittest so I can have confidence of what will really happen during an outage.
For example, my User model:
class User(ndb.Model):
guser = ndb.UserProperty()
user_handle = ndb.StringProperty()
and in other view/controller code:
def do_something():
try:
User(guser=users.get_current_user(), user_handle='barney').put()
except CapabilityDisabledError:
flash('Oops, database is down, try again later', 'danger')
return redirect(url_for('registration_done'))
Here's a gist of my test code: https://gist.github.com/iandouglas/10441406
In a nutshell, GAE allows us to use capabilities to temporarily disable the stubs for memcache, datastore_v3, etc., and in the main test method:
def test_stuff(self):
# this test ALWAYS passes, making me believe the datastore is temporarily down
self.assertFalse(capabilities.CapabilitySet('datastore_v3').is_enabled())
# but this write to the datastore always SUCCEEDS, so the exception never gets
# thrown, therefore this "assertRaises" always fails
self.assertRaises(CapabilityDisabledError,
lambda: User(guser=self.guser, pilot_handle='foo').put())
I read some other post recommending calling the User.put() as a lambda which results in this traceback:
Traceback (most recent call last):
File "/home/id/src/project/tests/integration/views/test_datastore_offline.py", line 28, in test_stuff
self.assertRaises(CapabilityDisabledError, lambda: User(
AssertionError: CapabilityDisabledError not raised
If I remove the lambda: portion, I get this traceback instead:
Traceback (most recent call last):
File "/home/id/src/project/tests/integration/views/test_datastore_offline.py", line 31, in test_stuff
pilot_handle_lower='foo'
File "/usr/lib/python2.7/unittest/case.py", line 475, in assertRaises
callableObj(*args, **kwargs)
TypeError: 'Key' object is not callable
Google's tutorials show you how to turn these capabilities on and off for unit testing, and in other tutorials they show you which exceptions could get thrown if their services are offline or experiencing intermittent issues, but they have no tutorials showing how they might work together in a unit test.
Thanks for any ideas.
The datastore stub does not support returning a CapabilityDisabledError, so enabled the error in the capabilities stub will not affect calls to datastore.
As a separate note, if you are using the High Replication Datastore, you'll never experience the CapabilityDisabledError because it does not have scheduled downtime.
The code I write now works fine, I can even print the deserialized objects with no mistakes whatsoever, so I do know exactly what is in there.
#staticmethod
def receiveData(self):
'''
This method has to be static, as it is the argument of a Thread.
It receives Wrapperobjects from the server (as yet containing only a player)
and resets the local positions accordingly
'''
logging.getLogger(__name__).info("Serverinformationen werden nun empfangen")
from modules.logic import game
sock = self.sock
time.sleep(10)
self.myPlayer = game.get_player()
while (True):
try:
wrapPacked = sock.recv(4096)
self.myList = cPickle.loads(wrapPacked)
# self.setData(self.myList)
except Exception as eload:
print eload
However, if I try to actually use the line that is in comments here (self.setData(self.myList),
I get
unpickling stack underflow
and
invalid load key, ' '.
Just for the record, the code of setData is:
def setData(self, list):
if (list.__sizeof__()>0):
first = list [0]
self.myPlayer.setPos(first[1])
self.myPlayer.setVelocity(first[2])
I have been on this for 3 days now, and really, I have no idea what is wrong.
Can you help me?
Full Traceback:
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 504, in run
self.__target(*self.__args, **self.__kwargs)
File "mypath/client.py", line 129, in receiveData
self.myList = cPickle.loads(wrapPacked)
UnpicklingError: unpickling stack underflow –
The fact that your exceptions always happen when you try to access the pickled data seem to indicate that you are hitting a bug in the cPickle library instead.
What can happen is that a C library forgets to handle an exception. The exception info is stored, not handled, and is sitting there in the interpreter until another exception happens or another piece of C code does check for an exception. At this point the old, unhandled exception is thrown instead.
Your error is clearly cPickle related, it is very unhappy about the data you feed it, but the exception itself is thrown in unrelated locations. This could be threading related, it could be a regular non-threading-related bug.
You need to see if you can load the data in a test setting. Write wrapPacked to a file for later testing. Load that file in a interpreter shell session, load it with cPickle.loads() and see what happens. Do the same with the pickle module.
If you do run into similar problems in this test session, and you can reproduce it (weird exceptions being thrown at a later point in the session) you need to file a bug with the Python project to have this looked at.