This question already has answers here:
Django related objects are missing from celery task (race condition?)
(3 answers)
Closed 5 years ago.
The assets django app I'm working on runs well with SQLite but I am facing performance issues with deletes / updates of large sets of records and so I am making the transition to a PostgreSQL database.
To do so, I am starting fresh by updating theapp/settings.py to configure PostgreSQL, starting with a fresh db and deleting the assets/migrations/ directory. I am then running:
./manage.py makemigrations assets
./manage.py migrate --run-syncdb
./manage.py createsuperuser
I have a function called within a registered post_create signal. It runs a scan when a Scan object is created. Within the class assets.models.Scan:
#classmethod
def post_create(cls, sender, instance, created, *args, **kwargs):
if not created:
return
from celery.result import AsyncResult
# get the domains for the project, from scan
print("debug: task = tasks.populate_endpoints.delay({})".format(instance.pk))
task = tasks.populate_endpoints.delay(instance.pk)
The offending code:
from celery import shared_task
....
import datetime
#shared_task
def populate_endpoints(scan_pk):
from .models import Scan, Project,
from anotherapp.plugins.sensual import subdomains
scan = Scan.objects.get(pk=scan_pk) #<<<<<<<< django no like
new_entries_count = 0
project = Project.objects.get(id=scan.project.id)
....
The resultant exception DoesNotExist raised:
debug: task = tasks.populate_endpoints.delay(2)
[2017-09-14 23:18:34,950: ERROR/ForkPoolWorker-8] Task assets.tasks.populate_endpoints[4555d329-2873-4184-be60-55e44c46a858] raised unexpected: DoesNotExist('Scan matching query does not exist.',)
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/celery/app/trace.py", line 374, in trace_task
R = retval = fun(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/celery/app/trace.py", line 629, in __protected_call__
return self.run(*args, **kwargs)
File "/usr/src/app/theapp/assets/tasks.py", line 12, in populate_endpoints
scan = Scan.objects.get(pk=scan_pk)
Interacting through ./manage.py shell however indicates that Scan object with pk == 2 exists:
>>> from assets.models import Scan
>>> Scan.objects.all()
<QuerySet [<Scan: ACME Web Test Scan>]>
>>> s = Scan.objects.all().first()
>>> s.pk
2
My only guess is that at the time the post_create function is called, the Scan object still does not exist in the PostgreSQL database, despite save() having been called.
SQLite does not exhibit this problem.
Also, I haven't found a relevant, related problem on stackoverflow as the DoesNotExist exception looks to be fairly generic and caused by many things.
Any ideas on this would be much appreciated.
This is a well known problem resulting from transactions and isolation level - sometimes the transaction has not been commited when the task is executed and if your isolation level is READ COMMITED then you can't indeed read this record from another process. Django 1.9 introduced the on_commit hook as a solution.
NB : technically this question is a duplicate of Django related objects are missing from celery task (race condition?) but the accepted answer uses django-transaction-hooks which has since then been merged into django.
Related
I've spent a 3+ hours on this for 18 of the last 21 days. Please, someone, tell me what I'm misunderstanding!
TL;DR: My code is repeatedly sending the db charset as a string to PyMysql, while it expects an object with an attribute called "encoding"
Background
This is Python code running on a docker container. A second container houses the database. The database address is stored in a .env variable called ENGINE_URL:
ENGINE_URL=mysql+pymysql://root:#database/starfinder_development?charset=utf8
I'm firing off Alembic and Flask-Alembic commands using click commands in the CLI. All of the methods below are used in CLI commands.
Models / Database Setup (works)
from flask import Flask
flask_app = Flask(__name__)
db_engine = SQLAlchemy(flask_app)
from my_application import models
def create_database():
db_engine.create_all()
At this point I can open up the database container and use the MySql CLI to see that all of my models have now been converted into tables with columns and relationships.
Attempt 1: Alembic
Create Revision Files with Alembic (works)
from alembic.config import Config
def main(): # fires prior to any CLI command
filepath = os.path.join(os.path.dirname(__file__),
"alembic.ini")
alembic_config = Config(file_=filepath)
alembic_config.set_main_option("sqlalchemy.url",
ENGINE_URL)
alembic_config.set_main_option("script_location",
SCRIPT_LOCATION)
migrate_cli(obj={"alembic_config": alembic_config})
def revision(ctx, message):
alembic_config = ctx.obj["alembic_config"]
alembic_command.revision(alembic_config, message)
At this point I have a migration file the was created exactly as expected. Then I need to upgrade the database using that migration...
Running Migrations with Alembic (fails)
def upgrade(ctx, migration_revision):
alembic_config = ctx.obj["alembic_config"]
migration_revision = migration_revision.lower()
_dispatch_alembic_cmd(alembic_config, "upgrade",
revision=migration_revision)
firing this off with cli_command upgrade head causes a failure, which I've included here at the bottom because it has an identical stack trace to my second attempt.
Attempt 2: Flask-Alembic
This attempt finds me completely rewriting my main and revision commands, but it doesn't get as far as using upgrade.
Create Revision Files with Flask-Alembic (fails)
def main(): # fires prior to any CLI command
alembic_config = Alembic()
alembic_config.init_app(flask_app)
migrate_cli(obj={"alembic_config": alembic_config})
def revision(ctx, message):
with flask_app.app_context():
alembic_config = ctx.obj["alembic_config"]
print(alembic_config.revision(message))
This results in an error that is identical to the error from my previous attempt.
The stack trace in both cases:
(Identical failure using alembic upgrade & flask-alembic revision)
File "/Users/MyUser/.pyenv/versions/3.6.2/envs/sf/lib/python3.6/site-packages/pymysql/connections.py", line 678, in __init__
self.encoding = charset_by_name(self.charset).encoding
AttributeError: 'NoneType' object has no attribute 'encoding'
In response, I went into the above file & added a print on L677, immediately prior to the error:
print(self.charset)
utf8
Note: If I modify my ENGINE_URL to use a different ?charset=xxx, that change is reflected here.
So now I'm stumped
PyMysql expects self.charset to have an attribute encoding, but self.charset is simply a string. How can I change this to behave as expected?
Help?
A valid answer would be an alternative process, though the "most correct" answer would be to help me resolve the charset/encoding problem.
My primary goal here is simply to get migrations working on my flask app.
I create a new Process from django app. Can i create new record in database from this process?
My code throws exception:
django.core.exceptions.AppRegistryNotReady: Apps aren't loaded yet.
UPD_1
def post(self, request):
v = Value('b', True)
proc = Process(target=start, args=(v, request.user,
request.data['stock'], request.data['pair'], '1111'))
proc.start()
def start(v, user, stock_exchange, pair, msg):
MyModel.objects.create(user=user, stock_exchange=stock_exchange, pair=pair, date=datetime.now(), message=msg)
You need to initialise the project first. You don't usually have to do this when going through manage.py, because it does it automatically, but a new process won't have had this done for it. So you need to put something like the following at the top of your code:
import django
import os
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "myproject.settings")
django.setup()
myproject.settings needs to be importable from whereever this code is running, so if not you might need to add to sys.path first.
Once this is done you can access your project's models, and use them to access your database, just like you normally would.
I was having a similar problem (also starting process from view), and what eventually helped me solve it was this answer.
The solution indicated is to close your DB connections just before you fork your new Process, so Django can recreate the connection when a query is needed in the new process. Adapted to you code it would be:
def post(self, request):
v = Value('b', True)
#close db connections here
from django import db
db.connections.close_all()
#create and fork your process
proc = Process(target=start, args=(v, request.user,
request.data['stock'], request.data['pair'], '1111'))
proc.start()
Calling django.setup() did not help in my case, and after reading the linked answer, probably because forked processes already share the file descriptors, etc. of its parent process (so Django is already setup).
I have a celery task that's declared on my Django project that I'm trying to call from the same module it's declared in. Right now, it looks like the following:
# myapp.admin.py
from myproject.celery import app as celery_app
#celery_app.task(name='myapp.admin.add')
def add(x, y):
time.sleep(10000)
return x + y
def my_custom_admin_action(modeladmin, request, queryset):
add.delay(2, 4)
# action later declared in the ModelAdmin
Knowing that celery sometimes is complicated with relative imports, I've specified the name. I even added the following to my settings.py:
CELERY_IMPORTS = ('myapp.admin', )
But when I try to use the admin action, I get the following message in my manage.py celeryd output:
[2014-09-18 14:58:25,413: ERROR/MainProcess] Received unregistered task of type 'myapp.admin.add'.
The message has been ignored and discarded.
Did you remember to import the module containing this task?
Or maybe you are using relative imports?
Please see http://bit.ly/gLye1c for more information.
Traceback (most recent call last):
File "/Users/JJ/.virtualenvs/TCJ/lib/python2.7/site-packages/celery/worker/consumer.py", line 455, in on_task_received
strategies[name](message, body,
KeyError: 'myapp.admin.add'
What am I doing wrong here? I even tried importing within the action as from . import add, but that didn't seem to help.
Celery is not picking your add task. One alternate way to solve this is to modify the instance of your Celery.
In myproject/celery.py
change instance of celery
app = Celery('name', backend='your_backend', broker='your_broker')
to
app = Celery('name', backend='your_backend', broker='your_broker',
include['myapp.admin',])
I am working in a Django web application which needs to query a PostgreSQL database. When implementing concurrency using Python threading interface, I am getting DoesNotExist errors for the queried items. Of course, these errors do not occur when performing the queries sequentially.
Let me show a unit test which I wrote to demonstrate the unexpected behavior:
class ThreadingTest(TestCase):
fixtures = ['demo_city',]
def test_sequential_requests(self):
"""
A very simple request to database, made sequentially.
A fixture for the cities has been loaded above. It is supposed to be
six cities in the testing database now. We will made a request for
each one of the cities sequentially.
"""
for number in range(1, 7):
c = City.objects.get(pk=number)
self.assertEqual(c.pk, number)
def test_threaded_requests(self):
"""
Now, to test the threaded behavior, we will spawn a thread for
retrieving each city from the database.
"""
threads = []
cities = []
def do_requests(number):
cities.append(City.objects.get(pk=number))
[threads.append(threading.Thread(target=do_requests, args=(n,))) for n in range(1, 7)]
[t.start() for t in threads]
[t.join() for t in threads]
self.assertNotEqual(cities, [])
As you can see, the first test performs some database requests sequentially, which are indeed working with no problem. The second test, however, performs exactly the same requests but each request is spawned in a thread. This is actually failing, returning a DoesNotExist exception.
The output of the execution of this unit tests is like this:
test_sequential_requests (cesta.core.tests.threadbase.ThreadingTest) ... ok
test_threaded_requests (cesta.core.tests.threadbase.ThreadingTest) ...
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner
self.run()
File "/usr/lib/python2.6/threading.py", line 484, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/jose/Work/cesta/trunk/src/cesta/core/tests/threadbase.py", line 45, in do_requests
cities.append(City.objects.get(pk=number))
File "/home/jose/Work/cesta/trunk/parts/django/django/db/models/manager.py", line 132, in get
return self.get_query_set().get(*args, **kwargs)
File "/home/jose/Work/cesta/trunk/parts/django/django/db/models/query.py", line 349, in get
% self.model._meta.object_name)
DoesNotExist: City matching query does not exist.
... other threads returns a similar output ...
Exception in thread Thread-6:
Traceback (most recent call last):
File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner
self.run()
File "/usr/lib/python2.6/threading.py", line 484, in run
self.__target(*self.__args, **self.__kwargs)
File "/home/jose/Work/cesta/trunk/src/cesta/core/tests/threadbase.py", line 45, in do_requests
cities.append(City.objects.get(pk=number))
File "/home/jose/Work/cesta/trunk/parts/django/django/db/models/manager.py", line 132, in get
return self.get_query_set().get(*args, **kwargs)
File "/home/jose/Work/cesta/trunk/parts/django/django/db/models/query.py", line 349, in get
% self.model._meta.object_name)
DoesNotExist: City matching query does not exist.
FAIL
======================================================================
FAIL: test_threaded_requests (cesta.core.tests.threadbase.ThreadingTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/jose/Work/cesta/trunk/src/cesta/core/tests/threadbase.py", line 52, in test_threaded_requests
self.assertNotEqual(cities, [])
AssertionError: [] == []
----------------------------------------------------------------------
Ran 2 tests in 0.278s
FAILED (failures=1)
Destroying test database for alias 'default' ('test_cesta')...
Remember that all this is happening in a PostgreSQL database, which is supposed to be thread safe, not with the SQLite or similars. Test was ran using PostgreSQL also.
At this point, I am totally lost about what can be failing. Any idea or suggestion?
Thanks!
EDIT: I wrote a little view just to check up if it works out of the tests. Here is the code of the view:
def get_cities(request):
queue = Queue.Queue()
def get_async_cities(q, n):
city = City.objects.get(pk=n)
q.put(city)
threads = [threading.Thread(target=get_async_cities, args=(queue, number)) for number in range(1, 5)]
[t.start() for t in threads]
[t.join() for t in threads]
cities = list()
while not queue.empty():
cities.append(queue.get())
return render_to_response('async/cities.html', {'cities': cities},
context_instance=RequestContext(request))
(Please, do not take into account the folly of writing the application logic inside the view code. Remember that this is only a proof of concept and would not be never in the real app.)
The result is that code is working nice, the requests are made successfully in threads and the view finally shows the cities after calling its URL.
So, I think making queries using threads will only be a problem when you need to test the code. In production, it will work without any problem.
Any useful suggestions to test this kind of code successfully?
Try using TransactionTestCase:
class ThreadingTest(TransactionTestCase):
TestCase keeps data in memory and doesn't issue a COMMIT to database. Probably the threads are trying to connect directly to DB, while the data is not commited there yet. Seedescription here:
https://docs.djangoproject.com/en/dev/topics/testing/?from=olddocs#django.test.TransactionTestCase
TransactionTestCase and TestCase are identical except for the manner
in which the database is reset to a known state and the ability for
test code to test the effects of commit and rollback. A
TransactionTestCase resets the database before the test runs by
truncating all tables and reloading initial data. A
TransactionTestCase may call commit and rollback and observe the
effects of these calls on the database.
Becomes more clear from this part of the documentation
class LiveServerTestCase(TransactionTestCase):
"""
...
Note that it inherits from TransactionTestCase instead of TestCase because
the threads do not share the same transactions (unless if using in-memory
sqlite) and each thread needs to commit all their transactions so that the
other thread can see the changes.
"""
Now, the transaction has not been committed inside a TestCase, hence the changes are not visible to the other thread.
This sounds like it's an issue with transactions. If you're creating elements within the current request (or test), they're almost certainly in an uncommitted transaction that isn't accessible from the separate connection in the other thread. You probably need to manage your transctions manually to get this to work.
I'm using Nick Johnson's Bulk Update library on google appengine (http://blog.notdot.net/2010/03/Announcing-a-robust-datastore-bulk-update-utility-for-App-Engine). It works wonderfully for other tasks, but for some reason with the following code:
from google.appengine.ext import db
from myapp.main.models import Story, Comment
import bulkupdate
class Migrate(bulkupdate.BulkUpdater):
DELETE_COMPLETED_JOBS_DELAY = 0
DELETE_FAILED_JOBS = False
PUT_BATCH_SIZE = 1
DELETE_BATCH_SIZE = 1
MAX_EXECUTION_TIME = 10
def get_query(self):
return Story.all().filter("hidden", False).filter("visible", True)
def handle_entity(self, entity):
comments = entity.comment_set
for comment in comments:
s = Story()
s.parent_story = comment.story
s.user = comment.user
s.text = comment.text
s.submitted = comment.submitted
self.put(s)
job = Migrate()
job.start()
I get the following error in my logs:
Permanent failure attempting to execute task
Traceback (most recent call last):
File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/deferred/deferred.py", line 258, in post
run(self.request.body)
File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/deferred/deferred.py", line 122, in run
raise PermanentTaskFailure(e)
PermanentTaskFailure: 'module' object has no attribute 'Migrate'
It seems quite bizarre to me. Clearly that class is right above the job, they're in the same file and clearly the job.start is being called. Why can't it see my Migrate class?
EDIT: I added this update job in a newer version of the code, which isn't the default. I invoke the job with the correct URL (http://version.myapp.appspot.com/migrate). Is it possible this is related to the fact that it isn't the 'default' version served by App Engine?
It seems likely that your declaration of the 'Migrate' class is in the handler script (Eg, the one directly invoked by app.yaml). A limitation of deferred is that you can't use it to call functions defined in the handler module.
Incidentally, my bulk update library is deprecated in favor of App Engine's mapreduce support; you should probably use that instead.