In a Django view I spawn a thread (class of threading.Thread) which in turn creates a multiprocessing pool of 5 workers.
Yes, I know using a task queue like Celery is usually the accepted way of doing things, but in this case we needed threads/multiprocessing.
Both the Thread and each of the Multiprocess Workers access items in the database. However, doing any call to a Django Model in the Thread or Worker causes a "django.core.exceptions.AppRegistryNotReady: Models aren't loaded yet" exception.
Here is the full stack trace:
Process SpawnPoolWorker-2:
Traceback (most recent call last):
File "C:\Python34\lib\multiprocessing\process.py", line 254, in _bootstrap
self.run()
File "C:\Python34\lib\multiprocessing\process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "C:\Python34\lib\multiprocessing\pool.py", line 108, in worker
task = get()
File "C:\Python34\lib\multiprocessing\queues.py", line 357, in get
return ForkingPickler.loads(res)
File "d:\bryan\Documents\Projects\spreadmodel_3.4_venv\lib\site-packages\djang
o\db\models\fields\__init__.py", line 59, in _load_field
return apps.get_model(app_label, model_name)._meta.get_field_by_name(field_n
ame)[0]
File "d:\bryan\Documents\Projects\spreadmodel_3.4_venv\lib\site-packages\djang
o\apps\registry.py", line 199, in get_model
self.check_models_ready()
File "d:\bryan\Documents\Projects\spreadmodel_3.4_venv\lib\site-packages\djang
o\apps\registry.py", line 131, in check_models_ready
raise AppRegistryNotReady("Models aren't loaded yet.")
django.core.exceptions.AppRegistryNotReady: Models aren't loaded yet.
It is odd to me that I don't see any part of my code in the stack trace.
I've tried doing django.setup() in the Thread init and at the beginning of the method the Workers start, still with no success.
At no point in my models do I try to read anything from the database like the common issue of doing a foreign key to the user model.
EDIT:
I can get the database queries to work in the Thread by putting django.setup just under the Simulation class instead of having it in the init method.
But I'm still having issues with the queries in the Workers.
Edit2:
If I modify Python's queue.Queue file and put the django.setup() call in the get function, everything works great.
However, this is not a valid solution. Any ideas?
Edit3:
If I run the tests inside PyCharm, then the test associated with this problem works. Running the test in the normal command line outside of PyCharm (or running the view server from a server [django test server or CherryPy]) results in the above error.
If it helps, here is a link to the views.py on GitHub.
https://github.com/NAVADMC/SpreadModel/blob/b4bbbcf7020a3e4df0d021942ddcc5039234bd88/Results/views.py
For future reference (after we fix the bug), you can see the odd behaviour on commit b4bbbcf7 (linked above).
Related
Hey guys I'm working on a prototype for a project at my school (I'm a research assistant so this isn't a graded project). I'm running celery on a server cluster (with 48 workers/cores) which is already setup and working. The nutshell of my project is that we want to use celery for some number crunching of a rather large amount of files/tasks.
Because of this it is very important that we save results to an actual file, we have gigs upon gigs of data and it WON'T fit in RAM while running the traditional task queue/backend.
Anyways...
My prototype (with a trivial add function):
task.py
from celery import Celery
app=Celery()
#app.task
def mult(x,y):
return x*y
And this works great when I execute: $ celery worker -A task -l info
But if I try and add a new backend:
from celery import Celery
app=Celery()
app.conf.update(CELERY_RESULT_BACKEND = 'file://~/Documents/results')
#app.task
def mult(x,y):
return x*y
I get a rather large error:
[2017-08-04 13:22:18,133: CRITICAL/MainProcess] Unrecoverable error:
AttributeError("'NoneType' object has no attribute 'encode'",)
Traceback (most recent call last):
File "/home/bartolucci/anaconda3/lib/python3.6/site-packages/kombu/utils/objects.py", line 42, in __get__
return obj.__dict__[self.__name__]
KeyError: 'backend'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/bartolucci/anaconda3/lib/python3.6/site- packages/celery/worker/worker.py", line 203, in start
self.blueprint.start(self)
File "/home/bartolucci/anaconda3/lib/python3.6/site-packages/celery/bootsteps.py", line 115, in start
self.on_start()
File "/home/bartolucci/anaconda3/lib/python3.6/site-packages/celery/apps/worker.py", line 143, in on_start
self.emit_banner()
File "/home/bartolucci/anaconda3/lib/python3.6/site-packages/celery/apps/worker.py", line 158, in emit_banner
' \n', self.startup_info(artlines=not use_image))),
File "/home/bartolucci/anaconda3/lib/python3.6/site-packages/celery/apps/worker.py", line 221, in startup_info
results=self.app.backend.as_uri(),
File "/home/bartolucci/anaconda3/lib/python3.6/site-packages/kombu/utils/objects.py", line 44, in __get__
value = obj.__dict__[self.__name__] = self.__get(obj)
File "/home/bartolucci/anaconda3/lib/python3.6/site-packages/celery/app/base.py", line 1183, in backend
return self._get_backend()
File "/home/bartolucci/anaconda3/lib/python3.6/site-packages/celery/app/base.py", line 902, in _get_backend
return backend(app=self, url=url)
File "/home/bartolucci/anaconda3/lib/python3.6/site-packages/celery/backends/filesystem.py", line 45, in __init__
self.path = path.encode(encoding)
AttributeError: 'NoneType' object has no attribute 'encode'
I am only 2 days into this project and have never worked with celery (or a similar library) before (I come from the algorithmic, mathy side of the fence). I currently wrangling with celery's user guide docs, but they're honestly pretty sparse on this detail.
Any help is much appreciated and thank you.
Looking at the celery code for filesystem backed result backend here.
https://github.com/celery/celery/blob/master/celery/backends/filesystem.py#L54
Your path needs to start with file:/// (3 slashes)
Your settings has it starting with file:// (2 slashes)
You might also want to use the absolute path instead of the ~.
I have a few twitterbots that I run on my raspberryPi. I have most functions wrapped in a try / except to ensure that if something errors it doesn't break the program and continues to execute.
I'm also using Python's Streaming library as my source of monitoring for the tags that I want the bot to retweet.
Here is an issue that happens that kills the program although I have the main function wrapped in a try/except:
Unhandled exception in thread started by <function startBot5 at 0x762fbed0>
Traceback (most recent call last):
File "TwitButter.py", line 151, in startBot5
'<botnamehere>'
File "/home/pi/twitter/bots/TwitBot.py", line 49, in __init__
self.startFiltering(trackList)
File "/home/pi/twitter/bots/TwitBot.py", line 54, in startFiltering
self.myStream.filter(track=tList)
File "/usr/local/lib/python3.4/dist-packages/tweepy/streaming.py", line 445, in filter
self._start(async)
File "/usr/local/lib/python3.4/dist-packages/tweepy/streaming.py", line 361, in _start
self._run()
File "/usr/local/lib/python3.4/dist-packages/tweepy/streaming.py", line 294, in _run
raise exception
File "/usr/local/lib/python3.4/dist-packages/tweepy/streaming.py", line 263, in _run
self._read_loop(resp)
File "/usr/local/lib/python3.4/dist-packages/tweepy/streaming.py", line 313, in _read_loop
line = buf.read_line().strip()
AttributeError: 'NoneType' object has no attribute 'strip'
My setup:
I have a parent class TwitButter.py, that creates an object from the TwitBot.py. These objects are the bots, and they are started on their own thread so they can run independently.
I have a function in the TwitBot that runs the startFiltering() function. It is wrapped in a try/except, but my except code is never triggered.
My guess is that the error is occurring within the Streaming library. Maybe that library is poorly coded and breaks on the line that is specified at the bottom of the traceback.
Any help would be awesome, and I wonder if others have experienced this issue?
I can provide extra details if needed.
Thanks!!!
This actually is problem in tweepy that was fixed by github #870 in 2017-04. So, should be resolved by updating your local copy to latest master.
What I did to discover that:
Did a web search to find the tweepy source repo.
Looked at streaming.py for context on the last traceback lines.
Noticed the most recent change to the file was the same problem.
I'll also note that most of the time you get a traceback from deep inside a Python library, the problem comes from the code calling it incorrectly, rather than a bug in the library. But not always. :)
Apologies if the title is not descriptive and I struggle to find a good title for the question.
My question involves Python, CouchDB (to a lesser degree), multiprocessing and networking. It started out as I was trying to debug a co-worker's program using Python's multiprocessing module to parallelize requests to a CouchDB database using couchdb-python. I created a minimal program to exhibit the bug and eventually solved the issue, but the solution drew another question which I was not able to answer to my best knowledge. I'm hoping experts on SO could help me with this, so here it goes.
The premise of the problem is pretty simple. We have n resources, all of which can be retrieved concurrently. Instead of making n serial requests, my co-worker is using the multiprocessing module to fetch all n resources in parallel. Here's a program I wrote to demonstrate the issue:
The Script (bug.py)
import couchdb
import multiprocessing
server = couchdb.Server(SERVER)
try:
database = server.create('test')
except:
server.delete('test')
database = server.create('test')
database.save({'_id': '1', 'type': 'dog', 'name': 'chase'})
database.save({'_id': '2', 'type': 'dog', 'name': 'rubble'})
database.save({'_id': '3', 'type': 'cat', 'name': 'kali'})
def query_id(id):
print(dict(database[id]))
def main():
args = [
['dog', 'chase'],
['dog', 'rubble'],
['cat', 'kali'],
]
print('-' * 80)
processes = []
for id_ in ['1', '2', '3']:
proc = multiprocessing.Process(target=query_id, args=(id_))
processes.append(proc)
proc.start()
for proc in processes:
proc.join()
if __name__ == '__main__':
main()
Pretty innocent code, right? Well, running it on the latest couchdb and couchdb-python gives the following error:
The output
--------------------------------------------------------------------------------
Process Process-2:
Process Process-1:
Traceback (most recent call last):
Traceback (most recent call last):
File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
self.run()
File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
self._target(*self._args, **self._kwargs)
File "bug.py", line 25, in query_id
File "bug.py", line 25, in query_id
print(dict(database[id]))
File "/home/kevin/src/couchdb-python/couchdb/client.py", line 418, in __getitem__
print(dict(database[id]))
File "/home/kevin/src/couchdb-python/couchdb/client.py", line 418, in __getitem__
return Document(data)
TypeError: 'ResponseBody' object is not iterable
return Document(data)
TypeError: 'ResponseBody' object is not iterable
After some digging, I finally found out that couchdb-python's implementation of ConnectionPool is not multiprocess safe. See this PR for more details. Basically, all processes share the same ConnectionPool object, and was given the same httplib.HTTPConnection object, and when they all simultaneously try to read from the connection, the string being returned is garbled, and the bug ensued. You can see the evidence of it if you put print(os.getpid(), line) inside httplib.HTTPResponse._read_status method. Here's a sample output after the print statement is added:
(26490, 'TP1.120 O\r\n')
(26489, 'T/ 0KServer: CouchDB/1.6.1 (Erlang OTP/17)\r\n')
Process Process-2:
Process Process-3:
Traceback (most recent call last):
Traceback (most recent call last):
File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "bug.py", line 25, in query_id
self.run()
File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
print(dict(database[id]))
File "/home/kevin/src/couchdb-python/couchdb/client.py", line 418, in __getitem__
self._target(*self._args, **self._kwargs)
File "bug.py", line 25, in query_id
print(dict(database[id]))
File "/home/kevin/src/couchdb-python/couchdb/client.py", line 418, in __getitem__
return Document(data)
TypeError: 'ResponseBody' object is not iterable
return Document(data)
TypeError: 'ResponseBody' object is not iterable
As seen here, the first line being read from the sub-processes are only partial, indicating a race condition here. If I further inspect the HTTPConnection object, I can see all three processes are sharing the same connection object, the socket to the server and the file descriptor from the socket that's being used for reading.
Puzzle
So far so good. I've identified the root cause of the problem and put together a fix. However, complication arises when I put the couchdb instance behind a reverse proxy. In this case, I'm using haproxy. Here's a sample config:
global
...
defaults
...
listen couchdb
bind *:9999
mode http
stats enable
option httpclose
option forwardfor
server couchdb-1 127.0.0.1:5984 check
and point the couchdb server url to http://localhost:9999 in the bug script, reran the script, and everything was fine! I also inspected the connection object, the socket and the file descriptor, and there were also shared among all processes.
This got me puzzled. I brought up mitmproxy and inspected what's going on in the two cases: with or without haproxy.
Without haproxy
When the parallel requests are made without haproxy, I observed in the mitmproxy details tab (I showed a single request, but the timing sequence is the same for all 3 concurrent requests):
The event sequence here suggests a blocking synchronous request.
With haproxy
You can see the sequence here is different from that without haproxy. Request is considered complete without the server connection being initiated.
Question
I'm not used to working at this low level so I know my knowledge in this regard is pretty lacking here. I want to understand what difference did putting haproxy in front of it brought that subverted the multiprocessing bug in couchdb-python? haproxy is event-based, so I suspect that has something to do with it, but would really appreciate someone explaining the difference!
Thanks a bunch in advance!
I am trying to use the example Pika Async consumer (http://pika.readthedocs.io/en/0.10.0/examples/asynchronous_consumer_example.html) as a multiprocessing process (by making the ExampleConsumer class subclass multiprocessing.Process). However, I'm running into some issues with gracefully shutting down everything.
Let's say for example I have defined my procs as below:
for k, v in queues_callbacks.iteritems():
proc = ExampleConsumer(queue, k, v, rabbit_user, rabbit_pw, rabbit_host, rabbit_port)
"queues_callbacks" is basically just a dictionary of exchange : callback_function (ideally I'd like to be able to connect to several exchanges with this architecture).
Then I do the normal python way of dealing with starting processes:
try:
for proc in self.consumers:
proc.start()
for proc in self.consumers:
proc.join()
except KeyboardInterrupt:
for proc in self.consumers:
proc.terminate()
proc.join(1)
The issue is coming when I try to stop everything. Let's say I've overriden the "terminate" method to call the consumer's "stop" method then continue on with the normal terminate of Process. With this structure, I am getting some strange attribute errors
Traceback (most recent call last):
File "/Users/christopheralexander/PycharmProjects/new_bot/abstract_bot.py", line 154, in <module>
main()
File "/Users/christopheralexander/PycharmProjects/new_bot/abstract_bot.py", line 150, in main
mybot.start()
File "/Users/christopheralexander/PycharmProjects/new_bot/abstract_bot.py", line 71, in start
self.stop()
File "/Users/christopheralexander/PycharmProjects/new_bot/abstract_bot.py", line 53, in stop
self.__stop_consumers__()
File "/Users/christopheralexander/PycharmProjects/new_bot/abstract_bot.py", line 130, in __stop_consumers__
self.consumers[0].terminate()
File "/Users/christopheralexander/PycharmProjects/new_bot/rabbit_consumer.py", line 414, in terminate
self.stop()
File "/Users/christopheralexander/PycharmProjects/new_bot/rabbit_consumer.py", line 399, in stop
self._connection.ioloop.start()
AttributeError: 'NoneType' object has no attribute 'ioloop'
It's as if these attributes somehow disappear at some point. In the particular case above, _connection is initialized as None, but then gets set when the Consumer is started. However, when the "stop" method is called, it has already reverted back to None (with nothing set to do so). I'm also observing other strange behavior, such as times when it appears that things are getting called twice (even though "stop" is called once). Any ideas as to what is going on here, or is this not the proper way of architecting this?
Thanks!
I am trying to setup an application based on the Google App Engine using the Managed VM feature.
I am using a shared library written in C++ using ctypes
cdll.LoadLibrary('./mylib.so')
which registers a callback function
CB_FUNC_TYPE = CFUNCTYPE(None, eSubscriptionType)
cbFuncType = CB_FUNC_TYPE(scrptCallbackHandler)
in which i want to save data to the ndb datastore
def scrptCallbackHandler(arg):
model = Model(name=str(arg.data))
model.put()
I am registering a callback function in which i want to take the Data from the C++ program and put it in the ndb datastore. This results in an error. On the devserver it behaves slightly different, so from a production server:
suspended generator _put_tasklet(context.py:343) raised BadRequestError(Application Id (app) format is invalid: '_')LOG 2 1429698464071045 suspended generator put(context.py:810) raised BadRequestError(Application Id (app) format is invalid: '_')
Traceback (most recent call last):
File "_ctypes/callbacks.c", line 314, in 'calling callback function' File "/home/vmagent/app/isw_cloud_client.py", line 343, in scrptCallbackHandler node.put()
File "/home/vmagent/python_vm_runtime/google/appengine/ext/ndb/model.py", line 3380, in _put return self._put_async(**ctx_options).get_result()
File "/home/vmagent/python_vm_runtime/google/appengine/ext/ndb/tasklets.py", line 325, in get_result self.check_success()
File "/home/vmagent/python_vm_runtime/google/appengine/ext/ndb/tasklets.py", line 368, in _help_tasklet_along value = gen.throw(exc.__class__, exc, tb)
File "/home/vmagent/python_vm_runtime/google/appengine/ext/ndb/context.py", line 810, in put key = yield self._put_batcher.add(entity, options)
File "/home/vmagent/python_vm_runtime/google/appengine/ext/ndb/tasklets.py", line 368, in _help_tasklet_along value = gen.throw(exc.__class__, exc, tb)
File "/home/vmagent/python_vm_runtime/google/appengine/ext/ndb/context.py", line 343, in _put_tasklet keys = yield self._conn.async_put(options, datastore_entities)
File "/home/vmagent/python_vm_runtime/google/appengine/ext/ndb/tasklets.py", line 454, in _on_rpc_completion result = rpc.get_result()
File "/home/vmagent/python_vm_runtime/google/appengine/api/apiproxy_stub_map.py", line 613, in get_result return self.__get_result_hook(self)
File "/home/vmagent/python_vm_runtime/google/appengine/datastore/datastore_rpc.py", line 1827, in __put_hook self.check_rpc_success(rpc)
File "/home/vmagent/python_vm_runtime/google/appengine/datastore/datastore_rpc.py", line 1342, in check_rpc_success raise _ToDatastoreError(err)google.appengine.api.datastore_errors.BadRequestError: Application Id (app) format is invalid: '_'
The start of the C++ program is triggered by a call to a Request handler but runs in the background and accepts incoming data which should be processed in the callback.
Update: As Tim pointed out already it seems that the context of the wsgi handler is lost. Most likely the solution here would be to create the application context somehow.
I am only guessing what is my problem and i want to tell what i did to solve it.
The execution context of the callback functions is somewhat different than the rest of the python application. Any asynchronous operation in the callback fails. I tried doing an http call or saving it to the datastore. The operations never finish and after 60s the application shows an error that they crashed. I guess this is because how the python manages the execution and the corresponding memory allocation.
I was able to execute the callback in an object's context by wrapping it in a closure within a class. This wasnt really the problem but the solution can be found in this answer: How can I get methods to work as callbacks with python ctypes?
For my solution i am now using a combination of cloud-endpoints on another module and background threads on the ctypes-module.
Within the C-Callback i start a background thread, which is able to do asynchronous work
# Start a background thread using the background thread service from GAE
background_thread.start_new_background_thread(putData, [name, value])
And here the simple task it executes:
# Here i call my cloud-endpoints
def putData(name, value):
body = {
'name' : 'name',
'value' : int(value)
}
res = service.objects().create(body=body).execute()
Of course i need to do error handling and additional stuff, but for me this is a good solution.
Note: Adding models to the datastore in the bg thread failed because the environment in the bg thread is different from the application and the app id was not set.