I was testing out RQ (Redis-Queue) when after running the command rqworker and testing some things out I ended up rebooting my computer without gracefull shuting down the worker.
Now whether or not I have rqworker running there is a constant worker in the background named Ubuntu.4497 in idle state.
Can someone tell me how to gracefully shutdown this worker that seems to be running in the background ?
44497 is not the PID because I cant find anything with the PID 4497
RQ main dev stated that its a bug. https://github.com/nvie/rq/issues/55
Related
suppose I am running a worker dyno with following python code in heroku
import time
time.sleep(60*60)
exit()
I want to stop the worker completely, this code ends the program, but it starts again having the same effect as heroku ps:restart worker what should I write in the code to have the same effect as heroku ps:scale worker=0, is this possible ? If not what are my alternatives ?
The worker processes defined in your Procfile should be always-on workers, same as the web-processes you define in there.
For one-off tasks that exit afterwards you can use a one-off worker dyno. You can also easily define them in the Heroku scheduler.
I'm running a celery worker as a systemd daemon which serves a lot of long-running agents.
When I restart the worker all the agents hang and stop running new tasks waiting for pending ones.
Restarting agents is not an acceptable solution for me.
I'd also avoid using task timeouts
Is there a way to restart the worker gracefully to not impact already running agents?
All the agents are python scripts.
I start the worker by executing the following in the terminal:
celery -A cel_test worker --loglevel=INFO --concurrency=10 -n worker1.%h
Then I get a long looping error message stating that celery has received an unregistered task and has triggered:
KeyError: 'cel_test.grp_all_w_codes.mk_dct' #this is the name of the task
The problem with this is that cel_test.grp_all_w_codes.mk_dct doesn't exist. In fact there isn't even a module cel_test.grp_all_w_codes let alone the task mk_dct. There was once a few days ago but I've since deleted it. I thought maybe there was a .pyc file floating around but there isn't. I also can't find a single reference in my code to the task that's throwing the error. I shut down my computer and restarted the rabbitmq server thinking maybe a reference to something was just stuck in memory but it did not help.
Does anyone have any idea what could be the problem here or what I'm missing?
Well, without knowing your conf files, I can see two reasons that would provoke this:
the mk_dct task wasn't completed when you stopped the worker and delete the module. If you're running with CELERY_ACKS_LATE, it will try to relaunch the task everytime you re run the worker. Try remove this setting, or launch the worker with the purge option.
celery -A cel_test worker --loglevel=INFO --concurrency=10 -n worker1.%h --purge
the mk_dct task is launched by your celery beat. If so, try relaunching celery beat and clearing it's database backend if you had a custom one.
If it does not solve the problem, please post your celery conf, and make sure you have cleaned all the .pyc of your project and restarted everything.
Updated post:
I have a python web application running on a port. It is used to monitor some other processes and one of its features is to allow users to restart his own processes. The restart is done through invoking a bash script, which will proceed to restart those processes and run them in the background.
The problem is, whenever I kill off the python web application after I have used it to restart any user's processes, those processes will take take over the port used by the python web application in a round-robin fashion, so I am unable to restart the python web application due to the port being bounded. As a result, I must kill off the processes involved in the restart until nothing occupies the port the python web application uses.
Everything is ok except for those processes occupying the port. That is really undesirable.
Processes that may be restarted:
redis-server
newrelic-admin run-program (which spawns another web application)
a python worker process
UPDATE (6 June 2013): I have managed to solve this problem. Look at my answer below.
Original Post:
I have a python web application running on a port. This python program has a function that calls a bash script. The bash script spawns a few background processes, then exits.
The problem is, whenever I kill the python program, the background processes spawned by the bash script will take over and occupy that same port.
Specifically the subprocesses are:
a redis server (with daemonize = true in the configuration file)
newrelic-admin run-program (spawns a web application)
a python worker process
Update 2: I've tried running these with nohup. Only the python worker process doesnt attempt to take over the port after I kill the python web application. The redis server and newrelic-admin still do.
I observed this problem when I was using subprocess.call in the python program to run the bash script. I've tried a double fork method in the python program before running the bash script, but it results in the same problem.
How can I prevent any processes spawned from the bash script from taking over the port?
Thank you.
Update: My intention is that, those processes spawned by the bash script should continue running if the python application is killed off. Currently, they do continue running after I kill off the python application. The problem is, when I kill off the python application, the processes spawned by the bash script start to take over the port in a round-robin fashion.
Update 3: Based on the output I see from 'pstree' and 'ps -axf', processes 1 and 2 (the redis server and the web app spawned by newrelic-admin run-program) are not child processes of the python web application. This makes it even weirder that they take over the port that the python web application occupies when I kill it... Anyone knows why?
Just some background on the methods I've tried to solve my above problem, before I go on to the answer proper:
subprocess.call
subprocess.Popen
execve
the double fork method along with one of the above (http://code.activestate.com/recipes/278731-creating-a-daemon-the-python-way/)
By the way, none of the above worked for me. Whenever I killed off the web application that executes the bash script (which in turns spawns some background processes we shall denote as Q now), the processes in Q will in a round-robin fashion, take over the port occupied by the web application, so I had to kill them one by one before I could restart my web application.
After many days of living with this problem and moving on to other parts of my project, I thought of some random Stack Overflow posts and other articles on the Internet and from my own experience, recalled my experience of ssh'ing into a remote and starting a detached screen session, then logging out, and logging in again some time later to discover the screen session still alive.
So I thought, hey, what the heck, nothing works so far, so I might as well try using screen to see if it can solve my problem. And to my great surprise and joy it does! So I am posting this solution hopefully to help those who are facing the same issue.
In the bash script, I simply started the processes using a named screen process. For instance, for the redis application, I might start it like this:
screen -dmS redisScreenName redis-server redis.conf
So those processes will keep running on those detached screen sessions they were started with. In this case, I did not daemonize the redis process.
To kill the screen process, I used:
screen -S redisScreenName -X quit
However, this does not kill the redis-server. So I had to kill it separately.
Now, in the python web application, I can just use subprocess.call to execute the bash script, which will spawn detached screen sessions (using 'screen -dmS') which run the processes I want to spawn. And when I kill off the python web application, none of the spawned processes take over its port. Everything works smoothly.
I have written an Upstart job to run celery in my Ubuntu server. Here's my configuration file called celeryd.conf
# celeryd - runs the celery daemon
#
# This task is run on startup to run the celery daemon
description "run celery daemon"
start on startup
expect fork
respawn
exec su - trakklr -c "/app/trakklr/src/trakklr celeryd --events --beat --loglevel=debug --settings=production"
When I execute sudo service celeryd start, the celeryd process starts just fine and all the x number of worker process start fine.
..but when I execute, sudo service celeryd stop, it stops most of the processes but a few processes are left hanging.
Why is this happening? I'm using Celery 2.5.3.
Here's an issue from the Github tracker.
https://github.com/celery/django-celery/issues/142
I still use init.d to run celery so this may not apply. With that in mind, stopping the celery service sends the TERM signal to celery. This tells the workers not to accept new tasks but it does not terminate existing tasks. Therefore, depending on how long your tasks take to execute you may see tasks for some time after telling celery to stop. Eventually, they will all shut down unless you have some other problem.
I wasn't able to figure this out but it seemed to be an issue with my older celery version. I found this issue mentioned on their issue-tracker and I guess it points to the same issue:
https://github.com/celery/django-celery/issues/142
I upgraded my celery and django-celery to the 3.x.x versions and this issue was gone.