how to identify source of frequent process startup - python

Some months ago I used to play around with Python and Django, finally setting up a Django web service running python manage.py ... on a RaspberryPi. Now, I'd like to use the Linux device for other things. Unfortunately, there seems to be a frequent startup of some process (every couple of seconds) that eats up the available processing power. And I can not remember or see, WHO is starting this process or WHERE it is started.
The following picture shows a htop output. The process shown right below the title row uses 83% of CPU power and seems to be invoked by the following command line (run_gunicorn seems to be part of the Python / Django environment):
/home/pi/.virtualenvs/ENV_python27/bin/python /home/pi/examples/django__test/manage.py run_gunicorn -w 4 .
The fact that the PID of the odd process changes every couple of seconds makes it impossible for me as a linux novice to further invest its source and details. In the picture the process has the PID 24296.
Is there a way to find the place within the linux file system and its files where this process is frequently started? Can I somehow remove the respective command in order to not waste so much processing power? Are there a handful of possible places from where Linux can startup processes automatically (like CRON, which I have already checked)?
Please ask for more details and I will try to provide them.
Thanks.

The gunicorn process is probably being run by supervisor. Look at your /etc/supervisor/supervisord.conf file or /etc/supervisor/supervisord.conf.d directory.

Related

Python & Django: Working on a chroot jail to run a single bash script

I am facing the following problem and I am not sure if my approach is anywhere near 'right'.
I've built a Django application that handles students' assignments for a programming subject at university. The original version of this application (https://github.com/elcoya/seal) used a chroot'd daemon to get the code, delivered by the students, place a bash script along-side that code and execute de bash, which could contain any kind of opeartions, like building and testing the students' code. So far... so good. However, running this daemon was a bit of a headache. Since it ran within a jail, the binded /proc, within that jail, became obsolete every time the server was restarted (it was restarted from time to time :( ) or some error occur in the daemon, the process died or was killed, and therefor, stop doing it's job of "correcting" the students' deliveries.
To prevent this errors from happening, and have a more trust worthy automatic correction service, I would like to install a 'django-kronos' task (which runs from the crontab in the server) to do the same job. This would be great, but that would mean that from my Django stack code, I would need to move into the chroot to run the mentioned bash script.
SO suggests this post, but it is from 2012, and it kind of advises against what I am trying to do. Am I missing something here? Is os.chroot(/path/to/jail) the way to go?
You could run your user scripts inside a Docker container. Docker gives you all the benefit of of a jail and much more. For instance, it can restart a container for you if it the host running it were to be rebooted: https://docs.docker.com/engine/admin/start-containers-automatically/

Scheduling tasks using Python's Schedule module

From the docs:
import schedule
import time
def job():
print("I'm working...")
schedule.every(10).minutes.do(job)
schedule.every().hour.do(job)
schedule.every().day.at("10:30").do(job)
while True:
schedule.run_pending()
time.sleep(1)
I understand that while the program is running, it will do the function you tell it to run. What I don't understand is how you would go about making this an automated task for every day. Is the idea that you would call this from the command line and always leave that open? If I shut off my computer, I would have to re-enable that again wouldn't I?
I feel there is something I am missing when creating an automated Python task in this case. I am on a windows environment.
Here is the overview: Running tasks as startup items means different things on each OS which has nothing to do with python specifically.
On windows you could set it up as a windows service by wrapping it using the python library Pyinstaller (which changes your script to an .exe file then running your.exe install --startup='auto'
On Linux based OS's you would need to check where to put the script because the startup sequence has changed in the last few years. There are even management software packages to make it easier.
On mac there is the GUI tools for controlling startup services as well as launchctl http://www.macworld.com/article/2047747/take-control-of-startup-and-login-items.html
You can take a look at the process currently on your computer by going to:
Windows: Task manager (press ctrl-alt-delete and select Task manager)
(depending on your windows version) click the Details tab. You will see the User name be blank or have "System" if it's run as a system process.
Linux or Mac: in a terminal type ps -Al
responding to comments:
System level - if nobody is logged in what is your computer doing? (your script?, web server?, protein folding?, dreaming of electric sheep?)
Yes, Python would be taking up resources each time you run a separate script. I have Gigs of RAM and Python takes <30 MB to run each script (depending on the size of libraries + size of program+ io bound + cpu bound problems). Your system is running >100 processes currently and it able to run 1000's. Don't worry about optimizing your program on the system till it's a problem.

Best way to make a Python scheduler

I am working with scrapy 0.20 on python 2.7
Question
What is the best python scheduler.
My need
I need to run my spider, which is a python script, each 3 hours.
What I have thought
I tried using windows scheduler features which comes with Windows 7 and It works good. I am able to run a python script each 3 hours but I may deploy my python script on Linux server so I may not be able to use this option.
I create a Java application using Quartz-Scheduler. It works good but this is a third library, which my manager may refuse.
I created a windows service and I made it fire the script each three hours. It works but I may deploy my python script on a Linux server so I may not be able to use this option.
I am asking about the best practice to fire a python script
I tried using windows scheduler features which comes with Windows 7 and it works good.
So that works fine for you already. Good, no need to change your script to do scheduling work yourself.
but I may deploy my python script on Linux server so I may not be able to use this option.
On Linux, you can use cron jobs to achieve this.
The other way would be to simply keep your script running the whole time, but pause it for the three hours in which you are doing nothing. So you don’t need to set up anything on the target machine, but just need to run the script in the background, and it will keep running and doing its job.
This is exactly how job schedulers work btw. They are launched early when the operating system starts, and then they just keep running forever and every short time interval (a minute or so) they check if there is any job on their list that needs to run now. And if that’s the case, they spawn a new process and run the job.
So if you wanted to make such a scheduler in Python, you would just keep it running forever, and once every time interval (in your case 3 hours because you only have a single job anyway), you start your job. That can be in a separate process, in a separate thread, or indirectly in a separate thread using asynchronous functions.
The best way to deploy/schedule your scrapy project is to use scrapyd server.
You should install scrapyd.
sudo-apt get install scrapyd
You change your project config file to something like this :
[deploy:somename]
url = http://localhost:6800/ ## this the default
project = scrapy_project
you deploy your project under the scrapyd server:
scrapy deploy somename
You change your poll interval in /etc/scrapyd/conf.d/default-000 to 3 hours ( default to 5 seconds):
poll_interval = 10800
You configure your spider something like :
curl http://localhost:6800/schedule.json -d project=scrapy_project -d spider=myspider
You can use the web service to monitor your jobs:
http://localhost:6800/
PS: I just test it under ubuntu So I am not sure that a windows version exist. If not you can install a VM with ubuntu to launch the spiders.
Well, there's always the charming
sched
(docs) module, which provides a generic scheduling interface.
Give it a time function and a sleep function, and it'll give you back a pretty nice and extensible scheduler.
It's not system-level, but if you can run it as a service, it should suffice.

Daemon vs Upstart for python script

I have written a module in Python and want it to run continuously once started and need to stop it when I need to update other modules. I will likely be using monit to restart it, if module has crashed or is otherwise not running.
I was going through different techniques like Daemon, Upstart and many others.
Which is the best way to go so that I use that approach through out my all new modules to keep running them forever?
From your mention of Upstart I will assume that this question is for a service being run on an Ubuntu server.
On an Ubuntu server an upstart job is really the simplest and most convenient option for creating an always on service that starts up at the right time and can be stopped or reloaded with familiar commands.
To create an upstart service you need to add a single file to /etc/init. Called <service-name>.conf. An example script looks like this:
description "My chat server"
author "your#email-address.com"
start on runlevel [2345]
stop on runlevel [!2345]
env AN_ENVIRONMENTAL_VARIABLE=i-want-to-set
respawn
exec /srv/applications/chat.py
This means that everytime the machine is started it will start the chat.py program. If it dies for whatever reason it will restart it. You don't have to worry about double forking or otherwise daemonizing your code. That's handled for you by upstart.
If you want to stop or start your process you can do so with
service chat start
service chat stop
The name chat is automatically found from the name of the .conf file inside /etc/init
I'm only covering the basics of upstart here. There are lots of other features to make it even more useful. All available by running man upstart.
This method is much more convenient, than writing your own daemonization code. A 4-8 line config file for a built in Ubuntu component is much less error prone than making your code safely double fork and then having another process monitor it to make sure it doesn't go away.
Monit is a bit of a red herring. If you want downtime alerts you will need to run a monitoring program on a separate server anyway. Rely on upstart to keep the process always running on a server. Then have a different service that makes sure the server is actually running. Downtime happens for many different reasons. A process running on the same server will tell you precisely nothing if the server itself goes down. You need a separate machine (or a third party provider like pingdom) to alert you about that condition.
You could check out supervisor. What it is capable of is starting a process at system startup, and then keeping it alive until shutdown.
The simplest configuration file would be:
[program:my_script]
command = /home/foo/bar/venv/bin/python /home/foo/bar/scripts/my_script.py
environment = MY_ENV_VAR=FOO, MY_OTHER_ENV_VAR=BAR
autostart = True
autorestart = True
Then you could link it to /etc/supervisord/conf.d, run sudo supervisorctl to enter management console of supervisor, type in reread so that supervisor notices new config entry and update to display new programs on the status list.
To start/restart/stop a program you could execute sudo supervisorctl start/restart/stop my_script.
I used old-style initscript with start-stop-daemon utility.Look at skel in /etc/init.d

Tornadoweb webapp cannot be managed via upstart

Few days ago I found out that my webapp wrote ontop of the tornadoweb framework doesn't stop or restart via upstart. Upstart just hangs and doesn't do anything.
I investigated the issue and found that upstart recieves wrong PID, so it can only run once my webapp daemon and can't do anything else.
Strace shows that my daemon makes 4 (!) clone() calls instead of 2.
Week ago anything was good and webapp was fully and correctly managed by the upstart.
OS is Ubuntu 10.04.03 LTS (as it was weeks ago).
Do you have any ideas how to fix it?
PS: I know about "expect fork|daemon" directive, it changes nothing ;)
Sorry my silence, please.
Investigation of the issue ended with the knowledge about uuid python library which adds 2 forks to my daemon. I get rid of this lib and tornado daemon works now properly.
Alternative answer was supervisord which can run any console tools as a daemon which can't daemonize by itself.
There are two often used solutions
The first one is to let your application honestly report its pid. If you could force your application to write the actual pid into the pidfile then you could get its pid from there.
The second one is a little more complicated. You may add specific environment variable for the script invocation. This environment variable will stay with all the forks if forks don't clear environment and than you can find all of your processes by parsing /proc/*/environ files.
There should be easier solution for finding processes by their environment but I'm not sure.

Categories

Resources