python script process stat stay on Sl and stop running

python script process stat stay on Sl and stop running - python

I run a python script and need to run a long time，but when I run a few hours it will stop，and I type ps aux the result is:
root 10371 0.9 10.4 273236 52232 ? Sl 09:35 6:23 python my_programe.py
then I try to use kill -18 10371 to call it , but useless, how can I continue to call it to run again?

The process state S doesn't mean that the process has stopped (and thus trying to use SIGCONT to continue is of course useless), rather it means Interruptible sleep (waiting for an event to complete). You should be able to see how long the process sleeps or what it waits for by using strace -p.

Related

Know if subprocess is not stuck by it's prints to stdout

I have subprocess that I am running by:
proc = subprocess.Popen("python -u my_script.py", shell=True)
my_script.py should print regularly to stdout and I have other non related process that is listening to this output so I can't change the output to be printed to somewhere else.
I want to ensure that the process is really regularly printing and not got stuck in some loop .etc, do I have way to check if stdout was wroten for some amount of time?
any other options to reach this goal?
EDIT
I am using windows

you can create a named pipe with mkfifo and use tee to output your script's data to both the process listening for it and the pipe.
mkfifo blarg
my_script.py | tee blarg | your_greedy_data_processing_instance
tail -f blarg
instead of tail you can use an arbitrarly complicated script to study the output and the state of the process generating it (timers, pid checks)

It appears that the access time and modification time of /dev/stdout is updated regularly. Note, however, that /dev/stdout will always be a soft link -- er, a symbolic link, I mean -- to the file handle of stdout for the process that's checking /dev/stdout. I.e., /dev/stdout links to /proc/self/fd/1.
So it seems that you could check the first file descriptor of your process to see if its modification time has changed, e.g.:
$ stat -c %y -L /proc/10830/fd/1
2021-05-13 02:34:00.367857061
-L means act on the target of the soft link, not the soft link itself; -c %y is just asking for the modification time. This Python script is running as process 10830 on my system right now, and it's occasionally updating the modification time (about every 8 seconds):
>>> import time
>>> while True: time.sleep(1); print("still alive")
still alive
still alive
still alive
....
You should Google this answer to be sure that the behavior I'm seeing is reliable, though, because I've never read anything about it before.
Alternatively, you could either (a) trust that the script is fine -- which it will, of course, always be (unless it's catching exceptions and refusing to exit even if it can no longer do anything useful, in which case you should change it to die the way it should), or (b) set up a daemon to do something like send a signal to the script, at which point the script could send a signal to the daemon to say "I'm still alive." There's literally no reason to do that, in my opinion, but how you write your programs is up to you.
So assuming that you want to press forward with this, here's a trivial example of the daemon that would monitor the script you want to make sure isn't stuck in a loop or something:
import time
import signal
import os
import sys
# keep a timestamp of when we receive a response
response_timestamp = time.time()
# add code here to get the process ID of the other script
other_pid = 0
def sig_handler(signum, frame):
global response_timestamp
response_timestamp = time.time()
if __name__ == '__main__':
# make sure that when we receive SIGBREAK, sig_handler() gets called
signal.signal(signal.SIGBREAK, sig_handler)
while True:
# send SIGBREAK to "other_pid"
os.kill(other_pid, signal.SIGBREAK)
time.sleep(15)
if time.time() - 20 > response_timestamp:
print("the other process is frozen")
sys.exit(os.EX_SOFTWARE)
Then you add this to the other script that you're monitoring:
import signal
import os
# add code here to get the process ID
other_pid = 0
def sig_handler(signum, frame):
os.kill(other_pid, signal.SIGBREAK)
...
...
(rest of your script)
Now be aware that the only thing this will do, is make sure that the process isn't completely frozen. Regrettably, Windows doesn't have a great deal of options when it comes to signals: SIGBREAK was the best one that I saw, but note that it's the signal received by a process when you hit CTRL+C to interrupt the program (so if you manually hit CTRL+C in the window running the Python program, it won't kill it, it will just make it call sig_handler()).
I would also be remiss if I did not inform you that even though this will probably work just fine, it is not safe to do almost anything inside of a signal handler function. It's bad form and may blow up on you unexpectedly, but in practice, it's pretty safe.

bash script launching many processes and blocking computer

I have written a bash script with the aim to run a .py template Python script 15,000 times, each time using a slightly modified version of this .py.
After each run of one .py, the bash script logs what happened into a file.
The bash script, which works on my laptop and computes the 15,000 things.
N_SIM=15000
for ((j = 1; j <= $N_SIM; j++))
do
index=$((j))
a0=$(awk "NR==${index} { print \$1 }" values_a0_in_Xenon_10_20_10_25_Wcm2_RRon.txt)
dirname="a0_${a0}"
mkdir -p $dirname
cd $dirname
awk -v s="a0=${a0}" 'NR==6 {print s} 1 {print}' ../integration.py > integrationa0included.py
mpirun -n 1 python3 integrationa0included.py 2&> integration_Xenon.log &
cd ..
done
It launches processes and the terminal looks like (or something along these lines, the numbers are only there for illustrative purposes, they are not exact):
[1]
[2]
[3]
...
...
[45]
[1]: exit, success (a message along these lines)
[4]: exit, success
[46]
[47]
[48]
[2]: exit, success
...
And the pattern of finishing some launched processes and continuously launching new ones repeats up until the 15,000 processes are launched and completed.
I want to run this on a very old computer.
The problem is that it launches almost instantly 300 such processes and then the computer freezes. It basically crashes. I cannot do CTRL+Z or CTRL+C or type. It's frozen.
I want to ask you if there's a modification to the bash script which launches only 2 processes, waits for 1 to finish, launches the 3rd, waits for the 2nd to finish, launches the 4th, and so on.
So that there aren't so many processes waiting at any given time. Maybe this doesn't block the old computer.

Inside your loop, add the following code to the beginning of the loop body:
# insert this after `do`
[ "$(jobs -pr | wc -l)" -ge 2 ] && wait -n
If there are already two or more background jobs running this waits till at least one of the running jobs terminated.

Issues with python scripts running simultaneously

I have two python scripts that use two different cameras for a project I am working on and I am trying to run them both inside a different script or within each other, either way is fine.
import os
os.system('python 1.py')
os.system('python 2.py')
My problem however is that they don't run at the same time, I have to quit the first one for the next to open. I also tried doing it with bash as well with the & shell operator
python 1.py &
python 2.py &
And this does in fact make them both run however the issue is that they both run endlessly in the background and I need to close them rather easily. Any suggestion what I can do to avoid the issues with these implementations

You could do it with multiprocessing
import os
import time
import psutil
from multiprocessing import Process
def run_program(cmd):
# Function that processes will run
os.system(cmd)
# Initiating Processes with desired arguments
program1 = Process(target=run_program, args=('python 1.py',))
program2 = Process(target=run_program, args=('python 2.py',))
# Start our processes simultaneously
program1.start()
program2.start()
def kill(proc_pid):
process = psutil.Process(proc_pid)
for proc in process.children(recursive=True):
proc.kill()
process.kill()
# Wait 5 seconds and kill first program
time.sleep(5)
kill(program1.pid)
program1.join()
# Wait another 1 second and kill second program
time.sleep(1)
kill(program2.pid)
program2.join()
# Print current status of our programs
print('1.py alive status: {}'.format(program1.is_alive()))
print('2.py alive status: {}'.format(program2.is_alive()))

One possible method is to use systemd to control your process (i.e. treat them as daemons).
This is how I control my Python servers since they need to run in the background and be completely detached from the current tty so I can exit my connection to the machine and the continue processes continue. You can then also stop the server later using systemctl, as explained below.
Instructions:
Create a .service file and save it in /etc/systemd/system, with contents along the lines of:
[Unit]
Description=daemon one
[Service]
ExecStart=/path/to/1.py
and repeat with one going to 2.py.
Then you can use systemctl to control your daemons.
First reload all config files with:
systemctl daemon-reload
then start either of your daemons (where my_daemon.service is one of your unit files):
systemctl start my_daemon
it should now be running and you should find it in:
systemctl list-units
You can also check its status with:
systemctl status my_daemon
and stop/restart them with:
systemctl stop|restart my_daemon

Use subprocess.Popen. This will create a child process and return its pid.
pid = Popen("python 1.py").pid
And then check out these functions for communicating with the child process and checking if it is still running.

gdb.execute blocks all the threads in python scripts

I am scripting GDB with Python 2.7.
I am simply stepping instructions with gdb.execute("stepi"). If the debugged program is idling and waiting for user interaction, gdb.execute("stepi") doesn't return. If there is such a situation, I want to stop the debugging session without terminating gdb.
To do so, I create a thread that will kill the debugged process if the current instruction ran for more than x seconds:
from ctypes import c_ulonglong, c_bool
from os import kill
from threading import Thread
from time import sleep
import signal
# We need mutable primitives in order to update them in the thread
it = c_ulonglong(0) # Instructions counter
program_exited = c_bool(False)
t = Thread(target=check_for_idle, args=(pid,it,program_exited))
t.start()
while not program_exited.value:
gdb.execute("si") # Step instruction
it.value += 1
# Threaded function that will kill the loaded program if it's idling
def check_for_idle(pid, it, program_exited):
delta_max = 0.1 # Max delay between 2 instructions, seconds
while not program_exited.value:
it_prev = c_ulonglong(it.value) # Previous value of instructions counter
sleep(delta_max)
# If previous instruction lasted for more than 'delta_max', kill debugged process
if (it_prev.value == it.value):
# Process pid has been retrieved before
kill(pid, signal.SIGTERM)
program_exited.value = True
print("idle_process_end")
However, gdb.execute is pausing my thread... Is there another way to kill the debugged process if it is idling?

However, gdb.execute is pausing my thread
What is happening here is that gdb.execute does not release Python's global lock when calling into gdb. So, while the gdb command executes, other Python threads are stuck.
This is just an oversight in gdb. I've filed a bug for it.
Is there another way to kill the debugged process if it is idling?
There is one other technique you can try -- I am not certain it will work. Unfortunately this part of gdb is not fully fleshed out (at the present moment); so also feel free to file bug reports.
The main idea is to run gdb commands on the main thread -- but not from Python. So, try writing your stepping loop using the gdb CLI, maybe like:
(gdb) while 1
> stepi
> end
Then your thread should be able to kill the inferior. Another approach might be for your thread to inject a gdb command into the main loop using gdb.post_event.

Parallel processing from a command queue on Linux (bash, python, ruby... whatever)

I have a list/queue of 200 commands that I need to run in a shell on a Linux server.
I only want to have a maximum of 10 processes running (from the queue) at once. Some processes will take a few seconds to complete, other processes will take much longer.
When a process finishes I want the next command to be "popped" from the queue and executed.
Does anyone have code to solve this problem?
Further elaboration:
There's 200 pieces of work that need to be done, in a queue of some sort. I want to have at most 10 pieces of work going on at once. When a thread finishes a piece of work it should ask the queue for the next piece of work. If there's no more work in the queue, the thread should die. When all the threads have died it means all the work has been done.
The actual problem I'm trying to solve is using imapsync to synchronize 200 mailboxes from an old mail server to a new mail server. Some users have large mailboxes and take a long time tto sync, others have very small mailboxes and sync quickly.

On the shell, xargs can be used to queue parallel command processing. For example, for having always 3 sleeps in parallel, sleeping for 1 second each, and executing 10 sleeps in total do
echo {1..10} | xargs -d ' ' -n1 -P3 sh -c 'sleep 1s' _
And it would sleep for 4 seconds in total. If you have a list of names, and want to pass the names to commands executed, again executing 3 commands in parallel, do
cat names | xargs -n1 -P3 process_name
Would execute the command process_name alice, process_name bob and so on.

I would imagine you could do this using make and the make -j xx command.
Perhaps a makefile like this
all : usera userb userc....
usera:
imapsync usera
userb:
imapsync userb
....
make -j 10 -f makefile

Parallel is made exatcly for this purpose.
cat userlist | parallel imapsync
One of the beauties of Parallel compared to other solutions is that it makes sure output is not mixed. Doing traceroute in Parallel works fine for example:
(echo foss.org.my; echo www.debian.org; echo www.freenetproject.org) | parallel traceroute

For this kind of job PPSS is written: Parallel processing shell script. Google for this name and you will find it, I won't linkspam.

GNU make (and perhaps other implementations as well) has the -j argument, which governs how many jobs it will run at once. When a job completes, make will start another one.

Well, if they are largely independent of each other, I'd think in terms of:
Initialize an array of jobs pending (queue, ...) - 200 entries
Initialize an array of jobs running - empty
while (jobs still pending and queue of jobs running still has space)
take a job off the pending queue
launch it in background
if (queue of jobs running is full)
wait for a job to finish
remove from jobs running queue
while (queue of jobs is not empty)
wait for job to finish
remove from jobs running queue
Note that the tail test in the main loop means that if the 'jobs running queue' has space when the while loop iterates - preventing premature termination of the loop. I think the logic is sound.
I can see how to do that in C fairly easily - it wouldn't be all that hard in Perl, either (and therefore not too hard in the other scripting languages - Python, Ruby, Tcl, etc). I'm not at all sure I'd want to do it in shell - the wait command in shell waits for all children to terminate, rather than for some child to terminate.

In python, you could try:
import Queue, os, threading
# synchronised queue
queue = Queue.Queue(0) # 0 means no maximum size
# do stuff to initialise queue with strings
# representing os commands
queue.put('sleep 10')
queue.put('echo Sleeping..')
# etc
# or use python to generate commands, e.g.
# for username in ['joe', 'bob', 'fred']:
# queue.put('imapsync %s' % username)
def go():
while True:
try:
# False here means no blocking: raise exception if queue empty
command = queue.get(False)
# Run command. python also has subprocess module which is more
# featureful but I am not very familiar with it.
# os.system is easy :-)
os.system(command)
except Queue.Empty:
return
for i in range(10): # change this to run more/fewer threads
threading.Thread(target=go).start()
Untested...
(of course, python itself is single-threaded. You should still get the benefit of multiple threads in terms of waiting for IO, though.)

If you are going to use Python, I recommend using Twisted for this.
Specifically Twisted Runner.

https://savannah.gnu.org/projects/parallel (gnu parallel)
and pssh might help.

Python's multiprocessing module would seem to fit your issue nicely. It's a high-level package that supports threading by process.

Simple function in zsh to parallelize jobs in not more than 4 subshells, using lock files in /tmp.
The only non trivial part are the glob flags in the first test:
#q: enable filename globbing in a test
[4]: returns the 4th result only
N: ignore error on empty result
It should be easy to convert it to posix, though it would be a bit more verbose.
Do not forget to escape any quotes in the jobs with \".
#!/bin/zsh
setopt extendedglob
para() {
lock=/tmp/para_$$_$((paracnt++))
# sleep as long as the 4th lock file exists
until [[ -z /tmp/para_$$_*(#q[4]N) ]] { sleep 0.1 }
# Launch the job in a subshell
( touch $lock ; eval $* ; rm $lock ) &
# Wait for subshell start and lock creation
until [[ -f $lock ]] { sleep 0.001 }
}
para "print A0; sleep 1; print Z0"
para "print A1; sleep 2; print Z1"
para "print A2; sleep 3; print Z2"
para "print A3; sleep 4; print Z3"
para "print A4; sleep 3; print Z4"
para "print A5; sleep 2; print Z5"
# wait for all subshells to terminate
wait

Can you elaborate what you mean by in parallel? It sounds like you need to implement some sort of locking in the queue so your entries are not selected twice, etc and the commands run only once.
Most queue systems cheat -- they just write a giant to-do list, then select e.g. ten items, work them, and select the next ten items. There's no parallelization.
If you provide some more details, I'm sure we can help you out.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

python script process stat stay on Sl and stop running - python

I run a python script and need to run a long time，but when I run a few hours it will stop，and I type ps aux the result is: root 10371 0.9 10.4 273236 52232 ? Sl 09:35 6:23 python my_programe.py then I try to use kill -18 10371 to call it , but useless, how can I continue to call it to run again?

The process state S doesn't mean that the process has stopped (and thus trying to use SIGCONT to continue is of course useless), rather it means Interruptible sleep (waiting for an event to complete). You should be able to see how long the process sleeps or what it waits for by using strace -p.

Related

Know if subprocess is not stuck by it's prints to stdout

bash script launching many processes and blocking computer

Issues with python scripts running simultaneously

gdb.execute blocks all the threads in python scripts

Parallel processing from a command queue on Linux (bash, python, ruby... whatever)

Categories

Resources