EDIT: I found the issue. It was a problem with PyCharm. I ran the .py outside of PyCharm and it worked as expected. In PyCharm I enabled "Emulate terminal in output console" and it now also works there...
Expectations:
Apscheduler spawns a thread that checks a website for something.
If the something was found (or multiple of it), the thread spawns (multiple) processes to download it/them.
After five seconds the next check thread spawns. While the other downloads may continue in the background.
Problem:
The spawned processes never stop to exist, which makes other parts of the code (not included) not work, because I need to check if the processes are done etc.
If I use a simple time.sleep(5) instead (see code), it works as expected.
No I cannot set max_instances to 1 because this will stop the scheduled job from running if there is one active download process.
Code:
import datetime
import multiprocessing
from apscheduler.schedulers.background import BackgroundScheduler
class DownloadThread(multiprocessing.Process):
def __init__(self):
super().__init__()
print("Process started")
def main():
print(multiprocessing.active_children())
# prints: [<DownloadThread name='DownloadThread-1' pid=3188 parent=7088 started daemon>,
# <DownloadThread name='DownloadThread-3' pid=12228 parent=7088 started daemon>,
# <DownloadThread name='DownloadThread-2' pid=13544 parent=7088 started daemon>
# ...
# ]
new_process = DownloadThread()
new_process.daemon = True
new_process.start()
new_process.join()
if __name__ == '__main__':
sched = BackgroundScheduler()
sched.add_job(main, 'interval', args=(), seconds=5, max_instances=999, next_run_time=datetime.datetime.now())
sched.start()
while True:
# main() # works. Processes despawn.
# time.sleep(5)
input()
Related
I have a script that I need to be running 24/7. However, I cannot seem to get EC2 to stop killing the process
I've tried daemonizing it with python-daemon,
I've tried nohup,
I've tried adding & at the end of the command to make it a background process,
I've tried screen to assign it to a virtual session.
All of these work temporarily but when I check it an hour later with ps aux | grep python, it's no longer there/no longer running
I've looked at the output and the nohup.out file to see if it's crashing because of an error but no error/output
I've used signal handlers:
signal.signal(signal.SIGINT, exit_gracefully)
signal.signal(signal.SIGTERM, exit_gracefully)
Still nothing. I suspected that there may be an error that is escaping my tests but I ran it in an open console for an hour and it worked perfectly so there must be something unexpected happening unassociated to my code.
An excerpt:
import signal
import time
from daemon import DaemonContext
import schedule as schedule
global exit_now
def exit_gracefully(*args):
global exit_now
exit_now = True
def run_server():
global exit_now
schedule.every(3).seconds.do(lambda: a_function(param1, param2))
schedule.every(3).hours.do(lambda: another_function(param1, param2))
schedule.every(10).minutes.do(function_with_no_params)
signal.signal(signal.SIGINT, exit_gracefully)
signal.signal(signal.SIGTERM, exit_gracefully)
exit_now = False
while not exit_now:
schedule.run_pending()
time.sleep(1)
my_model.backup()
print("Processes successfully stopped")
if __name__ == "__main__":
with DaemonContext():
run_server()
Edit: I tried adding a log file to my daemon and to my nohup to catch and print whatever's breaking it but upon exit, nothing. No output at all. Additionally, I tried disowning the background process but that didn't work.
The OS is Amazon Linux. Here's a link to the codebase in case you would like to reproduce it: https://github.com/DavidTeju/Tweet-Generator
I have a script that runs every 5 minutes and performs some actions (check for products back in stock and notify me when they are).
I only want a single instance of apscheduler running because I do not want a website being checked multiple times in a 5 minute window.
Here is my code:
from apscheduler.schedulers.background import BackgroundScheduler
sched = BackgroundScheduler()
def check1():
requests.get("https://somewebsite.com/product-i-want")
# check if item in stock
# notify me
def check2():
requests.get("https://someotherwebsite.com/another-product-i-want")
# check if item in stock
# notify me
def main():
# Schedule jobs to run every 5min
sched.add_job(check1, 'interval', minutes=5, max_instances=1)
sched.add_job(check2, 'interval', minutes=5, max_instances=1)
# Also run jobs on start
for job in sched.get_jobs():
job.modify(next_run_time=datetime.now())
# Start jobs
sched.start()
# Keep-alive
try:
while True:
time.sleep(2)
except (KeyboardInterrupt, SystemExit):
sched.shutdown()
if __name__ == '__main__':
main()
And then I have a shell script that is run:
screen -X -S scrape quit # quit screen with name 'scrape'
screen -dmS scrape python3 Scrapers.py # create screen with name 'scrape' to run python script
I am constantly adding jobs to this script so I have a cronjob that calls the above shell script every hour to kill the current running script and restart it.
But having a cronjob call a script to refresh this python script is a little counterintuitive. My original thought was to give the job an id but it seems like sched.get_jobs() returns empty after you run sched.start().
Is my understanding of BackgroundScheduler completely incorrect? Is there a better way to achieve running only a single instance of certain apscheduler jobs even if the script crashes? I am using apscheduler V3.9.1 (latest).
I have long running process, that I want to keep track about in which state it currently is in. There is N processes running in same time therefore multiprocessing issue.
I pass Queue into process to report messages about state, and this Queue is then read(if not empty) in thread every couple of second.
I'm using Spider on windows as environment and later described behavior is in its console. I did not try it in different env.
from multiprocessing import Process,Queue,Lock
import time
def test(process_msg: Queue):
try:
process_msg.put('Inside process message')
# process...
return # to have exitstate = 0
except Exception as e:
process_msg.put(e)
callback_msg = Queue()
if __name__ == '__main__':
p = Process(target = test,
args = (callback_msg,))
p.start()
time.sleep(5)
print(p)
while not callback_msg.empty():
msg = callback_msg.get()
if type(msg) != Exception:
tqdm.write(str(msg))
else:
raise msg
Problem is that whatever I do with code, it never reads what is inside the Queue(also because it never puts anything in it). Only when I switch to dummy version, which runs similary to threading on only 1 CPU from multiprocessing.dummy import Process,Queue,Lock
Apparently the test function have to be in separate file.
I use Tornado as a web server, user can submit a task through the front end page, after auditing they can start the submitted task. In this situation, i want to start an asynchronous sub process to handle the task, so i write the following code in an request handler:
def task_handler():
// handle task here
def start_a_process_for_task():
p = multiprocessing.Process(target=task_handler,args=())
p.start()
return 0
I don't care about the sub process and just start a process for it and return to the front end page and tell user the task is started. The task itself will run in the background and will record it's status or results to database so user
can view on the web page later. So here i don't want to use p.join() which is blocking, but without p.join() after the task finished,the sub process becomes a defunct process and as Tornado runs as a daemon and never exits, the defunct process will never disappear.
Anyone knows how to fix this problem, thanks.
The proper way to avoid defunct children is for the parent to gracefully clean up and close all resources of the exited child. This is normally done by join(), but if you want to avoid that, another approach could be to set up a global handler for the SIGCHLD signal on the parent.
SIGCHLD will be emitted whenever a child exits, and in the handler function you should either call Process.join() if you still have access to the process object, or even use os.wait() to "wait" for any child process to terminate and properly reap it. The wait time here should be 0 as you know for sure a child process has just exited. You will also be able to get the process' exit code / termination signal so it can also be a useful method to handle / log child process crashes.
Here's a quick example of doing this:
from __future__ import print_function
import os
import signal
import time
from multiprocessing import Process
def child_exited(sig, frame):
pid, exitcode = os.wait()
print("Child process {pid} exited with code {exitcode}".format(
pid=pid, exitcode=exitcode
))
def worker():
time.sleep(5)
print("Process {pid} has completed it's work".format(pid=os.getpid()))
def parent():
children = []
# Comment out the following line to see zombie children
signal.signal(signal.SIGCHLD, child_exited)
for i in range(5):
c = Process(target=worker)
c.start()
print("Parent forked out worker process {pid}".format(pid=c.pid))
children.append(c)
time.sleep(1)
print("Forked out {c} workers, hit Ctrl+C to end...".format(c=len(children)))
while True:
time.sleep(5)
if __name__ == '__main__':
parent()
One caveat is that I am not sure if this process works on non-Unix operating systems. It should work on Linux, Mac and other Unixes.
You need to join your subprocesses if you do not want to create zombies. You can do it in threads.
This is a dummy example. After 10 seconds, all your subprocesses are gone instead of being zombies. This launches a thread for every subprocess. Threads do not need to be joined or waited. A thread executes subprocess, joins it and then exits the thread as soon as the subprocess is completed.
import multiprocessing
import threading
from time import sleep
def task_processor():
sleep(10)
class TaskProxy(threading.Thread):
def __init__(self):
super(TaskProxy, self).__init__()
def run(self):
p = multiprocessing.Process(target=task_processor,args=())
p.start()
p.join()
def task_handler():
t = TaskProxy()
t.daemon = True
t.start()
return
for _ in xrange(0,20):
task_handler()
sleep(60)
I am using python 2.7 and Python thread doesn't kill its process after the main program exits. (checking this with the ps -ax command on ubuntu machine)
I have the below thread class,
import os
import threading
class captureLogs(threading.Thread):
'''
initialize the constructor
'''
def __init__(self, deviceIp, fileTag):
threading.Thread.__init__(self)
super(captureLogs, self).__init__()
self._stop = threading.Event()
self.deviceIp = deviceIp
self.fileTag = fileTag
def stop(self):
self._stop.set()
def stopped(self):
return self._stop.isSet()
'''
define the run method
'''
def run(self):
'''
Make the thread capture logs
'''
cmdTorun = "adb logcat > " + self.deviceIp +'_'+self.fileTag+'.log'
os.system(cmdTorun)
And I am creating a thread in another file sample.py,
import logCapture
import os
import time
c = logCapture.captureLogs('100.21.143.168','somefile')
c.setDaemon(True)
c.start()
print "Started the log capture. now sleeping. is this a dameon?", c.isDaemon()
time.sleep(5)
print "Sleep tiime is over"
c.stop()
print "Calling stop was successful:", c.stopped()
print "Thread is now completed and main program exiting"
I get the below output from the command line:
Started the log capture. now sleeping. is this a dameon? True
Sleep tiime is over
Calling stop was successful: True
Thread is now completed and main program exiting
And the sample.py exits.
But when I use below command on a terminal,
ps -ax | grep "adb"
I still see the process running. (I am killing them manually now using the kill -9 17681 17682)
Not sure what I am missing here.
My question is,
1) why is the process still alive when I already killed it in my program?
2) Will it create any problem if I don't bother about it?
3) is there any other better way to capture logs using a thread and monitor the logs?
EDIT: As suggested by #bug Killer, I added the below method in my thread class,
def getProcessID(self):
return os.getpid()
and used os.kill(c.getProcessID(), SIGTERM) in my sample.py . The program doesn't exit at all.
It is likely because you are using os.system in your thread. The spawned process from os.system will stay alive even after the thread is killed. Actually, it will stay alive forever unless you explicitly terminate it in your code or by hand (which it sounds like you are doing ultimately) or the spawned process exits on its own. You can do this instead:
import atexit
import subprocess
deviceIp = '100.21.143.168'
fileTag = 'somefile'
# this is spawned in the background, so no threading code is needed
cmdTorun = "adb logcat > " + deviceIp +'_'+fileTag+'.log'
proc = subprocess.Popen(cmdTorun, shell=True)
# or register proc.kill if you feel like living on the edge
atexit.register(proc.terminate)
# Here is where all the other awesome code goes
Since all you are doing is spawning a process, creating a thread to do it is overkill and only complicates your program logic. Just spawn the process in the background as shown above and then let atexit terminate it when your program exits. And/or call proc.terminate explicitly; it should be fine to call repeatedly (much like close on a file object) so having atexit call it again later shouldn't hurt anything.