I have a script that runs every 5 minutes and performs some actions (check for products back in stock and notify me when they are).
I only want a single instance of apscheduler running because I do not want a website being checked multiple times in a 5 minute window.
Here is my code:
from apscheduler.schedulers.background import BackgroundScheduler
sched = BackgroundScheduler()
def check1():
requests.get("https://somewebsite.com/product-i-want")
# check if item in stock
# notify me
def check2():
requests.get("https://someotherwebsite.com/another-product-i-want")
# check if item in stock
# notify me
def main():
# Schedule jobs to run every 5min
sched.add_job(check1, 'interval', minutes=5, max_instances=1)
sched.add_job(check2, 'interval', minutes=5, max_instances=1)
# Also run jobs on start
for job in sched.get_jobs():
job.modify(next_run_time=datetime.now())
# Start jobs
sched.start()
# Keep-alive
try:
while True:
time.sleep(2)
except (KeyboardInterrupt, SystemExit):
sched.shutdown()
if __name__ == '__main__':
main()
And then I have a shell script that is run:
screen -X -S scrape quit # quit screen with name 'scrape'
screen -dmS scrape python3 Scrapers.py # create screen with name 'scrape' to run python script
I am constantly adding jobs to this script so I have a cronjob that calls the above shell script every hour to kill the current running script and restart it.
But having a cronjob call a script to refresh this python script is a little counterintuitive. My original thought was to give the job an id but it seems like sched.get_jobs() returns empty after you run sched.start().
Is my understanding of BackgroundScheduler completely incorrect? Is there a better way to achieve running only a single instance of certain apscheduler jobs even if the script crashes? I am using apscheduler V3.9.1 (latest).
Related
I have a script that I need to be running 24/7. However, I cannot seem to get EC2 to stop killing the process
I've tried daemonizing it with python-daemon,
I've tried nohup,
I've tried adding & at the end of the command to make it a background process,
I've tried screen to assign it to a virtual session.
All of these work temporarily but when I check it an hour later with ps aux | grep python, it's no longer there/no longer running
I've looked at the output and the nohup.out file to see if it's crashing because of an error but no error/output
I've used signal handlers:
signal.signal(signal.SIGINT, exit_gracefully)
signal.signal(signal.SIGTERM, exit_gracefully)
Still nothing. I suspected that there may be an error that is escaping my tests but I ran it in an open console for an hour and it worked perfectly so there must be something unexpected happening unassociated to my code.
An excerpt:
import signal
import time
from daemon import DaemonContext
import schedule as schedule
global exit_now
def exit_gracefully(*args):
global exit_now
exit_now = True
def run_server():
global exit_now
schedule.every(3).seconds.do(lambda: a_function(param1, param2))
schedule.every(3).hours.do(lambda: another_function(param1, param2))
schedule.every(10).minutes.do(function_with_no_params)
signal.signal(signal.SIGINT, exit_gracefully)
signal.signal(signal.SIGTERM, exit_gracefully)
exit_now = False
while not exit_now:
schedule.run_pending()
time.sleep(1)
my_model.backup()
print("Processes successfully stopped")
if __name__ == "__main__":
with DaemonContext():
run_server()
Edit: I tried adding a log file to my daemon and to my nohup to catch and print whatever's breaking it but upon exit, nothing. No output at all. Additionally, I tried disowning the background process but that didn't work.
The OS is Amazon Linux. Here's a link to the codebase in case you would like to reproduce it: https://github.com/DavidTeju/Tweet-Generator
EDIT: I found the issue. It was a problem with PyCharm. I ran the .py outside of PyCharm and it worked as expected. In PyCharm I enabled "Emulate terminal in output console" and it now also works there...
Expectations:
Apscheduler spawns a thread that checks a website for something.
If the something was found (or multiple of it), the thread spawns (multiple) processes to download it/them.
After five seconds the next check thread spawns. While the other downloads may continue in the background.
Problem:
The spawned processes never stop to exist, which makes other parts of the code (not included) not work, because I need to check if the processes are done etc.
If I use a simple time.sleep(5) instead (see code), it works as expected.
No I cannot set max_instances to 1 because this will stop the scheduled job from running if there is one active download process.
Code:
import datetime
import multiprocessing
from apscheduler.schedulers.background import BackgroundScheduler
class DownloadThread(multiprocessing.Process):
def __init__(self):
super().__init__()
print("Process started")
def main():
print(multiprocessing.active_children())
# prints: [<DownloadThread name='DownloadThread-1' pid=3188 parent=7088 started daemon>,
# <DownloadThread name='DownloadThread-3' pid=12228 parent=7088 started daemon>,
# <DownloadThread name='DownloadThread-2' pid=13544 parent=7088 started daemon>
# ...
# ]
new_process = DownloadThread()
new_process.daemon = True
new_process.start()
new_process.join()
if __name__ == '__main__':
sched = BackgroundScheduler()
sched.add_job(main, 'interval', args=(), seconds=5, max_instances=999, next_run_time=datetime.datetime.now())
sched.start()
while True:
# main() # works. Processes despawn.
# time.sleep(5)
input()
Right now, I'm trying to make a graphing website that tracks the progress (XP) of Discord users with a bot called MEE6, here is my repl. Right now, I'm using Threading to create two separate threads - one for a web server, and one containing a while loop with the function inside:
def func():
while True:
backend.get_details()
time.sleep(86400)
This should make the function run every 24 hours, but as evidenced by the time stamps in the database:
"05-November-2021 00:02:58": 2106855,
"05-November-2021 00:52:48": 2106855,
"05-November-2021 01:23:21": 2106855,
"05-November-2021 03:48:13": 2106874,
"05-November-2021 07:13:40": 2106874
It is not. How can I fix this?
Here is my threading code:
def keep_alive():
server = Thread(target=run)
data = Thread(target=func)
server.start()
data.start()
def run():
app.run(host="0.0.0.0", port=8000)
if __name__ == '__main__':
# s.run()
# os.system('cls')
keep_alive()
# print('i')
Have you tried fixing it using the schedule package? For an example see this post:
Python script to do something at the same time every day
For running a scheduler in the background (i.e. while running an app) see this excellent post:
How to schedule a function to run every hour on Flask?
From different thread/interface my class getting work,my class has to process the work with configured delay time.
def getJob(job):
work = self._getNextWorkToRun(job)
if work is None:
return {}
#proceed to do work
job sends by different package to this class. I wanted to call _getNextWorkToRun() method every five minutes once only. but the job comes every seconds/less than seconds. So I have to wait until 5 minutes to call _getNextWorkToRun() once again with new job. Every job has reference (JOB1,JOB2...etc.,) and all the jobs have to complete with the delay of 5 mins.
What is the best way to achieve this.
below is an example of using threads, jobs will be added anytime to job queue from any other function and a get_job() function will run continuously to monitor jobs and process them on fixed interval until get a stop flag
from threading import Thread
from queue import Queue
import time
from random import random
jobs = Queue() # queue safely used between threads to pass jobs
run_flag = True
def job_feeder():
for i in range(10):
# adding a job to jobs queue, job could be anything, here we just add a string for simplicity
jobs.put(f'job-{i}')
print(f'adding job-{i}')
time.sleep(random()) # simulate adding jobs randomly
print('job_feeder() finished')
def get_job():
while run_flag:
if jobs.qsize(): # check if there is any jobs in queue first
job = jobs.get() # getting the job
print(f'executing {job}')
time.sleep(3)
print('get_job finished')
t1 = Thread(target=job_feeder)
t2 = Thread(target=get_job)
t1.start()
t2.start()
# we can make get_job() thread quit anytime by setting run_flag
time.sleep(20)
run_flag = False
# waiting for threads to quit
t1.join()
t2.join()
print('all clear')
output:
adding job-0
executing job-0
adding job-1
adding job-2
adding job-3
adding job-4
adding job-5
adding job-6
adding job-7
executing job-1
adding job-8
adding job-9
job_feeder() finished
executing job-2
executing job-3
executing job-4
executing job-5
executing job-6
get_job finished
all clear
note get_job() processed only 6 jobs because we send quit signal after 20 seconds
I want to make Python apscheduler run in background , here is my code:
from apscheduler.schedulers.background import BackgroundScheduler, BlockingScheduler
from datetime import datetime
import logging
import sys
logging.basicConfig(level=logging.DEBUG, stream=sys.stdout)
def singleton(cls, *args, **kw):
instances = {}
def _singleton(*args, **kw):
if cls not in instances:
instances[cls] = cls(*args, **kw)
return instances[cls]
return _singleton
#singleton
class MyScheduler(BackgroundScheduler):
pass
def simple_task(timestamp):
logging.info("RUNNING simple_task: %s" % timestamp)
scheduler = MyScheduler()
scheduler.start()
scheduler.add_job(simple_task, 'interval', seconds=5, args=[datetime.utcnow()])
when I run the command:
look:Python look$ python itger.py
I just got this:
INFO:apscheduler.scheduler:Scheduler started
DEBUG:apscheduler.scheduler:Looking for jobs to run
DEBUG:apscheduler.scheduler:No jobs; waiting until a job is added
INFO:apscheduler.scheduler:Added job "simple_task" to job store "default"
And ps:
ps -e | grep python
I just got 54615 ttys000 0:00.00 grep python
My Problem is how to set the code run in background and I can see it's running or it's print log for every 5 secs so the code show?
BackgroundScheduler runs in a background thread which in this case, I guess, doesn't prevent the application main threat to terminate.
Try add at the end of your application:
import time
print("Waiting to exit")
while True:
time.sleep(1)
... and then terminate your application with CTRL+C.
Threads are no magical way of making code run & managed by the OS. They are local to your process only, and so if that process terminates or dies unexpectantly, so does your Thread.
So the answer to your question is: don't use threads, write your program in a normal fashion so you can invoke it on your commandline, and then use a OS-based scheduler such as CRON to schedule it.
Alternatively, if your program needs to run continuously because it e.g. builds up caches that are expensive to re-compute every 5 minutes, use a process-observer such as supervisord to ensure even after a reboot or crash the program continues executing.