How to schedule a script to run periodically? - python

I have a task to write a Python script which has to parse a web-page once a week. I wrote the script but do not know how can I make it to work once a week. Could someone share an advice and write possible solution?

Have a look at cron. Its not python, but fits the job much better in my opinion. For example:
#weekly python path/to/your/script
A similar question was discussed here.

Whether the script itself should repeat a task periodically usually depends on how frequently the task should repeat. Once a week is usually better left to a scheduling tool like cron or at.
However, a simple method inside the script is to wrap your main logic in a loop that sleeps until the next desired starting time, then let the script run continuously. (Note that a script cannot reliably restart itself, or showing how to do so is beyond the scope of this question. Prefer an external solution.)
Instead of
def main():
...
if __name__ == '__main__':
main()
use
import tim
one_week = 7 * 24 * 3600 # Seconds in a week
def main():
...
if __name__ == '__main__':
while True:
start = time.time()
main()
stop = time.time()
elapsed = stop - start
time.sleep(one_week - elapsed)

Are you planning to run it locally? Are you working with a virtual environment?
Task scheduler option
If you are running it locally, you can use Task scheduler from Windows. Setting up the task can be a bit tricky I found, so here is an overview:
Open Task Scheduler > Create Task (on right actions menu)
In tab "General" name the task
In tab "Triggers" define your triggers (i.e. when you want to schedule the tasks)
In tab "Actions" press on new > Start a program. Under Program/script point to the location (full path) of your python executable (python.exe). If you are working with a virtual environment it is typically in venv\Scripts\python.exe. The full path would be C:\your_workspace_folder\venv\Scripts\python.exe. Otherwise it will be most likely in your Program Files.
Within the same tab, under Add arguments, enter the full path to your python script. For instance: "C:\your_workspace_folder\main.py" (note that you need the ").
Press Ok and save your task.
Debugging
To test if your schedule works you could right click on the task in Task scheduler and press on Run. However, then you don't see the logs of what is happening. I recommend therefore to open a terminal (eg cmd) and type the following:
C:\your_workspace_folder\venv\Scripts\python.exe "C:\your_workspace_folder\main.py"
This allows you to see the full trace of your code and if its running properly. Typical errors that occur are related to file paths (eg if you are not using the full path but a relative path).
Sleeping mode
It can happen that some of the tasks do not run because you don't have administrator privileges and your computer goes in sleeping mode. What I found as a workaround is to keep the computer from going into sleeping mode using a .vbs script. Simply open notepad and create a new file named idle.vbs (extension should be .vbs so make sure you select all programs). In there paste the following code:
Dim objResult
Set objShell = WScript.CreateObject("WScript.Shell")
Do While True
objResult = objShell.sendkeys("{NUMLOCK}{NUMLOCK}")
Wscript.Sleep (60000)
Loop

Related

How to run a python script after an x amount of time it has just finished?

I want to be able to run my python script which scans for something related to cryptocurrencies 1 minute after the same script has just been complete. Is this possible? Or could I maybe set a limit before the loop repeats itself or something?
The code I have isn't something I am willing to share due to its sensitive nature. It is a trading bot.
You have a few solutions:
Use a cron job if you are on a unix like platform (you can use this for the syntax, or man cron in the terminal to learn more about it)
Create a long running python script that sleeps for one minute before executing your logic again. Something like this:
import time
while True:
# execute code here
time.sleep(60)
If you are running it on Windows platform ,
You can create a batch file to run your script in cmd using the following command:
start "" "path_to_python.exe" "path_to_python_script"
then create a task in windows task scheduler .
You can refer
https://dev.to/tharindadilshan/running-a-python-script-every-x-minutes-in-windows-10-3nm9 . It might helps.

best practice to deploy a python script

I have a script of a bot deployed on azure that has to be always running. it's a python bot that tracks Twitter mentions in real time by opening a stream listener.
The script fails every once in a while for reasons not directly related to the script (timeouts, connection errors, etc). After searching for answers around here I found this piece of code as the best workaround for restarting the script every time it fails.
#!/usr/bin/env python3.7
import os
def run_bot():
while True:
try:
os.system("test_bot.py start")
except:
pass
if __name__ == "__main__":
run_bot()
I am logging all error messages to learn the reasons why it fails but I think there just be a better way to achieve the same, I would very much appreciate some hints.
So this is wrong way to run a script, you are running it in while loop forever.
A better way is either to schedule your main script in a cron job: Execute Python script via crontab
You can schedule this job to run every min, or hour or a specific time of the day, up to you.
If you wish to run something always, like a system monitor. Then you can run that part inside a while True loop that is fine. Like a loop which checks temperature every 5 secs writes to a file and sleep for 5 secs.
sample sudo code for the script: prog.py
while True:
log_temp()
time.sleep(5secs)
But if the script fails then schedule something to restart the script. Dont start the script inside another while loop.
Something like this: https://unix.stackexchange.com/questions/107939/how-to-restart-the-python-script-automatically-if-it-is-killed-or-dies

Python restart file on windows or linux

I currently have a big long file that queries a webpage to build a dictionary. I'd like this file to restart at 4am every day as the webpage will have updated with fresh info. What do I need to put inside my while True: loop?
Current status:
##Various Imports
##Selenium code to get details
##Dictionary Compile
while True:
now = datetime.datetime.now()
current_time = now.strftime("%H:%M:%S")
if current_time == ("04:00:00"):
##The Code to Restart the process goes here
else:
#Other Stuff happens with the dictionary
Building and testing on windows but will ultimately run on a Raspberry Pi
I would suggest you make the script simply get the information from the web page then do whatever it needs to do with your dictionary and end. Only one time. Then you can schedule this script to run at 4:00 AM every day with Windows Task Scheduler or with a Cron Job on linux. Here is a link on how to set up a cron job to run a python script.
If you want the functionality of a cron job in Python you could use the schedule library. It looks like it has support to keep the script running and then restart at 4AM. More can be read on this StackOverflow Question.

Methods to schedule a task prior to runtime

What are the best methods to set a .py file to run at one specific time in the future? Ideally, its like to do everything within a single script.
Details: I often travel for business so I built a program to automatically check me in to my flights 24 hours prior to takeoff so I can board earlier. I currently am editing my script to input my confirmation number and then setting up cron jobs to run said script at the specified time. Is there a better way to do this?
Options I know of:
• current method
• put code in the script to delay until x time. Run the script immediately after booking the flight and it would stay open until the specified time, then check me in and close. This would prevent me from shutting down my computer, though, and my machine is prone to overheating.
Ideal method: input my confirmation number & flight date, run the script, have it set up whatever cron automatically, be done with it. I want to make sure whatever method I use doesn't include keeping a script open and running in the background.
cron is best for jobs that you want to repeat periodically. For one-time jobs, use at or batch.

Python ( or maybe linux in general) file operation flow control or file lock

I am using a cluster of computers to do some parallel computation. My home directory is shared across the cluster. In one machine, I have a ruby code that creates bash script containing computation command and write the script to, say, ~/q/ directory. The scripts are named *.worker1.sh, *.worker2.sh, etc.
On other 20 machines, I have 20 python code running ( one at each machine ) that (constantly) check the ~/q/ directory and look for jobs that belong to that machine, using a python code like this:
jobs = glob.glob('q/*.worker1.sh')
[os.system('sh ' + job + ' &') for job in jobs]
For some additional control, the ruby code will create a empty file like workeri.start (i = 1..20) at q directory after it write the bash script to q directory, the python code will check for that 'start' file before it runs the above code. And in the bash script, if the command finishes successfully, the bash script will create an empty file like 'workeri.sccuess', the python code checks this file after it runs the above code to make sure the computation finishs successfully. If python finds out that the computation finishs successfully, it will remove the 'start' file in q directory, so the ruby code knows that job finishs successfully. After the 20 bash script all finished, the ruby code will create new bash script and python read and executes new scripts and so on.
I know this is not a elegant way to coordinate the computation, but I haven't figured out a better to communicate between different machines.
Now the question is: I expect that the 20 jobs will run somewhat in parallel. The total time to finish the 20 jobs will not be much longer than the time to finish one job. However, it seems that these jobs runs sequentially and time is much longer than I expected.
I suspect that part of the reason is that multiple codes are reading and writing the same directory at once but the linux system or python locks the directory and only allow one process to oprate the directory. This makes the code execute one at a time.
I am not sure if this is the case. If I split the bash scripts to different directories, and let the python code on different machines read and write different directories, will that solve the problem? Or is there any other reasons that cause the problem?
Thanks a lot for any suggestions! Let me know if I didn't explain anything clearly.
Some additional info:
my home directory is at /home/my_group/my_home, here is the mount info for it
:/vol/my_group on /home/my_group type nfs (rw,nosuid,nodev,noatime,tcp,timeo=600,retrans=2,rsize=65536,wsize=65536,addr=...)
I say constantly check the q directory, meaning a python loop like this:
While True:
if 'start' file exists:
find the scripts and execute them as I mentioned above
I know this is not a elegant way to coordinate the computation, but I
haven't figured out a better to communicate between different
machines.
While this isn't directly what you asked, you should really, really consider fixing your problem at this level, using some sort of shared message queue is likely to be a lot simpler to manage and debug than relying on the locking semantics of a particular networked filesystem.
The simplest solution to set up and run in my experience is redis on the machine currently running the Ruby script that creates the jobs. It should literally be as simple as downloading the source, compiling it and starting it up. Once the redis server is up and running, you change your code to append your the computation commands to one or more Redis lists. In ruby you would use the redis-rb library like this:
require "redis"
redis = Redis.new
# Your other code to build up command lists...
redis.lpush 'commands', command1, command2...
If the computations need to be handled by certain machines, use a list per-machine like this:
redis.lpush 'jobs:machine1', command1
# etc.
Then in your Python code, you can use redis-py to connect to the Redis server and pull jobs off the list like so:
from redis import Redis
r = Redis(host="hostname-of-machine-running-redis")
while r.llen('jobs:machine1'):
job = r.lpop('commands:machine1')
os.system('sh ' + job + ' &')
Of course, you could just as easily pull jobs off the queue and execute them in Ruby:
require 'redis'
redis = Redis.new(:host => 'hostname-of-machine-running-redis')
while redis.llen('jobs:machine1')
job = redis.lpop('commands:machine1')
`sh #{job} &`
end
With some more details about the needs of the computation and the environment it's running in, it would be possible to recommend even simpler approaches to managing it.
Try a while loop? If that doesn't work, on the python side try using a TRY statement like so:
Try:
with open("myfile.whatever", "r") as f:
f.read()
except:
(do something if it doesnt work, perhaps a PASS? (must be in a loop to constantly check this)
else:
execute your code if successful

Categories

Resources