Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 months ago.
Improve this question
Good day.
I have a question about the correct way of implemting code that needs to run every 5 minutes.
Is it better to:
A - Inside the code have a timeloop that starts after 5 minutes, and
executes.
B - Have a script that runs every 5 minutes and executes your
application.
C - Other?
BG: This will be running on a windows server 2022, to send mail every 5 minutes if certain condations where met.
Thank you.
B.) The script is named Windows Task Scheduler and comes with permission management etc.. A Windows server admin can tell you about it.
Why?
Your app might have memory leaks (well, Python not so much) and it runs more stable when it's restarted every time.
An app that sleeps still uses memory, which may be swapped to disk and read back when it awakes. If the app terminates, the memory will be freed and not be swapped to disk.
Your app may crash and no longer do what you expect to be done at every interval
The user may (accidentally?) terminate your app with the same effect
Why not / when not?
If the time to initialize the app (e.g. reading data from database or disk) takes a long time (especially longer than the sleep time).
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
The community reviewed whether to reopen this question last month and left it closed:
Duplicate This question has been answered, is not unique, and doesn’t differentiate itself from another question.
Improve this question
I want to run a script, or at least the block of code in it every 12 hours. How bad/ how much resources will I be wasting by using:
while True:
My Code
time.sleep(43200)
Is there an easier and more efficient way to accomplish this?
I'd recommend using apscheduler if you need to run the code once in an hour (or more):
from apscheduler.schedulers.blocking import BlockingScheduler
def main():
Do something
scheduler = BlockingScheduler()
scheduler.add_job(main, "interval", hours=12)
scheduler.start()
apscheduler provides more controlled and consistent timing of when the operation will be executed. For example, if you want to run something every 12 hours but the processing takes 11 hours, then a sleep based approach would end up executing every 23 hours (11 hours running + 12 hours sleeping).
this timing is not accurate as it only count time when cpu is sheduled on this process
at least you can check target time is arrived every several second
and this is not a good solution as your process is less reliable than system cron. your process may hang due to unknown bugs and on system high cpu/mem utilization
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 days ago.
Improve this question
How to transfer session to another compute node with python in the following case?
case 1: If using kubernete,
case 2: Or using autoscale,
case 3: if using Amazon,
How to transfer session to another compute node with python?
So that program can run forever
Nope, none of those things can transfer a process with all of its in-memory and on-disk state across hosts.
If you’re looking at Kubernetes already, I’d encourage you to design your application so that it doesn’t have any local state. Everything it knows about lives in a database that’s maintained separately (if you’re into AWS, it could be an RDS hosted database or something else). Then you can easily run multiple copies of it (maybe multiple replicas in a Kubernetes ReplicaSet or Deployment) and easily kill one off to restart it somewhere else.
One of the high-end virtualization solutions might be able to do what you’re asking, but keeping a program running forever forever is pretty hard, particularly in a scripting language like Python. (How do you update the program? How do you update the underlying OS, when it needs to reboot to take a kernel update?)
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have a Python program which is running in a loop and downloading 20k RSS feeds using feedparser and inserting feed data into RDBMS.
I have observed that it starts from 20-30 feeds a min and gradually slows down. After couple of hours it comes down to 4-5 feeds an hour. If I kill the program and restart from where it left, again the throughput is 20-30 feeds a min.
It certainly is not MySQL which is slowing down.
What could be potential issues with the program?
In all likelihood the issue is to do with memory. You are probably holding the feeds in memory or somehow accumulating memory that isn't getting garbage collected. To diagnose:
Look at the size of your task (task manager if windows and top if unix/Linux) and monitor it as it grows with the feeds.
Then you can use a memory profiler to figure what exactly is consuming the memory
Once you have found that you can code differently maybe
A few tips:
Do an explicit garbage collection call (gc.collect()) after setting any relevant unused data structures to empty
Use a multiprocessing scheme where you spawn multiple processes that each handle a smaller number of feeds
Maybe go on a 64 bit system if you are using a 32 bit
Some suggestions on memory profiler:
https://pypi.python.org/pypi/memory_profiler
This one is quite good and the decorators are helpful
https://stackoverflow.com/a/110826/559095
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
leave out of cron command in unix.
Situations are bellow:
Users in total, about one million right now, and will grow to about 3 million in one year;
Timing tasks type: notification, calculation, upload data and so on;
Timing interval: from several minutes to one month;
Different tasks may have different logic and parameters;
requirements are bellow:
Better if can get it done in python, for the server code is Python;
The timing tolerance can be within 5 seconds, say if a task should be executed at 2015-01-01T00:00:00, it's ok to get it done from 2014-12-31T12:59:55 to 2015-01-01T00:00:05;
Log details for each task for each user, can debug in the future;
Can persist the task details info, for the server maybe down for some reasons;
If the server is down, can restart the tasks after re-firing up the server;
thanks a lot.
You could check fantail
You can create multiple Fantails to accommodate your different requirements and also see the Pickers
var sch = new Fantail({
debug: false, // Expose queues and handlers.
throttle: 200, // Run handlers (at most once) every 200 milliseconds
immediate: false // .start() immediately.
});
The schedule module is what you are looking for :
import schedule
import time
def foo():
print "Hello world !"
schedule.every(1).minutes.do(foo)
# You can do the following in another thread.
while True:
schedule.run_pending()
time.sleep(1)
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I've got a very nice machine to play with over at Azure. It's got 16 cores and memory up the wazoo.
Running on it is an app I wrote that does a LOT of crunching. Basically dividing up about 100,000 text documents into ngrams and creating a document index.
I recently moved this app over from a pretty small AWS instance with about 1/20th of the processing power. I couldn't even do 40,000 records without running out of memory. It took about 30 minutes to index 30,000 records.
So now, even with all that processing power, I'm still sitting here waiting 30 minutes to crunch 30,000 records. Is it just the nature of this type of process? Or am I not really taking advantage of my resources properly?
EDIT (THE CODE EXPLANATION):
The part of the app taking the most time is looping through NLTK library looking for named entities within the text of each document. I am running a loop of the 100k documents through a process very similar to this example:
https://gist.github.com/onyxfish/322906
Some stats:
Windows Azure VM
Python 2.7 (32 bit) (Enthought Canopy Environment)
Numpy 1.7.0
Stats:
If your process takes 0.3% of CPU time and takes a long time to execute, it clearly isn't CPU-bound.
If I had to guess based on the limited information provided, I'd guess that the code is I/O-bound. Write a little program that simply reads the 100,000 files and time it in the exact same execution environment. If that too is slow, you might want to consider merging the many files into few; it should improve things considerably.