I'm trying out threads in python. I want a spinning cursor to display while another method runs (for 5-10 mins). I've done out some code but am wondering is this how you would do it? i don't like to use globals, so I assume there is a better way?
c = True
def b():
for j in itertools.cycle('/-\|'):
if (c == True):
sys.stdout.write(j)
sys.stdout.flush()
time.sleep(0.1)
sys.stdout.write('\b')
else:
return
def a():
global c
#code does stuff here for 5-10 minutes
#simulate with sleep
time.sleep(2)
c = False
Thread(target = a).start()
Thread(target = b).start()
EDIT:
Another issue now is that when the processing ends the last element of the spinning cursor is still on screen. so something like \ is printed.
You could use events:
http://docs.python.org/2/library/threading.html
I tested this and it works. It also keeps everything in sync. You should avoid changing/reading the same variables in different threads without synchronizing them.
#!/usr/bin/python
from threading import Thread
from threading import Event
import time
import itertools
import sys
def b(event):
for j in itertools.cycle('/-\|'):
if not event.is_set():
sys.stdout.write(j)
sys.stdout.flush()
time.sleep(0.1)
sys.stdout.write('\b')
else:
return
def a(event):
#code does stuff here for 5-10 minutes
#simulate with sleep
time.sleep(2)
event.set()
def main():
c = Event()
Thread(target = a, kwargs = {'event': c}).start()
Thread(target = b, kwargs = {'event': c}).start()
if __name__ == "__main__":
main()
Related to 'kwargs', from Python docs (URL in the beginning of the post):
class threading.Thread(group=None, target=None, name=None, args=(), kwargs={})
...
kwargs is a dictionary of keyword arguments for the target invocation. Defaults to {}.
You're on the right track mostly, except for the global variable. Normally you'd needed to coordinate access to shared data like that with a lock or semaphore, but in this special case you can take a short-cut and just use whether one of the threads is running or not instead. This is what I mean:
from threading import Thread
from threading import Event
import time
import itertools
import sys
def monitor_thread(watched_thread):
chars = itertools.cycle('/-\|')
while watched_thread.is_alive():
sys.stdout.write(chars.next())
sys.stdout.flush()
time.sleep(0.1)
sys.stdout.write('\b')
def worker_thread():
# code does stuff here - simulated with sleep
time.sleep(2)
if __name__ == "__main__":
watched_thread = Thread(target=worker_thread)
watched_thread.start()
Thread(target=monitor_thread, args=(watched_thread,)).start()
This is not properly synchronized. But I will not try to explain it all to you right now because it's a whole lot of knowledge. Try to read this: http://effbot.org/zone/thread-synchronization.htm
But in your case it's not that bad that things aren't synchronized correctyl. The only thing that could happen, is that the spining bar spins a few ms longer than the background task actually needs.
Related
I have a python script which calls a series of sub-processes. They need to run "for ever" - but they occasionally die, or get killed. When this happens I need to restart the process using the same arguments as the one which died.
This is a very simplified version:
[edit: this is the less simplified version, which includes "restart" code]
import multiprocessing
import time
import random
def printNumber(number):
print("starting :", number)
while random.randint(0, 5) > 0:
print(number)
time.sleep(2)
if __name__ == '__main__':
children = [] # list
args = {} # dictionary
for processNumber in range(10,15):
p = multiprocessing.Process(
target=printNumber,
args=(processNumber,)
)
children.append(p)
p.start()
args[p.pid] = processNumber
while True:
time.sleep(1)
for n, p in enumerate(children):
if not p.is_alive():
#get parameters dead child was started with
pidArgs = args[p.pid]
del(args[p.pid])
print("n,args,p: ",n,pidArgs,p)
children.pop(n)
# start new process with same args
p = multiprocessing.Process(
target=printNumber,
args=(pidArgs,)
)
children.append(p)
p.start()
args[p.pid] = pidArgs
I have updated the example to illustrate how I want the processes to be restarted if one crashes/killed/etc - keeping track of which pid was started with which args.
Is this the "best" way to do this, or is there a more "python" way of doing this?
I think I would create a separate thread for each Process and use a ProcessPoolExecutor. Executors have a useful function, submit, which returns a Future. You can wait on each Future and re-launch the Executor when the Future is done. Arguments to the function are tracked as class variables, so restarting is just a simple loop.
import threading
from concurrent.futures import ProcessPoolExecutor
import time
import random
import traceback
def printNumber(number):
print("starting :", number)
while random.randint(0, 5) > 0:
print(number)
time.sleep(2)
class KeepRunning(threading.Thread):
def __init__(self, func, *args, **kwds):
self.func = func
self.args = args
self.kwds = kwds
super().__init__()
def run(self):
while True:
with ProcessPoolExecutor(max_workers=1) as pool:
future = pool.submit(self.func, *self.args, **self.kwds)
try:
future.result()
except Exception:
traceback.print_exc()
if __name__ == '__main__':
for process_number in range(10, 15):
keep = KeepRunning(printNumber, process_number)
keep.start()
while True:
time.sleep(1)
At the end of the program is a loop to keep the main thread running. Without that, the program will attempt to exit while your Processes are still running.
For the example you provided I would just remove the exit condition from the while loop and change it to True.
As you said though the actual code is more complicated (why didn't you post that?). So if the process gets terminated by lets say an exception just put the code inside a try catch block. You can then put said block in an infinite loop.
I hope this is what you are looking for but that seems to be the right way to do it provided the goal and information you provided.
Instead of just starting the process immediately, you can save the list of processes and their arguments, and create another process that checks they are alive.
For example:
if __name__ == '__main__':
process_list = []
for processNumber in range(5):
process = multiprocessing.Process(
target=printNumber,
args=(processNumber,)
)
process_list.append((process,args))
process.start()
while True:
for running_process, process_args in process_list:
if not running_process.is_alive():
new_process = multiprocessing.Process(target=printNumber, args=(process_args))
process_list.remove(running_process, process_args) # Remove terminated process
process_list.append((new_process, process_args))
I must say that I'm not sure the best way to do it is in python, you may want to look at scheduler services like jenkins or something like that.
I am trying to run some threads using a thread limiter to keep number of threads to 10. I had an example to use as guide but I need to pass in some arguments to the function when the calling the thread. I am struggling with the passing of arguments. I marked with ### the areas where I am not sure the syntax and where I think my problem is.
I am trying to use this The right way to limit maximum number of threads running at once? as a guide. Here is the sample code I am trying to follow. My example is below that. Any time I try to pass in all the arguments I get back TypeError: __ init__() takes 1 to 6 arguments but 17 were passed In my example below I cut the arguments down to 4 to make it easier to read but I have 17 arguments in my live code. In example I keep the arguments down to run, main_path, target_path, jiranum to keep reading easier
threadLimiter = threading.BoundedSemaphore(maximumNumberOfThreads)
class MyThread(threading.Thread):
def run(self):
threadLimiter.acquire()
try:
self.Executemycode()
finally:
threadLimiter.release()
def Executemycode(self):
print(" Hello World!")
# <your code here>
My code
import os
import sys
import threading
threadLimiter = threading.BoundedSemaphore(10)
class MyThread(threading.Thread):
def run(self): ### I also tried (run, main_path, target_path, jiranum)
threadLimiter.acquire()
try:
self.run_compare(run, main_path, target_path, jiranum) #### I also tried self
finally:
threadLimiter.release()
def run_compare(run, main_path, target_path, jiranum): #### ???
os.chdir(target_path)
os.system(main_path + ', ' + target_path + ',' + jiranum + ',' + run)
if __name__ == '__main__':
#set the needed variables
threads = []
for i in range (1, int(run)+1):
process=threading.Thread(target=MyThread, args=(str(i), main_path, target_path, jiranum)) #### Is this defined right?
process.start()
threads.append(process)
for process in threads:
process.join()
This would probably be a simpler task with concurrent.futures but I like getting my hands dirty, so here we go. A few suggestions:
I find classes as thread targets often complicate things, so if there's no compelling reason, keep it simple
It's easier to use a with block to acquire and release a semaphore, and a regular semaphore usually suffices in that case
17 arguments can get messy; I would build a tuple of the arguments outside the call to threading.Thread() so it's easier to read, then unpack the tuple in the thread
This should work as a simple example; os.system() just echoes something and sleeps, so you can see the thread count is limited by the semaphore.
import os
import threading
from random import randint
threadLimiter = threading.Semaphore(10)
def run_config(*args):
run, arg1, arg2 = args # unpack the 17 args by name
with threadLimiter:
seconds = randint(2, 7)
os.system(f"echo run {run}, args {arg1} {arg2} ; sleep {seconds}")
if __name__ == '__main__':
threads = []
run = "20" # I guess this is a string because of below?
for i in range (1, int(run)+1):
thr_args = (str(i), "arg1",
"arg2") # put the 17 args here
thr = threading.Thread(target=run_config, args=thr_args)
thr.start()
threads.append(thr)
for thr in threads:
thr.join()
I want test_func to work separately. It should print every N seconds, I've tried this, but time.sleep() just stops my program.
import time
import threading
def test_func():
time.sleep(10)
print("Printing stuff....")
t = threading.Timer(10, test_func).start()
test_func()
# something goes here constantly
So I though of doing something like this, but that doesn't work either.
import time
import threading
def test_func():
time.sleep(10)
print("Printing stuff....")
# something goes here constantly
while True:
t = threading.Timer(10, test_func).start()
So, how can it be fixed?
You don't need a timer for this. Just put your function in a thread, so:
def test_func():
while True:
print("Printing stuff....")
time.sleep(10)
threading.Thread(target=test_func).start()
Do not span more threads within while loop. Instead change code to
def test_func():
while True:
time.sleep(10)
print("Printing stuff....")
Call the test_func() here in thread.
I want to know how can I stop my program in console with CTRL+C or smth similar.
The problem is that there are two threads in my program. Thread one crawls the web and extracts some data and thread two displays this data in a readable format for the user. Both parts share same database. I run them like this :
from threading import Thread
import ResultsPresenter
def runSpider():
Thread(target=initSpider).start()
Thread(target=ResultsPresenter.runPresenter).start()
if __name__ == "__main__":
runSpider()
how can I do that?
Ok so I created my own thread class :
import threading
class MyThread(threading.Thread):
"""Thread class with a stop() method. The thread itself has to check
regularly for the stopped() condition."""
def __init__(self):
super(MyThread, self).__init__()
self._stop = threading.Event()
def stop(self):
self._stop.set()
def stopped(self):
return self._stop.isSet()
OK so I will post here snippets of resultPresenter and crawler.
Here is the code of resultPresenter :
# configuration
DEBUG = False
DATABASE = database.__path__[0] + '/database.db'
app = Flask(__name__)
app.config.from_object(__name__)
app.config.from_envvar('CRAWLER_SETTINGS', silent=True)
def runPresenter():
url = "http://127.0.0.1:5000"
webbrowser.open_new(url)
app.run()
There are also two more methods here that I omitted - one of them connects to the database and the second method loads html template to display result. I repeat this until conditions are met or user stops the program ( what I am trying to implement ). There are also two other methods too - one get's initial link from the command line and the second valitated arguments - if arguments are invalid I won't run crawl() method.
Here is short version of crawler :
def crawl(initialLink, maxDepth):
#here I am setting initial values, lists etc
while not(depth >= maxDepth or len(pagesToCrawl) <= 0):
#this is the main loop that stops when certain depth is
#reached or there is nothing to crawl
#Here I am popping urls from url queue, parse them and
#insert interesting data into the database
parser.close()
sock.close()
dataManager.closeConnection()
Here is the init file which starts those modules in threads:
import ResultsPresenter, MyThread, time, threading
def runSpider():
MyThread.MyThread(target=initSpider).start()
MyThread.MyThread(target=ResultsPresenter.runPresenter).start()
def initSpider():
import Crawler
import database.__init__
import schemas.__init__
import static.__init__
import templates.__init__
link, maxDepth = Crawler.getInitialLink()
if link:
Crawler.crawl(link, maxDepth)
killall = False
if __name__ == "__main__":
global killall
runSpider()
while True:
try:
time.sleep(1)
except:
for thread in threading.enumerate():
thread.stop()
killall = True
raise
Killing threads is not a good idea, since (as you already said) they may be performing some crucial operations on database. Thus you may define global flag, which will signal threads that they should finish what they are doing and quit.
killall = False
import time
if __name__ == "__main__":
global killall
runSpider()
while True:
try:
time.sleep(1)
except:
/* send a signal to threads, for example: */
killall = True
raise
and in each thread you check in a similar loop whether killall variable is set to True. If it is close all activity and quit the thread.
EDIT
First of all: the Exception is rather obvious. You are passing target argument to __init__, but you didn't declare it in __init__. Do it like this:
class MyThread(threading.Thread):
def __init__(self, *args, **kwargs):
super(MyThread, self).__init__(*args, **kwargs)
self._stop = threading.Event()
And secondly: you are not using my code. As I said: set the flag and check it in thread. When I say "thread" I actually mean the handler, i.e. ResultsPresenter.runPresenter or initSpide. Show us the code of one of these and I'll try to show you how to handle stopping.
EDIT 2
Assuming that the code of crawl function is in the same file (if it is not, then you have to import killall variable), you can do something like this
def crawl(initialLink, maxDepth):
global killall
# Initialization.
while not killall and not(depth >= maxDepth or len(pagesToCrawl) <= 0):
# note the killall variable in while loop!
# the other code
parser.close()
sock.close()
dataManager.closeConnection()
So basically you just say: "Hey, thread, quit the loop now!". Optionally you can literally break a loop:
while not(depth >= maxDepth or len(pagesToCrawl) <= 0):
# some code
if killall:
break
Of course it will still take some time before it quits (has to finish the loop and close parser, socket, etc.), but it should quit safely. That's the idea at least.
Try this:
ps aux | grep python
copy the id of the process you want to kill and:
kill -3 <process_id>
And in your code (adapted from here):
import signal
import sys
def signal_handler(signal, frame):
print 'You killed me!'
sys.exit(0)
signal.signal(signal.SIGQUIT, signal_handler)
print 'Kill me now'
signal.pause()
I'm new to Python and I'm writing a script that
includes some timed routines.
My current approach is to instantiate a class
that includes those Timers (from: threading.Timer),
but I don't want the script to return when it gets to the
end of the function:
import mytimer
timer = mytimer()
Suppose I have a imple script like that one. All it
does is instantiate a mytimer object which performs a series
of timed activities.
In order for the application not to exit, I could use Qt like this:
from PyQt4.QtCore import QCoreApplication
import mytimer
import sys
def main():
app = QCoreApplication(sys.argv)
timer = mytimer()
sys.exit(app.exec_())
if __name__ == '__main__':
main()
This way, the sys.exit() call won't return immediately, and the
timer would just keep doing its thing 'forever' in background.
Although this is a solution I've used before, using Qt just for this doesn't
fell right to me.
So my question is, Is there any way to accomplish this using standard Python?
Thanks
Create a function in your script which tests a select or poll object to terminate a loop. Check out serve_forever in SocketServer.py from the standard library as an example.
A Google search for "python timer" finds:
http://docs.python.org/library/sched.html
http://docs.python.org/release/2.5.2/lib/timer-objects.html
The sched module seems to be exactly what you need.
Example:
>>> import sched, time
>>> s = sched.scheduler(time.time, time.sleep)
>>> def print_time(): print "From print_time", time.time()
...
>>> def print_some_times():
... print time.time()
... s.enter(5, 1, print_time, ())
... s.enter(10, 1, print_time, ())
... s.run()
... print time.time()
...
>>> print_some_times()
930343690.257
From print_time 930343695.274
From print_time 930343700.273
930343700.276
Once you have built your queue of times for things to happen, you just call the .run() method on your sched instance, and it will automatically wait until the queue is emptied, then will complete. So you can just put s.run() as the last thing in your script, and it will automatically exit only when the timed tasks are all done.
import mytimer
import sys
from threading import Lock
lock = Lock()
lock.acquire() # put lock into locked state
def main():
timer = mytimer()
lock.acquire() # blocks until someone calls lock.release()
if __name__ == '__main__':
main()
If you want a clean exit, you can just make mytimer() call lock.release() at some point.