Check how much the method works in realTime - python

I have specific Task, I need know how long work some method in realtime. If some method work long I need raise Exception.
this is my test method:
#timer
def get_mails(self):
print("start method")
max_sec1 = 7
current_sec1 = 0
while max_sec1 != current_sec1:
time.sleep(1)
current_sec1 += 1
print("method is finish")
Bellow i write decorator, he controle how many times it work
class TickTack(threading.Thread):
def __init__(self, func):
super(TickTack, self).__init__()
self._stop_event = threading.Event()
self.func = func
def run(self):
self.run_function()
def run_function(self):
self.func(self)
def terminate(self):
self._stop_event.set()
def timer(func):
#functools.wraps(func)
def wrapper(*args):
print("start wrapper")
max_sec = 3 # max work sec
current_sec = 0
tick_tack_thread = TickTack(func)
tick_tack_thread.start()
while max_sec != current_sec:
time.sleep(1)
current_sec += 1
print(current_sec)
if max_sec == current_sec:
tick_tack_thread.terminate() # stop thread!!! But it not work
raise Exception("Method work so long")
print("end wrapper")
return wrapper
this code not stoped get_mails functions. But I need stoped(raise Exception some method(my case method get_mails ))

Related

Python threading.join() hangs

My problem is as follows:
I have a class that inherits from threading.Thread that I want to be able to stop gracefully. This class also has a Queue it get's its work from.
Since there are quite some classes in my project that should have this behaviour, I've created some superclasses to reduce duplicate code like this:
Thread related behaviour:
class StoppableThread(Thread):
def __init__(self):
Thread.__init__(self)
self._stop = Event()
def stop(self):
self._stop.set()
def stopped(self):
return self._stop.isSet()
Queue related behaviour:
class Queueable():
def __init__(self):
self._queue = Queue()
def append_to_job_queue(self, job):
self._queue.put(job)
Combining the two above and adding queue.join() to the stop() call
class StoppableQueueThread(StoppableThread, Queueable):
def __init__(self):
StoppableThread.__init__(self)
Queueable.__init__(self)
def stop(self):
super(StoppableQueueThread, self).stop()
self._queue.join()
A base class for a datasource:
class DataSource(StoppableThread, ABC):
def __init__(self, data_parser):
StoppableThread.__init__(self)
self.setName("DataSource")
ABC.__init__(self)
self._data_parser = data_parser
def run(self):
while not self.stopped():
record = self._fetch_data()
self._data_parser.append_to_job_queue(record)
#abstractmethod
def _fetch_data(self):
"""implement logic here for obtaining a data piece
should return the fetched data"""
An implementation for a datasource:
class CSVDataSource(DataSource):
def __init__(self, data_parser, file_path):
DataSource.__init__(self, data_parser)
self.file_path = file_path
self.csv_data = Queue()
print('loading csv')
self.load_csv()
print('done loading csv')
def load_csv(self):
"""Loops through csv and adds data to a queue"""
with open(self.file_path, 'r') as f:
self.reader = reader(f)
next(self.reader, None) # skip header
for row in self.reader:
self.csv_data.put(row)
def _fetch_data(self):
"""Returns next item of the queue"""
item = self.csv_data.get()
self.csv_data.task_done()
print(self.csv_data.qsize())
return item
Suppose there is a CSVDataSource instance called ds, if I want to stop the thread I call:
ds.stop()
ds.join()
The ds.join() call however, never returns. I'm not sure why this is, because the run() method does check if the stop event is set.
Any Ideas?
Update
A little more clarity as requested: the applications is build up out of several threads. The RealStrategy thread (below) is the owner of all the other threads and is responsible for starting and terminating them. I haven't set the daemon flag for any of the threads, so they should be non-daemonic by default.
The main thread looks like this:
if __name__ == '__main__':
def exit_handler(signal, frame):
rs.stop_engine()
rs.join()
sys.exit(0)
signal.signal(signal.SIGINT, exit_handler)
rs = RealStrategy()
rs.run_engine()
And here are the rs.run_engine() and rs.stop_engine() methods that are called in main:
class RealStrategy(Thread):
.....
.....
def run_engine(self):
self.on_start()
self._order_handler.start()
self._data_parser.start()
self._data_source.start()
self.start()
def stop_engine(self):
self._data_source.stop()
self._data_parser.stop()
self._order_handler.stop()
self._data_source.join()
self._data_parser.join()
self._order_handler.join()
self.stop()
If you want to use queue.Queue.join, then you must also use queue.Queue.task_done. You can read the linked documentation or see the following copied from information available online:
Queue.task_done()
Indicate that a formerly enqueued task is complete.
Used by queue consumer threads. For each get() used to fetch a task, a
subsequent call to task_done() tells the queue that the processing on
the task is complete.
If a join() is currently blocking, it will resume when all items have
been processed (meaning that a task_done() call was received for every
item that had been put() into the queue).
Raises a ValueError if called more times than there were items placed
in the queue.
Queue.join()
Blocks until all items in the queue have been gotten and processed.
The count of unfinished tasks goes up whenever an item is added to the
queue. The count goes down whenever a consumer thread calls
task_done() to indicate that the item was retrieved and all work on it
is complete. When the count of unfinished tasks drops to zero, join()
unblocks.
To test your problem, an example implementation was created to find out what was going on. It is slightly different from how your program works but demonstrates a method to solving your problem:
#! /usr/bin/env python3
import abc
import csv
import pathlib
import queue
import sys
import threading
import time
def main():
source_path = pathlib.Path(r'C:\path\to\file.csv')
data_source = CSVDataSource(source_path)
data_source.start()
processor = StoppableThread(target=consumer, args=[data_source])
processor.start()
time.sleep(0.1)
data_source.stop()
def consumer(data_source):
while data_source.empty:
time.sleep(0.001)
while not data_source.empty:
task = data_source.get_from_queue(True, 0.1)
print(*task.data, sep=', ', flush=True)
task.done()
class StopThread(StopIteration):
pass
threading.SystemExit = SystemExit, StopThread
class StoppableThread(threading.Thread):
def _bootstrap(self, stop=False):
# noinspection PyProtectedMember
if threading._trace_hook:
raise RuntimeError('cannot run thread with tracing')
def terminate():
nonlocal stop
stop = True
self.__terminate = terminate
# noinspection PyUnusedLocal
def trace(frame, event, arg):
if stop:
raise StopThread
sys.settrace(trace)
super()._bootstrap()
def terminate(self):
try:
self.__terminate()
except AttributeError:
raise RuntimeError('cannot terminate thread '
'before it is started') from None
class Queryable:
def __init__(self, maxsize=1 << 10):
self.__queue = queue.Queue(maxsize)
def add_to_queue(self, item):
self.__queue.put(item)
def get_from_queue(self, block=True, timeout=None):
return self.__queue.get(block, timeout)
#property
def empty(self):
return self.__queue.empty()
#property
def full(self):
return self.__queue.full()
def task_done(self):
self.__queue.task_done()
def join_queue(self):
self.__queue.join()
class StoppableQueryThread(StoppableThread, Queryable):
def __init__(self, target=None, name=None, args=(), kwargs=None,
*, daemon=None, maxsize=1 << 10):
super().__init__(None, target, name, args, kwargs, daemon=daemon)
Queryable.__init__(self, maxsize)
def stop(self):
self.terminate()
self.join_queue()
class DataSource(StoppableQueryThread, abc.ABC):
#abc.abstractmethod
def __init__(self, maxsize=1 << 10):
super().__init__(None, 'DataSource', maxsize=maxsize)
def run(self):
while True:
record = self._fetch_data()
self.add_to_queue(record)
#abc.abstractmethod
def _fetch_data(self):
pass
class CSVDataSource(DataSource):
def __init__(self, source_path):
super().__init__()
self.__data_parser = self.__build_data_parser(source_path)
#staticmethod
def __build_data_parser(source_path):
with source_path.open(newline='') as source:
parser = csv.reader(source)
next(parser, None)
yield from parser
def _fetch_data(self):
try:
return Task(next(self.__data_parser), self.task_done)
except StopIteration:
raise StopThread from None
class Task:
def __init__(self, data, callback):
self.__data = data
self.__callback = callback
#property
def data(self):
return self.__data
def done(self):
self.__callback()
if __name__ == '__main__':
main()

Exiting a Thread in Python

I'm trying to write a program that crawls through a website and download all the videos it has. I'm facing a problem that the number of threads continuously increases even after the downloading of individual videos are done.
Here is the code for the individual Worker object, which is queued and then joined later. This is the only part of the code at which I generate a Thread. What I don't understand is how there can be remaining threads if given the object, I implement the self.stop() function and the while loop breaks.
class Worker(Thread):
def __init__(self, thread_pool):
Thread.__init__(self)
self.tasks = thread_pool.tasks
self.tasks_info = thread_pool.tasks_info
self.daemon = True
self._is_running=True
self.start()
def stop(self):
self._is_running = False
def run(self):
while self._is_running:
func, args, kargs = self.tasks.get()
try: func(*args, **kargs)
except Exception:
print("\nError: Threadpool error.")
sys.exit(1)
self.tasks_info['num_tasks_complete'] += 1
self.tasks.task_done()
self.stop()
I've used the thread functions to check which threads are alive, and it turns out that it is indeed mostly the worker functions as well as other objects called Thread(SockThread) and _MainThread, which I do not know how to close.
Please advise on 1. why the Worker thread is not ending and 2. how to get rid of the Thread(SockThread) as well as the _MainThread.
Thank you!
edit 1
class ThreadPool:
def __init__(self, name, num_threads, num_tasks):
self.tasks = Queue(num_threads)
self.num_threads=num_threads
self.tasks_info = {
'name': name,
'num_tasks': num_tasks,
'num_tasks_complete': 0
}
for _ in range(num_threads):
Worker(self)
print(threading.active_count)
def add_task(self, func, *args, **kwargs):
self.tasks.put((func, args, kwargs))
def wait_completion(self):
print("at the beginning of wait_completion:")
print(threading.active_count())
By looking at your code it looks like you have initialized the thread which calls the run() method for processing. After that you're even using the start method which is not the proper way. Your code should be as follows:
from threading import Event
class Worker(Thread):
def __init__(self, thread_pool):
self.tasks = thread_pool.tasks
self.tasks_info = thread_pool.tasks_info
self.exit = Event()
super(Thread,self).__init__()
def shutdown(self):
self.exit.set()
def run(self):
while not self.exit.is_set():
func, args, kargs = self.tasks.get()
try:
func(*args, **kargs)
except Exception:
print("\nError: Threadpool error.")
# use shutdown method for error
self.shutdown()
sys.exit(1)
self.tasks_info['num_tasks_complete'] += 1
self.tasks.task_done()
self.shutdown()

Why doesn't SIGVTALRM trigger inside time.sleep()?

I'm trying to use SIGVTALRM to snapshot profile my Python code, but it doesn't seem to be firing inside blocking operations like time.sleep() and socket operations.
Why is that? And is there any way to address that, so I can collect samples while I'm inside blocking operations?
I've also tried using ITIMER_PROF/SIGPROF and ITIMER_REAL/SIGALRM and both seem to produce similar results.
The code I'm testing with follows, and the output is something like:
$ python profiler-test.py
<module>(__main__:1);test_sampling_profiler(__main__:53): 1
<module>(__main__:1);test_sampling_profiler(__main__:53);busyloop(__main__:48): 1509
Note that the timesleep function isn't shown at all.
Test code:
import time
import signal
import collections
class SamplingProfiler(object):
def __init__(self, interval=0.001, logger=None):
self.interval = interval
self.running = False
self.counter = collections.Counter()
def _sample(self, signum, frame):
if not self.running:
return
stack = []
while frame is not None:
formatted_frame = "%s(%s:%s)" %(
frame.f_code.co_name,
frame.f_globals.get('__name__'),
frame.f_code.co_firstlineno,
)
stack.append(formatted_frame)
frame = frame.f_back
formatted_stack = ';'.join(reversed(stack))
self.counter[formatted_stack] += 1
signal.setitimer(signal.ITIMER_VIRTUAL, self.interval, 0)
def start(self):
if self.running:
return
signal.signal(signal.SIGVTALRM, self._sample)
signal.setitimer(signal.ITIMER_VIRTUAL, self.interval, 0)
self.running = True
def stop(self):
if not self.running:
return
self.running = False
signal.signal(signal.SIGVTALRM, signal.SIG_IGN)
def flush(self):
res = self.counter
self.counter = collections.Counter()
return res
def busyloop():
start = time.time()
while time.time() - start < 5:
pass
def timesleep():
time.sleep(5)
def test_sampling_profiler():
p = SamplingProfiler()
p.start()
busyloop()
timesleep()
p.stop()
print "\n".join("%s: %s" %x for x in sorted(p.flush().items()))
if __name__ == "__main__":
test_sampling_profiler()
Not sure about why time.sleep works that way (could it be using SIGALRM for itself to know when to resume?) but Popen.wait does not block signals so worst case you can call out to OS sleep.
Another approach is to use a separate thread to trigger the sampling:
import sys
import threading
import time
import collections
class SamplingProfiler(object):
def __init__(self, interval=0.001):
self.interval = interval
self.running = False
self.counter = collections.Counter()
self.thread = threading.Thread(target=self._sample)
def _sample(self):
while self.running:
next_wakeup_time = time.time() + self.interval
for thread_id, frame in sys._current_frames().items():
if thread_id == self.thread.ident:
continue
stack = []
while frame is not None:
formatted_frame = "%s(%s:%s)" % (
frame.f_code.co_name,
frame.f_globals.get('__name__'),
frame.f_code.co_firstlineno,
)
stack.append(formatted_frame)
frame = frame.f_back
formatted_stack = ';'.join(reversed(stack))
self.counter[formatted_stack] += 1
sleep_time = next_wakeup_time - time.time()
if sleep_time > 0:
time.sleep(sleep_time)
def start(self):
if self.running:
return
self.running = True
self.thread.start()
def stop(self):
if not self.running:
return
self.running = False
def flush(self):
res = self.counter
self.counter = collections.Counter()
return res
def busyloop():
start = time.time()
while time.time() - start < 5:
pass
def timesleep():
time.sleep(5)
def test_sampling_profiler():
p = SamplingProfiler()
p.start()
busyloop()
timesleep()
p.stop()
print "\n".join("%s: %s" %x for x in sorted(p.flush().items()))
if __name__ == "__main__":
test_sampling_profiler()
When doing it this way the result is:
$ python profiler-test.py
<module>(__main__:1);test_sampling_profiler(__main__:62);busyloop(__main__:54): 2875
<module>(__main__:1);test_sampling_profiler(__main__:62);start(__main__:37);start(threading:717);wait(threading:597);wait(threading:309): 1
<module>(__main__:1);test_sampling_profiler(__main__:62);timesleep(__main__:59): 4280
Still not totally fair, but better than no samples at all during sleep.
The absence of SIGVTALRM during a sleep() doesn't surprise me, since ITIMER_VIRTUAL "runs only when the process is executing."
(As an aside, CPython on non-Windows platforms implements time.sleep() in terms of select().)
With a plain SIGALRM, however, I expect a signal interruption and indeed I observe one:
<module>(__main__:1);test_sampling_profiler(__main__:62);busyloop(__main__:54): 4914
<module>(__main__:1);test_sampling_profiler(__main__:62);timesleep(__main__:59): 1
I changed the code somewhat, but you get the idea:
class SamplingProfiler(object):
TimerSigs = {
signal.ITIMER_PROF : signal.SIGPROF,
signal.ITIMER_REAL : signal.SIGALRM,
signal.ITIMER_VIRTUAL : signal.SIGVTALRM,
}
def __init__(self, interval=0.001, timer = signal.ITIMER_REAL): # CHANGE
self.interval = interval
self.running = False
self.counter = collections.Counter()
self.timer = timer # CHANGE
self.signal = self.TimerSigs[timer] # CHANGE
....

Externally stop a running while loop

I have a cmd.Cmd class command line interpreter that, for example, initializes a self.counter = Counter().
After calling 'start', do_start() will call self.counter.start() and self.counter starts a while loop that counts from 0 to infinity.
Pseudocode example of Counter:
class Counter(object):
def __init__(self):
self.number = 0
self.running = False
def start():
self.running = True
while self.running:
self.number += 1
def status():
return self.number
def stop():
self.running = False
How can I call 'status' in my cmd.Cmd class (which calls do_status()) to get self.counter.status() which will give the current number that has been incremented?
And how can I call 'stop' in my cmd.Cmd class to get self.counter.stop() to stop the while loop.
If you want to do something in parallel you must use threads or multiple processes like this:
import threading
from time import sleep
class Counter(object):
def __init__(self):
self.number = 0
self.running = False
def start(self):
self.running = True
while self.running:
self.number += 1
# add sleep to prevent blocking main thread by this loop
sleep(0.1)
def status(self):
return self.number
def stop(self):
self.running = False
class Cmd(object):
t = None
counter = None
def start(self):
self.counter = Counter()
self.t = threading.Thread(target=self.counter.start)
self.t.start()
def do_status(self):
return self.counter.status()
def stop(self):
self.counter.stop()
# waiting while thread with Counter will finish
self.t.join()
if __name__ == "__main__":
cmd = Cmd()
print "Starting counter"
cmd.start()
sleep(5)
print cmd.do_status()
sleep(2)
print cmd.do_status()
cmd.stop()
print "Counter was stopped"
Output will be:
Starting counter
50
70
Counter was stopped
But if you want to be able communicate with Counter from different application then you must learn about sockets .
if cmd is an instance of Cmd and your using an instance method:
Send the instance to Counter:
def __init__(self, cmd):
self.number = 0
# self.running = False # removed - use self.cmd.status() for control
self.cmd = cmd
Control while using self.cmd:
def start():
while self.cmd.status():
self.number += 1
I expect self.cmd.status() to be blocking (expecting user input, or something like that).

python time instance inside class not updating on call

I have a small script that polls a database to look for status of certain jobs. I decided to use APScheduler to handle the looping call. I created a decorator to timeout a function if taking too long. The issue I am having here is that the decorator is inside a class and even though I create two instances of the class, inside two different functions, they always have the same start_time. I thought maybe if I move the decorator inside of my class and initialize the start_time in the init call it would update the start_time per instance of the class. When I moved the decorator insdie of the class and assigned self.start_time = datetime.now() the start time updates on each call of the class and thus will never time out. The example of the decorator inside of the class is also below.
def timeout(start, min_to_wait):
def decorator(func):
def _handle_timeout():
scheduler.shutdown(wait=False)
#wraps(func)
def wrapper(*args, **kwargs):
expire = start + timedelta(minutes = min_to_wait)
now = datetime.now()
if now > expire:
_handle_timeout()
return func(*args, **kwargs)
return wrapper
return decorator
class Job(object):
def __init__(self, name, run_id, results):
self.name = name
self.run_id = object_id
self.results = results
self.parcel_id= None
self.status = None
start_time = datetime.now()
#timeout(start_time, config.WAIT_TIME)
def wait_for_results(self):
if self.results:
self.pack_id = self.results[0].get('parcel_id')
self.status = self.results[0].get('status')
return self.results[0]
else:
return False
#timeout(start_time, config.WORK_TIME)
def is_done(self):
status = self.results[0].get('status')
status_map = {'done': True,
'failed': FailedError,
'lost': LostError}
def _get_or_throw(s, map_obj):
value = map_obj.get(s)
if s in ['failed', 'lost']:
raise value(s)
else:
self.status = s
return s
return _get_or_throw(status, status_map)
def job_1(mssql, postgres, runid):
res = get_results(mssql, config.MSSQL, first_query_to_call)
first_job= Job('first_job', runid, res)
step_two = pack_job.wait_for_results()
if step_two:
try:
logger.info(first_job)
if first_job.is_done() == 'done':
scheduler.remove_job('first_job')
scheduler.add_job(lambda: job_two(mssql,
postgres, first_job.object_id, runid), 'interval', seconds=config.POLL_RATE, id='second_job')
except LostError as e:
logger.error(e, exc_info=True)
scheduler.shutdown(wait=False)
except FailedError as e:
logger.error(e, exc_info=True)
scheduler.shutdown(wait=False)
def job_two(mssql, postgres, object_id, runid):
res = get_results(mssql, config.MSSQL, some_other_query_to_run, object_id)
second_job= Job('second_job', runid, res)
step_two = second_job.wait_for_results()
if step_two:
try:
logger.info(second_job)
if second_job.is_done() == 'done':
scheduler.remove_job('second_job')
except LostError as e:
logger.error(e, exc_info=True)
scheduler.shutdown(wait=False)
except FailedError as e:
logger.error(e, exc_info=True)
scheduler.shutdown(wait=False)
if __name__ == '__main__':
runid = sys.argv[1:]
if runid:
runid = runid[0]
scheduler = BlockingScheduler()
run_job = scheduler.add_job(lambda: job_one(pymssql, psycopg2, runid), 'interval', seconds=config.POLL_RATE, id='first_job')
attempt to move decorator inside class:
class Job(object):
def __init__(self, name, run_id, results):
self.name = name
self.run_id = run_id
self.results = results
self.pack_id = None
self.status = None
self.start_time = datetime.now()
def timeout(min_to_wait):
def decorator(func):
def _handle_timeout():
scheduler.shutdown(wait=False)
#wraps(func)
def wrapper(self, *args, **kwargs):
print '**'
print self.start_time
print ''
expire = self.start_time + timedelta(minutes = min_to_wait)
now = datetime.now()
if now > expire:
_handle_timeout()
return func(self, *args, **kwargs)
return wrapper
return decorator
here is an example output from when I use the above decorator.
**
self start time: 2014-10-28 08:57:11.947026
**
self start time: 2014-10-28 08:57:16.976828
**
self start time: 2014-10-28 08:57:21.989064
the start_time needs to stay the same or else I can't timeout the function.
In the first exemple, your start time is initialised when the class statement is executed, which in your case is when the module is first imported in the interpreter.
In the second exemple, the start time is initialized when the class is instanciated. It should not change from one method call to another for a same Job instance. Of course if you keep on creating new instances, the start time will be different for each instance.
Now you didn't post the code using your Job class, so it's hard to tell what the right solution would be.

Categories

Resources