How to make my code stopable? (Not killing/interrupting) - python

I'm asking this question in a more broad spectrum because I'm not facing this specific issue right now, but I'm wondering how to do it in the future.
If I have a long running python script, that is supposed to do something all the time (could be a infine loop, if that helps). The code is started by running python main.py command on a terminal.
The code doesn't have an ending, so there will be no sys.exit().
I don't want to use KeyboardInterrupt and I don't want to kill the task. Because those options are abrupt, and you can't predict precisely at what point you are stoping the code.
Is there a way to 'softly' terminate the code when I eventually decide to fo it? For example using another command, preparing a class or running another script?
What would be the best practice for this?
PS.: Please, bear in mind that I'm a novice coder.
EDIT:
I'm adding some generic code, in order to make my question clearer.
import time,csv
import GenericAPI
class GenericDataCollector:
def __init__(self):
self.generic_api = GenericAPI()
def collect_data(self):
while True: #Maybe this could be a var that is changed from outside of the class?
data = self.generic_api.fetch_data() #Returns a JSON with some data
self.write_on_csv(data)
time.sleep(1)
def write_on_csv(self, data):
with open('file.csv','wt') as f:
writer = csv.writer(f)
writer.writerow(data)
def run():
obj = GenericDataCollector()
obj.collect_data()
if __name__ == "__main__":
run()
In this particular case, the class is collecting data from some generic API (that comes in JSON) and writing it in a csv file, in a infinite loop. How could I code a way (method?) to stop it (when called uppon, so unexpected), without abruptly interrupting (Ctrl+C or killing task).

I would recommend use the signal module. This allows you to handle signal interrupts (SIGINT) and clean up the program before your exit. Take the following code for example:
import signal
running = True
def handle(a, b):
global running
running = False
# catch the SIGINT signal and call handle() when the process
# receives it
signal.signal(signal.SIGINT, handle)
# your code here
while running:
pass
You can still exit with a Ctrl+C, but what you put in the while loop will not be cut off half way.

Based on #Calder White, how about this (not tested):
import signal
import time,csv
import GenericAPI
class GenericDataCollector:
def __init__(self):
self.generic_api = GenericAPI()
self.cont = True
def collect_data(self):
while self.cont:
signal.signal(signal.SIGINT, self.handle)
data = self.generic_api.fetch_data() #Returns a JSON with some data
self.write_on_csv(data)
time.sleep(1)
def handle(self):
self.cont = False
def write_on_csv(self, data):
with open('file.csv','wt') as f:
writer = csv.writer(f)
writer.writerow(data)
def run():
obj = GenericDataCollector()
obj.collect_data()
if __name__ == "__main__":
run()

Related

Threading with Bottle.py Server

I'm having an issue with threading that I can't solve in any way I've tried. I searched in StackOverflow too, but all I could find was cases that didn't apply to me, or explanations that I didn't understand.
I'm trying to build an app with BottlePy, and one of the features I want requires a function to run in background. For this, I'm trying to make it run in a thread. However, when I start the thread, it runs twice.
I've read in some places that it would be possible to check if the function was in the main script or in a module using if __name__ == '__main__':, however I'm not able to do this, since __name__ is always returning the name of the module.
Below is an example of what I'm doing right now.
The main script:
# main.py
from MyClass import *
from bottle import *
arg = something
myObject = Myclass(arg1)
app = Bottle()
app.run('''bottle args''')
The class:
# MyClass.py
import threading
import time
class MyClass:
def check_list(self, theList, arg1):
a_list = something()
time.sleep(5)
self.check_list(a_list, arg1)
def __init__(self, arg1):
if __name__ == '__main__':
self.a_list = arg.returnAList()
t = threading.Thread(target=self.check_list, args=(a_list, arg1))
So what I intend here is to have check_list running in a thread all the time, doing something and waiting some seconds to run again. All this so I can have the list updated, and be able to read it with the main script.
Can you explain to me what I'm doing wrong, why the thread is running twice, and how can I avoid this?
This works fine:
import threading
import time
class MyClass:
def check_list(self, theList, arg1):
keep_going=True
while keep_going:
print("check list")
#do stuff
time.sleep(1)
def __init__(self, arg1):
self.a_list = ["1","2"]
t = threading.Thread(target=self.check_list, args=(self.a_list, arg1))
t.start()
myObject = MyClass("something")
Figured out what was wrong thanks to the user Weeble's comment. When he said 'something is causing your main.py to run twice' I remembered that Bottle has an argument that is called 'reloader'. When set to True, this will make the application load twice, and thus the thread creation is run twice as well.

Different way to implement this threading?

I'm trying out threads in python. I want a spinning cursor to display while another method runs (for 5-10 mins). I've done out some code but am wondering is this how you would do it? i don't like to use globals, so I assume there is a better way?
c = True
def b():
for j in itertools.cycle('/-\|'):
if (c == True):
sys.stdout.write(j)
sys.stdout.flush()
time.sleep(0.1)
sys.stdout.write('\b')
else:
return
def a():
global c
#code does stuff here for 5-10 minutes
#simulate with sleep
time.sleep(2)
c = False
Thread(target = a).start()
Thread(target = b).start()
EDIT:
Another issue now is that when the processing ends the last element of the spinning cursor is still on screen. so something like \ is printed.
You could use events:
http://docs.python.org/2/library/threading.html
I tested this and it works. It also keeps everything in sync. You should avoid changing/reading the same variables in different threads without synchronizing them.
#!/usr/bin/python
from threading import Thread
from threading import Event
import time
import itertools
import sys
def b(event):
for j in itertools.cycle('/-\|'):
if not event.is_set():
sys.stdout.write(j)
sys.stdout.flush()
time.sleep(0.1)
sys.stdout.write('\b')
else:
return
def a(event):
#code does stuff here for 5-10 minutes
#simulate with sleep
time.sleep(2)
event.set()
def main():
c = Event()
Thread(target = a, kwargs = {'event': c}).start()
Thread(target = b, kwargs = {'event': c}).start()
if __name__ == "__main__":
main()
Related to 'kwargs', from Python docs (URL in the beginning of the post):
class threading.Thread(group=None, target=None, name=None, args=(), kwargs={})
...
kwargs is a dictionary of keyword arguments for the target invocation. Defaults to {}.
You're on the right track mostly, except for the global variable. Normally you'd needed to coordinate access to shared data like that with a lock or semaphore, but in this special case you can take a short-cut and just use whether one of the threads is running or not instead. This is what I mean:
from threading import Thread
from threading import Event
import time
import itertools
import sys
def monitor_thread(watched_thread):
chars = itertools.cycle('/-\|')
while watched_thread.is_alive():
sys.stdout.write(chars.next())
sys.stdout.flush()
time.sleep(0.1)
sys.stdout.write('\b')
def worker_thread():
# code does stuff here - simulated with sleep
time.sleep(2)
if __name__ == "__main__":
watched_thread = Thread(target=worker_thread)
watched_thread.start()
Thread(target=monitor_thread, args=(watched_thread,)).start()
This is not properly synchronized. But I will not try to explain it all to you right now because it's a whole lot of knowledge. Try to read this: http://effbot.org/zone/thread-synchronization.htm
But in your case it's not that bad that things aren't synchronized correctyl. The only thing that could happen, is that the spining bar spins a few ms longer than the background task actually needs.

QProcess.readAllStandardOutput() doesn't seem to read anything - PyQt

Here is the code sample:
class RunGui (QtGui.QMainWindow)
def __init__(self, parent=None):
...
QtCore.Qobject.connect(self.ui.actionNew, QtCore.SIGNAL("triggered()"), self.new_select)
...
def normal_output_written(self, qprocess):
self.ui.text_edit.append("caught outputReady signal") #works
self.ui.text_edit.append(str(qprocess.readAllStandardOutput())) # doesn't work
def new_select(self):
...
dialog_np = NewProjectDialog()
dialog_np.exec_()
if dialog_np.is_OK:
section = dialog_np.get_section()
project = dialog_np.get_project()
...
np = NewProject()
np.outputReady.connect(lambda: self.normal_output_written(np.qprocess))
np.errorReady.connect(lambda: self.error_output_written(np.qprocess))
np.inputNeeded.connect(lambda: self.input_from_line_edit(np.qprocess))
np.params = partial(np.create_new_project, section, project, otherargs)
np.start()
class NewProject(QtCore.QThread):
outputReady = QtCore.pyqtSignal(object)
errorReady = QtCore.pyqtSignal(object)
inputNeeded = QtCore.pyqtSignal(object)
params = None
message = ""
def __init__(self):
super(NewProject, self).__init__()
self.qprocess = QtCore.QProcess()
self.qprocess.moveToThread(self)
self._inputQueue = Queue()
def run(self):
self.params()
def create_new_project(self, section, project, otherargs):
...
# PyDev for some reason skips the breakpoints inside the thread
self.qprocess.start(command)
self.qprocess.waitForReadyRead()
self.outputReady.emit(self.qprocess) # works - I'm getting signal in RunGui.normal_output_written()
print(str(self.qprocess.readAllStandardOutput())) # prints empty line
.... # other actions inside the method requiring "command" to finish properly.
The idea is beaten to death - get the GUI to run scripts and communicate with the processes. The challenge in this particular example is that the script started in QProcess as command runs an app, that requires user input (confirmation) along the way. Therefore I have to be able to start the script, get all output and parse it, wait for the question to appear in the output and then communicate back the answer, allow it to finish and only then to proceed further with other actions inside create_new_project()
I don't know if this will fix your overall issue, but there are a few design issues I see here.
You are passing around the qprocess between threads instead of just emitting your custom signals with the results of the qprocess
You are using class-level attributes that should probably be instance attributes
Technically you don't even need the QProcess, since you are running it in your thread and actively using blocking calls. It could easily be a subprocess.Popen...but anyways, I might suggest changes like this:
class RunGui (QtGui.QMainWindow)
...
def normal_output_written(self, msg):
self.ui.text_edit.append(msg)
def new_select(self):
...
np = NewProject()
np.outputReady.connect(self.normal_output_written)
np.params = partial(np.create_new_project, section, project, otherargs)
np.start()
class NewProject(QtCore.QThread):
outputReady = QtCore.pyqtSignal(object)
errorReady = QtCore.pyqtSignal(object)
inputNeeded = QtCore.pyqtSignal(object)
def __init__(self):
super(NewProject, self).__init__()
self._inputQueue = Queue()
self.params = None
def run(self):
self.params()
def create_new_project(self, section, project, otherargs):
...
qprocess = QtCore.QProcess()
qprocess.start(command)
if not qprocess.waitForStarted():
# handle a failed command here
return
if not qprocess.waitForReadyRead():
# handle a timeout or error here
return
msg = str(self.qprocess.readAllStandardOutput())
self.outputReady.emit(msg)
Don't pass around the QProcess. Just emit the data. And create it from within the threads method so that it is automatically owned by that thread. Your outside classes should really not have any knowledge of that QProcess object. It doesn't even need to be a member attribute since its only needed during the operation.
Also make sure you are properly checking that your command both successfully started, and is running and outputting data.
Update
To clarify some problems you might be having (per the comments), I wanted to suggest that QProcess might not be the best option if you need to have interactive control with processes that expect periodic user input. It should work find for running scripts that just produce output from start to finish, though really using subprocess would be much easier. For scripts that need user input over time, your best bet may be to use pexpect. It allows you to spawn a process, and then watch for various patterns that you know will indicate the need for input:
foo.py
import time
i = raw_input("Please enter something: ")
print "Output:", i
time.sleep(.1)
print "Another line"
time.sleep(.1)
print "Done"
test.py
import pexpect
import time
child = pexpect.spawn("python foo.py")
child.setecho(False)
ret = -1
while ret < 0:
time.sleep(.05)
ret = child.expect("Please enter something: ")
child.sendline('FOO')
while True:
line = child.readline()
if not line:
break
print line.strip()
# Output: FOO
# Another line
# Done

Python how to stop threading operations

I want to know how can I stop my program in console with CTRL+C or smth similar.
The problem is that there are two threads in my program. Thread one crawls the web and extracts some data and thread two displays this data in a readable format for the user. Both parts share same database. I run them like this :
from threading import Thread
import ResultsPresenter
def runSpider():
Thread(target=initSpider).start()
Thread(target=ResultsPresenter.runPresenter).start()
if __name__ == "__main__":
runSpider()
how can I do that?
Ok so I created my own thread class :
import threading
class MyThread(threading.Thread):
"""Thread class with a stop() method. The thread itself has to check
regularly for the stopped() condition."""
def __init__(self):
super(MyThread, self).__init__()
self._stop = threading.Event()
def stop(self):
self._stop.set()
def stopped(self):
return self._stop.isSet()
OK so I will post here snippets of resultPresenter and crawler.
Here is the code of resultPresenter :
# configuration
DEBUG = False
DATABASE = database.__path__[0] + '/database.db'
app = Flask(__name__)
app.config.from_object(__name__)
app.config.from_envvar('CRAWLER_SETTINGS', silent=True)
def runPresenter():
url = "http://127.0.0.1:5000"
webbrowser.open_new(url)
app.run()
There are also two more methods here that I omitted - one of them connects to the database and the second method loads html template to display result. I repeat this until conditions are met or user stops the program ( what I am trying to implement ). There are also two other methods too - one get's initial link from the command line and the second valitated arguments - if arguments are invalid I won't run crawl() method.
Here is short version of crawler :
def crawl(initialLink, maxDepth):
#here I am setting initial values, lists etc
while not(depth >= maxDepth or len(pagesToCrawl) <= 0):
#this is the main loop that stops when certain depth is
#reached or there is nothing to crawl
#Here I am popping urls from url queue, parse them and
#insert interesting data into the database
parser.close()
sock.close()
dataManager.closeConnection()
Here is the init file which starts those modules in threads:
import ResultsPresenter, MyThread, time, threading
def runSpider():
MyThread.MyThread(target=initSpider).start()
MyThread.MyThread(target=ResultsPresenter.runPresenter).start()
def initSpider():
import Crawler
import database.__init__
import schemas.__init__
import static.__init__
import templates.__init__
link, maxDepth = Crawler.getInitialLink()
if link:
Crawler.crawl(link, maxDepth)
killall = False
if __name__ == "__main__":
global killall
runSpider()
while True:
try:
time.sleep(1)
except:
for thread in threading.enumerate():
thread.stop()
killall = True
raise
Killing threads is not a good idea, since (as you already said) they may be performing some crucial operations on database. Thus you may define global flag, which will signal threads that they should finish what they are doing and quit.
killall = False
import time
if __name__ == "__main__":
global killall
runSpider()
while True:
try:
time.sleep(1)
except:
/* send a signal to threads, for example: */
killall = True
raise
and in each thread you check in a similar loop whether killall variable is set to True. If it is close all activity and quit the thread.
EDIT
First of all: the Exception is rather obvious. You are passing target argument to __init__, but you didn't declare it in __init__. Do it like this:
class MyThread(threading.Thread):
def __init__(self, *args, **kwargs):
super(MyThread, self).__init__(*args, **kwargs)
self._stop = threading.Event()
And secondly: you are not using my code. As I said: set the flag and check it in thread. When I say "thread" I actually mean the handler, i.e. ResultsPresenter.runPresenter or initSpide. Show us the code of one of these and I'll try to show you how to handle stopping.
EDIT 2
Assuming that the code of crawl function is in the same file (if it is not, then you have to import killall variable), you can do something like this
def crawl(initialLink, maxDepth):
global killall
# Initialization.
while not killall and not(depth >= maxDepth or len(pagesToCrawl) <= 0):
# note the killall variable in while loop!
# the other code
parser.close()
sock.close()
dataManager.closeConnection()
So basically you just say: "Hey, thread, quit the loop now!". Optionally you can literally break a loop:
while not(depth >= maxDepth or len(pagesToCrawl) <= 0):
# some code
if killall:
break
Of course it will still take some time before it quits (has to finish the loop and close parser, socket, etc.), but it should quit safely. That's the idea at least.
Try this:
ps aux | grep python
copy the id of the process you want to kill and:
kill -3 <process_id>
And in your code (adapted from here):
import signal
import sys
def signal_handler(signal, frame):
print 'You killed me!'
sys.exit(0)
signal.signal(signal.SIGQUIT, signal_handler)
print 'Kill me now'
signal.pause()

Is this a thread-safe module, and actually how do you write one?

I'm actually working with a multi threaded program which involves lots of mysql operations, and basically it's quite a pain as you have to come up with a smart way to make all queries work. This got me thinking that how you can make a module thread safe.
Anyway I'm trying to ask my question in this way: say you need to constantly append new content to a txt file with lots of different threads, main.py would definitely work as below:
import threading
lock = threading.RLock()
def AppendStr(the_str):
write_thread = threading.Thread(target = RealAppending, args = (the_str, ))
write_thread.start()
def RealAppending(the_str):
lock.acquire()
the_file = open("test.txt", "a")
the_file.append(the_str)
the_file.close()
lock.release()
def WorkerThread(some_arg):
do stuff
AppendStr("whatever you like")
for counter in range(100):
new_thread = threading.Thread(target = WorkerThread, args = (some_arg, ))
new_thread.start()
Well, the thing is, if I'm trying to make the codes neat and easier to maintain, does it still work if I put the codes below into write.py:
import threading
lock = threading.RLock()
def AppendStr(the_str):
write_thread = threading.Thread(target = RealAppending, args = (the_str, ))
write_thread.start()
def RealAppending(the_str):
lock.acquire()
the_file = open("test.txt", "a")
the_file.append(the_str)
the_file.close()
lock.release()
and do it like this in main.py: (I don't truly understand how import works in python)
import write
def WorkerThread(some_arg):
do stuff
write.AppendStr("whatever you like")
for counter in range(100):
new_thread = threading.Thread(target = WorkerThread, args = (some_arg, ))
new_thread.start()
And also what if there are lots of other modules using write.py in a multi-threaded way, and then you import those modules in main.py and call different def from there. Would everything work out as expected? If not, what should I do to design a ultimate thread-safe module which can be used like this?
If you write.py is imported in lots of other modules, do they all share the same lock? What's the scope of variables in such modules?
this looks like a threadsave module but there are mistakes:
this is a better version:
def RealAppending(the_str):
lock.acquire()
try:
the_file = open("test.txt", "a")
try:
the_file.write(the_str) # write
finally:
the_file.close()
finally: # if an error occurs
lock.release()
because:
if an error occurs when writing to the file the lock must be released
the syntax above is for python 2.3 and lower
here is the improved version that does exactly the same:
def RealAppending(the_str):
with lock:
with open("test.txt", "a") as the_file:
the_file.write(the_str) # write
So yes, your module is threadsave.
There is some stuff that can be added to prevent users from using it in an nonthreadsave manner:
# write.py
__all__ = ['AppendStr'] # put all functions in except the ones others shall not see
# so id you do from write import * nothing else will be imported but what is __all__
But still you cannot know in which order the strings will be written to the file.

Categories

Resources