Run long process continously using Tkinter (Python 2.7)

Run long process continously using Tkinter (Python 2.7) - python

After some time trying to figure out how the Tkinter library works I have run into a problem. The script i wrote uses multiprocessing, and because the script needs to be as fast as possible I minimized the amount of traffic between the processes. This means that it takes about a minute to complete an enormous amount of tasks.
(If this task gets aborted halfway through, the used files will get corrupted).
The problem is that i want a stop button in my GUI to stop the script the proper way. After some research i haven't made any progress in finding a solution, so maybe some of you could help. I basically need a way to tell the script halfway through a task that it has to stop, after which the script will continue until the task is finished.
Edit:
The way my script is set up:
(This is missing the Tkinter part, because i don't know the solution to it yet).
from multiprocessing import Pool
def Setup():
#defines all paths of the files that are edited (and a whole lot more)
def Calculation(x, y, Primes):
#takes an x and y value, calculates the value of that coordinate and determines
#if the value is prime. Returns True of False, and the calculated value.
def Quadrant(List):
#takes a huge list of coordinates that have to be calculated. These
#coordinates (x and y) are passed to the 'Calculation' function, one by one.
#Returns all the calculated coordinates and if they are prime or not (boolean)
if __name__ == "__main__":
Filenames = Setup()
Process = Pool(4)
while True:
#Loop the main bit of the code to keep expanding the generated image
Input = [List of all coordinates, split into 4 quadrants (seperate lists) evenly]
Output = Process.map(Quadrant, Input)
#Combine all data and update list of primes
#Detects if escape is pressed, stops if true.
I am basically looking for a way to stop the while loop above, or an alternative to this loop.

I basically meant that the task has to stop, without aborting it suddenly. The script has to wait untill it's task is finished, and then look if a button is pressed to decide if it has to continue or not
We have no code from you to respond to, so if you are using a while() (note that you can also just issue a return from the function if some condition is True/False).
import time
from multiprocessing import Process, Manager
def test_f(test_d):
""" frist process to run
exit this process when dictionary's 'QUIT' == True
"""
while not test_d["QUIT"]:
print " test_f", test_d["QUIT"]
time.sleep(1.0)
def test_f2(name):
""" second process to run. Runs until the for loop exits
"""
for j in range(0, 10):
print name, j
time.sleep(0.5)
print "second process finished"
if __name__ == '__main__':
##--- create a dictionary via Manager
manager = Manager()
test_d = manager.dict()
test_d["QUIT"] = False
##--- start first process and send dictionary
p = Process(target=test_f, args=(test_d,))
p.start()
##--- start second process
p2 = Process(target=test_f2, args=('P2',))
p2.start()
##--- sleep 2 seconds and then change dictionary
## to exit first process
time.sleep(2.0)
print "\nterminate first process"
test_d["QUIT"] = True
print "test_d changed"
print "data from first process", test_d
##--- may not be necessary, but I always terminate to be sure
time.sleep(5.0)
p.terminate()
p2.terminate()
""" Thanks Doug Hellmann
Note: It is important to join() the process after terminating it.
in order to give the background machinery time to update the.
status of the object to reflect the termination
"""
p.join()
p2.join()

Related

Execute threads in certain order

When we launch threads, is it known for SURE which thread will be executed first or is it something not predictable ?
I say this because square is always called first and then cube.
import threading
def print_cube(num):
# function to print cube of given num
print("Cube: {}".format(num * num * num))
def print_square(num):
# function to print square of given num
print("Square: {}".format(num * num))
if __name__ == "__main__":
# creating thread
cuadrado = threading.Thread(target=print_square, args=(10,))
cubo = threading.Thread(target=print_cube, args=(10,))
# starting thread 1
cuadrado.start()
# starting thread 2
cubo.start()
print("Done!")
I would like to understand the method Threading.start()
Does the order of calling Threading start() matter?
But, if I sleep the yarns with the same time, then it is random order
import threading
import time
def print_cube(num):
# function to print cube of given num
time.sleep(3)
print("Cube: {}".format(num * num * num))
def print_square(num):
# function to print square of given num
time.sleep(3)
print("Square: {}".format(num * num))
if __name__ == "__main__":
# creating thread
cuadrado = threading.Thread(target=print_square, args=(10,))
cubo = threading.Thread(target=print_cube, args=(10,))
# starting thread 1
cuadrado.start()
# starting thread 2
cubo.start()
# both threads completely executed
print("Done!")

I would like to understand the method Threading.start()
A Python threading.Thread object is not the same thing as a thread. A thread is an object in the operating system—separate from your code. I like to think of a thread as an agent who executes your target function.
The purpose of the Python Thread class is, to provide a platform-independent interface to the various different thread APIs of various different operating systems. One peculiarity of Python's Thread is that it does not actually create the operating system thread until you call its start() method. That's what start() does: It creates the underlying OS thread.
Does the order of calling Threading start() matter?
Depends what you mean. Your program definitely always starts the cuadrado thread before it starts the cubo thread, but the whole point of threads is to provide a means to achieve concurrency in your program; and what "concurrency" means is that the things happening in different threads are not required to happen in any definite order. By calling print_cube() and print_square() in different threads, you effectively are telling Python (and the OS) that you don't care which one prints first.
Maybe print_square() will always be called first on your computer. Maybe print_cube() will always be called first on somebody else's computer. Maybe it will be unpredictable which one goes first on a third computer.
Sounds a little chaotic, but the reason why we like concurrency is that it gives the OS and the Python system more freedom to get things done in the most efficient order. E.g., if one thread is waiting for some network packet to arrive, some other thread can be allowed to do some useful work. So long as the "useful work" doesn't need the packet that the other thread was waiting for, that's a Good Thing.

Python Multiprocessing: The child process finished but did not join

I try to implement a multiprocessing code for generating some dictionary interested.
Here is my logic:
from multiprocessing import Manager, Queue, Process
input_list = Queue()
for x in my_input: # my_input is my input data
input_list.put(x)
output = Manager().dict()
def job():
while input_list.qsize()>0:
x = input_list.get()
result = my_func(x) # Do something here
output[x] = result
def monitor():
while True:
if input_list.qsize() > 0:
time.sleep(1)
print("Item List is Empty")
print("Does all the result being save?", len(output.keys()) == len(my_input))
job_list = [Process(target=monitor)]
for _ in range(num_of_worker):
job_list.append(Process(target=job))
for j in job_list:
j.start()
for j in job_list:
j.join()
print("The script is finished")
The logic of my code is quite simple.
Initialize a queue and put my input in.
Define two functions, job (doing something and save it to a dict) and monitor (print when everything inside queue is being processed and print how many results are being saved).
Then standard multiprocessing start and join.
The output I am getting:
Item List is Empty
Does all the result being save? True
...
Some child process did not finish and not yet join.
The script is stuck here and did not print "The script is finished".
My script will get stuck at the join statement, despite the monitor telling me that everything is finished (by checking number of items left in input_list and number of results stored in output).
Moreover, this error is not replicable. If I see my script stuck for more than 5 minutes, I will terminate it manually and restart it. I found that the script could finish properly (like 3 out of 10 times).
What could be happening?
Remark:
Since I suspect the error is some child process did not join, I tried something with Event. When the monitor found that the input_list is empty and output is completely filled, it will kill all the process. But the script is also stuck at the event triggering. (And same as above, the code does not get stuck every time, it works 3 out of 10 times).

#Homer512 comments gives me insight on wher is the mistake in the code.
switch from
def job():
while input_list.qsize>0:
x = input_list.get()
...
to
def job():
while input_list.qsize>0:
try:
x = input_list.get(True,5)
...
except Empty:
return 0
The reason for my script stuck at join because when input_list got only 1 element left, it trigger the while statement of job but only one process can get something from the queue. The other process will just stuck at get without suitable timeout.

How can I remove duplicates while multiprocesscing?

I am very new to multiprocessing and I am only using it to find an image on the screen, the problem is the code produces duplicates which slow it down I have tried using a "not in" statement to only place proc into processes if it is not already in it, but this did not work. Any help or optimization would be welcome I have no idea what I am doing as this is just a personal project to learn multiprocessing.
from multiprocessing.context import Process
import pyautogui as auto
screenWidth, screenHeight = auto.size()
currentMouseX, currentMouseY = auto.position()
def bot(aim):
while True:
for aim in auto.locateAllOnScreen(r"dot.png", confidence=0.9795):
auto.click(aim)
print(aim)
def bot2(aim):
while True:
for aim in auto.locateAllOnScreen(r"dot.png", confidence=0.9795):
auto.click(aim)
print(aim)
def bot3(aim):
while True:
for aim in auto.locateAllOnScreen(r"dot.png", confidence=0.9795):
auto.click(aim)
print(aim)
if __name__ == "__main__":
processes = []
for t in auto.locateAllOnScreen(r"dot.png", confidence=0.9795):
proc = Process(target=bot, args=(t,))
processes.append(proc)
proc.start()
for z in auto.locateAllOnScreen(r"dot.png", confidence=0.9795):
proc = Process(target=bot2, args=(z,))
processes.append(proc)
proc.start()
for x in auto.locateAllOnScreen(r"dot.png", confidence=0.9795):
proc = Process(target=bot3, args=(x,))
processes.append(proc)
proc.start()
for p in processes:
p.join()

Unless my eyes deceive me, you have three functions bot, bot2 and bot3 that appear to be identical. You have to ask yourself why you need three identical functions that differ only in a name. I certainly don't have an answer.
Presumanly auto.locateAllOnScreen returns the locations of all occurrences of "dot.png" on your screen and you would like to print out information on each occurrence in parallel. Your main process is iterating all of these occurrences 3 times and for each occurrence staring a new process. Then each process is totally ignoring the occurrence argument, aim, that is being passed to it and instead iterating all the occurrences itself. So if there were 5 occurrences on the screen you would be creating 3 * 5 = 15 processes and each process would be printing 5 lines of output (one for each occurrence) for a total of 15 * 5 = 75 lines of output when in reality you should only be getting 5 lines of output if you were doing this correctly (I am ignoring that there is a while True: loop where all the output is then repeated). You are also potentially creating more processes than the number of CPU cores you have on your computer and so they would not truly be running in parallel on the assumption that the bot function(s) are CPU-intensive, which may not be the case.
I am not sure whether this problem is a candidate for multiprocessing since there is a fair amount of overhead just to create processes and to pass arguments and results to and from one process to another. So you might not gain any improvement in performance. But if the idea is to see how you would solve this using multiprocessing, then I would suggest that if you do not know in advance how many elements the call to auto.locateAllOnScreen might return and recognizing that there is no point in creating more processes than the number of processors you actually have, then it is probably best to use a multiprocessing pool of fixed size.
What you want to do is have your worker function bot (and you only need one of these) be passed a single occurrence that it will process. You then create a pool of processes equal to the smaller value of the size of the number of CPUs you have and the number of tasks you actually have to submit. You then submit to the pool a number of tasks where each task specifies the worker function to perform that task and the argument(s) it requires.
In the code below I have removed from function bot the while True: loop that never terminates. You can put it back in if you want.
from multiprocessing import Pool, cpu_count
import pyautogui as auto
def bot(aim):
# do the work for the single occurrence of aim
auto.click(aim)
print(aim)
if __name__ == "__main__":
aims = list(auto.locateAllOnScreen(r"dot.png", confidence=0.9795))
# choose appropriate pool size:
pool = Pool(min(len(aims), cpu_count()))
# bot will be called for each element returned by call to auto.locateAllOnScreen
pool.map(bot, aims)

how to start multiple jobs in python and communicate with the main job

I am a novice user of python multithreading/multiprocessing, so please bear with me.
I would like to solve the following problem and I need some help/suggestions in this regard.
Let me describe in brief:
I would like to start a python script which does something in the
beginning sequentially.
After the sequential part is over, I would like to start some jobs
in parallel.
Assume that there are four parallel jobs I want to start.
I would like to also start these jobs on some other machines using "lsf" on the computing cluster.My initial script is also running on a ” lsf”
machine.
The four jobs which I started on four machines will perform two logical steps A and B---one after the other.
When a job started initially, they start with logical step A and finish it.
After every job (4jobs) has finished the Step A; they should notify the first job which started these. In other words, the main job which started is waiting for the confirmation from these four jobs.
Once the main job receives confirmation from these four jobs; it should notify all the four jobs to do the logical step B.
Logical step B will automatically terminate the jobs after finishing the task.
Main job is waiting for the all jobs to finish and later on it should continue with the sequential part.
An example scenario would be:
Python script running on an “lsf” machine in the cluster starts four "tcl shells" on four “lsf” machines.
In each tcl shell, a script is sourced to do the logical step A.
Once the step A is done, somehow they should inform the python script which is waiting for the acknowledgement.
Once the acknowledgement is received from all the four, python script inform them to do the logical step B.
Logical step B is also a script which is sourced in their tcl shell; this script will also close the tcl shell at the end.
Meanwhile, python script is waiting for all the four jobs to finish.
After all four jobs are finished; it should continue with the sequential part again and finish later on.
Here are my questions:
I am confused about---should I use multithreading/multiprocessing. Which one suits better?
In fact what is the difference between these two? I read about these but I wasn't able to conclude.
What is python GIL? I also read somewhere at any one point in time only one thread will execute.
I need some explanation here. It gives me an impression that I can't use threads.
Any suggestions on how could I solve my problem systematically and in a more pythonic way.
I am looking for some verbal step by step explanation and some pointers to read on each step.
Once the concepts are clear, I would like to code it myself.
Thanks in advance.

In addition to roganjosh's answer, I would include some signaling to start the step B after A has finished:
import multiprocessing as mp
import time
import random
import sys
def func_A(process_number, queue, proceed):
print "Process {} has started been created".format(process_number)
print "Process {} has ended step A".format(process_number)
sys.stdout.flush()
queue.put((process_number, "done"))
proceed.wait() #wait for the signal to do the second part
print "Process {} has ended step B".format(process_number)
sys.stdout.flush()
def multiproc_master():
queue = mp.Queue()
proceed = mp.Event()
processes = [mp.Process(target=func_A, args=(x, queue)) for x in range(4)]
for p in processes:
p.start()
#block = True waits until there is something available
results = [queue.get(block=True) for p in processes]
proceed.set() #set continue-flag
for p in processes: #wait for all to finish (also in windows)
p.join()
return results
if __name__ == '__main__':
split_jobs = multiproc_master()
print split_jobs

1) From the options you listed in your question, you should probably use multiprocessing in this case to leverage multiple CPU cores and compute things in parallel.
2) Going further from point 1: the Global Interpreter Lock (GIL) means that only one thread can actually execute code at any one time.
A simple example for multithreading that pops up often here is having a prompt for user input for, say, an answer to a maths problem. In the background, they want a timer to keep incrementing at one second intervals to register how long the person took to respond. Without multithreading, the program would block whilst waiting for user input and the counter would not increment. In this case, you could have the counter and the input prompt run on different threads so that they appear to be running at the same time. In reality, both threads are sharing the same CPU resource and are constantly passing an object backwards and forwards (the GIL) to grant them individual access to the CPU. This is hopeless if you want to properly process things in parallel. (Note: In reality, you'd just record the time before and after the prompt and calculate the difference rather than bothering with threads.)
3) I have made a really simple example using multiprocessing. In this case, I spawn 4 processes that compute the sum of squares for a randomly chosen range. These processes do not have a shared GIL and therefore execute independently unlike multithreading. In this example, you can see that all processes start and end at slightly different times, but we can aggregate the results of the processes into a single queue object. The parent process will wait for all 4 child processes to return their computations before moving on. You could then repeat the code for func_B (not included in the code).
import multiprocessing as mp
import time
import random
import sys
def func_A(process_number, queue):
start = time.time()
print "Process {} has started at {}".format(process_number, start)
sys.stdout.flush()
my_calc = sum([x**2 for x in xrange(random.randint(1000000, 3000000))])
end = time.time()
print "Process {} has ended at {}".format(process_number, end)
sys.stdout.flush()
queue.put((process_number, my_calc))
def multiproc_master():
queue = mp.Queue()
processes = [mp.Process(target=func_A, args=(x, queue)) for x in xrange(4)]
for p in processes:
p.start()
# Unhash the below if you run on Linux (Windows and Linux treat multiprocessing
# differently as Windows lacks os.fork())
#for p in processes:
# p.join()
results = [queue.get() for p in processes]
return results
if __name__ == '__main__':
split_jobs = multiproc_master()
print split_jobs

Python: run one function until another function finishes

I have two functions, draw_ascii_spinner and findCluster(companyid).
I would like to:
Run findCluster(companyid) in the backround and while its processing....
Run draw_ascii_spinner until findCluster(companyid) finishes
How do I begin to try to solve for this (Python 2.7)?

Use threads:
import threading, time
def wrapper(func, args, res):
res.append(func(*args))
res = []
t = threading.Thread(target=wrapper, args=(findcluster, (companyid,), res))
t.start()
while t.is_alive():
# print next iteration of ASCII spinner
t.join(0.2)
print res[0]

You can use multiprocessing. Or, if findCluster(companyid) has sensible stopping points, you can turn it into a generator along with draw_ascii_spinner, to do something like this:
for tick in findCluster(companyid):
ascii_spinner.next()

Generally, you will use Threads. Here is a simplistic approach which assumes, that there are only two threads: 1) the main thread executing a task, 2) the spinner thread:
#!/usr/bin/env python
import time
import thread
def spinner():
while True:
print '.'
time.sleep(1)
def task():
time.sleep(5)
if __name__ == '__main__':
thread.start_new_thread(spinner, ())
# as soon as task finishes (and so the program)
# spinner will be gone as well
task()

This can be done with threads. FindCluster runs in a separate thread and when done, it can simply signal another thread that is polling for a reply.

You'll want to do some research on threading, the general form is going to be this
Create a new thread for findCluster and create some way for the program to know the method is running - simplest in Python is just a global boolean
Run draw_ascii_spinner in a while loop conditioned on whether it is still running, you'll probably want to have this thread sleep for a short period of time between iterations
Here's a short tutorial in Python - http://linuxgazette.net/107/pai.html

Run findCluster() in a thread (the Threading module makes this very easy), and then draw_ascii_spinner until some condition is met.
Instead of using sleep() to set the pace of the spinner, you can wait on the thread's wait() with a timeout.

It is possible to have a working example? I am new in Python. I have 6 tasks to run in one python program. These 6 tasks should work in coordinations, meaning that one should start when another finishes. I saw the answers , but I couldn't adopted the codes you shared to my program.
I used "time.sleep" but I know that it is not good because I cannot know how much time it takes each time.
# Sending commands
for i in range(0,len(cmdList)): # port Sending commands
cmd = cmdList[i]
cmdFull = convert(cmd)
port.write(cmd.encode('ascii'))
# s = port.read(10)
print(cmd)
# Terminate the command + close serial port
port.write(cmdFull.encode('ascii'))
print('Termination')
port.close()
# time.sleep(1*60)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.