Learning the Python Thread Module - python

I am trying to learn more about the thread module. I've come up with a quick script but am getting an error when I run it. The docs show the format as:
thread.start_new_thread ( function, args[, kwargs] )
My method only has one argument.
#!/usr/bin/python
import ftplib
import thread
sites = ["ftp.openbsd.org","ftp.ucsb.edu","ubuntu.osuosl.org"]
def ftpconnect(target):
ftp = ftplib.FTP(target)
ftp.login()
print "File list from: %s" % target
files = ftp.dir()
print files
for i in sites:
thread.start_new_thread(ftpconnect(i))
The error I am seeing occurs after one iteration of the for loop:
Traceback (most recent call last): File "./ftpthread.py", line 16,
in
thread.start_new_thread(ftpconnect(i)) TypeError: start_new_thread expected at least 2 arguments, got 1
Any suggestions for this learning process would be appreciated. I also looked into using threading, but I am unable to import threading since its not install apparently and I haven't found any documentation for installing that module yet.
Thank You!
There error I get when trying to import threading on my Mac is:
>>> import threading
# threading.pyc matches threading.py
import threading # precompiled from threading.pyc
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "threading.py", line 7, in <module>
class WorkerThread(threading.Thread) :
AttributeError: 'module' object has no attribute 'Thread'

The thread.start_new_thread function is really low-level and doesn't give you a lot of control. Take a look at the threading module, more specifically the Thread class: http://docs.python.org/2/library/threading.html#thread-objects
You then want to replace the last 2 lines of your script with:
# This line should be at the top of your file, obviously :p
from threading import Thread
threads = []
for i in sites:
t = Thread(target=ftpconnect, args=[i])
threads.append(t)
t.start()
# Wait for all the threads to complete before exiting the program.
for t in threads:
t.join()
Your code was failing, by the way, because in your for loop, you were calling ftpconnect(i), waiting for it to complete, and then trying to use its return value (that is, None) to start a new thread, which obviously doesn't work.
In general, starting a thread is done by giving it a callable object (function/method -- you want the callable object, not the result of a call -- my_function, not my_function()), and optional arguments to give the callable object (in our case, [i] because ftpconnect takes one positional argument and you want it to be i), and then calling the Thread object's start method.

Now that you can import threading, start with best practices at once ;-)
import threading
threads = [threading.Thread(target=ftpconnect, args=(s,))
for s in sites]
for t in threads:
t.start()
for t in threads: # shut down cleanly
t.join()

What you want is to pass the function object and arguments to the function to thread.start_new_thread, not execute the function.
Like this:
for i in sites:
thread.start_new_thread(ftpconnect, (i,))

Related

Trying to make a one hour loop in python

from notifypy import Notify
import schedule
def remember_water():
notification = Notify()
notification.title = "XXX"
notification.message = "XXX"
notification.send()
schedule.every().hour.do(remember_water())
while True:
schedule.run_pending()
time.sleep(1)
Tried to make a notification to drink water every hour... getting a type error when execute, maybe there are some other modules that are better.
Did´nt get in touch with these modules ever, pls help :)
Running your code produces:
Traceback (most recent call last):
File "/home/lars/tmp/python/drink.py", line 11, in <module>
schedule.every().hour.do(remember_water())
File "/home/lars/.local/share/virtualenvs/lars-rUjSNCQn/lib/python3.10/site-packages/schedule/__init__.py", line 625, in do
self.job_func = functools.partial(job_func, *args, **kwargs)
TypeError: the first argument must be callable
Your problem is here:
schedule.every().hour.do(remember_water())
The argument to the .do method must be a callable (like a function), but you are calling your function here and passing the result. You want to pass the function itself:
schedule.every().hour.do(remember_water)

How to share string between processes

I have the hardest time to share a string between processes. I looked at the following Q1, Q2, Q3 but still fail in an actual example. If I understood the docs correctly, looking should be done automatically therefore there should be no race condition with setting/reading. Here's what I've got:
#!/usr/bin/env python3
import multiprocessing
import ctypes
import time
SLEEP = 0.1
CYCLES = 20
def child_process_fun(share):
for i in range(CYCLES):
time.sleep(SLEEP)
share.value = str(time.time())
if __name__ == '__main__':
share = multiprocessing.Value(ctypes.c_wchar_p, '')
process = multiprocessing.Process(target=child_process_fun, args=(share,))
process.start()
for i in range(CYCLES):
time.sleep(SLEEP)
print(share.value)
which produces:
Traceback (most recent call last):
File "test2.py", line 23, in <module>
print(share.value)
File "<string>", line 5, in getvalue
ValueError: character U+e479b7b0 is not in range [U+0000; U+10ffff]
Edit: ´id(share.value)´ is different for each process. However if I try a double as shared variable instead they are the same and it works like a charm. Could this be a python bug?

Failures with Python multiprocessing.Pool when maxtasksperchild is set

I am using Python 2.7.8 on Linux and am seeing a consistent failure in a program that uses multiprocessing.Pool(). When I set maxtasksperchild to None, then all is well, when testing across a variety of values for processes. But if I set maxtasksperchild=n (n>=1), then I invariably end with an uncaught exception. Here is the main block:
if __name__ == "__main__":
options = parse_cmdline()
subproc = Sub_process(options)
lock = multiprocessing.Lock()
[...]
pool = multiprocessing.Pool(processes=options.processes,
maxtasksperchild=options.maxtasksperchild)
imap_it = pool.imap(recluster_block, subproc.input_block_generator())
#import pdb; pdb.set_trace()
for count, result in enumerate(imap_it):
print "Count = {}".format(count)
if result is None or len(result) == 0:
# presumably error was reported
continue
(interval, block_id, num_hpcs, num_final, retlist) = result
for c in retlist:
subproc.output_cluster(c, lock)
print "About to close_outfile."
subproc.close_outfile()
print "About to close pool."
pool.close()
print "About to join pool."
pool.join()
For debugging I have added a print statement showing the number of times through the loop. Here are a couple runs:
$ $prog --processes=2 --maxtasksperchild=2
Count = 0
Count = 1
Count = 2
Traceback (most recent call last):
File "[...]reclustering.py", line 821, in <module>
for count, result in enumerate(imap_it):
File "[...]/lib/python2.7/multiprocessing/pool.py", line 659, in next
raise value
TypeError: 'int' object is not callable
$ $prog --processes=2 --maxtasksperchild=1
Count = 0
Count = 1
Traceback (most recent call last):
[same message as above]
If I do not set maxtasksperchild, the program runs to completion successfully. Also, if I uncomment the "import pdb; pdb.set_trace()" line and enter the debugger, then the problem does not appear (Heisenbug). So, am I doing something wrong in the code here? Are there conditions on the code that generates the input (subproc.input_block_generator) or the code that processes it (recluster_block), that are known to cause issues like this? Thanks!
maxtasksperchild causes multiprocessing to respawn child processes. The idea is to get rid of any cruft that is building up. The problem is, you can get new cruft from the parent. When the child respawns, it gets the current state of the parent process, which is different than the orignal spawn. You are doing your work in the script's global namespace, so you are changing the environment the child will see quite a bit. Specifically, you use a variable called 'count' that masks a previous 'from itertools import count' statement.
To fix this:
use namespaces (itertools.count, like you said in the comment) to reduce name collisions
do your work in a function so that local variables aren't propagated to the child.

Python multiprocessing, ValueError: I/O operation on closed file

I'm having a problem with the Python multiprocessing package. Below is a simple example code that illustrates my problem.
import multiprocessing as mp
import time
def test_file(f):
f.write("Testing...\n")
print f.name
return None
if __name__ == "__main__":
f = open("test.txt", 'w')
proc = mp.Process(target=test_file, args=[f])
proc.start()
proc.join()
When I run this, I get the following error.
Process Process-1:
Traceback (most recent call last):
File "C:\Python27\lib\multiprocessing\process.py", line 258, in _bootstrap
self.run()
File "C:\Python27\lib\multiprocessing\process.py", line 114, in run
self.target(*self._args, **self._kwargs)
File "C:\Users\Ray\Google Drive\Programming\Python\tests\follow_test.py", line 24, in test_file
f.write("Testing...\n")
ValueError: I/O operation on closed file
Press any key to continue . . .
It seems that somehow the file handle is 'lost' during the creation of the new process. Could someone please explain what's going on?
I had similar issues in the past. Not sure whether it is done within the multiprocessing module or whether open sets the close-on-exec flag by default but I know for sure that file handles opened in the main process are closed in the multiprocessing children.
The obvious work around is to pass the filename as a parameter to the child process' init function and open it once within each child (if using a pool), or to pass it as a parameter to the target function and open/close on each invocation. The former requires the use of a global to store the file handle (not a good thing) - unless someone can show me how to avoid that :) - and the latter can incur a performance hit (but can be used with multiprocessing.Process directly).
Example of the former:
filehandle = None
def child_init(filename):
global filehandle
filehandle = open(filename,...)
../..
def child_target(args):
../..
if __name__ == '__main__':
# some code which defines filename
proc = multiprocessing.Pool(processes=1,initializer=child_init,initargs=[filename])
proc.apply(child_target,args)

Python multiprocessing: synchronizing file-like object

I'm trying to make a file like object which is meant to be assigned to sys.stdout/sys.stderr during testing to provide deterministic output. It's not meant to be fast, just reliable. What I have so far almost works, but I need some help getting rid of the last few edge-case errors.
Here is my current implementation.
try:
from cStringIO import StringIO
except ImportError:
from StringIO import StringIO
from os import getpid
class MultiProcessFile(object):
"""
helper for testing multiprocessing
multiprocessing poses a problem for doctests, since the strategy
of replacing sys.stdout/stderr with file-like objects then
inspecting the results won't work: the child processes will
write to the objects, but the data will not be reflected
in the parent doctest-ing process.
The solution is to create file-like objects which will interact with
multiprocessing in a more desirable way.
All processes can write to this object, but only the creator can read.
This allows the testing system to see a unified picture of I/O.
"""
def __init__(self):
# per advice at:
# http://docs.python.org/library/multiprocessing.html#all-platforms
from multiprocessing import Queue
self.__master = getpid()
self.__queue = Queue()
self.__buffer = StringIO()
self.softspace = 0
def buffer(self):
if getpid() != self.__master:
return
from Queue import Empty
from collections import defaultdict
cache = defaultdict(str)
while True:
try:
pid, data = self.__queue.get_nowait()
except Empty:
break
cache[pid] += data
for pid in sorted(cache):
self.__buffer.write( '%s wrote: %r\n' % (pid, cache[pid]) )
def write(self, data):
self.__queue.put((getpid(), data))
def __iter__(self):
"getattr doesn't work for iter()"
self.buffer()
return self.__buffer
def getvalue(self):
self.buffer()
return self.__buffer.getvalue()
def flush(self):
"meaningless"
pass
... and a quick test script:
#!/usr/bin/python2.6
from multiprocessing import Process
from mpfile import MultiProcessFile
def printer(msg):
print msg
processes = []
for i in range(20):
processes.append( Process(target=printer, args=(i,), name='printer') )
print 'START'
import sys
buffer = MultiProcessFile()
sys.stdout = buffer
for p in processes:
p.start()
for p in processes:
p.join()
for i in range(20):
print i,
print
sys.stdout = sys.__stdout__
sys.stderr = sys.__stderr__
print
print 'DONE'
print
buffer.buffer()
print buffer.getvalue()
This works perfectly 95% of the time, but it has three edge-case problems. I have to run the test script in a fast while-loop to reproduce these.
3% of the time, the parent process output isn't completely reflected. I assume this is because the data is being consumed before the Queue-flushing thread can catch up. I haven't though of a way to wait for the thread without deadlocking.
.5% of the time, there's a traceback from the multiprocess.Queue implementation
.01% of the time, the PIDs wrap around, and so sorting by PID gives the wrong ordering.
In the very worst case (odds: one in 70 million), the output would look like this:
START
DONE
302 wrote: '19\n'
32731 wrote: '0 1 2 3 4 5 6 7 8 '
32732 wrote: '0\n'
32734 wrote: '1\n'
32735 wrote: '2\n'
32736 wrote: '3\n'
32737 wrote: '4\n'
32738 wrote: '5\n'
32743 wrote: '6\n'
32744 wrote: '7\n'
32745 wrote: '8\n'
32749 wrote: '9\n'
32751 wrote: '10\n'
32752 wrote: '11\n'
32753 wrote: '12\n'
32754 wrote: '13\n'
32756 wrote: '14\n'
32757 wrote: '15\n'
32759 wrote: '16\n'
32760 wrote: '17\n'
32761 wrote: '18\n'
Exception in thread QueueFeederThread (most likely raised during interpreter shutdown):
Traceback (most recent call last):
File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner
File "/usr/lib/python2.6/threading.py", line 484, in run
File "/usr/lib/python2.6/multiprocessing/queues.py", line 233, in _feed
<type 'exceptions.TypeError'>: 'NoneType' object is not callable
In python2.7 the exception is slightly different:
Exception in thread QueueFeederThread (most likely raised during interpreter shutdown):
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
File "/usr/lib/python2.7/threading.py", line 505, in run
File "/usr/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
<type 'exceptions.IOError'>: [Errno 32] Broken pipe
How do I get rid of these edge cases?
The solution came in two parts. I've successfully run the test program 200 thousand times without any change in output.
The easy part was to use multiprocessing.current_process()._identity to sort the messages. This is not a part of the published API, but it is a unique, deterministic identifier of each process. This fixed the problem with PIDs wrapping around and giving a bad ordering of output.
The other part of the solution was to use multiprocessing.Manager().Queue() rather than the multiprocessing.Queue. This fixes problem #2 above because the manager lives in a separate Process, and so avoids some of the bad special cases when using a Queue from the owning process. #3 is fixed because the Queue is fully exhausted and the feeder thread dies naturally before python starts shutting down and closes stdin.
I have encountered far fewer multiprocessing bugs with Python 2.7 than with Python 2.6. Having said this, the solution I used to avoid the "Exception in thread QueueFeederThread" problem is to sleep momentarily, possibly for 0.01s, in each process in which the the Queue is used. It is true that using sleep is not desirable or even reliable, but the specified duration was observed to work sufficiently well in practice for me. You can also try 0.1s.

Categories

Resources