I have two threads, Reader and Writer.
The Writer gets data from the network and sends it then over a socket to some executable. When this is done the writer should block up to 70 seconds which I specify with a Event.wait(askrate).
This should give the executable enough time to compute the result and then submit the output. If the computation is finished I used Event.set() to release the lock on the Writer
thread so that it can read the next data that is forwared to the executeable and so on.
The problem that I have is, that the Writer thread still keeps reading data while the Reader thread is waiting for the result coming through the serial interface.
Anyone an idea why this blocking meachnism is not proberly working between these two threads?
askrate = 70
s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
s.connect("/tmp/demo_socket")
class Reader(Thread):
def __init__(self):
Thread.__init__(self)
self.daemon = True
def run(self):
while True:
nonce = s.recv(4)
if len(nonce) == 4:
submitter = Submitter(writer.block, nonce)
#submit result and release thread lock in Writer class
golden.set()
class Writer(Thread):
def __init__(self):
Thread.__init__(self)
self.daemon = True
def run(self):
while True:
work = bc.getwork()
self.block = work['data']
self.midstate = work['midstate']
payload = self.midstate.decode('hex') + self.block.decode('hex')
s.send(payload)
result = golden.wait(askrate)
if result:
golden.clear()
golden = Event()
reader = Reader()
writer = Writer()
reader.start()
writer.start()
I'm pretty sure that it's not how you are supposed to use AF_UNIX sockets. You are supposed to open the pseudo-file twice (from the same of different processes); then writes to one side appear as reads on the other side, and vice-versa. In your code, you open the pseudo-file only once. Any write is probably blocking, waiting for another process to open the pseudo-file a second time.
In your case, you should use socket.socketpair(), which returns you two sockets at once, playing the role of the two ends. Use one end in each thread.
Related
I have written a program that I am using to benchmark a mongodb database performing under multithreaded bulk write conditions.
The problem is that the program hangs and does not finish executing.
I am quite sure that the problem is due to writing 530838 records to the database and using 10 threads to bulk write 50 records at a time. This leaves a modulo value of 38 records, however the run method fetches 50 records from the queue so the process hangs when 530800 records have been written and never writes the final 38 records as the following code never finishes executing
for object in range(50):
objects.append(self.queue.get())
I would like the program to write 50 records at a time until fewer than 50 remain at which point it should write the remaining records in the queue and then exit the thread when no records remain in the queue.
Thanks in advance :)
import threading
import Queue
import json
from pymongo import MongoClient, InsertOne
import datetime
#Set the number of threads
n_thread = 10
#Create the queue
queue = Queue.Queue()
#Connect to the database
client = MongoClient("mongodb://mydatabase.com")
db = client.threads
class ThreadClass(threading.Thread):
def __init__(self, queue):
threading.Thread.__init__(self)
#Assign thread working with queue
self.queue = queue
def run(self):
while True:
objects = []
#Get next 50 objects from queue
for object in range(50):
objects.append(self.queue.get())
#Insert the queued objects into the database
db.threads.insert_many(objects)
#signals to queue job is done
self.queue.task_done()
#Create number of processes
threads = []
for i in range(n_thread):
t = ThreadClass(queue)
t.setDaemon(True)
#Start thread
t.start()
#Start timer
starttime = datetime.datetime.now()
#Read json object by object
content = json.load(open("data.txt","r"))
for jsonobj in content:
#Put object into queue
queue.put(jsonobj)
#wait on the queue until everything has been processed
queue.join()
for t in threads:
t.join()
#Print the total execution time
endtime = datetime.datetime.now()
duration = endtime-starttime
print(divmod(duration.days * 86400 + duration.seconds, 60))
From the docs on Queue.get you can see that the default settings are block=True and timeout=None, which results in blocked waiting on an empty queue to have a next item that can be taken.
You could use get_nowait or get(False) to ensure you're not blocking. If you want the blocking to be conditional on whether the queue has 50 items, whether it is empty, or other conditions, you can use Queue.empty and Queue.qsize, but note that they do not provide race-condition-proof guarantees of non-blocking behavior... they would merely be heuristics for whether to use block=False with get.
Something like this:
def run(self):
while True:
objects = []
#Get next 50 objects from queue
block = self.queue.qsize >= 50
for i in range(50):
try:
item = self.queue.get(block=block)
except Queue.Empty:
break
objects.append(item)
#Insert the queued objects into the database
db.threads.insert_many(objects)
#signals to queue job is done
self.queue.task_done()
Another approach would be to set timeout and use a try ... except block to catch any Empty exceptions that are raised. This has the advantage that you can decide how long to wait, rather than heuristically guessing when to immediately return, but they are similar.
Also note that I changed your loop variable from object to i ... you should most likely avoid having your loop variable ghost the global object class.
Let's say I have thread playing a sound every 10 seconds.
I also have a settings file, containing a certain amount of thread settings (id, sound to play). My main script unpickles every thread setting, then starts as many threads with their own settings.
My question is, how do I kill one of those threads, and only one ? Not the first or the last, I'd want to have the ability to chose which one. Is it even possible ?
I thought about giving each thread a reference, but since I take them from the settings file, I don't know how many threads I'll have to start.
Sound-playing thread:
class MyThread(Thread):
def __init__(self, sound):
Thread.__init__(self)
self.sound = sound
self.stopped = False
def run(self):
while not self.stopped:
#Totally made up function, as an example
playsound(self.sound)
time.sleep(10)
def stop(self):
self.stopped = True
Main script:
threads = []
with open('settings', 'rb') as f:
while True:
try:
threads.append(pickle.load(f))
continue
except EOFError:
break
for i in range(threads)
#Model of threads: [id, sound], [id, sound], [id, sound]...
MyThread(i[1]).start()
(New to Python and OO - I apologize in advance if I'm being stupid here)
I'm trying to define a Python 3 class such that when an instance is created two subprocesses are also created. These subprocesses do some work in the background (sending and listening for UDP packets). The subprocesses also need to communicate with each other and with the instance (updating instance attributes based on what is received from UDP, among other things).
I am creating my subprocesses with os.fork because I don't understand how to use the subprocess module to send multiple file descriptors to child processes - maybe this is part of my problem.
The problem I am running into is how to kill the child processes when the instance is destroyed. My understanding is I shouldn't use destructors in Python because stuff should get cleaned up and garbage collected automatically by Python. In any case, the following code leaves the children running after it exits.
What is the right approach here?
import os
from time import sleep
class A:
def __init__(self):
sfp, pts = os.pipe() # senderFromParent, parentToSender
pfs, stp = os.pipe() # parentFromSender, senderToParent
pfl, ltp = os.pipe() # parentFromListener, listenerToParent
sfl, lts = os.pipe() # senderFromListener, listenerToSender
pid = os.fork()
if pid:
# parent
os.close(sfp)
os.close(stp)
os.close(lts)
os.close(ltp)
os.close(sfl)
self.pts = os.fdopen(pts, 'w') # allow creator of A inst to
self.pfs = os.fdopen(pfs, 'r') # send and receive messages
self.pfl = os.fdopen(pfl, 'r') # to/from sender and
else: # listener processes
# sender or listener
os.close(pts)
os.close(pfs)
os.close(pfl)
pid = os.fork()
if pid:
# sender
os.close(ltp)
os.close(lts)
sender(self, sfp, stp, sfl)
else:
# listener
os.close(stp)
os.close(sfp)
os.close(sfl)
listener(self, ltp, lts)
def sender(a, sfp, stp, sfl):
sfp = os.fdopen(sfp, 'r') # receive messages from parent
stp = os.fdopen(stp, 'w') # send messages to parent
sfl = os.fdopen(sfl, 'r') # received messages from listener
while True:
# send UDP packets based on messages from parent and process
# responses from listener (some responses passed back to parent)
print("Sender alive")
sleep(1)
def listener(a, ltp, lts):
ltp = os.fdopen(ltp, 'w') # send messages to parent
lts = os.fdopen(lts, 'w') # send messages to sender
while True:
# listen for and process incoming UDP packets, sending some
# to sender and some to parent
print("Listener alive")
sleep(1)
a = A()
Running the above produces:
Sender alive
Listener alive
Sender alive
Listener alive
...
Actually, you should use destructors. Python objects have a __del__ method, which is called just before the object is garbage-collected.
In your case, you should define
def __del__(self):
...
within your class A that sends the appropriate kill signals to your child processes. Don't forget to store the child PIDs in your parent process, of course.
As suggested here, you can create a child process using multiprocessing module with flag daemon=True.
Example:
from multiprocessing import Process
p = Process(target=f, args=('bob',))
p.daemon = True
p.start()
There's no point trying to reinvent the wheel. subprocess does all you want and more, though multiprocessing will simply the process, so we'll use that.
You can use multiprocessing.Pipe to create connections and can send messages back and forth between a pair of processes. You can make a pipe "duplex", so both ends can send and receive if that's what you need. You can use multiprocessing.Manager to create a shared Namespace between processes (sharing a state between listener, sender and parent). There is a warning with using multiprocessing.list, multiprocessing.dict or multiprocessing.Namespace. Any mutable object assigned to them will not see changes made to that object until it is reassigned to the managed object.
eg.
namespace.attr = {}
# change below not cascaded to other processes
namespace.attr["key"] = "value"
# force change to other processes
namespace.attr = namespace.attr
If you need to have more than one process write to the same attribute then you will need to use synchronisation to prevent concurrent modification by one processes wiping out changes made by another process.
Example code:
from multiprocessing import Process, Pipe, Manager
class Reader:
def __init__(self, writer_conn, namespace):
self.writer_conn = writer_conn
self.namespace = namespace
def read(self):
self.namespace.msgs_recv = 0
with self.writer_conn:
try:
while True:
obj = self.writer_conn.recv()
self.namespace.msgs_recv += 1
print("Reader got:", repr(obj))
except EOFError:
print("Reader has no more data to receive")
class Writer:
def __init__(self, reader_conn, namespace):
self.reader_conn = reader_conn
self.namespace = namespace
def write(self, msgs):
self.namespace.msgs_sent = 0
with self.reader_conn:
for msg in msgs:
self.reader_conn.send(msg)
self.namespace.msgs_sent += 1
def create_child_processes(reader, writer, msgs):
p_write = Process(target=Writer.write, args=(writer, msgs))
p_write.start()
# This is very important otherwise reader will hang after writer has finished.
# The order of this statement coming after p_write.start(), but after
# p_read.start() is also important. Look up file descriptors and how they
# are inherited by child processes on Unix and how a any valid fd to the
# write side of a pipe will keep all read ends open
writer.reader_conn.close()
p_read = Process(target=Reader.read, args=(reader,))
p_read.start()
return p_read, p_write
def run_mp_pipe():
manager = Manager()
namespace = manager.Namespace()
read_conn, write_conn = Pipe()
reader = Reader(read_conn, namespace)
writer = Writer(write_conn, namespace)
p_read, p_write = create_child_processes(reader, writer,
msgs=["hello", "world", {"key", "value"}])
print("starting")
p_write.join()
p_read.join()
print("done")
print(namespace)
assert namespace.msgs_sent == namespace.msgs_recv
if __name__ == "__main__":
run_mp_pipe()
Output:
starting
Reader got: 'hello'
Reader got: 'world'
Reader got: {'key', 'value'}
Reader has no more data to receive
done
Namespace(msgs_recv=3, msgs_sent=3)
I have python TCP client and need to send media(.mpg) file in a loop to a 'C' TCP server.
I have following code, where in separate thread I am reading the 10K blocks of file and sending it and doing it all over again in loop, I think it is because of my implementation of thread module, or tcp send. I am using Queues to print the logs on my GUI ( Tkinter ) but after some times it goes out of memory..
UPDATE 1 - Added more code as requested
Thread class "Sendmpgthread" used to create thread to send data
.
.
def __init__ ( self, otherparams,MainGUI):
.
.
self.MainGUI = MainGUI
self.lock = threading.Lock()
Thread.__init__(self)
#This is the one causing leak, this is called inside loop
def pushlog(self,msg):
self.MainGUI.queuelog.put(msg)
def send(self, mysocket, block):
size = len(block)
pos = 0;
while size > 0:
try:
curpos = mysocket.send(block[pos:])
except socket.timeout, msg:
if self.over:
self.pushlog(Exit Send)
return False
except socket.error, msg:
print 'Exception'
return False
pos = pos + curpos
size = size - curpos
return True
def run(self):
media_file = None
mysocket = None
try:
mysocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysocket.connect((self.ip, string.atoi(self.port)))
media_file = open(self.file, 'rb')
while not self.over:
chunk = media_file.read(10000)
if not chunk: # EOF Reset it
print 'resetting stream'
media_file.seek(0, 0)
continue
if not self.send(mysocket, chunk): # If some error or thread is killed
break;
#disabling this solves the issue
self.pushlog('print how much data sent')
except socket.error, msg:
print 'print exception'
except Exception, msg:
print 'print exception'
try:
if media_file is not None:
media_file.close()
media_file = None
if mysocket is not None:
mysocket.close()
mysocket = None
finally:
print 'some cleaning'
def kill(self):
self.over = True
I figured out that it is because of wrong implementation of Queue as commenting that piece resolves the issue
UPDATE 2 - MainGUI class which is called from above Thread class
class MainGUI(Frame):
def __init__(self, other args):
#some code
.
.
#from the above thread class used to send data
self.send_mpg_status = Sendmpgthread(params)
self.send_mpg_status.start()
self.after(100, self.updatelog)
self.queuelog = Queue.Queue()
def updatelog(self):
try:
msg = self.queuelog.get_nowait()
while msg is not None:
self.printlog(msg)
msg = self.queuelog.get_nowait()
except Queue.Empty:
pass
if self.send_mpg_status: # only continue when sending
self.after(100, self.updatelog)
def printlog(self,msg):
#print in GUI
Since printlog is adding to a tkinter text control, the memory occupied by that control will grow with each message (it has to store all the log messages in order to display them).
Unless storing all the logs is critical, a common solution is to limit the maximum number of log lines displayed.
A naive implementation is to eliminate extra lines from the begining after the control reaches a maximum number of messages. Add a function to get the number of lines in the control and then, in printlog something similar to:
while getnumlines(self.edit) > self.maxloglines:
self.edit.delete('1.0', '1.end')
(above code not tested)
update: some general guidelines
Keep in mind that what might look like a memory leak does not always mean that a function is wrong, or that the memory is no longer accessible. Many times there is missing cleanup code for a container that is accumulating elements.
A basic general approach for this kind of problems:
form an opinion on what part of the code might be causing the problem
check it by commenting that code out (or keep commenting code until you find a candidate)
look for containers in the responsible code, add code to print their size
decide what elements can be safely removed from that container, and when to do it
test the result
I can't see anything obviously wrong with your code snippet.
To reduce memory usage a bit under Python 2.7, I'd use buffer(block, pos) instead of block[pos:]. Also I'd use mysocket.sendall(block) instead of your send method.
If the ideas above don't solve your problem, then the bug is most probably elsewhere in your code. Could you please post the shortest possible version of the full Python script which still grows out-of-memory (http://sscce.org/)? That increases your change of getting useful help.
Out of memory errors are indicative of data being generated but not consumed or released. Looking through your code I would guess these two areas:
Messages are being pushed onto a Queue.Queue() instance in the pushlog method. Are they being consumed?
The MainGui printlog method may be writing text somewhere. eg. Is it continually writing to some kind of GUI widget without any pruning of messages?
From the code you've posted, here's what I would try:
Put a print statement in updatelog. If this is not being continually called for some reason such as a failed after() call, then the queuelog will continue to grow without bound.
If updatelog is continually being called, then turn your focus to printlog. Comment the contents of this function to see if out of memory errors still occur. If they don't, then something in printlog may be holding on to the logged data, you'll need to dig deeper to find out what.
Apart from this, the code could be cleaned up a bit. self.queuelog is not created until after the thread is started which gives rise to a race condition where the thread may try to write into the queue before it has been created. Creation of queuelog should be moved to somewhere before the thread is started.
updatelog could also be refactored to remove redundancy:
def updatelog(self):
try:
while True:
msg = self.queuelog.get_nowait()
self.printlog(msg)
except Queue.Empty:
pass
And I assume the the kill function is called from the GUI thread. To avoid thread race conditions, the self.over should be a thread safe variable such as a threading.Event object.
def __init__(...):
self.over = threading.Event()
def kill(self):
self.over.set()
There is no data piling up in your TCP sending loop.
Memory error is probably caused by logging queue, as you have not posted complete code try using following class for logging:
from threading import Thread, Event, Lock
from time import sleep, time as now
class LogRecord(object):
__slots__ = ["txt", "params"]
def __init__(self, txt, params):
self.txt, self.params = txt, params
class AsyncLog(Thread):
DEBUGGING_EMULATE_SLOW_IO = True
def __init__(self, queue_max_size=15, queue_min_size=5):
Thread.__init__(self)
self.queue_max_size, self.queue_min_size = queue_max_size, queue_min_size
self._queuelock = Lock()
self._queue = [] # protected by _queuelock
self._discarded_count = 0 # protected by _queuelock
self._pushed_event = Event()
self.setDaemon(True)
self.start()
def log(self, message, **params):
with self._queuelock:
self._queue.append(LogRecord(message, params))
if len(self._queue) > self.queue_max_size:
# empty the queue:
self._discarded_count += len(self._queue) - self.queue_min_size
del self._queue[self.queue_min_size:] # empty the queue instead of creating new list (= [])
self._pushed_event.set()
def run(self):
while 1: # no reason for exit condition here
logs, discarded_count = None, 0
with self._queuelock:
if len(self._queue) > 0:
# select buffered messages for printing, releasing lock ASAP
logs = self._queue[:]
del self._queue[:]
self._pushed_event.clear()
discarded_count = self._discarded_count
self._discarded_count = 0
if not logs:
self._pushed_event.wait()
self._pushed_event.clear()
continue
else:
# print logs
if discarded_count:
print ".. {0} log records missing ..".format(discarded_count)
for log_record in logs:
self.write_line(log_record)
if self.DEBUGGING_EMULATE_SLOW_IO:
sleep(0.5)
def write_line(self, log_record):
print log_record.txt, " ".join(["{0}={1}".format(name, value) for name, value in log_record.params.items()])
if __name__ == "__main__":
class MainGUI:
def __init__(self):
self._async_log = AsyncLog()
self.log = self._async_log.log # stored as bound method
def do_this_test(self):
print "I am about to log 100 times per sec, while text output frequency is 2Hz (twice per second)"
def log_100_records_in_one_second(itteration_index):
for i in xrange(100):
self.log("something happened", timestamp=now(), session=3.1415, itteration=itteration_index)
sleep(0.01)
for iter_index in range(3):
log_100_records_in_one_second(iter_index)
test = MainGUI()
test.do_this_test()
I have noticed that you do not sleep() anywhere in the sending loop, this means data is read as fast as it can and is sent as fast as it can. Note that this is not desirable behavior when playing media files - container time-stamps are there to dictate data-rate.
I'm writing an app that appends lines to the same file from multiple threads.
I have a problem in which some lines are appended without a new line.
Any solution for this?
class PathThread(threading.Thread):
def __init__(self, queue):
threading.Thread.__init__(self)
self.queue = queue
def printfiles(self, p):
for path, dirs, files in os.walk(p):
for f in files:
print(f, file=output)
def run(self):
while True:
path = self.queue.get()
self.printfiles(path)
self.queue.task_done()
pathqueue = Queue.Queue()
paths = getThisFromSomeWhere()
output = codecs.open('file', 'a')
# spawn threads
for i in range(0, 5):
t = PathThread(pathqueue)
t.setDaemon(True)
t.start()
# add paths to queue
for path in paths:
pathqueue.put(path)
# wait for queue to get empty
pathqueue.join()
The solution is to write to the file in one thread only.
import Queue # or queue in Python 3
import threading
class PrintThread(threading.Thread):
def __init__(self, queue):
threading.Thread.__init__(self)
self.queue = queue
def printfiles(self, p):
for path, dirs, files in os.walk(p):
for f in files:
print(f, file=output)
def run(self):
while True:
result = self.queue.get()
self.printfiles(result)
self.queue.task_done()
class ProcessThread(threading.Thread):
def __init__(self, in_queue, out_queue):
threading.Thread.__init__(self)
self.in_queue = in_queue
self.out_queue = out_queue
def run(self):
while True:
path = self.in_queue.get()
result = self.process(path)
self.out_queue.put(result)
self.in_queue.task_done()
def process(self, path):
# Do the processing job here
pathqueue = Queue.Queue()
resultqueue = Queue.Queue()
paths = getThisFromSomeWhere()
output = codecs.open('file', 'a')
# spawn threads to process
for i in range(0, 5):
t = ProcessThread(pathqueue, resultqueue)
t.setDaemon(True)
t.start()
# spawn threads to print
t = PrintThread(resultqueue)
t.setDaemon(True)
t.start()
# add paths to queue
for path in paths:
pathqueue.put(path)
# wait for queue to get empty
pathqueue.join()
resultqueue.join()
the fact that you never see jumbled text on the same line or new lines in the middle of a line is a clue that you actually dont need to syncronize appending to the file. the problem is that you use print to write to a single file handle. i suspect print is actually doing 2 operations to the file handle in one call and those operations are racing between the threads. basically print is doing something like:
file_handle.write('whatever_text_you_pass_it')
file_handle.write(os.linesep)
and because different threads are doing this simultaneously on the same file handle sometimes one thread will get in the first write and the other thread will then get in its first write and then you'll get two carriage returns in a row. or really any permutation of these.
the simplest way to get around this is to stop using print and just use write directly. try something like this:
output.write(f + os.linesep)
this still seems dangerous to me. im not sure what gaurantees you can expect with all the threads using the same file handle object and contending for its internal buffer. personally id side step the whole issue and just have every thread get its own file handle. also note that this works because the default for write buffer flushes is line-buffered, so when it does a flush to the file it ends on an os.linesep. to force it to use line-buffered send a 1 as the third argument of open. you can test it out like this:
#!/usr/bin/env python
import os
import sys
import threading
def hello(file_name, message, count):
with open(file_name, 'a', 1) as f:
for i in range(0, count):
f.write(message + os.linesep)
if __name__ == '__main__':
#start a file
with open('some.txt', 'w') as f:
f.write('this is the beginning' + os.linesep)
#make 10 threads write a million lines to the same file at the same time
threads = []
for i in range(0, 10):
threads.append(threading.Thread(target=hello, args=('some.txt', 'hey im thread %d' % i, 1000000)))
threads[-1].start()
for t in threads:
t.join()
#check what the heck the file had
uniq_lines = set()
with open('some.txt', 'r') as f:
for l in f:
uniq_lines.add(l)
for u in uniq_lines:
sys.stdout.write(u)
The output looks like this:
hey im thread 6
hey im thread 7
hey im thread 9
hey im thread 8
hey im thread 3
this is the beginning
hey im thread 5
hey im thread 4
hey im thread 1
hey im thread 0
hey im thread 2
And maybe some more newlines where they shouldn't be?
You should have in mind the fact that a shared resource should not be accessed by more than one thread at a time or otherwise unpredictable consequences might happen (it's called using 'atomic operations' while using threads).
Take a look at this page for a little intuition: Thread Synchronization Mechanisms in Python