Multithreaded access to python dictionary - python

I am trying to access the same global dictionary from different threads in python simultaneously. Thread safety at the accessing point is not a consern for me since all accesses are reads and dont modify the dictionary.
I changed my code to do the accesses from multiple threads but i have noticed no increase in the speed of the execution, after checking arround it seems that the interpreter serializes the accesses in effect making the change in my code null.
Is there an easy way to have a structure like concurrentHashMap of Java in python?
The part of the code in question follows:
class csvThread (threading.Thread):
def __init__(self, threadID, bizName):
threading.Thread.__init__(self)
self.threadID = threadID
self.bizName = bizName
def run(self):
thread_function(self.bizName)
def thread_function(biz):
first = True
bizTempImgMap = {}
for imag in bizMap[biz]:
if not similar(bizTempImgMap, imgMap[imag]):
bizTempImgMap[imag] = imgMap[imag]
if first:
a = imgMap[imag]
sum = a
else:
c = np.column_stack((a, imgMap[imag]))
sum += imgMap[imag]
a = c.max(1) #max
first = False
else:
print ("-")
csvLock.acquire()
writer.writerow([biz]+a.astype(np.str_).tolist()+(np.true_divide(sum, len(bizTempImgMap.keys()))).tolist())
csvLock.release()
csvLock = threading.Lock()
...
imgMap = img_vector_load('test_photos.csv')
bizMap = img_busyness_load('csv/test_photo_to_biz_ids.csv')
...
for biz in bizMap.keys():
if len(threads)<100:
thread = csvThread(len(threads), biz)
threads.append(thread)
thread.start()
else:
print("\nWaiting for threads to finish\n")
for t in threads:
t.join()
print("\nThreads Finished\n")
threads = []

"i have noticed no increase in the speed of the execution"
No speed increase will be done by using threads in python, since they all work on the same core.
Take a look to: GIL
Notice this, python threading should be used for concurrent arquitectures not for speed performance.
In case you want to keep this implementation use multiprocessing.

Related

How can you code a nested concurrency in python?

My code has the following scheme:
class A():
def evaluate(self):
b = B()
for i in range(30):
b.run()
class B():
def run(self):
pass
if __name__ == '__main__':
a = A()
for i in range(10):
a.evaluate()
And I want to have two level of concurrency, the first one is on the evaluate method and the second one is on the run method (nested concurrency). The question is how to introduce this concurrency using the Pool class of the multiprocessing module? Should I pass explicitly number of cores?. The solution should not create processes greater than number of multiprocessing.cpu_count().
note: assume that number of cores is greater than 10 .
Edit:
I have seen a lot of comments that say that python does not have true concurrency due to GIL, this is true for python multi-threading but for multiprocessing this is not quit correct look here, also I have timed it also this article did, and the results show that it can go faster than sequential execution.
Your comment touches on a possible solution. In order to have "nested" concurrency you could have 2 separate pools. This would result in a "flat" structure program instead of a nest program. Additionally, it decouples A from B, A now knows nothing about b it just publishes to a generic queue. The example below uses a single process to illustrate wiring up concurrent workers communicating across an asynchronous queue but it could easily be replaced with a pool:
import multiprocessing as mp
class A():
def __init__(self, in_q, out_q):
self.in_q = in_q
self.out_q = out_q
def evaluate(self):
"""
Reads from input does work and process output
"""
while True:
job = self.in_q.get()
for i in range(30):
self.out_q.put(i)
class B():
def __init__(self, in_q):
self.in_q = in_q
def run(self):
"""
Loop over queue and process items, optionally configure
with another queue to "sink" the processing pipeline
"""
while True:
job = self.in_q.get()
if __name__ == '__main__':
# create the queues to wire up our concurrent worker pools
A_q = mp.Queue()
AB_q = mp.Queue()
a = A(in_q=A_q, out_q=AB_q)
b = B(in_q=AB_q)
p = mp.Process(target=a.evaluate)
p.start()
p2 = mp.Process(target=b.run)
p2.start()
for i in range(10):
A_q.put(i)
p.join()
p2.join()
This is a common pattern in golang.

Python 3 Limit count of active threads (finished threads do not quit)

I want to limit the number of active threads. What i have seen is, that a finished thread stays alive and does not exit itself, so the number of active threads keep growing until an error occours.
The following code starts only 8 threads at a time but they stay alive even when they finished. So the number keeps growing:
class ThreadEx(threading.Thread):
__thread_limiter = None
__max_threads = 2
#classmethod
def max_threads(cls, thread_max):
ThreadEx.__max_threads = thread_max
ThreadEx.__thread_limiter = threading.BoundedSemaphore(value=ThreadEx.__max_threads)
def __init__(self, target=None, args:tuple=()):
super().__init__(target=target, args=args)
if not ThreadEx.__thread_limiter:
ThreadEx.__thread_limiter = threading.BoundedSemaphore(value=ThreadEx.__max_threads)
def run(self):
ThreadEx.__thread_limiter.acquire()
try:
#success = self._target(*self._args)
#if success: return True
super().run()
except:
pass
finally:
ThreadEx.__thread_limiter.release()
def call_me(test1, test2):
print(test1 + test2)
time.sleep(1)
ThreadEx.max_threads(8)
for i in range(0, 99):
t = ThreadEx(target=call_me, args=("Thread count: ", str(threading.active_count())))
t.start()
Due to the for loop, the number of threads keep growing to 99.
I know that a thread has done its work because call_me has been executed and threading.active_count() was printed.
Does somebody know how i make sure, a finished thread does not stay alive?
This may be a silly answer but to me it looks you are trying to reinvent ThreadPool.
from multiprocessing.pool import ThreadPool
from time import sleep
p = ThreadPool(8)
def call_me(test1):
print(test1)
sleep(1)
for i in range(0, 99):
p.apply_async(call_me, args=(i,))
p.close()
p.join()
This will ensure only 8 concurrent threads are running your function at any point of time. And if you want a bit more performance, you can import Pool from multiprocessing and use that. The interface is exactly the same but your pool will now be subprocesses instead of threads, which usually gives a performance boost as GIL does not come in the way.
I have changed the class according to the help of Hannu.
I post it for reference, maybe it's useful for others that come across this post:
import threading
from multiprocessing.pool import ThreadPool
import time
class MultiThread():
__thread_pool = None
#classmethod
def begin(cls, max_threads):
MultiThread.__thread_pool = ThreadPool(max_threads)
#classmethod
def end(cls):
MultiThread.__thread_pool.close()
MultiThread.__thread_pool.join()
def __init__(self, target=None, args:tuple=()):
self.__target = target
self.__args = args
def run(self):
try:
result = MultiThread.__thread_pool.apply_async(self.__target, args=self.__args)
return result.get()
except:
pass
def call_me(test1, test2):
print(test1 + test2)
time.sleep(1)
return 0
MultiThread.begin(8)
for i in range(0, 99):
t = MultiThread(target=call_me, args=("Thread count: ", str(threading.active_count())))
t.run()
MultiThread.end()
The maximum of threads is 8 at any given time determined by the method begin.
And also the method run returns the result of your passed function if it returns something.
Hope that helps.

Is there any replacement for empty while loops?

I'm using empty while loops a lot, for example:
I have a thread running in the background that will change a value called "a" in 5 seconds. however, I'm using a different function at the same time, and I want to let the second function know that the value has changed, so what I always did was:
import threading, time
class example:
def __init__(self):
self.a = 0
def valchange(self):
time.sleep(5)
self.a += 1
time.sleep(1)
print("im changing the a value to " + str(self.a))
print("those print commands needs to run after notifier stopped his while and started printing")
def notifier(exam :example, num :int):
while(exam.a != num):
pass
print("it changed to " + str(num))
exa = example()
i = 1
while(i <= 16):
temp= threading.Thread(target=notifier, args=(exa, i, ))
temp.start()
i += 3
i = 1
while(i <= 16):
exa.valchange()
i += 1
It's important to mention, that example could not use wait and set to an event, because there is no indication to when you need to run set, and how much threads are running in the background, and even what numbers will have a thread waiting for them to change.
And also you can't use join because changing 'a' is not a sign to print, only the condition is the sign.
Async and select can't help me as well because of the last reason.
Is there any way to create something, that will stop the program fromrunning until the condition will become true? you can provide your solution with any programming language you want, but mainly I'm using python 3.
EDIT: please remember that I need it to work with every condition. And my code example- is only an example, so if something works there, it doesn't necessarily will work with a different condition.
Thank you very much in advance :)
Idea:
wait(a == 5) // will do nothing until a == 5
You need to use select or epoll system calls if you're waiting for some system operation to finish. In case you're waiting for a certain IO event, then you can use asyncio (provided your Python version > 3.3), otherwise you could consider twisted.
If you're doing some CPU bound operations you need to consider multiple processes or threads, only then you can do any such monitoring effectively. Having a while loop running infinitely without any interruption is a disaster waiting to happen.
If your thread only changes a's value once, at the end of its life, then you can use .join() to wait for the thread to terminate.
import threading
import time
class example:
def __init__(self):
self.a = 0
self.temp = threading.Thread(target=self.valchange)
self.temp.start()
self.notifier()
def valchange(self):
time.sleep(5)
self.a = 1
def notifier(self):
self.temp.join()
print("the value of a has changed")
example()
If the thread might change a's value at any point in its lifetime, then you can use one of the threading module's more generalized control flow objects to coordinate execution. For instance, the Event object.
import threading
import time
class example:
def __init__(self):
self.a = 0
self.event = threading.Event()
temp = threading.Thread(target=self.valchange)
temp.start()
self.notifier()
def valchange(self):
time.sleep(5)
self.a = 1
self.event.set()
def notifier(self):
self.event.wait()
print("the value of a has changed")
example()
One drawback to this Event approach is that the thread target has to explicitly call set() whenever it changes the value of a, which can be irritating if you change a several times in your code. You could automate this away using a property:
import threading
import time
class example(object):
def __init__(self):
self._a = 0
self._a_event = threading.Event()
temp = threading.Thread(target=self.valchange)
temp.start()
self.notifier()
#property
def a(self):
return self._a
#a.setter
def a(self, value):
self._a = value
self._a_event.set()
def valchange(self):
time.sleep(5)
self.a = 1
def notifier(self):
self._a_event.wait()
print("the value of a has changed")
example()
Now valchange doesn't have to do anything special after setting a's value.
What you are describing is a spin lock, and might be fine, depending on your use case.
The alternative approach is to have the code you are waiting on call you back when it reaches a certain condition. This would require an async framework such as https://docs.python.org/3/library/asyncio-task.html
There are some nice simple examples in those docs so I won't insult your intelligence by pasting them here.

Getting multi-threading to work with a list using python pexpect

I wrote a simple python pexpect script to ssh into a machine and perform a action. Now I need to do this action to multiple servers. I am using a list to hit all of the servers concurrently using multi-threading. My issue is due to everything being ran concurrently, each thread is running on the same server name. Is there a way to concurrently have each thread only run one of the listed servers?
#! /usr/bin/python
#Test script
import pexpect
import pxssh
import threading
import datetime
currentdate = datetime.datetime.now()
easterndate = (datetime.datetime.now() + datetime.timedelta(0, 3600))
#list of servers
serverlist = ["025", "089"]
#server number
sn = 0
ssh_new_conn = 'Are you sure you want to continue connecting'
class ThreadClass(threading.Thread):
def run(self):
index = 0
sn = serverlist[index]
print sn
username = '[a username]'
password = '[a password]'
hostname = '%(sn)s.[the rest of the host url]' % locals()
command = "/usr/bin/ssh %(username)s#%(hostname)s " % locals()
index = index + 1
now = datetime.datetime.now()
print command
p = pexpect.spawn(command, timeout=360)
***do some other stuff****
for i in range(len(severlist)):
t = ThreadClass()
t.start()
[update]
I may just trying doing this with a parent thread that calls the child thread and so forth....although it would be nice if multi-threading could work from a list or some sort of work queue.
The problem has nothing to do with "everything being ran concurrently". You're explicitly setting index = 0 at the start of the run function, so of course every thread works on index 0.
If you want each thread to deal with one server, just pass the index to each thread object:
class ThreadClass(threading.Thread):
def __init__(self, index):
super(ThreadClass, self).__init__()
self.index = index
def run(self):
sn = serverlist[self.index]
print sn
# same code as before, minus the index = index + 1 bit
for i in range(len(severlist)):
t = ThreadClass(i)
t.start()
(Of course you'll probably want to use serverlist instead of severlist and fix the other errors that make it impossible for your code to work.)
Or, more simply, pass the sn itself:
class ThreadClass(threading.Thread):
def __init__(self, sn):
super(ThreadClass, self).__init__()
self.sn = sn
def run(self):
print self.sn
# same code as last version, but use self.sn instead of sn
for sn in severlist:
t = ThreadClass(sn)
t.start()
Alternatively, if you really want to use a global variable, just make it global, and put a lock around it:
index = 0
index_lock = threading.Lock()
class ThreadClass(threading.Thread):
def run(self):
global index, index_lock
with index_lock:
sn = serverlist[index]
index += 1
print sn
# same code as first version
However, you might want to consider a much simpler design, with a pool or executor instead of an explicit worker thread and list of things to work on. For example:
def job(sn):
print sn
# same code as first version again
with concurrent.futures.ThreadPoolExecutor() as executor:
executor.map(job, serverlist)
This will only run, say, 4 or 8 or some other good "magic number" of jobs concurrently. Which is usually what you want. But if you want exactly one thread per server, just pass max_workers=len(serverlist) to the ThreadPoolExecutor constructor.
Besides being a whole lot less code to read, write, get wrong, debug, etc., it also has more functionality—e.g., you can get results and/or exceptions from the servers back to the main thread.

should I protect built-in data structure( list, dict) when using multiple threads?

I think I should use Lock object to protect custom class when using multiple threads, however, because Python use GIL to ensure that only one thread is running at any given time, does it mean that there's no need to use Lock to protect built-in type like list? example,
num_list = []
def consumer():
while True:
if len(num_list) > 0:
num = num_list.pop()
print num
return
def producer():
num_list.append(1)
consumer_thread = threading.Thread(target = consumer)
producer_thread = threading.Thread(target = producer)
consumer_thread.start()
producer_thread.start()
The GIL protects the interpreter state, not yours. There are some operations that are effectively atomic - they require a single bytecode and thus effectively do not require locking. (see is python variable assignment atomic? for an answer from a very reputable Python contributor).
There isn't really any good documentation on this though so I wouldn't rely on that in general unless if you plan on disassembling bytecode to test your assumptions. If you plan on modifying state from multiple contexts (or modifying and accessing complex state) then you should plan on using some sort of locking/synchronization mechanism.
If you're interested in approaching this class of problem from a different angle you should look into the Queue module. A common pattern in Python code is to use a synchronized queue to communicate among thread contexts rather than working with shared state.
#jeremy-brown explains with words(see below)... but if you want a counter example:
The lock isn't protecting your state. The following example doesn't use locks, and as a result if the xrange value is high enough it will result in failures: IndexError: pop from empty list.
import threading
import time
con1_list =[]
con2_list =[]
stop = 10000
total = 500000
num_list = []
def consumer(name, doneFlag):
while True:
if len(num_list) > 0:
if name == 'nix':
con2_list.append(num_list.pop())
if len(con2_list) == stop:
print 'done b'
return
else:
con1_list.append(num_list.pop())
if len(con1_list) == stop:
print 'done a'
return
def producer():
for x in xrange(total):
num_list.append(x)
def test():
while not (len(con2_list) >=stop and len(con1_list) >=stop):
time.sleep(1)
print set(con1_list).intersection( set(con2_list))
consumer_thread = threading.Thread(target = consumer, args=('nick',done1))
consumer_thread2 = threading.Thread(target = consumer, args=('nix',done2))
producer_thread = threading.Thread(target = producer)
watcher = threading.Thread(target = test)
consumer_thread.start();consumer_thread2.start();producer_thread.start();watcher.start()

Categories

Resources