''.join() in ThreadPoolExecutor eat memory - python

try code like this:
import gc
import random
from concurrent.futures import ThreadPoolExecutor
zen = "Special cases aren't special enough to break the rules. "
def abc(length: int):
msg = ''.join(random.sample(zen, length))
print(msg)
del msg
if __name__ == '__main__':
pool = ThreadPoolExecutor(max_workers=8)
while True:
for x in range(256):
pool.submit(abc, random.randint(2, 6))
print('===================================================')
gc.collect()
The code may take about 8MB if it running without ThreadPoolExecutor, or about 30MB using str() instead of ''.join(). But this code keep eating RAM without limit. I thought it is caused by random.sample or something else, but it proved that ''.join() in ThreadPoolExecutor cause this problem.
It confuse me as there is no modules mutually import each other(share zen only), & neither the del or Gc work :(
ps: please notes that infinite loop is not a problem. when you run something like:
while True:
print(1234567)
the memory usage will keeping under a certain line (code above may take not more than 1MB?). The code at the top don't have an increasing list or dict, & the variable has been del at the end of the module. So it should be cleaned up when a thread finish as I think, which obviously not.
pss: let's talk like this: the cause of the problem is anything in ''.join() will not be recycled. As if we change the abc module this way:
tmp = random.sample(zen, length)
msg = ''.join(tmp)
print(msg[:8])
del msg, tmp
Gc works effectively, and the usage keeping about 26MB.
So is there something I missed when using ''.join() or python language has a bug there?

When you run the code without threads each sentence will execute completely, and with that I mean that the gc.collect() will be called after the inner loop has ended.
But when you execute the code with threads a new thread will be called before the most recent one has ended, so the number of new threads will rapidly increase and as there's no limit for the number of threads you will have more threads than your CPU's can handle causing an accumulation of threads.

Related

Python multithreading producing funky results

I'm fairly new to multithreading in Python and encountered an issue (likely due to concurrency problems). When I run the code below, it produces "normal" 1,2,3,4,5,6,7,8,9 digits for the first 9 numbers. However, when it moves on to the next batch of numbers (the ones that should be printed by each thread after it "sleeps" for 2 seconds) it spits out:
different numbers each time
often very large numbers
sometimes no numbers at all
I'm guessing this is a concurrency issue where by the time each original thread got to printing the second number after "sleep" the i variable has been tampered with by the code, but can someone please explain what exactly is happening step-by-step and why the no numbers/large numbers phenomenon?
import threading
import time
def foo(text):
print(text)
time.sleep(2)
print(text)
for i in range(1,10):
allTreads = []
current_thread = threading.Thread(target = foo, args= (i,))
allTreads.append(current_thread)
current_thread.start()
Well, your problem is called race condition. Sometimes when the code is executed, one thread will print a number before the implicit '\n' of another thread, and that's why you often see those kind of behaviours.
Also, whats the purpose of the allTreads list there? It is restarted at every iteration, so it stores the current_thread and then is deleted at the end of the current iteration.
In order to avoid race conditions, you need some kind of synchronization between threads. Consider the threading.Lock(), in order to avoid that more than one thread at a time prints the given text:
import threading
import time
lock = threading.Lock()
def foo(text):
with lock:
print(text)
time.sleep(2)
with lock:
print(text)
for i in range(1,10):
allTreads = []
current_thread = threading.Thread(target = foo, args= (i,))
allTreads.append(current_thread)
current_thread.start()
The threading documentation in python is quite good. I recommend you to read these two links:
Python Threading Documentation
Real Python Threading

Python multiple processes consuming/iterating over single generator (divide and conquer)

I have a python generator that returns lots of items, for example:
import itertools
def generate_random_strings():
chars = "ABCDEFGH"
for item in itertools.product(chars, repeat=10):
yield "".join(item)
I then iterate over this and perform various tasks, the issue is that I'm only using one thread/process for this:
my_strings = generate_random_strings()
for string in my_strings:
# do something with string...
print(string)
This works great, I'm getting all my strings, but it's slow. I would like to harness the power of Python multiprocessing to "divide and conquer" this for loop. However, of course, I want each string to be processed only once. While I've found much documentation on multiprocessing, I'm trying to find the most simple solution for this with the least amount of code.
I'm assuming each thread should take a big chunk of items every time and process them before coming back and getting another big chunk etc...
Many thanks,
Most simple solution with least code? multiprocessing context manager.
I assume that you can put "do something with string" into a function called "do_something"
from multiprocessing import Pool as ProcessPool
number_of_processes = 4
with ProcessPool(number_of_processes) as pool:
pool.map(do_something, my_strings)
If you want to get the results of "do_something" back again, easy!
with ProcessPool(number_of_processes) as pool:
results = pool.map(do_something, my_strings)
You'll get them in a list.
Multiprocessing.dummy is a syntactic wrapper for process pools that lets you use the multiprocessing syntax. If you want threads instead of processes, just do this:
from multiprocessing.dummy import Pool as ThreadPool
You may use multiprocessing.
import multiprocessing
def string_fun(string):
# do something with string...
print(string)
my_strings = generate_random_strings()
num_of_threads = 7
pool = multiprocessing.Pool(num_of_threads)
pool.map(string_fun, my_strings)
Assuming you're using the lastest version of Python, you may want to read something about asyncio module. Multithreading is not easy to implement due to GIL lock: "In CPython, the global interpreter lock, or GIL, is a mutex that protects access to Python objects, preventing multiple threads from executing Python bytecodes at once. This lock is necessary mainly because CPython's memory management is not thread-safe."
So you can swap on Multiprocessing, or, as reported above, take a look at asycio module.
asyncio — Asynchronous I/O > https://docs.python.org/3/library/asyncio.html
I'll integrate this answer with some code as soon as possible.
Hope it helps,
Hele
As #Hele mentioned, asyncio is best of all, here is an example
Code
#!/usr/bin/python3
# -*- coding: utf-8 -*-
# python 3.7.2
from asyncio import ensure_future, gather, run
import random
alphabet = 'ABCDEFGH'
size = 1000
async def generate():
tasks = list()
result = None
for el in range(1, size):
task = ensure_future(generate_one())
tasks.append(task)
result = await gather(*tasks)
return list(set(result))
async def generate_one():
return ''.join(random.choice(alphabet) for i in range(8))
if __name__ == '__main__':
my_strings = run(generate())
print(my_strings)
Output
['CHABCGDD', 'ACBGAFEB', ...
Of course, you need to improve generate_one, this variant is very slow.
You can see source code here.

Multiprocessing -- Thread Pool Memory Leak?

I am observing memory usage that I cannot explain to myself. Below I provide a stripped down version of my actual code that still exhibits this behavior. The code is intended to accomplish the following:
Read a text file in chunks of 1000 lines. Each line is a sentence. Split these 1000 sentences into 4 generators. Pass these generators to a thread pool and run feature extraction in parallel on 250 sentences.
In my actual code I accumulate features and labels from all sentences of the entire file.
Now here comes the weird thing: Memory gets allocated but not freed again even when not accumulating these values! And it has something to do with the thread pool I think. The amount of memory taken in total is dependent on how many features are extracted for any given word. I simulate this here with range(100). Have a look:
from sys import argv
from itertools import chain, islice
from multiprocessing import Pool
from math import ceil
# dummyfied feature extraction function
# the lengt of the range determines howmuch mamory is used up in total,
# eventhough the objects are never stored
def features_from_sentence(sentence):
return [{'some feature' 'some value'} for i in range(100)], ['some label' for i in range(100)]
# split iterable into generator of generators of length `size`
def chunks(iterable, size=10):
iterator = iter(iterable)
for first in iterator:
yield chain([first], islice(iterator, size - 1))
def features_from_sentence_meta(l):
return list(map (features_from_sentence, l))
def make_X_and_Y_sets(sentences, i):
print(f'start: {i}')
pool = Pool()
# split sentences into a generator of 4 generators
sentence_chunks = chunks(sentences, ceil(50000/4))
# results is a list containing the lists of pairs of X and Y of all chunks
results = map(lambda x : x[0], pool.map(features_from_sentence_meta, sentence_chunks))
X, Y = zip(*results)
print(f'end: {i}')
return X, Y
# reads file in chunks of `lines_per_chunk` lines
def line_chunks(textfile, lines_per_chunk=1000):
chunk = []
i = 0
with open(textfile, 'r') as textfile:
for line in textfile:
if not line.split(): continue
i+=1
chunk.append(line.strip())
if i == lines_per_chunk:
yield chunk
i = 0
chunk = []
yield chunk
textfile = argv[1]
for i, line_chunk in enumerate(line_chunks(textfile)):
# stop processing file after 10 chunks to demonstrate
# that memory stays occupied (check your system monitor)
if i == 10:
while True:
pass
X_chunk, Y_chunk = make_X_and_Y_sets(line_chunk, i)
The file I am using to debug this has 50000 nonempty lines, which is why I use the hardcoded 50000 at one place. If you want to use the same file, he is a link for your convenience:
https://www.dropbox.com/s/v7nxb7vrrjim349/de_wiki_50000_lines?dl=0
Now when you run this script and open your system monitor you will observe that memory gets used up and the usage keeps going until the 10th chunk, where I artificially go into an endless loop to demonstrate that the memory stays in use, even though I never store anything.
Can you explain to me why this happens? I seem to be missing something about how multiprocessing pools are supposed to be used.
First, let's clear up some misunderstandings—although, as it turns out, this wasn't actually the right avenue to explore in the first place.
When you allocate memory in Python, of course it has to go get that memory from the OS.
When you release memory, however, it rarely gets returned to the OS, until you finally exit. Instead, it goes into a "free list"—or, actually, multiple levels of free lists for different purposes. This means that the next time you need memory, Python already has it lying around, and can find it immediately, without needing to talk to the OS to allocate more. This usually makes memory-intensive programs much faster.
But this also means that—especially on modern 64-bit operating systems—trying to understand whether you really do have any memory pressure issues by looking at your Activity Monitor/Task Manager/etc. is next to useless.
The tracemalloc module in the standard library provides low-level tools to see what actually is going on with your memory usage. At a higher level, you can use something like memory_profiler, which (if you enable tracemalloc support—this is important) can put that information together with OS-level information from sources like psutil to figure out where things are going.
However, if you aren't seeing any actual problems—your system isn't going into swap hell, you aren't getting any MemoryError exceptions, your performance isn't hitting some weird cliff where it scales linearly up to N and then suddenly goes all to hell at N+1, etc.—you usually don't need to bother with any of this in the first place.
If you do discover a problem, then, fortunately, you're already half-way to solving it. As I mentioned at the top, most memory that you allocated doesn't get returned to the OS until you finally exit. But if all of your memory usage is happening in child processes, and those child processes have no state, you can make them exit and restart whenever you want.
Of course there's a performance cost to doing so—process teardown and startup time, and page maps and caches that have to start over, and asking the OS to allocate the memory again, and so on. And there's also a complexity cost—you can't just run a pool and let it do its thing; you have to get involved in its thing and make it recycle processes for you.
There's no builtin support in the multiprocessing.Pool class for doing this.
You can, of course, build your own Pool. If you want to get fancy, you can look at the source to multiprocessing and do what it does. Or you can build a trivial pool out of a list of Process objects and a pair of Queues. Or you can just directly use Process objects without the abstraction of a pool.
Another reason you can have memory problems is that your individual processes are fine, but you just have too many of them.
And, in fact, that seems to be the case here.
You create a Pool of 4 workers in this function:
def make_X_and_Y_sets(sentences, i):
print(f'start: {i}')
pool = Pool()
# ...
… and you call this function for every chunk:
for i, line_chunk in enumerate(line_chunks(textfile)):
# ...
X_chunk, Y_chunk = make_X_and_Y_sets(line_chunk, i)
So, you end up with 4 new processes for every chunk. Even if each one has pretty low memory usage, having hundreds of them at once is going to add up.
Not to mention that you're probably severely hurting your time performance by having hundreds of processes competing over 4 cores, so you waste time in context switching and OS scheduling instead of doing real work.
As you pointed out in a comment, the fix for this is trivial: just make a single global pool instead of a new one for each call.
Sorry for getting all Columbo here, but… just one more thing… This code runs at the top level of your module:
for i, line_chunk in enumerate(line_chunks(textfile)):
# ...
X_chunk, Y_chunk = make_X_and_Y_sets(line_chunk, i)
… and that's the code that tries to spin up the pool and all the child tasks. But each child process in that pool needs to import this module, which means they're all going to end up running the same code, and spinning up another pool and a whole extra set of child tasks.
You're presumably running this on Linux or macOS, where the default startmethod is fork, which means multiprocessing can avoid this import, so you don't have a problem. But with the other startmethods, this code would basically be a forkbomb that eats up all of your system resources. And that includes spawn, which is the default startmethod on Windows. So, if there's ever any chance anyone might run this code on Windows, you should put all of that top-level code in a if __name__ == '__main__': guard.

Python: multithreading in infinite loop

I have a code which is basically running an infinite loop, and in each iteration of the loop I run some instructions. Some of these instructions have to run in "parallel", which I do by using multiprocessing. Here is an example of my code structure:
from multiprocessing import Pool
from multiprocessing.dummy import Pool as ThreadPool
def buy_fruit(fruit, number):
print('I bought '+str(number)+' times the following fruit:'+fruit)
return 'ok'
def func1(parameter1, parameter2):
myParameters=(parameter1,parameter2)
pool= Threadpool(2)
data = pool.starmap(func2,zip(myParameters))
return 'ok'
def func2(parameter1):
print(parameter1)
return 'ok'
while true:
myFruits=('apple','pear','orange')
myQuantities=(5,10,2)
pool= Threadpool(2)
data = pool.starmap(buy_fruit,zip(myFruits,myQuantities))
func1('hello', 'hola')
I agree it's a bit messy, because I have multi-processes within the main loop, but also within functions.
So everything works well, until the loop runs a few minutes and I get an error:
"RuntimeError: can't start new thread"
I saw online that this is due to the fact that I have opened too many threads.
What is the simplest way to close all my Threads by the end of each loop iteration, so I can restart "fresh" at the start of the new loop iteration?
Thank you in advance for your time and help!
Best,
Julia
PS: The example code is just an example, my real function opens many threads in each loop and each function takes a few seconds to execute.
You are creating a new ThreadPool object inside the endless loop, which is a likely cause to your problem, because you are not terminating the threads at the end of the loop. Have you tried creating the object outside of the endless loop?
pool = ThreadPool(2)
while True:
myFruits = ('apple','pear','orange')
myQuantities = (5,10,2)
data = pool.starmap(buy_fruit, zip(myFruits,myQuantities))
Alternatively, and to answer your question, if your use case for some reason requires creating a new ThreadPool Object in each loop iteration, use a ContextManager (with Notation) to make sure all threads are closed upon leaving the ContextManager.
while True:
myFruits = ('apple','pear','orange')
myQuantities = (5,10,2)
with ThreadPool(2) as pool:
data = pool.starmap(buy_fruit, zip(myFruits,myQuantities))
Notice however the noticable performance difference this has compared to the above code. Creating and terminating Threads is expensive, which is why the example above will run much faster, and is probably what you'll want to use.
Regarding your edit involving "nested ThreadPools": I would suggest to maintain one single instance of your ThreadPool, and pass references to your nested functions as required.
def func1(pool, parameter1, parameter2):
...
...
pool = ThreadPool(2)
while True:
myFruits=('apple','pear','orange')
myQuantities=(5,10,2)
data = pool.starmap(buy_fruit, zip(myFruits,myQuantities))
func1(pool, 'hello', 'hola')

Dumping a multiprocessing.Queue into a list

I wish to dump a multiprocessing.Queue into a list. For that task I've written the following function:
import Queue
def dump_queue(queue):
"""
Empties all pending items in a queue and returns them in a list.
"""
result = []
# START DEBUG CODE
initial_size = queue.qsize()
print("Queue has %s items initially." % initial_size)
# END DEBUG CODE
while True:
try:
thing = queue.get(block=False)
result.append(thing)
except Queue.Empty:
# START DEBUG CODE
current_size = queue.qsize()
total_size = current_size + len(result)
print("Dumping complete:")
if current_size == initial_size:
print("No items were added to the queue.")
else:
print("%s items were added to the queue." % \
(total_size - initial_size))
print("Extracted %s items from the queue, queue has %s items \
left" % (len(result), current_size))
# END DEBUG CODE
return result
But for some reason it doesn't work.
Observe the following shell session:
>>> import multiprocessing
>>> q = multiprocessing.Queue()
>>> for i in range(100):
... q.put([range(200) for j in range(100)])
...
>>> q.qsize()
100
>>> l=dump_queue(q)
Queue has 100 items initially.
Dumping complete:
0 items were added to the queue.
Extracted 1 items from the queue, queue has 99 items left
>>> l=dump_queue(q)
Queue has 99 items initially.
Dumping complete:
0 items were added to the queue.
Extracted 3 items from the queue, queue has 96 items left
>>> l=dump_queue(q)
Queue has 96 items initially.
Dumping complete:
0 items were added to the queue.
Extracted 1 items from the queue, queue has 95 items left
>>>
What's happening here? Why aren't all the items being dumped?
Try this:
import Queue
import time
def dump_queue(queue):
"""
Empties all pending items in a queue and returns them in a list.
"""
result = []
for i in iter(queue.get, 'STOP'):
result.append(i)
time.sleep(.1)
return result
import multiprocessing
q = multiprocessing.Queue()
for i in range(100):
q.put([range(200) for j in range(100)])
q.put('STOP')
l=dump_queue(q)
print len(l)
Multiprocessing queues have an internal buffer which has a feeder thread which pulls work off a buffer and flushes it to the pipe. If not all of the objects have been flushed, I could see a case where Empty is raised prematurely. Using a sentinel to indicate the end of the queue is safe (and reliable). Also, using the iter(get, sentinel) idiom is just better than relying on Empty.
I don't like that it could raise empty due to flushing timing (I added the time.sleep(.1) to allow a context switch to the feeder thread, you may not need it, it works without it - it's a habit to release the GIL).
# in theory:
def dump_queue(q):
q.put(None)
return list(iter(q.get, None))
# in practice this might be more resilient:
def dump_queue(q):
q.put(None)
return list(iter(lambda : q.get(timeout=0.00001), None))
# but neither case handles all the ways things can break
# for that you need 'managers' and 'futures' ... see Commentary
I prefer None for sentinels, but I would tend to agree with jnoller that mp.queue could use a safe and simple sentinel. His comments on risks of getting empty raised early is also valid, see below.
Commentary:
This is old and Python has changed, but, this does come up has a hit if you're having issues with lists <-> queue in MP Python. So, let's look a little deeper:
First off, this is not a bug, it's a feature: https://bugs.python.org/issue20147. To save you some time from reading that discussion and more details in the documentation, here are some highlights (kind of philosophical but I think it might help some who are starting with MP/MT in Python):
MP Queues are structures capable of being communicated with from different threads, different processes on the same system, and in fact can be different (networked) computers
In general with parallel/distributed systems, strict synchronization is expensive, so every time you use part of the API for any MP/MT datastructures, you need to look at the documentation to see what it promises to do, or not. Hint: if a function doesn't include the word "lock" or "semaphore" or "barrier" etc, then it will be some mixture of "asynchronous" and "best effort" (approximate), or what you might call "flaky."
Specific to this situation: Python is an interpreted language, with a famous single interpreter thread with it's famous "Global Interpreter Lock" (GIL). If your entire program is single-process, single threaded, then everything is hunky dory. If not (and with MP it's egregiously not), you need to give the interpreter some breathing room. time.sleep() is your friend. In this case, timeouts.
In your solution you are only using flaky functions - get() and qsize(). And the code is in fact worse than you might think - dial up the size of the queue and the size of the objects and you're likely to break things:
Now, you can work with flaky routines, but you need to give them room to maneuver. In your example you're just hammering that queue. All you need to do is change the line thing = queue.get(block=False) to instead be thing = queue.get(block=True,timeout=0.00001) and you should be fine.
The time 0.00001 is chosen carefully (10^-5), it's about the smallest that you can safely make it (this is where art meets science).
Some comments on why you need the timout: this relates to the internals of how MP queues work. When you 'put' something into an MP queue, it's not actually put into the queue, it's queued up to eventually be there. That's why qsize() happens to give you a correct result - that part of the code knows there's a pile of things "in" the queue. You just need to realize that an object "in" the queue is not the same thing as "i can now read it." Think of MP queues as sending a letter with USPS or FedEx - you might have a receipt and a tracking number showing that "it's in the mail," but the recipient can't open it yet. Now, to be even more specific, in your case you get '0' items accessible right away. That's because the single interpreter thread you're running hasn't had any chance to process stuff that's "queued up", so your first loop just queues up a bunch of stuff for the queue, but you're immediately forcing your single thread to try to do a get() before it's even had a chance to line up even a single object for you.
One might argue that it slows code down to have these timeouts. Not really - MP queues are heavy-weight constructs, you should only be using them to pass pretty heavy-weight "things" around, either big chunks of data, or at least complex computation. the act of adding 10^-5 seconds actually does is give the interpreter a chance to do thread scheduling - at which point it will see your backed-up put() operations.
Caveat
The above is not completely correct, and this is (arguably) an issue with the design of the get() function. The semantics of setting timeout to non-zero is that the get() function will not block for longer than that before returning Empty. But it might not actually be Empty (yet). So if you know your queue has a bunch of stuff to get, then the second solution above works better, or even with a longer timeout. Personally I think they should have kept the timeout=0 behavior, but had some actual built-in tolerance of 1e-5, because a lot of people will get confused about what can happen around gets and puts to MP constructs.
In your example code, you're not actually spinning up parallel processes. If we were to do that, then you'd start getting some random results - sometimes only some of the queue objects will be removed, sometimes it will hang, sometimes it will crash, sometimes more than one thing will happen. In the below example, one process crashes and the other hangs:
The underlying problem is that when you insert the sentinel, you need to know that the queue is finished. That should be done has part of the logic around the queue - if for example you have a classical master-worker design, then the master would need to push a sentinel (end) when the last task has been added. Otherwise you end up with race conditions.
The "correct" (resilient) approach is to involve managers and futures:
import multiprocessing
import concurrent.futures
def fill_queue(q):
for i in range(5000):
q.put([range(200) for j in range(100)])
def dump_queue(q):
q.put(None)
return list(iter(q.get, None))
with multiprocessing.Manager() as manager:
q = manager.Queue()
with concurrent.futures.ProcessPoolExecutor() as executor:
executor.submit(fill_queue, q) # add stuff
executor.submit(fill_queue, q) # add more stuff
executor.submit(fill_queue, q) # ... and more
# 'step out' of the executor
l = dump_queue(q)
# 'step out' of the manager
print(f"Saw {len(l)} items")
Let the manager handle your MP constructs (queues, dictionaries, etc), and within that let the futures handle your processes (and within that, if you want, let another future handle threads). This assures that things are cleaned up as you 'unravel' the work.

Categories

Resources