Is it possible to have progress bar with map_async from multiprocessing:
toy example:
from multiprocessing import Pool
import tqdm
def f(x):
print(x)
return x*x
n_job = 4
with Pool(processes=n_job) as pool:
results = pool.map_async(f, range(10)).get()
print(results)
something like this:
data = []
with Pool(processes=10) as pool:
for d in tqdm.tqdm(
pool.imap(f, range(10)),
total=10):
data.append(d)
There are a couple of ways of achieving what you want that I can think of:
Use apply_async with a callback argument to update the progress bar as each result becomes available.
Use imap and as you iterate the results you can update the progress bar.
There is a slight problem with imap in that the results must be returned in task-submission order, which is of course what you want. But that order does not necessarily reflect the order in which the submitted tasks complete so the progress bar is not necessarily getting updated as frequently as it otherwise might. But I will show that solution first since it is the simplest and probably adequate:
from multiprocessing import Pool
import tqdm
def f(x):
import time
time.sleep(1) # for demo purposes
return x*x
# Required by Windows:
if __name__ == '__main__':
pool_size = 4
results = []
with Pool(processes=pool_size) as pool:
with tqdm.tqdm(total=10) as pbar:
for result in pool.imap(f, range(10)):
results.append(result)
pbar.update()
print(results)
The solution that uses apply_async:
from multiprocessing import Pool
import tqdm
def f(x):
import time
time.sleep(1) # for demo purposes
return x*x
# Required by Windows:
if __name__ == '__main__':
def my_callback(_):
# We don't care about the actual result.
# Just update the progress bar:
pbar.update()
pool_size = 4
with Pool(processes=pool_size) as pool:
with tqdm.tqdm(total=10) as pbar:
async_results = [pool.apply_async(f, args=(x,), callback=my_callback) for x in range(10)]
results = [async_result.get() for async_result in async_results]
print(results)
I think this is it:
from multiprocessing import Pool
import tqdm
def f(x):
return x*x
n_job = 4
data = []
with Pool(processes=10) as pool:
for d in tqdm.tqdm(
pool.map_async(f, range(10)).get(),
total=10):
data.append(d)
print(data)
Related
I am trying to use the cryptofeed module to receive API OHLC data, store the data in a global variable by placing the cryptofeed stream in a separate multiprocessing thread, then accessing the global variable from a separate asyncio instance.
I am having trouble storing the global data using the multiprocess, the async function of close(), returns an empty pandas dataframe. I would like a suggestion on how to approach this problem.
from cryptofeed import FeedHandler
from cryptofeed.backends.aggregate import OHLCV
from cryptofeed.defines import TRADES
from cryptofeed.exchanges import BinanceFutures
import pandas as pd
from multiprocessing import Process
from concurrent.futures import ProcessPoolExecutor
import asyncio
data1 = pd.DataFrame() # Create an empty DataFrame
queue = multiprocessing.Queue()
async def ohlcv(data):
global data1
# Convert data to a Pandas DataFrame
df = pd.DataFrame.from_dict(data, orient='index')
# Reset the index
df.reset_index(inplace=True)
df.index = [pd.Timestamp.now()]
data1 = data1.append(df)
queue.put('nd')
# Append the rows of df to data
async def close(data):
while True:
print(data)
await asyncio.sleep(15)
def main1():
f = FeedHandler()
f.add_feed(BinanceFutures(symbols=['BTC-USDT-PERP'], channels=[TRADES], callbacks={TRADES: OHLCV(ohlcv, window=10)}))
f.run()
if __name__ == '__main__':
p = Process(target=main1)
p.start()
asyncio.run(close(data1))
It appears that you are trying to combine asyncio with multiprocessing in some fashion. I don't have access to your FeedHandler and BinanceFutures classes, so I will just have main1 directly call ohlcv and since it is running in a separate process from the main process, which is using asyncio, I can't see any reason with the code you posted why oh1cv would need to be a coroutine (asyncio function).
asyncio has a provision for running multiprocessing tasks and that is the way to proceed. So we run close as a coroutine and it runs main1 in a child process (a multiprocessing pool process, actually) returning back the result that was returned from main1. There is no need for explicit queue operations to return any result:
import asyncio
import pandas as pd
from concurrent.futures import ProcessPoolExecutor
def ohlcv(data):
# Convert data to a Pandas DataFrame
df = pd.DataFrame.from_dict(data, orient='index')
# Reset the index
df.reset_index(inplace=True)
df.index = [pd.Timestamp.now()]
return df
def main1():
"""
f = FeedHandler()
f.add_feed(BinanceFutures(symbols=['BTC-USDT-PERP'], channels=[TRADES], callbacks={TRADES: OHLCV(ohlcv, window=10)}))
f.run()
"""
return ohlcv({'a': 1})
async def close():
loop = asyncio.get_running_loop()
with ProcessPoolExecutor(1) as executor:
return await loop.run_in_executor(executor, main1)
if __name__ == '__main__':
df = asyncio.run(close())
print(df)
Prints:
index 0
2023-01-08 15:59:44.939261 a 1
I can't seem to figure out why my results are not appending while using the multiprocessing package.
I've looked at many similar questions but can't seem to figure out what I'm doing wrong. This my first attempt at multiprocessing (as you might be able to tell) so I don't quite understand all the jargon in the documentation which might be part of the problem
Running this in PyCharm prints an empty list instead of the desired list of row sums.
import numpy as np
from multiprocessing import Pool
import timeit
data = np.random.randint(0, 100, size=(5, 1000))
def add_these(numbers_to_add):
added = np.sum(numbers_to_add)
return added
results = []
tic = timeit.default_timer() # start timer
pool = Pool(3)
if __name__ == '__main__':
for row in data:
pool.apply_async(add_these, row, callback=results.append)
toc = timeit.default_timer() # start timer
print(toc - tic)
print(results)
EDIT: Closing and joining pool, then printing results within the if name==main block results in the following error being raised repeatedly until I manually stop execution:
RuntimeError:
An attempt has been made to start a new process before the current process has finished its bootstrapping phase. This probably means that you are not using fork to start your child processes and you have forgotten to use the proper idiom in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program is not going to be frozen to produce an executable.
Code to reproduce error:
import numpy as np
from multiprocessing import Pool, freeze_support
import timeit
data = np.random.randint(0, 100, size=(5, 1000))
def add_these(numbers_to_add):
added = np.sum(numbers_to_add)
return added
results = []
tic = timeit.default_timer() # start timer
pool = Pool(3)
if __name__ == '__main__':
for row in data:
pool.apply_async(add_these, (row,), callback=results.append)
pool.close()
pool.join()
print(results)
toc = timeit.default_timer() # end timer
print(toc - tic)
I think this would be a more correct way:
import numpy as np
from multiprocessing import Pool
import timeit
data = np.random.randint(0, 100, size=(5, 1000))
def add_these(numbers_to_add):
added = np.sum(numbers_to_add)
return added
results = []
if __name__ == '__main__':
with Pool(processes=3) as pool:
for row in data:
results = pool.apply_async(add_these, (row,))
try:
print(results.get(timeout=1))
except TimeoutError:
print("Multiprocessing Timeout")
I want to know if is there a way to make multiprocessing working in this code. What should I change or if there exist other function in multiprocessing that will allow me to do that operation.
You can call the locateOnScreen('calc7key.png') function to get the screen coordinates. The return value is a 4-integer tuple: (left, top, width, height).
I got error:
checkNumber1 = resourceBlankLightTemp[1]
TypeError: 'Process' object does not support indexing
import pyautogui, time, os, logging, sys, random, copy
import multiprocessing as mp
BLANK_DARK = os.path.join('images', 'blankDark.png')
BLANK_LIGHT = os.path.join('images', 'blankLight.png')
def blankFirstDarkResourcesIconPosition():
blankDarkIcon = pyautogui.locateOnScreen(BLANK_DARK)
return blankDarkIcon
def blankFirstLightResourcesIconPosition():
blankLightIcon = pyautogui.locateOnScreen(BLANK_LIGHT)
return blankLightIcon
def getRegionOfResourceImage():
global resourceIconRegion
resourceBlankLightTemp = mp.Process(target = blankFirstLightResourcesIconPosition)
resourceBlankDarkTemp = mp.Process(target = blankFirstDarkResourcesIconPosition)
resourceBlankLightTemp.start()
resourceBlankDarkTemp.start()
if(resourceBlankLightTemp == None):
checkNumber1 = 2000
else:
checkNumber1 = resourceBlankLightTemp[1]
if(resourceBlankDarkTemp == None):
checkNumber2 = 2000
else:
checkNumber2 = resourceBlankDarkTemp[1]
In general, if you just want to use multiprocessing to run existing CPU-intensive functions in parallel, it is easiest to do through a Pool, as shown in the example at the beginning of the documentation:
# ...
def getRegionOfResourceImage():
global resourceIconRegion
with mp.Pool(2) as p:
resourceBlankLightTemp, resourceBlankDarkTemp = p.map(
lambda x: x(), [blankFirstLightResourcesIconPosition,
blankFirstDarkResourcesIconPosition])
if(resourceBlankLightTemp == None):
# ...
I'm writing some bogus practice code so I can implement the ideas once I have a better idea of what I'm doing. The code is designed for multiprocessing in order to reduce the runtime by splitting the stack of three arrays into several pieces horizontally that are then executed in parallel via map_async. However, the code seems to hang at the first .recv() method of a pipe, even though the Pipe() object is fully defined, and I'm not sure why it's doing it. If I manually define each Pipe() object then the code works just fine, but as soon as I iterate the process the code hangs at sec1 = pipes[0][0].recv(). How can I fix this?
from multiprocessing import Process, Pipe
import multiprocessing as mp
import numpy as np
import math
num_sections = 4
pipes_send = [None]*num_sections
pipes_recv = [None]*num_sections
pipes = zip(pipes_recv,pipes_send)
for i in range(num_sections):
pipes[i] = list(pipes[i])
for i in range(num_sections):
pipes[i][0],pipes[i][1] = Pipe()
def f(sec_num):
for plane in range(3):
hist_sec[sec_num][plane] += rand_sec[sec_num][plane]
if sec_num == 0:
pipes[0][1].send(hist_sec[sec_num])
pipes[0][1].close()
if sec_num == 1:
pipes[1][1].send(hist_sec[sec_num])
pipes[1][1].close()
if sec_num == 2:
pipes[2][1].send(hist_sec[sec_num])
pipes[2][1].close()
if sec_num == 3:
pipes[3][1].send(hist_sec[sec_num])
pipes[3][1].close()
hist = np.zeros((3,512,512))
hist_sec = []
randmat = np.random.rand(3,512,512)
rand_sec = []
for plane in range(3):
hist_div = np.array_split(hist[plane], num_sections)
hist_sec.append(hist_div)
randmatsplit = np.array_split(randmat[plane], num_sections)
rand_sec.append(randmatsplit)
hist_sec = np.rollaxis(np.asarray(hist_sec),1,0)
rand_sec = np.rollaxis(np.asarray(rand_sec),1,0)
if __name__ == '__main__':
pool = mp.Pool(num_sections)
args = np.arange(num_sections)
pool.map_async(f, args, chunksize=1)
sec1 = pipes[0][0].recv()
sec2 = pipes[1][0].recv()
sec3 = pipes[2][0].recv()
sec4 = pipes[3][0].recv()
for plane in range(3):
hist_plane = np.concatenate((sec1[plane],sec2[plane],sec3[plane],sec4[plane]),axis=0)
hist_full.append(hist_plane)
pool.close()
pool.join()
I have a code like this
x = 3;
y = 3;
z = 10;
ar = np.zeros((x,y,z))
from multiprocessing import Process, Pool
para = []
process = []
def local_func(section):
print "section %s" % str(section)
ar[2,2,section] = 255
print "value set %d", ar[2,2,section]
pool = Pool(1)
run_list = range(0,10)
list_of_results = pool.map(local_func, run_list)
print ar
The value in ar was not changed with multithreading, what might be wrong?
thanks
You're using multiple processes here, not multiple threads. Because of that, each instance of local_func gets its own separate copy of ar. You can use a custom Manager to create a shared numpy array, which you can pass to each child process and get the results you expect:
import numpy as np
from functools import partial
from multiprocessing import Process, Pool
import multiprocessing.managers
x = 3;
y = 3;
z = 10;
class MyManager(multiprocessing.managers.BaseManager):
pass
MyManager.register('np_zeros', np.zeros, multiprocessing.managers.ArrayProxy)
para = []
process = []
def local_func(ar, section):
print "section %s" % str(section)
ar[2,2,section] = 255
print "value set %d", ar[2,2,section]
if __name__ == "__main__":
m = MyManager()
m.start()
ar = m.np_zeros((x,y,z))
pool = Pool(1)
run_list = range(0,10)
func = partial(local_func, ar)
list_of_results = pool.map(func, run_list)
print ar
Well, multi-threading and multi-processing are different things.
With multi-threading threads share access to the same array.
With multi-processing each process has its own copy of the array.
multiprocessing.Pool is a process pool, not a thread pool.
If you want thread pool, use multiprocess.pool.ThreadPool:
Replace:
from multiprocessing import Pool
with:
from multiprocessing.pool import ThreadPool as Pool