Error trying to run Pytorch on multiple GPUs

Error trying to run Pytorch on multiple GPUs - python

I'm trying to create a script with what I thought was a fairly simple Producer/Consumer queue. I'm using this on a system with two A4000 GPUs. Below is the relevant code.
import torch
from torch.multiprocessing import Process, set_start_method, Queue
def main():
input_data_queue = Queue(25)
send_data_queue = Queue(5)
for i in range(torch.cuda.device_count()):
Process_Data(input_data_queue, send_data_queue, i)
....
class Process_Data:
def __init__(self, in_q, out_q, gpu_id):
self.in_queue = in_q
self.out_queue = out_q
self.gpu_id = gpu_id
self.model = torch.hub.load('ultralytics/yolov5', 'custom', path='best.pt').to(torch.device(self.gpu_id))
self.model.eval()
....
if __name__ == "__main__":
set_start_method('spawn')
main()
I always get the error:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/usr/lib/python3.8/multiprocessing/spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/reductions.py", line 111, in rebuild_cuda_tensor
storage = storage_cls._new_shared_cuda(
File "/usr/local/lib/python3.8/dist-packages/torch/storage.py", line 630, in _new_shared_cuda
return eval(cls.__module__)._UntypedStorage._new_shared_cuda(*args, **kwargs)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
For calling the device, I've tried:
First creating a model with torch.device(0), then another class with torch.device(1)
torch.device("cuda:0") then torch.device("cuda:1")
torch.device("cuda") then torch.device("cuda")
torch.device("cuda", 0), then torch.device("cuda",1)
All variations I can find documented give the same error.
How can I get two models, running on two GPUs, sharing a work queue?

Related

How can you use easyocr with multiprocessing?

I tried to read text on images with easyocr on python, and I want to run it separately so it doesn't hold back other parts of the code. But when I call the function inside a multiprocessing loop, I get a notimplemented error. Here is an example of code.
import multiprocessing as mp
import easyocr
import cv2
def ocr_test(q, reader):
while not q.empty():
q.get()
img = cv2.imread('unknown.png')
result = reader.readtext(img)
if __name__ == '__main__':
q = mp.Queue()
reader = easyocr.Reader(['en'])
p = mp.Process(target=ocr_test, args=(q,reader))
p.start()
q.put('start')
p.join()
and this is the error I get.
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Program Files\Python310\lib\multiprocessing\spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "C:\Program Files\Python310\lib\multiprocessing\spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
File "C:\Python\venv\lib\site-packages\torch\multiprocessing\reductions.py", line 90, in rebuild_tensor
t = torch._utils._rebuild_tensor(storage, storage_offset, size, stride)
File "C:\Python\venv\lib\site-packages\torch\_utils.py", line 134, in _rebuild_tensor
t = torch.tensor([], dtype=storage.dtype, device=storage._untyped().device)
NotImplementedError: Could not run 'aten::empty.memory_format' with arguments from the 'QuantizedCPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::empty.memory_format' is only available for these backends: [CPU, Meta, MkldnnCPU, SparseCPU, SparseCsrCPU, BackendSelect, Python, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradLazy, AutogradXPU, AutogradMLC, AutogradHPU, AutogradNestedTensor, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, AutocastCPU, Autocast, Batched, VmapMode, Functionalize].
Is there a way to solve this problem?

OpenCV and custom multiprocessing

I try to build a webcam software which uses the webcam input and changes the image, then outputs this to a virtual webcam. So I got it running pretty good, but now I want to add a System Tray Icon to control it. As this is a GUI I need the my software to run on another thread.
So my (cut down) class looks like this:
import pyvirtualcam
import cv2
import effects
import numpy as np
class VirtualCam:
def __init__(self):
self.stopped = False
self.paused = False
[...]
def run(self):
with pyvirtualcam.Camera(width=1280, height=720, fps=self.FPS, fmt=pyvirtualcam.PixelFormat.BGR) as cam:
while not self.stopped:
if not self.paused: # only copy image if NOT paused!
success, img = self.cap.read()
else:
img = self.EMPTY_IMAGE
cam.send(effects.zoom(img, self.zoom))
cam.sleep_until_next_frame()
So this should be straight-forward.
Now on my gui thread, which is based on this code, I added menu points, where one starts a thread and nothing more (while the others can pause, stop, etc) (again this is cut down):
if __name__ == '__main__':
import itertools
import glob
import virtcam
from multiprocessing import Process
cam_thread = None
vcam = virtcam.VirtualCam()
# controlling vcam
def pause(sysTrayIcon):
vcam.pause()
def cont(sysTrayIcon):
vcam.cont()
def start(sysTrayIcon):
global cam_thread
# start threading for camera capture here
cam_thread = Process(target=vcam.run)
cam_thread.start()
def stop(sysTrayIcon):
global cam_thread
vcam.stop()
cam_thread.join()
menu_options = (
('Start', next(icons), start),
[...]
)
SysTrayIcon(next(icons), hover_text, menu_options, on_quit=bye, default_menu_index=1)
Okay so this should work, shouldn't it? When I click on "Start" in the tray-menu, I get an error:
Python WNDPROC handler failed
Traceback (most recent call last):
File "I:/Entwicklung/webcamZoomer/tray_gui.py", line 207, in command
self.execute_menu_option(id)
File "I:/Entwicklung/webcamZoomer/tray_gui.py", line 214, in execute_menu_option
menu_action(self)
File "I:/Entwicklung/webcamZoomer/tray_gui.py", line 256, in start
cam_thread.start()
File "H:\Programme\Python3\lib\multiprocessing\process.py", line 112, in start
self._popen = self._Popen(self)
File "H:\Programme\Python3\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "H:\Programme\Python3\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "H:\Programme\Python3\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
reduction.dump(process_obj, to_child)
File "H:\Programme\Python3\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: can't pickle cv2.VideoCapture objects
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "H:\Programme\Python3\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "H:\Programme\Python3\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input
Of course I used the search function, but I could only find things I don't really understand: It seems like OpenCV already uses multiprocessing. But why does it interfer with my code? Furthermore, I don't do any manual pickling; I actually need only the webcam input.
So - can someone help me out on this one? Thank you!
edit: I'm btw on Windows 10 and I only will need this software to be used on Windows systems.

Keras model conflicting with multiprocessing

I encountered a weird issue when I was trying to run 2 class methods concurrently in a third method. After eliminating large chunks of code, one at a time, I ended up with the example below.
Notes:
I must have a model as a class attribute, I cannot change that.
I need both tasks to run concurrently and I cannot get these 2 tasks out of the class because they interact with other class members
I get the same error using multiprocessing.Process(), so that's not going to fix the problem.
from concurrent.futures import ProcessPoolExecutor, as_completed
from tensorflow.keras.models import Model
class Example:
def __init__(self):
self.model = Model()
# comment out the line above and uncomment the line below, the error is gone
# self.model = None
def task1(self):
pass
def task2(self):
pass
def process(
self,
):
with ProcessPoolExecutor(2) as executor:
future_items = [
executor.submit(self.task1),
executor.submit(self.task2),
]
results = [
future_item.result() for future_item in as_completed(future_items)
]
print(results)
if __name__ == '__main__':
ex = Example()
ex.process()
Result:
2021-01-10 08:10:04.315386: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-01-10 08:10:04.315897: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/local/Cellar/python#3.8/3.8.7/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
obj = _ForkingPickler.dumps(obj)
File "/usr/local/Cellar/python#3.8/3.8.7/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
TypeError: cannot pickle 'weakref' object
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/emadboctor/Desktop/code/drl-algos/scratch.py", line 34, in <module>
ex.process()
File "/Users/emadboctor/Desktop/code/drl-algos/scratch.py", line 26, in process
results = [
File "/Users/emadboctor/Desktop/code/drl-algos/scratch.py", line 27, in <listcomp>
future_item.result() for future_item in as_completed(future_items)
File "/usr/local/Cellar/python#3.8/3.8.7/Frameworks/Python.framework/Versions/3.8/lib/python3.8/concurrent/futures/_base.py", line 432, in result
return self.__get_result()
File "/usr/local/Cellar/python#3.8/3.8.7/Frameworks/Python.framework/Versions/3.8/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
raise self._exception
File "/usr/local/Cellar/python#3.8/3.8.7/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/queues.py", line 239, in _feed
obj = _ForkingPickler.dumps(obj)
File "/usr/local/Cellar/python#3.8/3.8.7/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
TypeError: cannot pickle 'weakref' object

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu

import asyncio
import torch
import os
import pandas as pd
from flair.data import Sentence
from flair.embeddings import FlairEmbeddings, DocumentPoolEmbeddings, WordEmbeddings
device = torch.device("cpu")
print(device)
# first, declare how you want to embed
embeddings = DocumentPoolEmbeddings(
[WordEmbeddings('glove'), FlairEmbeddings('news-forward'), FlairEmbeddings('news-backward')])
path = os.getcwd()
df=pd.read_pickle(path+'/embedding_all_courses_2.pkl')
query_emd=[]
cos = torch.nn.CosineSimilarity(dim=0, eps=1e-6)
query= Sentence("some text")
embeddings.embed([query])
query_emd.append(query.embedding)
async def count(index,row):
for i in query_emd:
print(words,row['course_name'],cos(i, row['embedding']))
print(index)
async def main():
await asyncio.gather(*(count(index,row) for index,row in df.iterrows()))
if __name__ == "__main__":
import time
s = time.perf_counter()
asyncio.run(main())
elapsed = time.perf_counter() - s
print(f"{__file__} executed in {elapsed:0.2f} seconds.")
Trying to run pytorch cosine similarity in asyncio package to get the parallel result. Using a flair model for embedding text. Need to compare a text with a huge dataframe and get the most similar text as a result and response should be pretty fast. Can you please suggest an alternative way also? I also need to run this code on the CPU memory only not on the GPU Cuda system
Error:
raceback (most recent call last):
File "asyn_emd.py", line 74, in <module>
asyncio.run(main())
File "/home/linuxbrew/.linuxbrew/Cellar/python/3.7.5/lib/python3.7/asyncio/runners.py", line 43, in run
return loop.run_until_complete(main)
File "/home/linuxbrew/.linuxbrew/Cellar/python/3.7.5/lib/python3.7/asyncio/base_events.py", line 579, in run_until_complete
return future.result()
File "/asyn_emd.py", line 68, in main
await asyncio.gather(*(count(index,row) for index,row in df.iterrows()))
File "/asyn_emd.py", line 61, in count
print(words,row['course_name'],cos(i, row['embedding']))
File "/home/karthickaravindan/.virtualenvs/test/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/karthickaravindan/.virtualenvs/test/lib/python3.7/site-packages/torch/nn/modules/distance.py", line 75, in forward
return F.cosine_similarity(x1, x2, self.dim, self.eps)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Error group argument must be None for now in multiprocessing.pool

Below is my python script.
import multiprocessing
# We must import this explicitly, it is not imported by the top-level
# multiprocessing module.
import multiprocessing.pool
import time
from random import randint
class NoDaemonProcess(multiprocessing.Process):
# make 'daemon' attribute always return False
def _get_daemon(self):
return False
def _set_daemon(self, value):
pass
daemon = property(_get_daemon, _set_daemon)
# We sub-class multiprocessing.pool.Pool instead of multiprocessing.Pool
# because the latter is only a wrapper function, not a proper class.
class MyPool(multiprocessing.pool.Pool):
Process = NoDaemonProcess
def sleepawhile(t):
print("Sleeping %i seconds..." % t)
time.sleep(t)
return t
def work(num_procs):
print("Creating %i (daemon) workers and jobs in child." % num_procs)
pool = multiprocessing.Pool(num_procs)
result = pool.map(sleepawhile,
[randint(1, 5) for x in range(num_procs)])
# The following is not really needed, since the (daemon) workers of the
# child's pool are killed when the child is terminated, but it's good
# practice to cleanup after ourselves anyway.
pool.close()
pool.join()
return result
def test():
print("Creating 5 (non-daemon) workers and jobs in main process.")
pool = MyPool(20)
result = pool.map(work, [randint(1, 5) for x in range(5)])
pool.close()
pool.join()
print(result)
if __name__ == '__main__':
test()
This is running in ubuntu server and i'm using python 3.6.7
I had this working properly after apt-get upgrade Im getting error as
group argument must be None for now
What might be the error that I'm facing.
Should i change the python version. Should I roll back the changes after upgrading.
EDIT 1
Stacktrace exception:-
Traceback (most recent call last):
File "/src/mainapp.py", line 104, in bulkfun
p = MyPool(20)
File "/usr/lib/python3.6/multiprocessing/pool.py", line 175, in __init__
self._repopulate_pool()
File "/usr/lib/python3.6/multiprocessing/pool.py", line 236, in _repopulate_pool
self._wrap_exception)
File "/usr/lib/python3.6/multiprocessing/pool.py", line 250, in _repopulate_pool_static
wrap_exception)
File "/usr/lib/python3.6/multiprocessing/process.py", line 73, in __init__
assert group is None, 'group argument must be None for now'
AssertionError: group argument must be None for now
EDIT 2
The code works for python2.7, python3.5
But if i run with python 3.6.7 i got the error as below.
Creating 5 (non-daemon) workers and jobs in main process.
Traceback (most recent call last):
File "multi.py", line 52, in <module>
test()
File "multi.py", line 43, in test
pool = MyPool(5)
File "/usr/lib/python3.6/multiprocessing/pool.py", line 175, in __init__
self._repopulate_pool()
File "/usr/lib/python3.6/multiprocessing/pool.py", line 236, in _repopulate_pool
self._wrap_exception)
File "/usr/lib/python3.6/multiprocessing/pool.py", line 250, in _repopulate_pool_static
wrap_exception)
File "/usr/lib/python3.6/multiprocessing/process.py", line 73, in __init__
assert group is None, 'group argument must be None for now'
AssertionError: group argument must be None for now

I came across to this issue while upgrading Travis distribution from 14.04 to 16.04 and python 3.6 started to fail. I have found a solution to this problem as it was a fix to another package - FIX: Python 2.7-3.7.1 compatible NonDaemonPool
class NonDaemonPool(multiprocessing.pool.Pool):
def Process(self, *args, **kwds):
proc = super(NonDaemonPool, self).Process(*args, **kwds)
class NonDaemonProcess(proc.__class__):
"""Monkey-patch process to ensure it is never daemonized"""
#property
def daemon(self):
return False
#daemon.setter
def daemon(self, val):
pass
proc.__class__ = NonDaemonProcess
return proc

same here.
This code worked in my case (python 3.6.7). (https://stackoverflow.com/a/53180921/10742388)
class NoDaemonProcess(multiprocessing.Process):
#property
def daemon(self):
return False
#daemon.setter
def daemon(self, value):
pass
class NoDaemonContext(type(multiprocessing.get_context())):
Process = NoDaemonProcess
# We sub-class multiprocessing.pool.Pool instead of multiprocessing.Pool
# because the latter is only a wrapper function, not a proper class.
class MyPool(multiprocessing.pool.Pool):
def __init__(self, *args, **kwargs):
kwargs['context'] = NoDaemonContext()
super(MyPool, self).__init__(*args, **kwargs)
I think this problem comes from the change of process.py (https://github.com/python/cpython/blob/8ca0fa9d2f4de6e69f0902790432e0ab2f37ba68/Lib/multiprocessing/process.py#L189)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Error trying to run Pytorch on multiple GPUs - python

Related

How can you use easyocr with multiprocessing?

OpenCV and custom multiprocessing

Keras model conflicting with multiprocessing

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu

Error group argument must be None for now in multiprocessing.pool

Categories

Resources