Anaconda Acclerate / NumbaPro CUDA Linking Error OSX

Anaconda Acclerate / NumbaPro CUDA Linking Error OSX - python

Overall goal is to use NumbaPro to run some functions on the GPU (on OSX 10.8.3).
Before starting, I just wanted to get everything set up. According to this page I installed CUDA, registered as a CUDA developer, downloaded the Compiler SDK and set up the NUMBAPRO_NVVM=/path/to/libnvvm.dylib environment variable.
However, running this basic test function:
from numbapro import autojit
#autojit(target='gpu')
def my_function(x):
if x == 0.0:
return 1.0
else:
return x*x*x
print my_function(4.4)
exit()
Brings up this error:
File ".../anaconda/lib/python2.7/site-packages/numba/decorators.py", line 207, in compile_function
compiled_function = dec(f)
File "...lib/python2.7/site-packages/numbapro/cudapipeline/decorators.py", line 35, in _jit_decorator
File "...lib/python2.7/site-packages/numbapro/cudapipeline/decorators.py", line 128, in __init__
File "...lib/python2.7/site-packages/numbapro/cudapipeline/environment.py", line 31, in generate_ptx
File "...lib/python2.7/site-packages/numbapro/cudapipeline/environment.py", line 186, in _link_llvm_math_intrinsics
KeyError: 1
I've tried #vectorize'ing instead of autojit, same error.
#autojit by itself with no target works fine.
Any ideas?

For posterity's sake, I asked Continuum Support. They responded:
It seems that you are running a CUDA GPU with compute capability 1.x. NVVM only supports CC2.0 and above. We definitely should have a better error reporting and make it clear in the NumbaPro documentation for the supported compute capability.

Related

Python program don't work properly from LabVIEW, but works itself

I have a python program which makes some optimization via scipy.optimize.differential_evolution(...). My python program works properly if I launch it via double-click or PyCharm (using system interpreter, Python 3.6.8). But If I try to launch it from LabVIEW 2019 (32-bit) by Python Node, I have a mistake with differential_evolution method. The problem is in differential_evolution method because the mistake vanishes when I turn off this method. The problem can't be in input (from LabView) arguments because I don't use these LabView arguments in my function now.
My function which I call is shown on the below snippet:
def make_in_labview()
#initialization of constants like bounds for optimization e. t. c.
...
# Main problem:
result = sp_opt.differential_evolution(func=myClass.deviation, bounds=optimization_bounds, args=[[empiric_set, funcs, False]],
strategy='best1bin', maxiter=10, tol=0.0001, popsize=30,
mutation=0.35, recombination=0.7, workers=2)
# (sp_opt - scipy.optimize)
return 0
Here is main mistake which LabView outputs:
Function Name: make_tomography
Python returned the following error: <class 'AttributeError'>
module 'sys' has no attribute 'argv'
And here is seemingly unuseful Stack information about problem's source
*Call Stack information:
File "C:\Users...\main.py", line 3171, in make_tomography
result = chip.optimization_diff_evolution(optimization_bounds, [empiric_set, funcs, False])
File "C:\Users...\main.py", line 564, in optimization_diff_evolution
mutation=0.35, recombination=0.7, workers=2)
File "C:\Users\QPrac\AppData\Local\Programs\Python\Python36-32\lib\site-packages\scipy\optimize_differentialevolution.py", line 307, in differential_evolution
constraints=constraints) as solver:
File "C:\Users\QPrac\AppData\Local\Programs\Python\Python36-32\lib\site-packages\scipy\optimize_differentialevolution.py", line 501, in init
self._mapwrapper = MapWrapper(workers)
...
e.t.c. similarly or about multiprocessing like
File "C:\Users\QPrac\AppData\Local\Programs\Python\Python36-32\Lib\multiprocessing\context.py", line 119, in Pool
context=self.get_context())*
Can you help me to understand the problem?
P.S. Also, I have only python 3.6.8., 32-bit (as It needs to work properly with LabVIEW 32-bit) I installed on my Windows. My Windows is 64-bit, but I haven't an opportunity to install LabVIEW 64-bit.

The problem has been solved: if you use LabView's python node, you should to accurately check how you pass additional arguments (i. e. 'args' according to scipy manual) to the diff evolution method of scipy.

Python module Cupy function gives error when using cupy.einsum()

I am dealing with a problem with Cupy. I am currently using Cupy and it works great at a very satisfactory high speed. But I have a problem when I use cupy.einsum() method
I am using the same syntax with Numpy without any error. But when using Cupy it gives me an error. Here is the code section
import numpy as np
A = np.random.randn(2,3,10)
B = np.random.randn(3,4)
C = np.einsum('ijk,jl->ijl',A,B)
This works quite well and I get the result that I want consistently. However, when I write the same code with Cupy
import cupy as cp
A = cp.random.randn(2,3,10)
B = cp.random.randn(3,4)
C = cp.einsum('ijk,jl->ijl',A,B)
When I run this, A and B are calculated. But It gives me an error when calculating C. This is the error:
Traceback (most recent call last):
File "", line 4, in
C = cp.einsum('ijk,jl->ijl',A,B)
File
"C:\Users\Okan\anaconda3\lib\site-packages\cupy\linalg\einsum.py",
line 389, in einsum
result_dtype = cupy.result_type(*operands) if dtype is None else dtype
File "<array_function internals>", line 6, in result_type
TypeError: no implementation found for 'numpy.result_type' on types
that implement array_function: [<class 'cupy.core.core.ndarray'>]
I would be so glad if you have an idea or solution about this issue.
Thank you.

For those who are experiencing the same problem, open a new environment in Conda and install the python version above 3.9. After that, when you install cupy by
conda install cupy
it will directly install the latest version (v.7.8 or higher). The problem was based on the version of the cupy. After upgarding, the problem fixed.

Why does the Dequantize node fail to prepare?

Background
I'm playing around with MediaPipe for hand tracking and found this useful wrapper for loading MediaPipe's hand_landmark.tflite model. It works without any problems for me on Ubuntu 18.04 with Tensorflow 1.14.0.
However, when I try use a newer recently released model, I run into the following error:
INFO: Initialized TensorFlow Lite runtime.
Traceback (most recent call last):
File "/home/user/code/.../repo/models/test_model.py", line 12, in <module>
use_mediapipe_model()
File "/home/user/code/.../repo/models/test_model.py", line 8, in use_mediapipe_model
interp_joint.allocate_tensors()
File "/home/user/code/env/lib/python3.6/site-packages/tensorflow/lite/python/interpreter.py", line 95, in allocate_tensors
return self._interpreter.AllocateTensors()
File "/home/user/code/env/lib/python3.6/site-packages/tensorflow/lite/python/interpreter_wrapper/tensorflow_wrap_interpreter_wrapper.py", line 106, in AllocateTensors
return _tensorflow_wrap_interpreter_wrapper.InterpreterWrapper_AllocateTensors(self)
RuntimeError: tensorflow/lite/kernels/dequantize.cc:62 op_context.input->type == kTfLiteUInt8 || op_context.input->type == kTfLiteInt8 was not true.Node number 0 (DEQUANTIZE) failed to prepare.
When looking at the two models in Netron, I can see that the newer model uses nodes of the type Dequantize which seem to cause the problem. As I'm a beginner when it comes to Tensorflow I don't really know where to go from here.
Code to reproduce the error
from pathlib import Path
import tensorflow as tf
def use_mediapipe_model():
interp_joint = tf.lite.Interpreter(
f"{Path(__file__).parent}/hand_landmark.tflite") # path to model
interp_joint.allocate_tensors()
if __name__ == "__main__":
use_mediapipe_model()
Question
Is the problem related to the version of Tensorflow that I'm using or am I doing something wrong when it comes to loading the .tflite models?

Doesn't work in TF 1.14.0. You need at least 1.15.2

h5py.File function throws NameError for mpi4py

I am using h5py with mpi4py. I am reading an h5 file with as h5py.File(fname, 'w', driver='mpio', comm=MPI.COMM_WORLD) but I got an NameError.
I checked the source code from where the error comes and it needs h5py.h5.get_config().mpi to be True in order to import mpi4py. But it's set to False.
I have mpi4py installed and it works well.
The problems began when I updated numpy, I tried to go back to the previous version but it did'nt solve the problem. Before this update I had no problem with h5py
the full message error is :
File "main.py", line 87, in <module>
memory = H5_memory(MEM_SIZE, STATE_SHAPE , fname)
File "/My/work/dir/memory.py", line 185, in __init__
self.f = h5py.File(fname, 'w', driver='mpio', comm=MPI.COMM_WORLD)
File "/home/miniconda/envs/lib/python3.5/site-packages/h5py/_hl/files.py", line 270, in __init__
fapl = make_fapl(driver, libver, **kwds)
File "/hom/miniconda/envs/lib/python3.5/site-packages/h5py/_hl/files.py", line 73, in make_fapl
kwds.setdefault('info', mpi4py.MPI.Info())
NameError: name 'mpi4py' is not defined
Do you have any idea on how to solve this problem? I didn't find any answer that could help me online.
Thank you

Looking at the installation documentation for h5py, it looks like installing a parallelized version of the HDF5 library with MPI support is an option so you might have installed it without that option or misconfigured an environment variable such as HDF5_MPI=ON.

RuntimeError: Attempting to deserialize object on a CUDA device

I encounter a RunTimeError while I am trying to run the code in my machine's CPU instead of GPU. The code is originally from this GitHub project - IBD: Interpretable Basis Decomposition for Visual Explanation. This is for a research project. I tried putting the CUDA as false and looked at other solutions on this website.
GPU = False # running on GPU is highly suggested
CLEAN = False # set to "True" if you want to clean the temporary large files after generating result
APP = "classification" # Do not change! mode choide: "classification", "imagecap", "vqa". Currently "imagecap" and "vqa" are not supported.
CATAGORIES = ["object", "part"] # Do not change! concept categories that are chosen to detect: "object", "part", "scene", "material", "texture", "color"
CAM_THRESHOLD = 0.5 # the threshold used for CAM visualization
FONT_PATH = "components/font.ttc" # font file path
FONT_SIZE = 26 # font size
SEG_RESOLUTION = 7 # the resolution of cam map
BASIS_NUM = 7 # In decomposition, this is to decide how many concepts are used to interpret the weight vector of a class.
Here is the error:
Traceback (most recent call last):
File "test.py", line 22, in <module>
model = loadmodel()
File "/home/joshuayun/Desktop/IBD/loader/model_loader.py", line 48, in loadmodel
checkpoint = torch.load(settings.MODEL_FILE)
File "/home/joshuayun/.local/lib/python3.6/site-packages/torch/serialization.py", line 387, in load
return _load(f, map_location, pickle_module, **pickle_load_args)
File "/home/joshuayun/.local/lib/python3.6/site-packages/torch/serialization.py", line 574, in _load
result = unpickler.load()
File "/home/joshuayun/.local/lib/python3.6/site-packages/torch/serialization.py", line 537, in persistent_load
deserialized_objects[root_key] = restore_location(obj, location)
File "/home/joshuayun/.local/lib/python3.6/site-packages/torch/serialization.py", line 119, in default_restore_location
result = fn(storage, location)
File "/home/joshuayun/.local/lib/python3.6/site-packages/torch/serialization.py", line 95, in _cuda_deserialize
device = validate_cuda_device(location)
File "/home/joshuayun/.local/lib/python3.6/site-packages/torch/serialization.py", line 79, in validate_cuda_device
raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but
torch.cuda.is_available() is False. If you are running on a CPU-only machine,
please use torch.load with map_location='cpu' to map your storages to the CPU.

If you don't have gpu then use map_location=torch.device('cpu') with load model.load()
my_model = net.load_state_dict(torch.load('classifier.pt', map_location=torch.device('cpu')))

Just giving a smaller answer. To solve this, you could change the parameters of the function named load() in the serialization.py file. This is stored in: ./site-package/torch/serialization.py
Write:
def load(f, map_location='cpu', pickle_module=pickle, **pickle_load_args):
instead of:
def load(f, map_location=None, pickle_module=pickle, **pickle_load_args):
Hope it helps.

"If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU."
model = torch.load('model/pytorch_resnet50.pth',map_location ='cpu')

I have tried add "map_location='cpu'" in load function, but it doesn't work for me.
If you use a model trained by GPU on a CPU only computer, then you may meet this bug. And you can try this solution.
solution
class CPU_Unpickler(pickle.Unpickler):
def find_class(self, module, name):
if module == 'torch.storage' and name == '_load_from_bytes':
return lambda b: torch.load(io.BytesIO(b), map_location='cpu')
else: return super().find_class(module, name)
contents = CPU_Unpickler(f).load()

You can remap the Tensor location at load time using the map_location argument to torch.load.
On the following repository,in file "test.py", model = loadmodel() calls the model_loader.py file to load the model with torch.load().
While this will only map storages from GPU0, add the map_location:
torch.load(settings.MODEL_FILE, map_location={'cuda:0': 'cpu'})
In the model_loader.py file, add, map_location={'cuda:0': 'cpu'} whereever, torch.load() function is called.

As you state the problem hints you are trying to use a cuda-model on non-cuda machine. Pay attention to the details of the error message - please use torch.load with map_location='cpu' to map your storages to the CPU. I've had similar problem when I tried to load (from a checkpoint) pre-trained model on my cpu-only machine. The model was trained on a cuda machine so it couldn't be properly loaded. Once I added the map_location='cpu' argument to the load method everything worked.

I faced the same problem, Instead of modifying the existing code, which was running good yesterday, First I checked whether my GPU is free or not running
nvidia-smi
I could see that, its under utilized, therefore as traditional solution, I shutdown the laptop and restarted it and it got working.
(One thing I kept in mind that, earlier it was working and I haven't changed anything in code therefore it should work once I restart it and it got working and I was able to use the GPU)

For some reason, this also happens with portainer, even though your machines have GPUs. A crude solution would be to just restart it. It usually happens if you fiddle with the state of the container after it has been deployed (e.g. you change the restart policies while the container is running), which makes me think it's some portainer issue.

nothing worked for me-
my pickle was a custom object- in a script file with the line
device = torch.device("cuda")
finally, I managed to take Spikes solution, and adapt it to my needs with simple open(path,"rb"), so for any other unfortunate developers:
class CPU_Unpickler(pickle.Unpickler):
def find_class(self, module, name):
if module == 'torch.storage' and name == '_load_from_bytes':
return lambda b: torch.load(io.BytesIO(b), map_location='cpu')
else: return super().find_class(module, name)
contents = CPU_Unpickler(open(path,"rb")).load()

There is much easier way. Just add map_location to torch.load(path, map_location='cpu') as map_location='cpu':
def load_checkpoint(path) -> 'LanguageModel':
checkpoint = torch.load(path, map_location='cpu')
model = LanguageModel(
number_of_tokens=checkpoint['number_of_tokens'],
max_sequence_length=checkpoint['max_sequence_length'],
embedding_dimension=checkpoint['embedding_dimension'],
number_of_layers=checkpoint['number_of_layers'],
number_of_heads=checkpoint['number_of_heads'],
feed_forward_dimension=checkpoint['feed_forward_dimension'],
dropout_rate=checkpoint['dropout_rate']
).to(get_device())
model.load_state_dict(checkpoint['model_state_dict'])
return model.to(get_device())

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Anaconda Acclerate / NumbaPro CUDA Linking Error OSX - python

Related

Python program don't work properly from LabVIEW, but works itself

Python module Cupy function gives error when using cupy.einsum()

Why does the Dequantize node fail to prepare?

h5py.File function throws NameError for mpi4py

RuntimeError: Attempting to deserialize object on a CUDA device

Categories

Resources