I am a beginner learning how to exploit GPU for parallel computation using python and cupy. I would like to implement my code to simulate some problems in physics and require to use complex number, but don't know how to manage it. Although there are examples in Cupy's official document, it only mentions about include complex.cuh library and how to declare a complex variable. I can't find any example about how to assign a complex number correctly, as well ass how to call the function in the complex.cuh library to do calculation.
I am stuck in line 11 of this code. I want to make a complex number value equal x[tIdx]+j*y[t_Idx], j is the imaginary number. I tried several ways and no one works, so I left this one here.
import cupy as cp
import time
add_kernel = cp.RawKernel(r'''
#include <cupy/complex.cuh>
extern "C" __global__
void test(double* x, double* y, complex<float>* z){
int tId_x = blockDim.x*blockIdx.x + threadIdx.x;
int tId_y = blockDim.y*blockIdx.y + threadIdx.y;
complex<float>* value = complex(x[tId_x],y[tId_y]);
z[tId_x*blockDim.y*gridDim.y+tId_y] = value;
}''',"test")
x = cp.random.rand(1,8,4096,dtype = cp.float32)
y = cp.random.rand(1,8,4096,dtype = cp.float32)
z = cp.zeros((4096,4096), dtype = cp.complex64)
t1 = time.time()
add_kernel((128,128),(32,32),(x,y,z))
print(time.time()-t1)
What is the proper way to assign a complex number in the RawKernel?
Thank you for answering this question!
#plaeonix, thank you very much for your hint. I find out the answer.
This line:
complex<float>* value = complex(x[tId_x],y[tId_y])
should be replaced to:
complex<float> value = complex<float>(x[tId_x],y[tId_y])
Then the assignment of a complex number works.
Related
I'm creating a PyTorch C++ extension and after much research I can't figure out how to index a tensor and update its values. I found out how to iterate over a tensor's entries using the data_ptr() method, but that's not applicable to my use case.
Given is a matrix M, a list of lists (blocks) of index pairs P and a function f: dtype(M)^2 -> dtype(M)^2 that takes two values and spits out two new values.
I'm trying to implement the following pseudo code:
for each block B in P:
for each row R in M:
for each index-pair (i,j) in B:
M[R,i], M[R,j] = f(M[R,i], M[R,j])
After all, this code is going to run on the GPU using CUDA, but since I don't have any experience with that, I wanted to first write a pure C++ program and then convert it.
Can anyone suggest how to do this or how to convert the algorithm to do something equivalent?
What I wanted to do can be done using the
tensor.accessor<scalar_dtype, num_dimensions>()
method. If executing on the GPU instead use scalars.packed_accessor64<scalar_dtype, num_dimensions, torch::RestrictPtrTraits>()
or
scalars.packed_accessor32<scalar_dtype, num_dimensions, torch::RestrictPtrTraits>() (depending on the size of your tensor).
auto num_rows = scalars.size(0);
matrix = torch::rand({10, 8});
auto a = matrix.accessor<float, 2>();
for (auto i = 0; i < num_rows; ++i) {
auto x = a[i][some_index];
auto new_x = some_function(x);
a[i][some_index] = new_x;
}
I was trying to solve certain set of constraints using z3 in python. My code:
import math
from z3 import *
### declaration
n_co2 = []
c_co2 = []
alpha = []
beta = []
m_dot_air = []
n_pir = []
pir_sensor = []
for i in range(2):
c_co2.append(Real('c_co2_'+str(i)))
n_pir.append(Real('n_pir_'+str(i)))
n_co2.append(Real('n_co2_'+str(0)))
alpha.append(Real('alpha_'+str(0)))
beta.append(Real('beta_'+str(0)))
m_dot_air.append(Real('m_dot_air_'+str(0)))
pir_sensor.append(Real('pir_sensor_'+str(0)))
s = Solver()
s.add(n_co2[0]>0)
s.add(c_co2[0]>0)
s.add(c_co2[1]>=0.95*c_co2[0])
s.add(c_co2[1]<=1.05*c_co2[0])
s.add(n_co2[0]>=0.95*n_pir[1])
s.add(n_co2[0]<=1.05*n_pir[1])
s.add(c_co2[1]>0)
s.add(alpha[0]<=-1)
s.add(beta[0]>0)
s.add(m_dot_air[0]>0)
s.add(alpha[0]==-1*(1+ m_dot_air[0] + (m_dot_air[0]**2)/2.0 + (m_dot_air[0]**3)/6.0 ))
s.add(beta[0]== (1-alpha[0])/m_dot_air[0])
s.add(n_co2[0]== (c_co2[1]-alpha[0]*c_co2[0])/(beta[0]*19.6)-(m_dot_air[0]*339)/19.6)
s.add(n_pir[1]>=0)
s.add(pir_sensor[0]>=-1)
s.add(pir_sensor[0]<=1)
s.add(Not(pir_sensor[0]==0))
s.add(n_pir[1]==(n_pir[0]+pir_sensor[0]))
#### testing
s.add(pir_sensor[0]==1)
s.add(n_pir[1]==1)
s.add(n_co2[0]==1)
print(s.check())
print(s.reason_unknown())
print(s.model())
The output of the code:
sat
[c_co2_0 = 355,
c_co2_1 = 1841/5,
m_dot_air_0 = 1,
n_co2_0 = 1,
n_pir_1 = 1,
pir_sensor_0 = 1,
n_pir_0 = 0,
beta_0 = 11/3,
alpha_0 = -8/3,
/0 = [(19723/15, 1078/15) -> 1793/98,
(11/3, 1) -> 11/3,
else -> 0]]
What is the significance "/0 = ..." part of the output model.
But when I change the type of n_pir from Real to Int, z3 cannot solve it. Although we saw that we have an Int solution for n_pir. Reason of unknown:
smt tactic failed to show goal to be sat/unsat (incomplete (theory arithmetic))
How this problem can be solved? Could anyone please provide reasoning about this problem?
For the "/0" part: It's an internally generated constraint from converting real to int solutions. You can totally ignore that. In fact, you shouldn't really even look at the value of that, it's an artifact of the z3py bindings and should probably be hidden from the user.
For your question regarding why you cannot make 'Real' to 'Int'. That's because you have a non-linear set of equations (where you multiply or divide two variables), and non-linear integer arithmetic is undecidable in general. (Whereas non-linear real arithmetic is decidable.) So, when you use 'Int', solver simply uses some heuristics, and in this case fails and says unknown. This is totally expected. Read this answer for more details: How does Z3 handle non-linear integer arithmetic?
Z3 does come with an NRA solver, you can give that a try. Declare your solver as:
s = SolverFor("NRA")
But again you're at the mercy of the heuristics and you may or may not get a solution. Also, watch out for z3py bindings coercing constants to when you mix and match arithmetic like that. A good way is to write:
print s.sexpr()
before you call s.check() and take a look at the output and convince yourself that the translation has been done correctly. For details on that, see this question: Python and Z3: integers and floating, how to manage them in the correct way?
I am little confused by R syntax formula
I created the following python function with Rpy2:
objects.r('''
project_var <- function(grid,points) {
coordinates(points) = ~X + Y
gridded(grid) = ~X+Y
grid = idw(Z~1, points,grid)
grid <- as.data.frame(grid)
return(grid)
}
''')
Then I import it
project_var = robjects.globalenv['project_var']
Then I call it:
test = project_var(model,points_top)
And it works as expected!
I would like to'Z' to be set by an argument of my function, something like this:
project_var <- function(grid,points,feature_name) {
...
grid = idw(feature_name~1, points,grid)
My Problem :
idw(feature_name~1, points,grid)
I do not really understand this line and what is really feature name (because it is not a string nor known variable at this point, but the name of a column as a formula).
for info idw comes from gstat library... and I do not know R...
here is the doc:
idw.locations(formula, locations, data, newdata, nmax = Inf, nmin = 0,
omax = 0, maxdist = Inf, block, na.action = na.pass, idp = 2.0,
debug.level = 1)
https://cran.r-project.org/web/packages/gstat/gstat.pdf
So what should I put for feature_name in the python side ? or how to build it in R so it would transform the string feature_name into something that would work ?
Any help would be appreciate.
Thank you for reading so far.
I do not really understand this line and what is really feature name (because it is not a string nor known variable at this point, but the name of a column).
R differs from Python as expressions in a function call (here idw(Z~1, points,grid)) will only be evaluated within the function, and the unevaluated expression itself is available to the code in the body of the function.
In addition to that, Z~1 is itself a special thing: it is an R formula. You could write fml <- Z ~ 1 in R and the object fml will be a "formula". The constructor for the formula is somewhat hidden as <something> ~ <something> is considered a language construct in R, but in fact you have something like build_formula(<left_side_expression>, <right_side_expression>). You can try in R fml <- get("~")(Z, 1) and see that this is exactly that happening.
okay, just need to use as.formula to convert a string to a formula :-)
idw(as.formula(feature_name), points,grid)
I'm starting with numba and my first goal is to try and accelerate a not so complicated function with a nested loop.
Given the following class:
class TestA:
def __init__(self, a, b):
self.a = a
self.b = b
def get_mult(self):
return self.a * self.b
and a numpy ndarray that contains class TestA objects. Dimension (N,) where N is usually ~3 million in length.
Now given the following function:
def test_no_jit(custom_class_obj_container):
container_length = len(custom_class_obj_container)
sum = 0
for i in range(container_length):
for j in range(i + 1, container_length):
obj_i = custom_class_obj_container[i]
obj_j = custom_class_obj_container[j]
sum += (obj_i.get_mult() + obj_j.get_mult())
return sum
I've tried to play around numba to get it to work with the function above however I cannot seem to get it to work with nopython=True flag, and if it's set to false, then the runtime is higher than the no-jit function.
Here is my latest try in trying to jit the function (also using nb.prange):
#nb.jit(nopython=False, parallel=True)
def test_jit(custom_class_obj_container):
container_length = len(custom_class_obj_container)
sum = 0
for i in nb.prange(container_length):
for j in nb.prange(i + 1, container_length):
obj_i = custom_class_obj_container[i]
obj_j = custom_class_obj_container[j]
sum += (obj_i.get_mult() + obj_j.get_mult())
return sum
I've tried to search around but I cannot seem to find a tutorial of how to define a custom class in the signature, and how would I go in order to accelerate a function of that sort and get it to run on GPU and possibly (any info regarding that matter would be highly appreciated) to get it to run with cuda libraries - which are installed and ready to use (previously used with tensorflow)
The numba docs give an example of creating a custom type, even for nopython mode: https://numba.pydata.org/numba-doc/latest/extending/interval-example.html
In your case though, unless this is a really slimmed down version of what you actually want to do, it seems like the easiest approach would be to re-use existing types. Additionally, the construction of a 3M length object array is going to be slow, and produce fragmented memory (as the objects are not being stored in contiguous blocks).
An example of how using record arrays might be used to solve the problem:
x_dt = np.dtype([('a', np.float64),
('b', np.float64)])
n = 30000
buf = np.arange(n*2).reshape((n, 2)).astype(np.float64)
vec3 = np.recarray(n, dtype=x_dt, buf=buf)
#numba.njit
def mult(a):
return a.a * a.b
#numba.jit(nopython=True, parallel=True)
def sum_of_prod(vector):
sum = 0
vector_len = len(vector)
for i in numba.prange(vector_len):
for j in numba.prange(i + 1, vector_len):
sum += mult(vector[i]) + mult(vector[j])
return sum
sum_of_prod(vec3)
FWIW, I'm no numba expert. I found this question when searching for how to implement a custom type in numba for non-numerical stuff. In your case, because this is highly numerical, I think a custom type is probably overkill.
I am rather new to python programming so please be a big simple with your answer.
I have a .raw file which is 2b/2b complex short int format. Its actually a 2-D raster file. I want to read and seperate both real and complex parts. Lets say the raster is [MxN] size.
Please let me know if question is not clear.
Cheers
N
You could do it with the struct module. Here's a simple example based on the file formatting information you mentioned in a comment:
import struct
def read_complex_array(filename, M, N):
row_fmt = '={}h'.format(N) # "=" prefix means integers in native byte-order
row_len = struct.calcsize(row_fmt)
result = []
with open(filename, "rb" ) as input:
for col in xrange(M):
reals = struct.unpack(row_fmt, input.read(row_len))
imags = struct.unpack(row_fmt, input.read(row_len))
cmplx = [complex(r,i) for r,i in zip(reals, imags)]
result.append(cmplx)
return result
This will return a list of complex-number lists, as can be seen in this output from a trivial test I ran:
[
[ 0.0+ 1.0j 1.0+ 2.0j 2.0+ 3.0j 3.0+ 4.0j],
[256.0+257.0j 257.0+258.0j 258.0+259.0j 259.0+260.0j],
[512.0+513.0j 513.0+514.0j 514.0+515.0j 515.0+516.0j]
]
Both the real and imaginary parts of complex numbers in Python are usually represented as a pair of machine-level double precision floating point numbers.
You could also use the array module. Here's the same thing using it:
import array
def read_complex_array2(filename, M, N):
result = []
with open(filename, "rb" ) as input:
for col in xrange(M):
reals = array.array('h')
reals.fromfile(input, N)
# reals.byteswap() # if necessary
imags = array.array('h')
imags.fromfile(input, N)
# imags.byteswap() # if necessary
cmplx = [complex(r,i) for r,i in zip(reals, imags)]
result.append(cmplx)
return result
As you can see, they're very similar, so it's not clear there's a big advantage to using one over the other. I suspect the array based version might be faster, but that would have to be determined by actually timing it with some real data to be able to say with any certainty.
Take a look at Hachoir library. It's designed for this purposes, and does it's work really good.