I am having trouble converting some MATLAB code into python. I am trying to build a signal by adding in shifted copies of base signal into a much longer one. The code that works in MATLAB is
function [time, signal] = generateRandomSignal(pulse,data,samples,Tb)
N = length(data);
time = linspace(0,N*Tb,samples*N);
signal = zeros(1,length(time));
k = 1;
for n = 1:N
window = k:k+samples-1;
signal(window) = signal(window) + data(n)*pulse;
k = k + samples;
end
In python using the variable to slice the larger array wasn't working so I changed that but now I got what I think should work but I keep getting errors about inconsistent array sizes even though when I inspect the sizes in a debugger it looks like it should work.
from numpy import *
def generateRandomSignal(pulse,data,samples,Tb):
N = data.size;
time = linspace(0,N*Tb,samples*N);
signal = zeros((1,time.size));
k = 0;
for n in range(0,N):
signal[k:k+samples] = signal[k:k+samples].copy() + data[n]*pulse[:].copy();
k = k + samples;
return time, signal
What is the correct way to do this in Python?
EDIT: Minimal expected input and output
Input
data = [1, -1, 0, 1, 1]
pulse = [1, 1, 1]
samples = 3. #length of pulse
Tb = 0.1
Output
signal = [1, 1, 1, -1, -1, -1, 0, 0, 0, 1, 1, 1, 1, 1, 1]
time = vector of 15 points evenly spaced from 0 to 0.3. (Not the problem)
EDIT2 Error
ValueError: operands could not be broadcast together with shapes (1920,) (1,4410)
That is the actual error produced. (1,4410) is the correct shape for the pulse array but I have no idea where the 1920 is coming from or what the empty comma means
Change your definition of signal to signal = zeros(time.size). Unlike Matlab, NumPy's 1D arrays have shape (N,), not (N,1).
I can't see why you should have 0 index in signal:
signal[0,k:k+samples] = signal[0,k:k+samples].copy() + data[n]*pulse[:].copy();
Related
I want to generate a sparse numpy ndarray by using the row vector, column vector, and value vector of each element.
For example, if I have
row_index=np.array([0,1,2])
column_index=np.array([2,1,0])
value=np.array([4,5,6])
Then I want a matrix
[0,0,4
0,5,0
6,0,0]
Is there a function in numpy that can do the similar thing like scipy.sparse by using scipy.sparse.csc_matrix((data, (row_ind, col_ind)), [shape=(M, N)])? If not, is there a way to generate the matrix without for loops? I want to speed up the code but
scipy.sparse is quite slow during the calculation, and the matrix I want is not so large.
If the matrix you want is not very large, it might be faster to just create a regular (non-sparse) ndarray. For example, you can use the following code to generate a dense matrix using only numpy:
row_index = np.array([0, 1, 2])
column_index = np.array([2, 1, 0])
values = np.array([4, 5, 6])
# numpy dense
M = np.zeros((np.max(row_index) + 1, np.max(column_index) + 1))
M[row_index, column_index] = values
On my machine, creating the matrix (the last two lines) take approximately 6.3 μs to run. I compared it to the following code which uses scipy.sparse:
# scipy sparse
M = scipy.sparse.csc_matrix((values, (row_index, column_index)),
shape=(np.max(row_index) + 1, np.max(column_index) + 1))
This takes approximately 80 μs to run. Because you asked for a method to create a sparse array, I changed the first implementation to the following code, so that the created ndarray is converted into a sparse array:
# numpy sparse
M = np.zeros((np.max(row_index) + 1, np.max(column_index) + 1))
M[row_index, column_index] = values
M = scipy.sparse.csc_matrix(M)
This takes approximately 82 μs to run. The bottleneck in this code is clearly the operation of creating a sparse matrix.
Note that the scipy.sparse method scales very well as function of matrix size, and eventually becomes the fastest for larger matrices (on my machine, starting from approximately 360×360). See the figure below for an indication of the speed of each method as function of matrix size, from a 10×10 matrix up to a 1000×1000 matrix. Some outliers in the figure are most likely due to other programs on my machine interfering. Furthermore, I am not sure of the technical details behind the 'jumps' in the numpy dense method at ~360×360 and ~510×510. I have also added the code I used to run this comparison so that you can run it on your own machine.
import timeit
import matplotlib.pyplot as plt
import numpy as np
import scipy.sparse
def generate_indices(num_values):
row_index = np.arange(num_values)
column_index = np.arange(num_values)[::-1]
values = np.arange(num_values)
return row_index, column_index, values
def numpy_dense(N, row_index, column_index, values):
start = timeit.default_timer()
for _ in range(N):
M = np.zeros((np.max(row_index) + 1, np.max(column_index) + 1))
M[row_index, column_index] = values
end = timeit.default_timer()
return (end - start) / N
def numpy_sparse(N, row_index, column_index, values):
start = timeit.default_timer()
for _ in range(N):
M = np.zeros((np.max(row_index) + 1, np.max(column_index) + 1))
M[row_index, column_index] = values
M = scipy.sparse.csc_matrix(M)
end = timeit.default_timer()
return (end - start) / N
def scipy_sparse(N, row_index, column_index, values):
start = timeit.default_timer()
for _ in range(N):
M = scipy.sparse.csc_matrix((values, (row_index, column_index)),
shape=(np.max(row_index) + 1, np.max(column_index) + 1))
end = timeit.default_timer()
return (end - start) / N
ns = np.arange(10, 1001, 10) # matrix size to try with
runtimes_numpy_dense, runtimes_numpy_sparse, runtimes_scipy_sparse = [], [], []
for n in ns:
print(n)
indices = generate_indices(n)
# number of iterations for timing
# ideally, you want this to be as high as possible,
# but I didn't want to wait very long for this plot
N = 1000 if n < 500 else 100
runtimes_numpy_dense.append(numpy_dense(N, *indices))
runtimes_numpy_sparse.append(numpy_sparse(N, *indices))
runtimes_scipy_sparse.append(scipy_sparse(N, *indices))
fig, ax = plt.subplots()
ax.plot(ns, runtimes_numpy_dense, 'x-', markersize=4, label='numpy dense')
ax.plot(ns, runtimes_numpy_sparse, 'x-', markersize=4, label='numpy sparse')
ax.plot(ns, runtimes_scipy_sparse, 'x-', markersize=4, label='scipy sparse')
ax.set_yscale('log')
ax.set_xlabel('Matrix size')
ax.set_ylabel('Runtime (s)')
ax.legend()
plt.show()
You can create your (sparse) array in coordinate format, where you pass:
values to be put at specified coordinates,
row coordinates,
column coordinates.
The code to do it can be:
import scipy.sparse as ss
arr = ss.coo_matrix((value, (row_index - 1, column_index - 1)))
Note that you created row_index and column_index with 1-based
indexing in mind, whereas array indices are actually 0-based,
so in the code above I subtracted 1 from your both coordinate arrays.
When you print arr.toarray(), you will get just what you wanted:
array([[0, 0, 4],
[0, 5, 0],
[6, 0, 0]])
I have written the following code to convert a matrix into a stochastic and irreducible matrix. I have followed a paper (Deeper Inside PageRank) to write this code. This code works well for the square matrix but giving an error for rectangular matrices. How can I modify it to convert rectangular matrices into stochastic and irreducible matrices?
My Code:
import numpy as np
P = np.array([[0, 1/2, 1/2, 0, 0, 0], [0, 0, 0, 0, 0, 0], [1/3, 1/3, 0, 0, 1/3, 0], [0, 0, 0, 0, 1/2, 1/2], [0, 0, 0, 1/2, 0, 1/2]])
#P is the original matrix containing 0 rows
col_len = len(P[0])
row_len = len(P)
eT = np.ones(shape=(1, col_len)) # Row vector of ones to replace row of zeros
e = eT.transpose() # it is a column vector e
eT_n = np.array(eT / col_len) # obtained by dividing row vector of ones by order of matrix
Rsum = 0
for i in range(row_len):
for j in range(col_len):
Rsum = Rsum + P[i][j]
if Rsum == 0:
P[i] = eT_n
Rsum = 0
P_bar = P.astype(float) #P_bar is stochastic matrix obtained by replacing row of ones by eT_n in P
alpha = 0.85
P_dbar = alpha * P_bar + (1 - alpha) * e * (eT_n) #P_dbar is the irreducible matrix
print("The stocastic and irreducible matrix P_dbar is:\n", P_dbar)
Expected output:
A rectangular stochastic and irreducible matrix.
Actual output:
Traceback (most recent call last):
File "C:/Users/admin/PycharmProjects/Recommender/StochasticMatrix_11Aug19_BSK_v3.py", line 13, in <module>
P_dbar = alpha * P_bar + (1 - alpha) * e * (eT_n) #P_dbar is the irreducible matrix
ValueError: operands could not be broadcast together with shapes (5,6) (6,6)
You are trying to multiply two arrays of different shapes. That will not work, since one array has 30 elements, and the other has 36 elements.
You have to make sure the array e * eT_n has the same shape as your input array P.
You are not using the row_len value. But if e has the correct number of rows, your code will run.
# e = eT.transpose() # this will only work when the input array is square
e = np.ones(shape=(row_len, 1)) # this also works with a rectangular P
You can check that the shape is correct:
(e * eT_n).shape == P.shape
You should study the numpy documentation and tutorials to learn how to use the ndarray data structure. It's very powerful, but also quite different from the native python data types.
For example, you can replace this verbose and very slow nested python loop with a vectorized array operations.
Original code (with fixed indentation):
for i in range(row_len):
Rsum = 0
for j in range(col_len):
Rsum = Rsum + P[i][j]
if Rsum == 0:
P[i] = eT_n
Idiomatic numpy code:
P[P.sum(axis=1) == 0] = eT_n
Furthermore, you don't need to create the array eT_n. Since it's just a single value repeated, you can assign the scalar 1/6 directly instead.
# eT = np.ones(shape=(1, col_len))
# eT_n = np.array(eT / col_len)
P[P.sum(axis=1) == 0] = 1 / P.shape[1]
For the Code below, I am wondering how to make a circular kernel instead of a rectangular one. I am currently looking at something circular, and I want to find the BGR average values for it. By adjusting my kernel, my data will be more accurate.
for center in c_1:
b = img2[center[0]-4: center[0]+5, center[1]-4: center[1]+5, 0]
g = img2[center[0]-4: center[0]+5, center[1]-4: center[1]+5, 1]
r = img2[center[0]-4: center[0]+5, center[1]-4: center[1]+5, 2]
From: https://docs.opencv.org/3.0-beta/doc/py_tutorials/py_imgproc/py_morphological_ops/py_morphological_ops.html
We manually created a structuring elements in the previous examples with help of Numpy. It is rectangular shape. But in some cases, you may need elliptical/circular shaped kernels. So for this purpose, OpenCV has a function, cv2.getStructuringElement(). You just pass the shape and size of the kernel, you get the desired kernel.
# Elliptical Kernel
>>> cv2.getStructuringElement(cv2.MORPH_ELLIPSE,(5,5))
array([[0, 0, 1, 0, 0],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[1, 1, 1, 1, 1],
[0, 0, 1, 0, 0]], dtype=uint8)
Get the circle region when given the center, you could try the following function:
def circleAverage(center, r = 4):
"""
"""
for i in range(center[0]-r, center[0]+r):
for j in range(center[1]-r, center[1] + r):
if (center[0] - i) ** 2 + (center[1] - j) ** 2 <= r**2:
// do your computation here.
Hope this helps you.
Came here to find how to make a circular (symmetric) kernel. Ended up with my own implementation.
import numpy as np
def get_circular_kernel(diameter):
mid = (diameter - 1) / 2
distances = np.indices((diameter, diameter)) - np.array([mid, mid])[:, None, None]
kernel = ((np.linalg.norm(distances, axis=0) - mid) <= 0).astype(int)
return kernel
Note that for low diameters, behavior is perhaps unexpected. Variable mid when used for the second time can for example be replaced by diameter / 2.
I've implemented it in a following way:
r = 16
kernel = np.fromfunction(lambda x, y: ((x-r)**2 + (y-r)**2 <= r**2)*1, (2*r+1, 2*r+1), dtype=int).astype(np.uint8)
Extra type conversion is needed to avoid overflow
Let's say I have an 2D array of (N, N) shape:
import numpy as np
my_array = np.random.random((N, N))
Now I want to do some computations only on some "cells" of this array, for instance the ones inside the central part of the array. To avoid doing computations on cells I'm not interested in, what I usually do here is create a Boolean mask, in this spirit:
my_mask = np.zeros_like(my_array, bool)
my_mask[40:61,40:61] = True
my_array[my_mask] = some_twisted_computations(my_array[my_mask])
But what if some_twisted_computations() involves values of the neighboring cells if they are inside the mask? Performance-wise, would it be a good idea to create an "adjacency array" with a (len(my_mask), 4) shape, storing the index of 4-connected neighbor cells in the flat my_array[mask] array that I will use in some_twisted_computations()? If yes, what are the efficient options for computing such adjacency array? Should I switch to lower-level langage/other data structures?
My real-worlds arrays shapes are around (1000,1000,1000), the mask concerns only a small subset (~100000) of these values and is of rather complex geometry. I hope my questions make sense...
EDIT: the very dirty and slow solution I've worked out:
wall = mask
i = 0
top_neighbors = []
down_neighbors = []
left_neighbors = []
right_neighbors = []
indices = []
for index, val in np.ndenumerate(wall):
if not val:
continue
indices += [index]
if wall[index[0] + 1, index[1]]:
down_neighbors += [(index[0] + 1, index[1])]
else:
down_neighbors += [i]
if wall[index[0] - 1, index[1]]:
top_neighbors += [(index[0] - 1, index[1])]
else:
top_neighbors += [i]
if wall[index[0], index[1] - 1]:
left_neighbors += [(index[0], index[1] - 1)]
else:
left_neighbors += [i]
if wall[index[0], index[1] + 1]:
right_neighbors += [(index[0], index[1] + 1)]
else:
right_neighbors += [i]
i += 1
top_neighbors = [i if type(i) is int else indices.index(i) for i in top_neighbors]
down_neighbors = [i if type(i) is int else indices.index(i) for i in down_neighbors]
left_neighbors = [i if type(i) is int else indices.index(i) for i in left_neighbors]
right_neighbors = [i if type(i) is int else indices.index(i) for i in right_neighbors]
The best answer will probably depend on the nature of the computations you want to do. For example, if they can be expressed as summations over neighboring pixels, then something like np.convolve or scipy.signal.fftconvolve can be a really nice solution.
For your specific question of efficiently generating arrays of neighbor indices, you might try something like this:
x = np.random.rand(100, 100)
mask = x > 0.9
i, j = np.where(mask)
i_neighbors = i[:, np.newaxis] + [0, 0, -1, 1]
j_neighbors = j[:, np.newaxis] + [-1, 1, 0, 0]
# need to do something with the edge cases
# the best choice will depend on your application
# here we'll change out-of-bounds neighbors to the
# central point itself.
i_neighbors = np.clip(i_neighbors, 0, 99)
j_neighbors = np.clip(j_neighbors, 0, 99)
# compute some vectorized result over the neighbors
# as a concrete example, here we'll do a standard deviation
result = x[i_neighbors, j_neighbors].std(axis=1)
The result is an array of values corresponding to the masked region, containing the standard deviation of neighboring values.
Hopefully that approach will work for whatever specific problem you have in mind!
Edit: given the edited question above, here's how my response can be adapted to generate arrays of indices in a vectorized manner:
x = np.random.rand(100, 100)
mask = x > -0.9
i, j = np.where(mask)
i_neighbors = i[:, np.newaxis] + [0, 0, -1, 1]
j_neighbors = j[:, np.newaxis] + [-1, 1, 0, 0]
i_neighbors = np.clip(i_neighbors, 0, 99)
j_neighbors = np.clip(j_neighbors, 0, 99)
indices = np.zeros(x.shape, dtype=int)
indices[mask] = np.arange(len(i))
neighbor_in_mask = mask[i_neighbors, j_neighbors]
neighbors = np.where(neighbor_in_mask,
indices[i_neighbors, j_neighbors],
np.arange(len(i))[:, None])
left_indices, right_indices, top_indices, bottom_indices = neighbors.T
I'm trying to get this code to run as fast as possible and at the moment is very inefficient.
I have a 4D matrix of scalar data. The 4 dimensions correspond to latitude, longitude, altitude and time. The data is stored in a numpy array and its shape is (5,5,30,2).
In 4 different lists I am keeping the "map" for each axis, storing what value corresponds to each index. For example, the map arrays could look like:
mapLatitude = [45.,45.2,45.4,45.6,45.8]
mapLongitude = [-10.8,-10.6,-10.4,-10.2,-10.]
mapAltitude = [0,50,100,150,...,1450]
mapTime = [1345673,1345674]
This means that in the data matrix, the data point at location 0,1,3,0 corresponds to
Lat = 45, Lon = -10.6, Alt = 150, Time = 1345673.
Now, I need to generate a new array containing the coordinates of each point in my data matrix.
So far, this is what I've written:
import numpy as np
# data = np.array([<all data>])
coordinateMatrix = [
(mapLatitude[index[0]],
mapLongitude[index[1]],
mapAltitude[index[2]],
mapTime[index[3]] ) for index in numpy.ndindex(data.shape) ]
This works, but takes quite a long time, especially when the data matrix increases in size (I need to use this with matrices with a shape like (100,100,150,30) ).
If it helps, I need to generate this coordinateMatrix to feed it to scipy.interpolate.NearestNDInterpolator .
Any suggestions on how to speed this up?
Thank you very much!
If you turn your lists into ndarray's you can use broadcasting as follows:
coords = np.zeros((5, 5, 30, 2, 4))
coords[..., 0] = np.array(mapLatitude).reshape(5, 1, 1, 1)
coords[..., 1] = np.array(mapLongitude).reshape(1, 5, 1, 1)
coords[..., 2] = np.array(mapAltitude).reshape(1, 1, 30, 1)
coords[..., 3] = np.array(mapTime).reshape(1, 1, 1, 2)
For more general inputs something like this should work:
def makeCoordinateMatrix(*coords) :
dims = len(coords)
coords = [np.array(a) for a in coords]
shapes = tuple([len(a) for a in coords])
ret = np.zeros(shapes + (dims,))
for j, a in enumerate(coords) :
ret[..., j] = a.reshape((len(a),) + (1,) * (dims - j - 1))
return ret
coordinateMatrix = makeCoordinateMatrix(mapLatitude, mapLongitude,
mapAltitude, mapTime)