In SciPy, fancy indexing for csr_matrices

In SciPy, fancy indexing for csr_matrices - python

I am new to Python, so forgive me ahead of time if this is an elementary question, but I have searched around and have not found a satisfying answer.
I am trying to do the following using NumPy and SciPy:
I,J = x[:,0], x[:1] # x is a two column array of (r,c) pairs
V = ones(len(I))
G = sparse.coo_matrix((V,(I,J))) # G's dimensions are 1032570x1032570
G = G + transpose(G)
r,c = G.nonzero()
G[r,c] = 1
...
NotImplementedError: Fancy indexing in assignment not supported for csr matrices
Pretty much, I want all the nonzero values to equal 1 after adding the transpose, but I get the fancy indexing error messages.
Alternatively, if I could show that the matrix G is symmetric, adding the transpose would not be necessary.
Any insight into either approach would be very much appreciated.

In addition to doing something like G = G / G, you can operate on G.data.
So, in your case, doing either:
G.data = np.ones(G.nnz)
or
G.data[G.data != 0] = 1
Will do what you want. This is more flexible, as it allows you to preform other types of filters (e.g. G.data[G.data > 0.9] = 1 or G.data = np.random.random(G.nnz))
The second option will only set the values to one if they have a nonzero value. During some calculations, you'll wind up with zero values that are "dense" (i.e. they're actually stored as a value in the sparse array). (You can remove these in-place with G.eliminate_zeros())

Related

How to search for the position of specific XY pairs in a 2 dimensional numpy array?

I have an image stored as 3 Numpy arrays:
# Int arrays of coordinates
# Not continuous, some points are omitted
X_image = np.array([1,2,3,4,5,6,7,9])
Y_image = np.array([9,8,7,6,5,4,3,1])
# Float array of RGB values.
# Same index
rgb = np.array([
[0.5543,0.2665,0.5589],
[0.5544,0.1665,0.5589],
[0.2241,0.6645,0.5249],
[0.2242,0.6445,0.2239],
[0.2877,0.6425,0.5829],
[0.5543,0.3165,0.2839],
[0.3224,0.4635,0.5879],
[0.5534,0.6693,0.5889],
])
The RGB information is not convertible to int. So it has to stay floats
I have another array that defines the position of an area of some pixels in the image:
X_area = np.array([3,4,6])
Y_area = np.array([7,6,4])
I need to find the RGB information for these pixels, using the first 4 arrays as a reference.
My idea was to search for the index of these area points in the full image and then use this index to find back the RGB information.
index = search_for_index_of_array_1_in_array_2((X_area,Y_area),(X_image,Y_image))
# index shall be [3,4,6]
rgb_area = rgb[index]
The search_for_index_of_array_1_in_array_2 can be implemented with a for loop. I tried it, this is too slow. I actually have millions of points.
I know that it is probably more of a use case for Julia than Python, as we deal with low-level data manipulation with a performance need, but I'm obliged to use Python. So, the only performance trick I see is to use a vectorized solution with NumPy.
I'm not used to manipulating NumPy. I tried numpy.where.
index = np.where(X_area in X_image and Y_area in Y_image )
index
Gives :
<ipython-input-18-0e434ab7a291>:1: DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
index = np.where(X_area in X_image and Y_area in Y_image )
(array([], dtype=int64),)
It shall be empty as we have 3 compliant points.
I also tested, with the same result:
XY_image = np.vstack((X_image,Y_image))
XY_area = np.vstack((X_area,Y_area))
index = np.where(XY_area == XY_image)
and even:
np.extract(XY_image == XY_area, XY_image)
If I get it, the issue is that the arrays do not have the same length. But this is what I have.
Do you have an idea of how to proceed?
Thanks
Edit: here is a loop that works but... is not fast:
indexes = []
for i in range(XY_area.shape[1]):
XY_area_b = np.broadcast_to(XY_area[:,i],(9,2)).transpose()
where_in_image = np.where(XY_area_b == XY_image)
index_in_image = where_in_image[1][1]
indexes.append(index_in_image)
indexes

The classical method to solve this problem is generally to use a hashmap. However, Numpy do not provide such a data structure. That being said, an alternative (generally slower) solution is to sort the values and then perform a binary search. Hopefully, Numpy provide useful functions to do that. This solution run in O(n log(m)) (with n the number of value to search and m the number of value searched) should be much faster than a linear search running in O(n m) time. Here is an example:
# Format the inputs
valType = X_image.dtype
assert Y_image.dtype == valType and X_area.dtype == valType and X_image.dtype == valType
pointType = [('x', valType),('y', valType)]
XY_image = np.ravel(np.column_stack((X_image, Y_image))).view(pointType)
XY_area = np.ravel(np.column_stack((X_area, Y_area))).view(pointType)
# Build an index to sort XY_image and then generate the sorted points
sortingIndex = np.argsort(XY_image)
sorted_XY_image = XY_image[sortingIndex]
# Search each value of XY_area in XY_image and find the location in the unsorted array
tmp = np.searchsorted(XY_image, XY_area)
index = sortingIndex[tmp]
rgb_area = rgb[index]

Thanks to Jérôme's answer, I understand better the value of using a hashmap:
def hashmap(X,Y):
return X + 10000*Y
h_area = hashmap(X_area,Y_area)
h_image = hashmap(X_image,Y_image)
np.where(np.isin(h_image,h_area))
This hashmap is a bit brutal, but it actually returns the indexes:
(array([2, 3, 5], dtype=int64),)

Octave/Matlab version of this Python for-loop

I just want to know if there is any Octave/Matlab equivalent syntax for this particular for-loop in python:
for (i,j) in [(1,2),(2,3),(3,4),(4,5),(5,6),(6,7)]:
a[i,j] = 1
I need it to ease out my Image processing assignments where I can easily construct Image matrix without having to enter each pixel value for almost each element of the Image matrix. So, if there are any other ways of implementing the above functionality in Octave/Matlab, then please let me know.
Thanks.

In Octave ,I guess also in MATLAB, you can do:
for ij = [{1;2} {2;3} {3;4} {4;5} {5;6} {6;7}]
a(ij{:}) = 1;
end
But in general In MATLAB and Python it is better to prevent loops. There are much efficient indexing methods both in Python and MATLAB.

If you want to set a series of pixels in a, given by coordinates, to the same value, you can do as follows:
coord = [1,2; 2,3; 3,4; 4,5; 5,6; 6,7];
ind = sub2ind(size(a), coord(:,1), coord(: 2));
a(ind) = 1;
You can replace that last 1 with a vector with as many elements as coordinates in coord to assign a different value to each pixel.
Note that MATLAB indexes rows with the first index, so the first column of coord corresponds to the y coordinate.

The simplest here would be:
for i = 1 : 6
a(i, i+1) = 1; % Alternatively: j=i+1; a(i,j)=1;
end
The more flexible alternative is to construct the pairs:
vals = [1,2; … ; 6,7]; % Your i,j pairs. Possibly even put 3 numbers there, i,j,value.
for i = 1 : size(vals, 1)
a(vals(i,1), vals(i,2)) = 1;
end

How to convert row list to column list

How do you convert [1, 2, 3] to [[1],[2],[3]] in python?
Also, say I have a vector of length m with values ranging from 1 to 10, I want to create a matrix of size mx10 such that say if vector y = 1 then the matrix should be [0,1,0,0,0,0,0,0,0,0]. In octave it was possible with,
y_train = zeros(m,output_layer_size);
for i=1:output_layer_size
y_train(find(y==i),i)=1;
end
But similar function gives out VisibleDeprecationWarning warning in python and does give desired output
y_train = np.zeros((y.shape[0],10))
for i in range(10):
y_train[y==i][i]=1

Adding a dimenstion to a vector in numpy is easy. You have a number of options available, depending on what you want to do:
Use np.newaxis, which is often aliased by None, in your index:
v = v[:, None]
OR
v = [None, :]
Using newaxis allows you to control precisely whether the vector becomes a column or a row.
Reshape the vector:
v = v.reshape((1, -1))
OR
v = np.reshape(v, (-1, 1))
I have really shown four options here (np.reshape vs np.ndarray.reshape and row vs column). Using -1 in the new vector's dimensions means "whatever size is necessary to make it the same number of elements as the original". It is much easier than explicitly using the shape.
Use np.expand_dims, which is almost exactly equivalent to np.newaxis, but in functional form.
Construct a new array with ndmin=2:
v = np.array(v, copy=False, ndmin=2)
This method is the least flexible because it does not let you control the position of the new axis. It is usually used when the only thing that matters is the dimensionality and broadcasting takes care of the rest.
The second part of the question appears to be a simple use-case for fancy indexing in Python. Here is as IDEOne link where I unrolled your octave loop. You can rephrase it in Python as:
y_train = np.zeros((y.size, m_output));
y_train[np.arange(y.size), y] = 1
Here is an IDEOne link of the demo.

Transposing 1D array directly will not work. It will return the original array. Try this instead:
np.atleast_2d(x).T

The ones from the comment did not work for me but numpy.where() worked!
b=np.array([[0],[0],[2],[2],[4],[1],[6],[7],[5],[9]])
a=np.random.randint(10,size=(10,10))
for i in range(10):
c=np.zeros((1,10))
c[0][i]=1
a[np.where(b==i)[0]] = c
print a

MATLAB "any" conditional deletion translation to Python

I'm having trouble understanding what B = A(~any(A < threshold, 2), :); (in MATLAB) does given array A with dimensions N x 3.
Ultimately, I am trying to implement a function do perform the same operation in Python (so far, I have something like B = A[not any(A[:,1] < threshold), :], which I know to be incorrect), and I was wondering what the numpy equivalent to such an operation would be.
Thank you!

Not much of difference really. In MATLAB, you are performing ANY along the rows with any(...,2). In NumPy, you have axis to denote those dimensions and for a 2D array, it would be np.any(...,axis=1).
Thus, the NumPy equivalent implementation would be -
import numpy as np
B = A[~np.any(A < threshold,axis=1),:]
This indexing is also termed as slicing in NumPy terminology. Since, we are slicing along the first axis, we can drop the all-elements-selection along the rest of the axes. So, it would simplify to -
B = A[~np.any(A < threshold,axis=1)]
Finally, we can use the method ndarray.any and skip the mention of axis parameter to shorten the code further, like so -
B = A[~(A < threshold).any(1)]

Sum ndarray values

Is there an easier way to get the sum of all values (assuming they are all numbers) in an ndarray :
import numpy as np
m = np.array([[1,2],[3,4]])
result = 0
(dim0,dim1) = m.shape
for i in range(dim0):
for j in range(dim1):
result += m[i,j]
print result
The above code seems somewhat verbose for a straightforward mathematical operation.
Thanks!

Just use numpy.sum():
result = np.sum(matrix)
or equivalently, the .sum() method of the array:
result = matrix.sum()
By default this sums over all elements in the array - if you want to sum over a particular axis, you should pass the axis argument as well, e.g. matrix.sum(0) to sum over the first axis.
As a side note your "matrix" is actually a numpy.ndarray, not a numpy.matrix - they are different classes that behave slightly differently, so it's best to avoid confusing the two.

Yes, just use the sum method:
result = m.sum()
For example,
In [17]: m = np.array([[1,2],[3,4]])
In [18]: m.sum()
Out[18]: 10
By the way, NumPy has a matrix class which is different than "regular" numpy arrays. So calling a regular ndarray matrix causes some cognitive dissonance. To help others understand your code, you may want to change the name matrix to something else.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

In SciPy, fancy indexing for csr_matrices - python

Related

How to search for the position of specific XY pairs in a 2 dimensional numpy array?

Octave/Matlab version of this Python for-loop

How to convert row list to column list

MATLAB "any" conditional deletion translation to Python

Sum ndarray values

Categories

Resources