How do you convert [1, 2, 3] to [[1],[2],[3]] in python?
Also, say I have a vector of length m with values ranging from 1 to 10, I want to create a matrix of size mx10 such that say if vector y = 1 then the matrix should be [0,1,0,0,0,0,0,0,0,0]. In octave it was possible with,
y_train = zeros(m,output_layer_size);
for i=1:output_layer_size
y_train(find(y==i),i)=1;
end
But similar function gives out VisibleDeprecationWarning warning in python and does give desired output
y_train = np.zeros((y.shape[0],10))
for i in range(10):
y_train[y==i][i]=1
Adding a dimenstion to a vector in numpy is easy. You have a number of options available, depending on what you want to do:
Use np.newaxis, which is often aliased by None, in your index:
v = v[:, None]
OR
v = [None, :]
Using newaxis allows you to control precisely whether the vector becomes a column or a row.
Reshape the vector:
v = v.reshape((1, -1))
OR
v = np.reshape(v, (-1, 1))
I have really shown four options here (np.reshape vs np.ndarray.reshape and row vs column). Using -1 in the new vector's dimensions means "whatever size is necessary to make it the same number of elements as the original". It is much easier than explicitly using the shape.
Use np.expand_dims, which is almost exactly equivalent to np.newaxis, but in functional form.
Construct a new array with ndmin=2:
v = np.array(v, copy=False, ndmin=2)
This method is the least flexible because it does not let you control the position of the new axis. It is usually used when the only thing that matters is the dimensionality and broadcasting takes care of the rest.
The second part of the question appears to be a simple use-case for fancy indexing in Python. Here is as IDEOne link where I unrolled your octave loop. You can rephrase it in Python as:
y_train = np.zeros((y.size, m_output));
y_train[np.arange(y.size), y] = 1
Here is an IDEOne link of the demo.
Transposing 1D array directly will not work. It will return the original array. Try this instead:
np.atleast_2d(x).T
The ones from the comment did not work for me but numpy.where() worked!
b=np.array([[0],[0],[2],[2],[4],[1],[6],[7],[5],[9]])
a=np.random.randint(10,size=(10,10))
for i in range(10):
c=np.zeros((1,10))
c[0][i]=1
a[np.where(b==i)[0]] = c
print a
Related
I have two arrays I and X. I want to perform an operation which basically takes the indices from I and uses values from X. For example, I[0]=[0,1], I want to calculate X[0] and X[1] followed by X[0]-X[1] and append to a new array T. Similarly, for I[1]=[1,2], I want to calculate X[1] and X[2] followed by X[1]-X[2] and append to T. The expected output is presented.
import numpy as np
I=np.array([[0,1],[1,2]])
X=np.array([10,5,3])
The expected output is
T=array([[X[0]-X[1]],[X[1]-X[2]]])
The most basic approach is using nested indices together with the np.append() function.
It works like below:
T = np.append(X[I[0][0]] - X[I[0][1]], X[I[1][0]] - X[I[1][1]])
Where, X[I[0][0]] means to extract the value of I[0][0] and use that as the index we want for the array X.
You can also implement a loop to do that:
T = np.array([], dtype="int64")
for i in range(I.shape[0]):
for j in range(I.shape[1]-1):
T = np.append(T, X[I[i][j]] - X[I[i][j+1]])
If you find this answer helpful, please accept my answer. Thanks.
You can do this using integer array indexing. For large arrays, using for loops like in the currently accepted answer is going to be much slower than using vectorized operations.
import numpy as np
I = np.array([[0, 1], [1, 2]])
X = np.array([10, 5, 3])
T = X[I[:, 0:1]] - X[I[:, 1:2]]
I have an image stored as 3 Numpy arrays:
# Int arrays of coordinates
# Not continuous, some points are omitted
X_image = np.array([1,2,3,4,5,6,7,9])
Y_image = np.array([9,8,7,6,5,4,3,1])
# Float array of RGB values.
# Same index
rgb = np.array([
[0.5543,0.2665,0.5589],
[0.5544,0.1665,0.5589],
[0.2241,0.6645,0.5249],
[0.2242,0.6445,0.2239],
[0.2877,0.6425,0.5829],
[0.5543,0.3165,0.2839],
[0.3224,0.4635,0.5879],
[0.5534,0.6693,0.5889],
])
The RGB information is not convertible to int. So it has to stay floats
I have another array that defines the position of an area of some pixels in the image:
X_area = np.array([3,4,6])
Y_area = np.array([7,6,4])
I need to find the RGB information for these pixels, using the first 4 arrays as a reference.
My idea was to search for the index of these area points in the full image and then use this index to find back the RGB information.
index = search_for_index_of_array_1_in_array_2((X_area,Y_area),(X_image,Y_image))
# index shall be [3,4,6]
rgb_area = rgb[index]
The search_for_index_of_array_1_in_array_2 can be implemented with a for loop. I tried it, this is too slow. I actually have millions of points.
I know that it is probably more of a use case for Julia than Python, as we deal with low-level data manipulation with a performance need, but I'm obliged to use Python. So, the only performance trick I see is to use a vectorized solution with NumPy.
I'm not used to manipulating NumPy. I tried numpy.where.
index = np.where(X_area in X_image and Y_area in Y_image )
index
Gives :
<ipython-input-18-0e434ab7a291>:1: DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
index = np.where(X_area in X_image and Y_area in Y_image )
(array([], dtype=int64),)
It shall be empty as we have 3 compliant points.
I also tested, with the same result:
XY_image = np.vstack((X_image,Y_image))
XY_area = np.vstack((X_area,Y_area))
index = np.where(XY_area == XY_image)
and even:
np.extract(XY_image == XY_area, XY_image)
If I get it, the issue is that the arrays do not have the same length. But this is what I have.
Do you have an idea of how to proceed?
Thanks
Edit: here is a loop that works but... is not fast:
indexes = []
for i in range(XY_area.shape[1]):
XY_area_b = np.broadcast_to(XY_area[:,i],(9,2)).transpose()
where_in_image = np.where(XY_area_b == XY_image)
index_in_image = where_in_image[1][1]
indexes.append(index_in_image)
indexes
The classical method to solve this problem is generally to use a hashmap. However, Numpy do not provide such a data structure. That being said, an alternative (generally slower) solution is to sort the values and then perform a binary search. Hopefully, Numpy provide useful functions to do that. This solution run in O(n log(m)) (with n the number of value to search and m the number of value searched) should be much faster than a linear search running in O(n m) time. Here is an example:
# Format the inputs
valType = X_image.dtype
assert Y_image.dtype == valType and X_area.dtype == valType and X_image.dtype == valType
pointType = [('x', valType),('y', valType)]
XY_image = np.ravel(np.column_stack((X_image, Y_image))).view(pointType)
XY_area = np.ravel(np.column_stack((X_area, Y_area))).view(pointType)
# Build an index to sort XY_image and then generate the sorted points
sortingIndex = np.argsort(XY_image)
sorted_XY_image = XY_image[sortingIndex]
# Search each value of XY_area in XY_image and find the location in the unsorted array
tmp = np.searchsorted(XY_image, XY_area)
index = sortingIndex[tmp]
rgb_area = rgb[index]
Thanks to Jérôme's answer, I understand better the value of using a hashmap:
def hashmap(X,Y):
return X + 10000*Y
h_area = hashmap(X_area,Y_area)
h_image = hashmap(X_image,Y_image)
np.where(np.isin(h_image,h_area))
This hashmap is a bit brutal, but it actually returns the indexes:
(array([2, 3, 5], dtype=int64),)
I have an array:
X ndarray 180x360
The following does not work
X = numpy.append(X, X[:,0], 1)
because X[:,0] has the wrong dimensions.
Is not this weird?
This way around the problem seems a bit dirty:
X = numpy.append(X, numpy.array(X[:,0],ndmin=2).T, axis=1)
In MATLAB one could just write: X(:,361) = X(:,1) !!!
I came to realize that this works, too:
X = numpy.insert(X, 361, X[:,0], axis=1)
but why append does not work similarly?
Thank you serpents
The reason is that indexing with one integer removes that axis:
>>> X[:, 0].shape
(180,)
That's a one dimensional array, but if you index by giving a start and stop you keep the axis:
>>> X[:, 0:1].shape
(180, 1)
which could be correctly appended to your array:
>>> np.append(a, a[:, 0:1], 1)
array([....])
But all this aside if you find yourself appending and concatenating lots of arrays be warned: These are extremly inefficient. Most of the time it's better to find another way of doing this, for example creating a bigger array in the beginning and then just setting the rows/columns by slicing:
X = np.zeros((180, 361))
X[:, 360] = X[:, 0] # much more efficient than appending or inserting
You can create a new axis on X[:,0]:
np.append(X, X[:,0,None], axis=1)
I think the reason why you have to match array shapes is that numpy.append is implemented using concatenate.
A key difference is that in MATLAB everything has at least 2 dimensions.
>> size(x(:,1))
ans =
2 1
and as you note, it allows indexing 'beyond-the-end' - way beyond
>> x(:,10)=x(:,1)
x =
1 2 3 1 0 0 0 0 0 1
4 5 6 4 0 0 0 0 0 4
But in numpy indexing reduces the dimensions, without the 2d floor:
In [1675]: x = np.ones((3,4),int)
In [1676]: x.shape
Out[1676]: (3, 4)
In [1677]: x[:,0].shape
Out[1677]: (3,)
That means that if I want to replicate a column I need to make sure it is still a column in the concatenate. There are numerous ways of doing that.
x[:,0][:,None] - use of np.newaxis (alias None) is a nice general purpose method. x[:,[0]], x[:,0:1], x[:,0].reshape(-1,1) also have their place.
append is just concatenate that replaces the list of arguments with 2. It's a confusing imitation of the list append. It is written Python so you can read it (as experienced MATLAB coders do).
insert is a more complicated function (also in Python). Adding at the end it does something like:
In [1687]: x.shape
Out[1687]: (3, 4)
In [1688]: res=np.empty((3,5),int)
In [1689]: res[:,:4] = x
In [1690]: res[:,-1] = x[:,0]
That last assignment works because both sides have the same shape (technically they just have to be broadcastable shapes). So insert doesn't tell us anything about what should or should not work in more basic operations like concatenate.
I would like to use np.ravel to create a similar return structure as seen in the MATLAB code below:
[xi yi imv1] = find(squeeze(imagee(:,:,1))+0.1);
imv1 = imv1 - 0.1;
[xi yi imv2] = find(squeeze(imagee(:,:,2))+0.1);
imv2 = imv2 - 0.1;
where imagee is a matrix corresponding to values of a picture obtained from imread().
so, the(almost) corresponding Python translation is:
imv1=np.ravel(imagee**[:,:,0]**,order='F')
Where the bolded index splicing is clearly not the same as MATLAB. How do I specify the index values in Pythonic so that my return values will be the same as that found in the MATLAB portion? I believe this MATLAB code is written as "access all rows, columns, in the specified array of the third dimension." Therefore, how to specify this third parameter in Python?
To retrieve indexes, I usually use np.where. Here's an example: You have a 2 dimensional array
a = np.asarray([[0,1,2],[3,4,5]])
and want to get the indexes where the values are above a threshold, say 2. You can use np.where with the condition a>2
idxX, idxY = np.where(a>2)
which in turn you can use to address a
print a[idxX, idxY]
>>> [3 4 5]
However, the same effect can be achieved by indexing:
print a[a>2]
>>> [3 4 5]
This works on ravel'ed arrays as well as on three dimensional. Using 3D arrays with the first method however will require you to foresee more index arrays.
I am new to Python, so forgive me ahead of time if this is an elementary question, but I have searched around and have not found a satisfying answer.
I am trying to do the following using NumPy and SciPy:
I,J = x[:,0], x[:1] # x is a two column array of (r,c) pairs
V = ones(len(I))
G = sparse.coo_matrix((V,(I,J))) # G's dimensions are 1032570x1032570
G = G + transpose(G)
r,c = G.nonzero()
G[r,c] = 1
...
NotImplementedError: Fancy indexing in assignment not supported for csr matrices
Pretty much, I want all the nonzero values to equal 1 after adding the transpose, but I get the fancy indexing error messages.
Alternatively, if I could show that the matrix G is symmetric, adding the transpose would not be necessary.
Any insight into either approach would be very much appreciated.
In addition to doing something like G = G / G, you can operate on G.data.
So, in your case, doing either:
G.data = np.ones(G.nnz)
or
G.data[G.data != 0] = 1
Will do what you want. This is more flexible, as it allows you to preform other types of filters (e.g. G.data[G.data > 0.9] = 1 or G.data = np.random.random(G.nnz))
The second option will only set the values to one if they have a nonzero value. During some calculations, you'll wind up with zero values that are "dense" (i.e. they're actually stored as a value in the sparse array). (You can remove these in-place with G.eliminate_zeros())