I have a 2D numpy array:
arr = np.array(([[6,1,2],
[3,4,5],
[0,7,8]]))
I use a other 1D numpy array:
value = np.asarray([9,8,7,6,5,4,3,2,1])
I would like to change the values of my 2D array with the index value of my 1D array
For example:
In my 2D array at position (0,0), I have the value 6. I must therefore modify the value (0,0) by the value present at index 6 of my 1D array, therefore 3.
So far I have this code:
value = np.asarray([9,8,7,6,5,4,3,2,1])
arr = np.array(([[6,1,2],[3,4,5],[0,7,8]]))
print(arr)
#[[6 1 2]
#[3 4 5]
#[0 7 8]]
for i in range(arr.shape[0]):
for j in range(arr.shape[1]):
arr[i,j] = value[arr[i,j]]
print(arr)
#[[3 8 7]
#[6 5 4]
#[9 2 1]]
The problem is that this code takes time on large tables. (10 seconds for an array of size 4096²)
Is there an effective way to solve this problem?
This is very simple, you just need a single command. Numpy automatically takes care of the vectorization.
arr = value[arr]
Here is an example with the data you provided:
>>> value[arr]
array([[3, 8, 7],
[6, 5, 4],
[9, 2, 1]])
Related
What is the explanation of the following behavior:
import numpy as np
arr = np.zeros((3, 3))
li = [1,2]
print('output1:', arr[:, li].shape)
print('output2:', arr[:][li].shape)
>>output1: (3, 2)
>>output2: (2, 3)
I would expect output2 to be equal to output1.
Let's use a different array where it's easier to see the difference:
>>> arr = np.arange(9).reshape(3, 3)
>>> arr
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
The first case arr[:, li] will select all elements from the first dimension (in this case all the rows), then index the array with [1, 2], which means just leaving out the first column:
array([[1, 2],
[4, 5],
[7, 8]])
Hence, the shape of this is (3, 2).
The other case arr[:] will copy the original array, so it doesn't change the shape, therefore it's equvivalent to arr[li], hence the output shape is (2, 3). In general you should avoid double indexing an array, because that might create views twice, which is inefficient.
You are getting the the correct output.
In first line
print('output1:', arr[:, li].shape)
You are printing 2nd and 3rd element of each subarray within arr, thus getting 3 elements each containing 2 values.
In second line
print('output2:', arr[:][li].shape)
You are selecting first the whole array, then from the whole array you select 2nd and 3rd element (each containing 3 elements themselves), thus getting 2 elements each containing 3 values.
The difference can be seen if you examine this code -
import numpy as np
arr = np.arange(9).reshape(3, 3)
li = [1,2]
print('output1:', arr[:, li])
print('output2:', arr[:][li])
This gives -
[[1 2]
[4 5]
[7 8]]
and
[[3 4 5]
[6 7 8]]
When you do arr[:, [1, 2]], what you are saying that you want to take all the rows of the array (: specifies this) and, from that, take column [1, 2].
On the other hand, when you do arr[:], you are referring to the full array first. Out of which you are again taking the first two rows.
Essentially, in the second case, [1 2] is referring to the row axis of the original array while in the first case, it's referring to the column.
I have a Pandas Series containing 1D arrays/lists. I want to extract it to a 2D NumPy array.
s=pd.Series([[1,2,3,4],[5,6,7,8]])
With to_numpy() I get a 1D array looking like this
array([list([1, 2, 3, 4]), list([5, 6, 7, 8])], dtype=object)
However, I want something like array([[1,2,3,4],[5,6,7,8]]).
Convert first to lists and then to array:
arr = np.array(s.tolist())
print (arr)
[[1 2 3 4]
[5 6 7 8]]
I understand how
x=np.array([[1, 2], [3, 4], [5, 6]]
y = x[[0,1,2], [0,1,0]]
Output gives y= [1 4 5] This just takes the first list as rows and seconds list and columns.
But how does the the below work?
x = np.array([[ 0, 1, 2],[ 3, 4, 5],[ 6, 7, 8],[ 9, 10, 11]])
rows = np.array([[0,0],[3,3]])
cols = np.array([[0,2],[0,2]])
y = x[rows,cols]
This gives the output of :
[[ 0 2]
[ 9 11]]
Can you please explain the logic when using ndarrays as slicing object? Why does it have a 2d array for both rows and columns. How are the rules different when the slicing object is a ndarray as opposed to a python list?
We've the following array x
x = np.array([[1, 2], [3, 4], [5, 6]]
And the indices [0, 1, 2] and [0, 1, 0] which when indexed into x like
x[[0,1,2], [0,1,0]]
gives
[1, 4, 5]
The indices that we used basically translates to:
[0, 1, 2] & [0, 1, 0] --> [0,0], [1,1], [2,0]
Since we used 1D list as indices, we get 1D array as result.
With that knowledge, let's see the next case. Now, we've the array x as:
x = np.array([[ 0, 1, 2],[ 3, 4, 5],[ 6, 7, 8],[ 9, 10, 11]])
Now the indices are 2D arrays.
rows = np.array([[0,0],[3,3]])
cols = np.array([[0,2],[0,2]])
This when indexed into the array x like:
x[rows,cols]
simply translates to:
[[0,0],[3,3]]
| | | | ====> [[0,0]], [[0,2]], [[3,0]], [[3,2]]
[[0,2],[0,2]]
Now, it's easy to observe how these 4 list of list when indexed into the array x would give the following result (i.e. here it simply returns the corner elements from our array x):
[[ 0, 2]
[ 9, 11]]
Note that in this case we get the result as a 2D array (as opposed to 1D array in the first case) since our indices rows & columns itself were 2D arrays (i.e. equivalently list of list) whereas in the first case our indices were 1D arrays (or equivalently simple list without any nesting).
So, if you need 2D arrays as result, you need to give 2D arrays as indices.
The easiest way to wrap one's head around this is the following observation: The shape of the output is determined by the shape of the index array, or more precisely the shape resulting from broadcasting all the index arrays together.
Look at it like that: you have an array A of a given shape and another array V of some other shape and you want to fill A with values from V. What do you need to specify? Well, for each position in A you need to specify coordinates of some element in V. Therefore if V is ND you need N index arrays of the same shape as A or at least broadcastable to that. Then you index V by putting these index arrays at their coordinate positions in the [] expression.
To stay simple, we'll stay 2D and assume rows.shape = cols.shape. (You can break this rule with broadcasting, but for now we won't). We'll call this shape (I, J)
then y = x[rows, cols] is the same as:
y = np.empty((I, J))
for i in range(I):
for j in range(J):
y[i, j] = x[rows[i, j], cols[i, j]]
I want to split an 2D array this way:
Example.
From this 4x4 2D array:
np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]])
Create these four 2x2 2D arrays:
np.array([[1,2],[3,4]])
np.array([[5,6],[7,8]])
np.array([[9,10],[11,12]])
np.array([[13,14],[15,16]])
In a general case, from a NxN 2D array (square arrays) create 2D arrays of KxK shape, as many as possible.
Just to be more precise: to create the output array, not necessarily it will be made of all values from the row.
Example:
From a 2D 8x8 array, with values from 1 to 64, if I want to split this array in 2D 2x2 arrays, the first row from 8x8 array is a row from 1 to 8, and the first output 2D 2x2 array will be np.array([[1,2],[3,4]]), and the second output 2D 2x2 array will be np.array([[5,6],[7,8]])... It continues until the last output 2D array, that will be np.array([[61,62],[63,64]]). Look that each 2D 2x2 array was not filled with all the values from the row (CORRECT).
There is a Numpy method that do this?
You're probably looking for something like numpy.reshape.
In your example:
numpy.array([[1,2,3,4], [5,6,7,8]]).reshape(2,4)
>>>array([[1,2], [3,4], [5,6], [7,8]])
Or, as suggested by #MSeifert, using -1 as final dimension will let numpy do the division by itself:
numpy.array([[1,2,3,4], [5,6,7,8]]).reshape(2,-1)
>>>array([[1,2], [3,4], [5,6], [7,8]])
To get your desired output, you need to reshape to a 3D array and then unpack the first dimension:
>>> inp = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]])
>>> list(inp.reshape(-1, 2, 2))
[array([[1, 2],
[3, 4]]),
array([[5, 6],
[7, 8]]),
array([[ 9, 10],
[11, 12]]),
array([[13, 14],
[15, 16]])]
You can also unpack using = if you want to store the arrays in different variables instead of in one list of arrays:
>>> out1, out2, out3, out4 = inp.reshape(-1, 2, 2)
>>> out1
array([[1, 2],
[3, 4]])
If you're okay with a 3D array containing your 2D 2x2 arrays you don't need unpacking or the list() call:
>>> inp.reshape(-1, 2, 2)
array([[[ 1, 2],
[ 3, 4]],
[[ 5, 6],
[ 7, 8]],
[[ 9, 10],
[11, 12]],
[[13, 14],
[15, 16]]])
The -1 is a special value for reshape. As the documentation states:
One shape dimension can be -1. In this case, the value is inferred from the length of the array and remaining dimensions.
If you want it more general, just take the square root of the row-length and use that as argument for reshape:
>>> inp = np.ones((8, 8)) # 8x8 array
>>> square_shape = 2
>>> inp.reshape(-1, square_shape, square_shape) # 16 2x2 arrays
>>> square_shape = 4
>>> inp.reshape(-1, square_shape, square_shape) # 4 4x4 arrays
If you want to split it row wise, you may do np.reshape(arr,(2,2), order='C')
If you want to split it column wise, you may do not.reshape(arr,(2,2), order='F')
I have a 3d numpy array (n_samples x num_components x 2) in the example below n_samples = 5 and num_components = 7.
I have another array (indices) which is the selected component for each sample which is of shape (n_samples,).
I want to select from the data array given the indices so that the resulting array is n_samples x 2.
The code is below:
import numpy as np
np.random.seed(77)
data=np.random.randint(low=0, high=10, size=(5, 7, 2))
indices = np.array([0, 1, 6, 4, 5])
#how can I select indices from the data array?
For example for data 0, the selected component should be the 0th and for data 1 the selected component should be 1.
Note that I can't use any for loops because I'm using it in Theano and the solution should be solely based on numpy.
Is this what you are looking for?
In [36]: data[np.arange(data.shape[0]),indices,:]
Out[36]:
array([[7, 4],
[7, 3],
[4, 5],
[8, 2],
[5, 8]])
To get component #0, use
data[:, 0]
i.e. we get every entry on axis 0 (samples), and only entry #0 on axis 1 (components), and implicitly everything on the remaining axes.
This can be easily generalized to
data[:, indices]
to select all relevant components.
But what OP really wants is just the diagonal of this array, i.e. (data[0, indices[0]], (data[1, indices[1]]), ...) The diagonal of a high-dimensional array can be extracted using the diagonal function:
>>> np.diagonal(data[:, indices])
array([[7, 7, 4, 8, 5],
[4, 3, 5, 2, 8]])
(You may need to transpose the result.)
You have a variety of ways to do so, but this is my loop recommendation:
selection = np.array([ datum[indices[k]] for k,datum in enumerate(data)])
The resulting array, selection, has the desired shape.