import numpy as np
matrix1 = np.array([[1,2,3],[4,5,6]])
vector1 = matrix1[:,0] # This should have shape (2,1) but actually has (2,)
matrix2 = np.array([[2,3],[5,6]])
np.hstack((vector1, matrix2))
ValueError: all the input arrays must have same number of dimensions
The problem is that when I select the first column of matrix1 and put it in vector1, it gets converted to a row vector, so when I try to concatenate with matrix2, I get a dimension error. I could do this.
np.hstack((vector1.reshape(matrix2.shape[0],1), matrix2))
But this looks too ugly for me to do every time I have to concatenate a matrix and a vector. Is there a simpler way to do this?
The easier way is
vector1 = matrix1[:,0:1]
For the reason, let me refer you to another answer of mine:
When you write something like a[4], that's accessing the fifth element of the array, not giving you a view of some section of the original array. So for instance, if a is an array of numbers, then a[4] will be just a number. If a is a two-dimensional array, i.e. effectively an array of arrays, then a[4] would be a one-dimensional array. Basically, the operation of accessing an array element returns something with a dimensionality of one less than the original array.
Here are three other options:
You can tidy up your solution a bit by allowing the row dimension of the vector to be set implicitly:
np.hstack((vector1.reshape(-1, 1), matrix2))
You can index with np.newaxis (or equivalently, None) to insert a new axis of size 1:
np.hstack((vector1[:, np.newaxis], matrix2))
np.hstack((vector1[:, None], matrix2))
You can use np.matrix, for which indexing a column with an integer always returns a column vector:
matrix1 = np.matrix([[1, 2, 3],[4, 5, 6]])
vector1 = matrix1[:, 0]
matrix2 = np.matrix([[2, 3], [5, 6]])
np.hstack((vector1, matrix2))
Subsetting
The even simpler way is to subset the matrix.
>>> matrix1
[[1 2 3]
[4 5 6]]
>>> matrix1[:, [0]] # Subsetting
[[1]
[4]]
>>> matrix1[:, 0] # Indexing
[1 4]
>>> matrix1[:, 0:1] # Slicing
[[1]
[4]]
I also mentioned this in a similar question.
It works somewhat similarly to a Pandas dataframe. If you index the dataframe, it gives you a Series. If you subset or slice the dataframe, it gives you a dataframe.
Your approach uses indexing, David Z's approach uses slicing, and my approach uses subsetting.
Related
I have a matrix m = [[1,2,3],[4,5,6],[7,8,9]] and a vector v=[1,2,0] that contains the indices of the rows I want to return for each column of my matrix.
the results I expect should be r=[4,8,3], but I can not find out how to get this result using numpy.
By applying the vector to the index, for each columns I get this : m[v,[0,1,2]] = [4, 8, 3], which is roughly my quest.
To prevent hardcoding the columns, I'm using np.arange(m.shape[1]) and the my final formula looks like r=m[v,np.arange(m.shape[1])]
This sounds weird to me and a little complicated for something that should be quite common.
Is there a clean way to get such result ?
In [157]: m = np.array([[1,2,3],[4,5,6],[7,8,9]]);v=np.array([1,2,0])
In [158]: m
Out[158]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
In [159]: v
Out[159]: array([1, 2, 0])
In [160]: m[v,np.arange(3)]
Out[160]: array([4, 8, 3])
We are choosing 3 elements, with indices (1,0),(2,1),(0,2).
Closer to the MATLAB approach:
In [162]: np.ravel_multi_index((v,np.arange(3)),(3,3))
Out[162]: array([3, 7, 2])
In [163]: m.flat[_]
Out[163]: array([4, 8, 3])
Octave/MATLAB equivalent
>> m = [1 2 3;4 5 6;7 8 9];
>> v = [2 3 1]
v =
2 3 1
>> m = [1 2 3;4 5 6;7 8 9];
>> v = [2 3 1];
>> sub2ind([3,3],v,[1 2 3])
ans =
2 6 7
>> m(sub2ind([3,3],v,[1 2 3]))
ans =
4 8 3
The same broadcasting is used to access a block, as illustrated in this recent question:
Is there a way in Python to get a sub matrix as in Matlab?
Well, this 'weird/complicated' thing is actually mentioned as a "straight forward" scenario, in the documentation of Integer array andexing, which is a sub-topic under the broader topic of "Advanced Indexing".
To quote some extract:
When the index consists of as many integer arrays as the array being
indexed has dimensions, the indexing is straight forward, but
different from slicing. Advanced indexes always are broadcast and iterated as one. Note that the result shape is identical to the (broadcast) indexing array shapes
Blockquote
If it makes it seem any less complicated/weird, you could use range(m.shape[1]) instead of np.arange(m.shape[1]). It just needs to be any array or array-like structure.
Visualization / Intuition:
When I was learning this (integer array indexing), it helped me to visualize things in the following way:
I visualized the indexing arrays standing side-by-side, all having exactly the same shape (perhaps as a consequence of getting broadcasted together). I also visualized the result array, which also has the same shape as the indexing arrays. In each of these indexing arrays and the result array, I visualized a monkey, capable of doing a walk-through of its own array, hopping to successive elements of its own array. Note that, in general, this identical shape of the indexing arrays and the result array, can be n-dimensional, and this identical shape can be very different from the shape of the source array whose values are actually being indexed.
In your own example, the source array m has shape (3,3), and the indexing arrays and the result array each have a shape of (3,).
Inn your example, there is a monkey in each of those three arrays (the two indexing arrays and the result array). We then visualize the monkeys doing a walk-through of their respective array elements in tandem. Here, "in tandem" means all the three monkeys start at the first element of their respective arrays, and whenever a monkey hops to the next element of its own array, the other monkeys in the other arrays also hop to the next element in their respective arrays. As it hops to each successive element, the monkey in each indexing array calls out the value of the element it has just visited. So the two monkeys in the two indexing arrays read out the values they've just visited, in their respective indexing arrays. The monkey in the result array also hops in tandem with the monkeys in the indexing arrays. It hears the values being called out by the monkeys in the indexing arrays, uses those values as indices into the source array m, and thus determines the value to be picked from source array m. The monkey in the result array picks up this value from the source array m, and stores it the value in the result array, at the location it has just hopped to. Thus, for example, when all the three monkeys are in the second element of their respective arrays, the second position of the result array would get its value determined.
As stated by the numpy documentation, I think the way you mentioned is the standard way to do this task:
Example
From each row, a specific element should be selected. The row index is just [0, 1, 2] and the column index specifies the element to choose for the corresponding row, here [0, 1, 0]. Using both together the task can be solved using advanced indexing:
x = np.array([[1, 2], [3, 4], [5, 6]])
x[[0, 1, 2], [0, 1, 0]]
I have a 2D numpy array and need to update a selection of elements via multiple layers of indexing. The obvious way to do this for me does not work since it seems numpy is only updating a copy of the array and not the array itself:
import numpy as np
# Create an array and indices that should be updated
arr = np.arange(9).reshape(3,3)
idx = np.array([[0,2], [1,1],[2,0]])
bool_idx = np.array([True, True, False])
# This line does not work as intended since the original array stays unchanged
arr[idx[:,0],idx[:,1]][bool_idx] = -1 * arr[idx[:,0],idx[:,1]][bool_idx]
This is the resulting output:
>>> arr
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
However, I expected this output:
>>> arr
array([[0, 1, -2],
[3, -4, 5],
[6, 7, 8]])
We need to mask the indices with the given mask and then index into arr and assign new values. For indexing, we can use tuple(masked_indices) to index or use the two columns of the index-array for integer-indexing, thus giving us two methods.
Method #1 :
arr[tuple(idx[bool_idx].T)] *= -1
Method #2 :
idx_masked = idx[bool_idx]
arr[idx_masked[:,0],idx_masked[:,1]] *= -1
Why didn't the original method work?
On LHS you were doing arr[idx[:,0],idx[:,1]][bool_idx], which is esssentially two steps : arr[idx[:,0],idx[:,1]], which under the hoods calls arr.__getitem__(indexer)*. When indexer is a slice, the regularity of the elements allows NumPy to return a view (by modifying the strides and offset). When indexer is an arbitrary boolean mask or arbitrary array of integers, there is in general no regularity to the elements selected, so there is no way to return a view. Let's call arr[idx[:,0],idx[:,1]] as arr2.
In the next step, with the combined arr[idx[:,0],idx[:,1]][bool_idx], i.e. arr2[bool_idx], under the hoods it calls arr2.__setitem__(mask), which is implemented to modify arr2 and as such doesn't propagate back to arr.
*Inspiration from - https://stackoverflow.com/a/38768993/.
More info on __getitem__,__setitem__.
Why did the methods posted in this post work?
Because both directly used the indexer on arr with arr.__setitem__(indexer) that modifies arr.
You just need to make a small change to your own attempt -- you need to apply the boolean index array on each of your integer index expressions. In other words, this should work:
arr[idx[:,0][bool_idx],idx[:,1][bool_idx]] *= -1
(I've just moved the [bool_idx] inside the square brackets, to apply it on the both of the integer index expressions -- idx[:,0] and idx[:,1])
I would like to be able to convert arrays, such as
a = np.array([[1,2], [3,4]])
into the same array BUT each element as a 1-element array instead of a number.
The desired output would be:
np.array([[np.array([1]), np.array([2])], [np.array([3]), np.array([4])]])
The operation you describe is very rarely useful. More likely, it would be a better idea to add an extra dimension of length 1 to the end of your array:
a = a[..., np.newaxis]
# or
a = a.reshape(a.shape + (1,))
Then a[0, 1] will be a 1D array, but all the nice NumPy features like broadcasting and ufuncs will work right. Note that this creates a view of the original array; if you need an independent copy, you can call the copy() method.
If you actually want a 2D array whose elements are 1D arrays, NumPy doesn't make that easy for you. (It's almost never a good way to organize your data, so there isn't much reason for the NumPy developers to provide an easy way to do it.) Most of the things you might expect to create such an array will instead create a 3D array. The most straightforward way to do it I know of is to create an empty array of object dtype and fill the cells one by one, using ordinary Python loops:
b = numpy.empty(a.shape, dtype=object)
for i in range(a.shape[0]):
for j in range(a.shape[1]):
b[i, j] = numpy.array([a[i, j]])
np.array([ [ [x] for x in row ] for row in [[1,2], [3,4]] ])
You are fighting np.arrays attempt to create a high dimensional array.
For example, your desired expression, when entered, produces a (2,2,1) array:
a2 = np.array([[np.array([1]), np.array([2])], [np.array([3]), np.array([4])]])
In [172]: a2
Out[172]:
array([[[1],
[2]],
[[3],
[4]]])
Which might actually be what you want (despite print appearances), since
In [180]: a2[0,0]
Out[180]: array([1])
Often the best way to fight the tendency of numpy to make a higher nd array, is to initialize it as empty with the right shape and dtype, and then fill it:
In [183]: a1=np.empty(a.shape,dtype=object)
In [184]: a1.flat=[np.array(x) for x in a.flatten()]
In [185]: a1
Out[185]:
array([[array(1), array(2)],
[array(3), array(4)]], dtype=object)
It has the right shape (2,2), and dtype. The elements are arrays - but their dimensions are 'wrong', () instead of (1,). flat and flatten are handy tools for iterating over nd arrays as though they were 1d.
a1.flat=[np.array([x]) for x in a.flatten()] does not work - it's an array of ints like a, but with object dtype.
A more explicit iteration does work:
In [214]: a1=np.empty(a.shape,dtype=object)
In [215]: for i in np.ndindex(2,2):
a1[i]=np.array([a[i]])
In [216]: a1
Out[216]:
array([[array([1]), array([2])],
[array([3]), array([4])]], dtype=object)
I have a numpy array which looks like:
myArray = np.array([[1,2],[3]])
But I can not flatten it,
In: myArray.flatten()
Out: array([[1, 2], [3]], dtype=object)
If I change the array to the same length in the second axis, then I can flatten it.
In: myArray2 = np.array([[1,2],[3,4]])
In: myArray2.flatten()
Out: array([1, 2, 3, 4])
My Question is:
Can I use some thing like myArray.flatten() regardless the dimension of the array and the length of its elements, and get the output: array([1,2,3])?
myArray is a 1-dimensional array of objects. Your list objects will simply remain in the same order with flatten() or ravel(). You can use hstack to stack the arrays in sequence horizontally:
>>> np.hstack(myArray)
array([1, 2, 3])
Note that this is basically equivalent to using concatenate with an axis of 1 (this should make sense intuitively):
>>> np.concatenate(myArray, axis=1)
array([1, 2, 3])
If you don't have this issue however and can merge the items, it is always preferable to use flatten() or ravel() for performance:
In [1]: u = timeit.Timer('np.hstack(np.array([[1,2],[3,4]]))'\
....: , setup = 'import numpy as np')
In [2]: print u.timeit()
11.0124390125
In [3]: u = timeit.Timer('np.array([[1,2],[3,4]]).flatten()'\
....: , setup = 'import numpy as np')
In [4]: print u.timeit()
3.05757689476
Iluengo's answer also has you covered for further information as to why you cannot use flatten() or ravel() given your array type.
Well, I agree with the other answers when they say that hstack or concatenate do the job in this case. However, I would like to point that even if it 'fixes' the problem, the problem is not addressed properly.
The problem is that even if it looks like the second axis has different length, this is not true in practice. If you try:
>>> myArray.shape
(2,)
>>> myArray.dtype
dtype('O') # stands for Object
>>> myArray[0]
[1, 2]
It shows you that your array is not a 2D array with variable size (as you might think), it is just a 1D array of objects. In your case, the elements are list, being the first element of your array a 2-element list and the second element of the array is a 1-element list.
So, flatten and ravel won't work because transforming 1D array to a 1D array results in exactly the same 1D array. If you have a object numpy array, it won't care about what you put inside, it will treat individual items as unkown items and can't decide how to merge them.
What you should have in consideration, is if this is the behaviour you want for your application. Numpy arrays are specially efficient with fixed-size numeric matrices. If you are playing with arrays of objects, I don't see why would you like to use Numpy instead of regular python lists.
np.hstack works in this case
In [69]: np.hstack(myArray)
Out[69]: array([1, 2, 3])
I have defined 2 numpy array 2,3 and horizontally concatenate them
a=numpy.array([[1,2,3],[4,5,6]])
b=numpy.array([[7,8,9],[10,11,12]])
C=numpy.concatenate((a,b),axis=0)
c becomes 4,3 matrix
Now I tried same thing with 1,3 list as
a=numpy.array([1,2,3])
b=numpy.array([4,5,6])
c=numpy.concatenate((a,b),axis=0)
Now I was expecting 2,3 matrix but instead I have 1,6. I understand that vstack etc will work but I am curious as to why this is happening? And what I am doing wrong with numpy.concatenate?
Thanks for the reply. I can get the result as suggested by having 1,3 array and then concatenation. But logic is I have to add rows to an empty matrix at each iteration. I tried append as Suggested:
testing=[]
for i in range(3):
testing=testing.append([1,2,3])
It gave error testing doesnot have attribute append as its of None Type. Further If I use logic of 1,3 array using np.array([[1,2,3]]) how can i do this inside for loop?
You didn't do anything wrong. numpy.concatenate join a sequence of arrays together.which means it create an integrated array from the current array's element which in a 2D array the elements are nested lists and in a 1D array the elements are variables.
So this is not the concatenate's job, as you said you can use np.vstack :
>>> c=numpy.vstack((a,b))
>>> c
array([[1, 2, 3],
[4, 5, 6]])
Also in your code list.append appends and element in-place to a list you can not assign it to a variable.instead you can just append to testing in each iteration.
testing=[]
for i in range(3):
testing.append([1,2,3])
also as a more efficient way you can create that list using a list comprehension list following :
testing=[[1,2,3] for _ in xrange(3)]
This is happening because you are concatenating along axis=0.
In your first example:
a=numpy.array([[1,2,3],[4,5,6]]) # 2 elements in 0th dimension
b=numpy.array([[7,8,9],[10,11,12]]) # 2 elements in 0th dimension
C=numpy.concatenate((a,b),axis=0) # 4 elements in 0th dimension
In your second example:
a=numpy.array([1,2,3]) # 3 elements in 0th dimension
b=numpy.array([4,5,6]) # 3 elements in 0th dimension
c=numpy.concatenate((a,b),axis=0) # 6 elements in 0th dimension
Edit:
Note that in your second example, you only have one dimensional array.
In [35]: a=numpy.array([1,2,3])
In [36]: a.shape
Out[36]: (3,)
If the shape of the arrays was (1,3) you would get your expected result:
In [43]: a2=numpy.array([[1,2,3]])
In [44]: b2=numpy.array([[4,5,6]])
In [45]: numpy.concatenate((a2,b2), axis=0)
Out[45]:
array([[1, 2, 3],
[4, 5, 6]])