Issue with numpy.concatenation - python

I have defined 2 numpy array 2,3 and horizontally concatenate them
a=numpy.array([[1,2,3],[4,5,6]])
b=numpy.array([[7,8,9],[10,11,12]])
C=numpy.concatenate((a,b),axis=0)
c becomes 4,3 matrix
Now I tried same thing with 1,3 list as
a=numpy.array([1,2,3])
b=numpy.array([4,5,6])
c=numpy.concatenate((a,b),axis=0)
Now I was expecting 2,3 matrix but instead I have 1,6. I understand that vstack etc will work but I am curious as to why this is happening? And what I am doing wrong with numpy.concatenate?
Thanks for the reply. I can get the result as suggested by having 1,3 array and then concatenation. But logic is I have to add rows to an empty matrix at each iteration. I tried append as Suggested:
testing=[]
for i in range(3):
testing=testing.append([1,2,3])
It gave error testing doesnot have attribute append as its of None Type. Further If I use logic of 1,3 array using np.array([[1,2,3]]) how can i do this inside for loop?

You didn't do anything wrong. numpy.concatenate join a sequence of arrays together.which means it create an integrated array from the current array's element which in a 2D array the elements are nested lists and in a 1D array the elements are variables.
So this is not the concatenate's job, as you said you can use np.vstack :
>>> c=numpy.vstack((a,b))
>>> c
array([[1, 2, 3],
[4, 5, 6]])
Also in your code list.append appends and element in-place to a list you can not assign it to a variable.instead you can just append to testing in each iteration.
testing=[]
for i in range(3):
testing.append([1,2,3])
also as a more efficient way you can create that list using a list comprehension list following :
testing=[[1,2,3] for _ in xrange(3)]

This is happening because you are concatenating along axis=0.
In your first example:
a=numpy.array([[1,2,3],[4,5,6]]) # 2 elements in 0th dimension
b=numpy.array([[7,8,9],[10,11,12]]) # 2 elements in 0th dimension
C=numpy.concatenate((a,b),axis=0) # 4 elements in 0th dimension
In your second example:
a=numpy.array([1,2,3]) # 3 elements in 0th dimension
b=numpy.array([4,5,6]) # 3 elements in 0th dimension
c=numpy.concatenate((a,b),axis=0) # 6 elements in 0th dimension
Edit:
Note that in your second example, you only have one dimensional array.
In [35]: a=numpy.array([1,2,3])
In [36]: a.shape
Out[36]: (3,)
If the shape of the arrays was (1,3) you would get your expected result:
In [43]: a2=numpy.array([[1,2,3]])
In [44]: b2=numpy.array([[4,5,6]])
In [45]: numpy.concatenate((a2,b2), axis=0)
Out[45]:
array([[1, 2, 3],
[4, 5, 6]])

Related

get a vector from a matrix and a vactor of index in numpy

I have a matrix m = [[1,2,3],[4,5,6],[7,8,9]] and a vector v=[1,2,0] that contains the indices of the rows I want to return for each column of my matrix.
the results I expect should be r=[4,8,3], but I can not find out how to get this result using numpy.
By applying the vector to the index, for each columns I get this : m[v,[0,1,2]] = [4, 8, 3], which is roughly my quest.
To prevent hardcoding the columns, I'm using np.arange(m.shape[1]) and the my final formula looks like r=m[v,np.arange(m.shape[1])]
This sounds weird to me and a little complicated for something that should be quite common.
Is there a clean way to get such result ?
In [157]: m = np.array([[1,2,3],[4,5,6],[7,8,9]]);v=np.array([1,2,0])
In [158]: m
Out[158]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
In [159]: v
Out[159]: array([1, 2, 0])
In [160]: m[v,np.arange(3)]
Out[160]: array([4, 8, 3])
We are choosing 3 elements, with indices (1,0),(2,1),(0,2).
Closer to the MATLAB approach:
In [162]: np.ravel_multi_index((v,np.arange(3)),(3,3))
Out[162]: array([3, 7, 2])
In [163]: m.flat[_]
Out[163]: array([4, 8, 3])
Octave/MATLAB equivalent
>> m = [1 2 3;4 5 6;7 8 9];
>> v = [2 3 1]
v =
2 3 1
>> m = [1 2 3;4 5 6;7 8 9];
>> v = [2 3 1];
>> sub2ind([3,3],v,[1 2 3])
ans =
2 6 7
>> m(sub2ind([3,3],v,[1 2 3]))
ans =
4 8 3
The same broadcasting is used to access a block, as illustrated in this recent question:
Is there a way in Python to get a sub matrix as in Matlab?
Well, this 'weird/complicated' thing is actually mentioned as a "straight forward" scenario, in the documentation of Integer array andexing, which is a sub-topic under the broader topic of "Advanced Indexing".
To quote some extract:
When the index consists of as many integer arrays as the array being
indexed has dimensions, the indexing is straight forward, but
different from slicing. Advanced indexes always are broadcast and iterated as one. Note that the result shape is identical to the (broadcast) indexing array shapes
Blockquote
If it makes it seem any less complicated/weird, you could use range(m.shape[1]) instead of np.arange(m.shape[1]). It just needs to be any array or array-like structure.
Visualization / Intuition:
When I was learning this (integer array indexing), it helped me to visualize things in the following way:
I visualized the indexing arrays standing side-by-side, all having exactly the same shape (perhaps as a consequence of getting broadcasted together). I also visualized the result array, which also has the same shape as the indexing arrays. In each of these indexing arrays and the result array, I visualized a monkey, capable of doing a walk-through of its own array, hopping to successive elements of its own array. Note that, in general, this identical shape of the indexing arrays and the result array, can be n-dimensional, and this identical shape can be very different from the shape of the source array whose values are actually being indexed.
In your own example, the source array m has shape (3,3), and the indexing arrays and the result array each have a shape of (3,).
Inn your example, there is a monkey in each of those three arrays (the two indexing arrays and the result array). We then visualize the monkeys doing a walk-through of their respective array elements in tandem. Here, "in tandem" means all the three monkeys start at the first element of their respective arrays, and whenever a monkey hops to the next element of its own array, the other monkeys in the other arrays also hop to the next element in their respective arrays. As it hops to each successive element, the monkey in each indexing array calls out the value of the element it has just visited. So the two monkeys in the two indexing arrays read out the values they've just visited, in their respective indexing arrays. The monkey in the result array also hops in tandem with the monkeys in the indexing arrays. It hears the values being called out by the monkeys in the indexing arrays, uses those values as indices into the source array m, and thus determines the value to be picked from source array m. The monkey in the result array picks up this value from the source array m, and stores it the value in the result array, at the location it has just hopped to. Thus, for example, when all the three monkeys are in the second element of their respective arrays, the second position of the result array would get its value determined.
As stated by the numpy documentation, I think the way you mentioned is the standard way to do this task:
Example
From each row, a specific element should be selected. The row index is just [0, 1, 2] and the column index specifies the element to choose for the corresponding row, here [0, 1, 0]. Using both together the task can be solved using advanced indexing:
x = np.array([[1, 2], [3, 4], [5, 6]])
x[[0, 1, 2], [0, 1, 0]]

What purpose does [0] serve in numpy.where(y_euler<0.0)[0]

Also what is the difference between this:
idx_negative_euler = numpy.where(y_euler<0.0)[0]
and this:
idx_negative_euler = numpy.where(y_euler<0.0)[0][0]
I realize that this returns an array of indices where the array y_euler is negative, however I simply can't figure out what the [0] or the [0][0] at the end of the line is supposed to do.
I couldn't find any documentation regarding this (I'm not even sure what to search for). I've already looked into the numpy.where documentation but that didn't help.
[0] means "get the first item of the sequence." For example if you had this list:
x = [5, 7, 9]
Then x[0] would be the first item of that sequence: 5.
numpy.where() returns a sequence. Putting [0] on the end of that expression gets the first item in that sequence.
[0][0] means "get the first item in the sequence (which is itself also a sequence), and then get the first item in that sequence". So if numpy.where() returned a list of lists, [0][0] would get the first item in the first list.
Make a simple 1d array:
In [60]: x=np.array([0,1,-1,2,-1,0])
Where returns a tuple (...,) of arrays, one for each dimension:
In [61]: np.where(x<0)
Out[61]: (array([2, 4], dtype=int32),)
pull the first (here only) element from the tuple
In [62]: np.where(x<0)[0]
Out[62]: array([2, 4], dtype=int32)
get the first element of the indexing array
In [63]: np.where(x<0)[0][0]
Out[63]: 2
the whole tuple returned by where can be used to index the array.
In [64]: x[np.where(x<0)]
Out[64]: array([-1, -1])
x[2,4], x[([2,4],)] do the same indexing.
The usefulness of the tuple value becomes more obvious when working on a 2d or higher dim array. In that case np.where(...)[0] would give the 'rows' index array. But the where(...)[0] is most common in the 1d case where the tuple layer usually isn't needed.

how can I flatten an 2d numpy array, which has different length in the second axis?

I have a numpy array which looks like:
myArray = np.array([[1,2],[3]])
But I can not flatten it,
In: myArray.flatten()
Out: array([[1, 2], [3]], dtype=object)
If I change the array to the same length in the second axis, then I can flatten it.
In: myArray2 = np.array([[1,2],[3,4]])
In: myArray2.flatten()
Out: array([1, 2, 3, 4])
My Question is:
Can I use some thing like myArray.flatten() regardless the dimension of the array and the length of its elements, and get the output: array([1,2,3])?
myArray is a 1-dimensional array of objects. Your list objects will simply remain in the same order with flatten() or ravel(). You can use hstack to stack the arrays in sequence horizontally:
>>> np.hstack(myArray)
array([1, 2, 3])
Note that this is basically equivalent to using concatenate with an axis of 1 (this should make sense intuitively):
>>> np.concatenate(myArray, axis=1)
array([1, 2, 3])
If you don't have this issue however and can merge the items, it is always preferable to use flatten() or ravel() for performance:
In [1]: u = timeit.Timer('np.hstack(np.array([[1,2],[3,4]]))'\
....: , setup = 'import numpy as np')
In [2]: print u.timeit()
11.0124390125
In [3]: u = timeit.Timer('np.array([[1,2],[3,4]]).flatten()'\
....: , setup = 'import numpy as np')
In [4]: print u.timeit()
3.05757689476
Iluengo's answer also has you covered for further information as to why you cannot use flatten() or ravel() given your array type.
Well, I agree with the other answers when they say that hstack or concatenate do the job in this case. However, I would like to point that even if it 'fixes' the problem, the problem is not addressed properly.
The problem is that even if it looks like the second axis has different length, this is not true in practice. If you try:
>>> myArray.shape
(2,)
>>> myArray.dtype
dtype('O') # stands for Object
>>> myArray[0]
[1, 2]
It shows you that your array is not a 2D array with variable size (as you might think), it is just a 1D array of objects. In your case, the elements are list, being the first element of your array a 2-element list and the second element of the array is a 1-element list.
So, flatten and ravel won't work because transforming 1D array to a 1D array results in exactly the same 1D array. If you have a object numpy array, it won't care about what you put inside, it will treat individual items as unkown items and can't decide how to merge them.
What you should have in consideration, is if this is the behaviour you want for your application. Numpy arrays are specially efficient with fixed-size numeric matrices. If you are playing with arrays of objects, I don't see why would you like to use Numpy instead of regular python lists.
np.hstack works in this case
In [69]: np.hstack(myArray)
Out[69]: array([1, 2, 3])

Slicing n-dimensional numpy array using list of indices

Say I have a 3 dimensional numpy array:
np.random.seed(1145)
A = np.random.random((5,5,5))
and I have two lists of indices corresponding to the 2nd and 3rd dimensions:
second = [1,2]
third = [3,4]
and I want to select the elements in the numpy array corresponding to
A[:][second][third]
so the shape of the sliced array would be (5,2,2) and
A[:][second][third].flatten()
would be equivalent to to:
In [226]:
for i in range(5):
for j in second:
for k in third:
print A[i][j][k]
0.556091074129
0.622016249651
0.622530505868
0.914954716368
0.729005532319
0.253214472335
0.892869371179
0.98279375528
0.814240066639
0.986060321906
0.829987410941
0.776715489939
0.404772469431
0.204696635072
0.190891168574
0.869554447412
0.364076117846
0.04760811817
0.440210532601
0.981601369658
Is there a way to slice a numpy array in this way? So far when I try A[:][second][third] I get IndexError: index 3 is out of bounds for axis 0 with size 2 because the [:] for the first dimension seems to be ignored.
Numpy uses multiple indexing, so instead of A[1][2][3], you can--and should--use A[1,2,3].
You might then think you could do A[:, second, third], but the numpy indices are broadcast, and broadcasting second and third (two one-dimensional sequences) ends up being the numpy equivalent of zip, so the result has shape (5, 2).
What you really want is to index with, in effect, the outer product of second and third. You can do this with broadcasting by making one of them, say second into a two-dimensional array with shape (2,1). Then the shape that results from broadcasting second and third together is (2,2).
For example:
In [8]: import numpy as np
In [9]: a = np.arange(125).reshape(5,5,5)
In [10]: second = [1,2]
In [11]: third = [3,4]
In [12]: s = a[:, np.array(second).reshape(-1,1), third]
In [13]: s.shape
Out[13]: (5, 2, 2)
Note that, in this specific example, the values in second and third are sequential. If that is typical, you can simply use slices:
In [14]: s2 = a[:, 1:3, 3:5]
In [15]: s2.shape
Out[15]: (5, 2, 2)
In [16]: np.all(s == s2)
Out[16]: True
There are a couple very important difference in those two methods.
The first method would also work with indices that are not equivalent to slices. For example, it would work if second = [0, 2, 3]. (Sometimes you'll see this style of indexing referred to as "fancy indexing".)
In the first method (using broadcasting and "fancy indexing"), the data is a copy of the original array. In the second method (using only slices), the array s2 is a view into the same block of memory used by a. An in-place change in one will change them both.
One way would be to use np.ix_:
>>> out = A[np.ix_(range(A.shape[0]),second, third)]
>>> out.shape
(5, 2, 2)
>>> manual = [A[i,j,k] for i in range(5) for j in second for k in third]
>>> (out.ravel() == manual).all()
True
Downside is that you have to specify the missing coordinate ranges explicitly, but you could wrap that into a function.
I think there are three problems with your approach:
Both second and third should be slices
Since the 'to' index is exclusive, they should go from 1 to 3 and from 3 to 5
Instead of A[:][second][third], you should use A[:,second,third]
Try this:
>>> np.random.seed(1145)
>>> A = np.random.random((5,5,5))
>>> second = slice(1,3)
>>> third = slice(3,5)
>>> A[:,second,third].shape
(5, 2, 2)
>>> A[:,second,third].flatten()
array([ 0.43285482, 0.80820122, 0.64878266, 0.62689481, 0.01298507,
0.42112921, 0.23104051, 0.34601169, 0.24838564, 0.66162209,
0.96115751, 0.07338851, 0.33109539, 0.55168356, 0.33925748,
0.2353348 , 0.91254398, 0.44692211, 0.60975602, 0.64610556])

Convert row vector to column vector in NumPy

import numpy as np
matrix1 = np.array([[1,2,3],[4,5,6]])
vector1 = matrix1[:,0] # This should have shape (2,1) but actually has (2,)
matrix2 = np.array([[2,3],[5,6]])
np.hstack((vector1, matrix2))
ValueError: all the input arrays must have same number of dimensions
The problem is that when I select the first column of matrix1 and put it in vector1, it gets converted to a row vector, so when I try to concatenate with matrix2, I get a dimension error. I could do this.
np.hstack((vector1.reshape(matrix2.shape[0],1), matrix2))
But this looks too ugly for me to do every time I have to concatenate a matrix and a vector. Is there a simpler way to do this?
The easier way is
vector1 = matrix1[:,0:1]
For the reason, let me refer you to another answer of mine:
When you write something like a[4], that's accessing the fifth element of the array, not giving you a view of some section of the original array. So for instance, if a is an array of numbers, then a[4] will be just a number. If a is a two-dimensional array, i.e. effectively an array of arrays, then a[4] would be a one-dimensional array. Basically, the operation of accessing an array element returns something with a dimensionality of one less than the original array.
Here are three other options:
You can tidy up your solution a bit by allowing the row dimension of the vector to be set implicitly:
np.hstack((vector1.reshape(-1, 1), matrix2))
You can index with np.newaxis (or equivalently, None) to insert a new axis of size 1:
np.hstack((vector1[:, np.newaxis], matrix2))
np.hstack((vector1[:, None], matrix2))
You can use np.matrix, for which indexing a column with an integer always returns a column vector:
matrix1 = np.matrix([[1, 2, 3],[4, 5, 6]])
vector1 = matrix1[:, 0]
matrix2 = np.matrix([[2, 3], [5, 6]])
np.hstack((vector1, matrix2))
Subsetting
The even simpler way is to subset the matrix.
>>> matrix1
[[1 2 3]
[4 5 6]]
>>> matrix1[:, [0]] # Subsetting
[[1]
[4]]
>>> matrix1[:, 0] # Indexing
[1 4]
>>> matrix1[:, 0:1] # Slicing
[[1]
[4]]
I also mentioned this in a similar question.
It works somewhat similarly to a Pandas dataframe. If you index the dataframe, it gives you a Series. If you subset or slice the dataframe, it gives you a dataframe.
Your approach uses indexing, David Z's approach uses slicing, and my approach uses subsetting.

Categories

Resources