numpy array that is (n,1) and (n,) - python

What is the difference between a numpy array (lets say X) that has a shape of (N,1) and (N,). Aren't both of them Nx1 matrices ? The reason I ask is because sometimes computations return either one or the other.

This is a 1D array:
>>> np.array([1, 2, 3]).shape
(3,)
This array is a 2D but there is only one element in the first dimension:
>>> np.array([[1, 2, 3]]).shape
(1, 3)
Transposing gives the shape you are asking for:
>>> np.array([[1, 2, 3]]).T.shape
(3, 1)
Now, look at the array. Only the first column of this 2D array is filled.
>>> np.array([[1, 2, 3]]).T
array([[1],
[2],
[3]])
Given these two arrays:
>>> a = np.array([[1, 2, 3]])
>>> b = np.array([[1, 2, 3]]).T
>>> a
array([[1, 2, 3]])
>>> b
array([[1],
[2],
[3]])
You can take advantage of broadcasting:
>>> a * b
array([[1, 2, 3],
[2, 4, 6],
[3, 6, 9]])
The missing numbers are filled in. Think for rows and columns in table or spreadsheet.
>>> a + b
array([[2, 3, 4],
[3, 4, 5],
[4, 5, 6]])
Doing this with higher dimensions gets tougher on your imagination.

Related

Repeat specific row or column of Python numpy 2D array [duplicate]

I'd like to copy a numpy 2D array into a third dimension. For example, given the 2D numpy array:
import numpy as np
arr = np.array([[1, 2], [1, 2]])
# arr.shape = (2, 2)
convert it into a 3D matrix with N such copies in a new dimension. Acting on arr with N=3, the output should be:
new_arr = np.array([[[1, 2], [1,2]],
[[1, 2], [1, 2]],
[[1, 2], [1, 2]]])
# new_arr.shape = (3, 2, 2)
Probably the cleanest way is to use np.repeat:
a = np.array([[1, 2], [1, 2]])
print(a.shape)
# (2, 2)
# indexing with np.newaxis inserts a new 3rd dimension, which we then repeat the
# array along, (you can achieve the same effect by indexing with None, see below)
b = np.repeat(a[:, :, np.newaxis], 3, axis=2)
print(b.shape)
# (2, 2, 3)
print(b[:, :, 0])
# [[1 2]
# [1 2]]
print(b[:, :, 1])
# [[1 2]
# [1 2]]
print(b[:, :, 2])
# [[1 2]
# [1 2]]
Having said that, you can often avoid repeating your arrays altogether by using broadcasting. For example, let's say I wanted to add a (3,) vector:
c = np.array([1, 2, 3])
to a. I could copy the contents of a 3 times in the third dimension, then copy the contents of c twice in both the first and second dimensions, so that both of my arrays were (2, 2, 3), then compute their sum. However, it's much simpler and quicker to do this:
d = a[..., None] + c[None, None, :]
Here, a[..., None] has shape (2, 2, 1) and c[None, None, :] has shape (1, 1, 3)*. When I compute the sum, the result gets 'broadcast' out along the dimensions of size 1, giving me a result of shape (2, 2, 3):
print(d.shape)
# (2, 2, 3)
print(d[..., 0]) # a + c[0]
# [[2 3]
# [2 3]]
print(d[..., 1]) # a + c[1]
# [[3 4]
# [3 4]]
print(d[..., 2]) # a + c[2]
# [[4 5]
# [4 5]]
Broadcasting is a very powerful technique because it avoids the additional overhead involved in creating repeated copies of your input arrays in memory.
* Although I included them for clarity, the None indices into c aren't actually necessary - you could also do a[..., None] + c, i.e. broadcast a (2, 2, 1) array against a (3,) array. This is because if one of the arrays has fewer dimensions than the other then only the trailing dimensions of the two arrays need to be compatible. To give a more complicated example:
a = np.ones((6, 1, 4, 3, 1)) # 6 x 1 x 4 x 3 x 1
b = np.ones((5, 1, 3, 2)) # 5 x 1 x 3 x 2
result = a + b # 6 x 5 x 4 x 3 x 2
Another way is to use numpy.dstack. Supposing that you want to repeat the matrix a num_repeats times:
import numpy as np
b = np.dstack([a]*num_repeats)
The trick is to wrap the matrix a into a list of a single element, then using the * operator to duplicate the elements in this list num_repeats times.
For example, if:
a = np.array([[1, 2], [1, 2]])
num_repeats = 5
This repeats the array of [1 2; 1 2] 5 times in the third dimension. To verify (in IPython):
In [110]: import numpy as np
In [111]: num_repeats = 5
In [112]: a = np.array([[1, 2], [1, 2]])
In [113]: b = np.dstack([a]*num_repeats)
In [114]: b[:,:,0]
Out[114]:
array([[1, 2],
[1, 2]])
In [115]: b[:,:,1]
Out[115]:
array([[1, 2],
[1, 2]])
In [116]: b[:,:,2]
Out[116]:
array([[1, 2],
[1, 2]])
In [117]: b[:,:,3]
Out[117]:
array([[1, 2],
[1, 2]])
In [118]: b[:,:,4]
Out[118]:
array([[1, 2],
[1, 2]])
In [119]: b.shape
Out[119]: (2, 2, 5)
At the end we can see that the shape of the matrix is 2 x 2, with 5 slices in the third dimension.
Use a view and get free runtime! Extend generic n-dim arrays to n+1-dim
Introduced in NumPy 1.10.0, we can leverage numpy.broadcast_to to simply generate a 3D view into the 2D input array. The benefit would be no extra memory overhead and virtually free runtime. This would be essential in cases where the arrays are big and we are okay to work with views. Also, this would work with generic n-dim cases.
I would use the word stack in place of copy, as readers might confuse it with the copying of arrays that creates memory copies.
Stack along first axis
If we want to stack input arr along the first axis, the solution with np.broadcast_to to create 3D view would be -
np.broadcast_to(arr,(3,)+arr.shape) # N = 3 here
Stack along third/last axis
To stack input arr along the third axis, the solution to create 3D view would be -
np.broadcast_to(arr[...,None],arr.shape+(3,))
If we actually need a memory copy, we can always append .copy() there. Hence, the solutions would be -
np.broadcast_to(arr,(3,)+arr.shape).copy()
np.broadcast_to(arr[...,None],arr.shape+(3,)).copy()
Here's how the stacking works for the two cases, shown with their shape information for a sample case -
# Create a sample input array of shape (4,5)
In [55]: arr = np.random.rand(4,5)
# Stack along first axis
In [56]: np.broadcast_to(arr,(3,)+arr.shape).shape
Out[56]: (3, 4, 5)
# Stack along third axis
In [57]: np.broadcast_to(arr[...,None],arr.shape+(3,)).shape
Out[57]: (4, 5, 3)
Same solution(s) would work to extend a n-dim input to n+1-dim view output along the first and last axes. Let's explore some higher dim cases -
3D input case :
In [58]: arr = np.random.rand(4,5,6)
# Stack along first axis
In [59]: np.broadcast_to(arr,(3,)+arr.shape).shape
Out[59]: (3, 4, 5, 6)
# Stack along last axis
In [60]: np.broadcast_to(arr[...,None],arr.shape+(3,)).shape
Out[60]: (4, 5, 6, 3)
4D input case :
In [61]: arr = np.random.rand(4,5,6,7)
# Stack along first axis
In [62]: np.broadcast_to(arr,(3,)+arr.shape).shape
Out[62]: (3, 4, 5, 6, 7)
# Stack along last axis
In [63]: np.broadcast_to(arr[...,None],arr.shape+(3,)).shape
Out[63]: (4, 5, 6, 7, 3)
and so on.
Timings
Let's use a large sample 2D case and get the timings and verify output being a view.
# Sample input array
In [19]: arr = np.random.rand(1000,1000)
Let's prove that the proposed solution is a view indeed. We will use stacking along first axis (results would be very similar for stacking along the third axis) -
In [22]: np.shares_memory(arr, np.broadcast_to(arr,(3,)+arr.shape))
Out[22]: True
Let's get the timings to show that it's virtually free -
In [20]: %timeit np.broadcast_to(arr,(3,)+arr.shape)
100000 loops, best of 3: 3.56 µs per loop
In [21]: %timeit np.broadcast_to(arr,(3000,)+arr.shape)
100000 loops, best of 3: 3.51 µs per loop
Being a view, increasing N from 3 to 3000 changed nothing on timings and both are negligible on timing units. Hence, efficient both on memory and performance!
This can now also be achived using np.tile as follows:
import numpy as np
a = np.array([[1,2],[1,2]])
b = np.tile(a,(3, 1,1))
b.shape
(3,2,2)
b
array([[[1, 2],
[1, 2]],
[[1, 2],
[1, 2]],
[[1, 2],
[1, 2]]])
A=np.array([[1,2],[3,4]])
B=np.asarray([A]*N)
Edit #Mr.F, to preserve dimension order:
B=B.T
Here's a broadcasting example that does exactly what was requested.
a = np.array([[1, 2], [1, 2]])
a=a[:,:,None]
b=np.array([1]*5)[None,None,:]
Then b*a is the desired result and (b*a)[:,:,0] produces array([[1, 2],[1, 2]]), which is the original a, as does (b*a)[:,:,1], etc.
Summarizing the solutions above:
a = np.arange(9).reshape(3,-1)
b = np.repeat(a[:, :, np.newaxis], 5, axis=2)
c = np.dstack([a]*5)
d = np.tile(a, [5,1,1])
e = np.array([a]*5)
f = np.repeat(a[np.newaxis, :, :], 5, axis=0) # np.repeat again
print('b='+ str(b.shape), b[:,:,-1].tolist())
print('c='+ str(c.shape),c[:,:,-1].tolist())
print('d='+ str(d.shape),d[-1,:,:].tolist())
print('e='+ str(e.shape),e[-1,:,:].tolist())
print('f='+ str(f.shape),f[-1,:,:].tolist())
b=(3, 3, 5) [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
c=(3, 3, 5) [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
d=(5, 3, 3) [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
e=(5, 3, 3) [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
f=(5, 3, 3) [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
Good luck

Create numpy array within numpy array

I want to create a numpy array within a numpy array. If i do it with normal python its something like
a = [[1,2], [3,4]]
a[0][1] = [1,1,1]
print a
The result is [[1, [1, 1, 1]], [3, 4]]
How can I achieve the same using numpy arrays? The code I have is:
a = np.array([(1, 2, 3),(4, 5, 6)])
b = np.array([1,1,1])
a[0][1] = b
a as created is dtype int. Each element can only be another integer:
In [758]: a = np.array([(1, 2, 3),(4, 5, 6)])
...: b = np.array([1,1,1])
...:
In [759]: a
Out[759]:
array([[1, 2, 3],
[4, 5, 6]])
In [760]: b
Out[760]: array([1, 1, 1])
In [761]: a[0,1]=b
...
ValueError: setting an array element with a sequence.
You can make another dtype of array, one that holds pointers to objects, much as list does:
In [762]: aO = a.astype(object)
In [763]: aO
Out[763]:
array([[1, 2, 3],
[4, 5, 6]], dtype=object)
Now it is possible to replace one of those element pointers with a pointer to b array:
In [765]: aO[0,1]=b
In [766]: aO
Out[766]:
array([[1, array([1, 1, 1]), 3],
[4, 5, 6]], dtype=object)
But as asked in the comments - why do you want/need to do this? What are you going to do with such an array? It is possible to do some numpy math on such an array, but as shown in some recent SO questions, it is hit-or-miss. It is also slower.
As far as I know, you cannot do this. Numpy arrays cannot have entries of varying shape. Your request to make an array like [[1, [1, 1, 1]], [3, 4]] is impossible. However, you could make a numpy matrix of dimensions (3x2x3) to get
[
[
[1,0,0],
[1,1,1],
[0,0,0],
]
[
[3,0,0],
[4,0,0],
[0,0,0]
]
]
Your only option is to pad empty elements with some number (I used 0s above) or use another data structure.

numpy array 1.9.2 getting ValueError: could not broadcast input array from shape (4,2) into shape (4)

Following piece of code was working in numpy 1.7.1 but it is giving value error in the current version. I want to know the root cause of it.
import numpy as np
x = [1,2,3,4]
y = [[1, 2],[2, 3], [1, 2],[2, 3]]
a = np.array([x, np.array(y)])
Following is the output I get in numpy 1.7.1
>>>a
array([[1, 2, 3, 4],
[array([1, 2]), array([2, 3]), array([1, 2]), array([2, 3])]], dtype=object)
But the same code produces error in version 1.9.2.
----> 5 a = np.array([x, np.array(y)])
ValueError: could not broadcast input array from shape (4,2) into shape (4)
I have found one possible solution the this. But I don't know whether this is the best thing to do.
b= np.empty(2, dtype=object)
b[:] = [x, np.array(y)]
>>> b
array([[1, 2, 3, 4],
array([[1, 2],
[2, 3],
[1, 2],
[2, 3]])], dtype=object)
Please suggest a solution to achieve the desired output. Thanks
What exactly are you trying to produce? I don't have a 1.7 version to test your example.
np.array(x) produces a (4,) array. np.array(y) a (4,2).
As noted in a comment, in 1.8.1 np.array([x, np.array(y)]) produces
ValueError: setting an array element with a sequence.
I can make a object dtype array, consisting of the list and the array
In [90]: np.array([x, np.array(y)],dtype=object)
Out[90]:
array([[1, 2, 3, 4],
[array([1, 2]), array([2, 3]), array([1, 2]), array([2, 3])]], dtype=object)
I can also concatenate 2 arrays to make a (4,3) array (x is the first column)
In [92]: np.concatenate([np.array(x)[:,None],np.array(y)],axis=1)
Out[92]:
array([[1, 1, 2],
[2, 2, 3],
[3, 1, 2],
[4, 2, 3]])
np.column_stack([x,y]) does the same thing.
Curiously in a dev 1.9 (I don't have production 1.9.2 installed) it works (sort of)
In [9]: np.__version__
Out[9]: '1.9.0.dev-Unknown'
In [10]: np.array([x,np.array(y)])
Out[10]:
array([[ 1, 2, 3, 4],
[174420780, 175084380, 16777603, 0]])
In [11]: np.array([x,np.array(y)],dtype=object)
Out[11]:
array([[1, 2, 3, 4],
[None, None, None, None]], dtype=object)
In [16]: np.array([x,y],dtype=object)
Out[16]:
array([[1, 2, 3, 4],
[[1, 2], [2, 3], [1, 2], [2, 3]]], dtype=object)
So it looks like there is some sort of development going on.
In any case making a new array from this list and a 2d array is ambiguous. Use column_stack (assuming you want a 2d int array).
numpy 1.9.0 release notes:
The performance of converting lists containing arrays to arrays using np.array has been improved. It is now equivalent in speed to np.vstack(list).
With transposed y vstack works:
In [125]: np.vstack([[1,2,3,4],np.array([[1,2],[2,3],[1,2],[2,3]]).T])
Out[125]:
array([[1, 2, 3, 4],
[1, 2, 1, 2],
[2, 3, 2, 3]])
If 1.7.1 worked, and x was string names, not just ints as in your example, then it probably was producing a object array.

Reduce Dimensons when converting list to array

I want to reduce the dimensions of an array after converting it to a list
a = np.array([[1,2],[3,4]])
print a.shape
b = np.array([[1],[3,4]])
print b.shape
Output:
(2, 2)
(2,)
I want a to have the same shape as b i.e. (2,)
>>> a = np.array([[1,2],[3,4], None])[:2]
>>> a
array([[1, 2], [3, 4]], dtype=object)
>>> a.shape
(2,)
Works, though is probably the wrong way to do it (I'm a numpy newb).
Do you understand what b is?
b = np.array([[1],[3,4]])
print(repr(b))
array([[1], [3, 4]], dtype=object)
b is a 1d array with 2 elements, each a list. np.array does this way because the 2 sublists have different length, so it can't create a 2d array.
a = np.array([[1,2],[3,4]])
print(repr(a))
array([[1, 2],
[3, 4]])
Here the 2 sublists have the same length, so it can create a 2d array. Each element is an integer. np.array tries to create the highest dimensional array that the input allows.
Probably the best way to create another array like b is to make a copy, and insert the desired lists.
a1 = b.copy()
a1[0] = [1,2]
# a1[1] = [3,4]
print(repr(a1))
array([[1, 2], [3, 4]], dtype=object)
You have to use this convoluted method because you trying to do something 'unnatural'.
You comment about using vstack. Both work:
In [570]: np.vstack((a,b)) # (3,2) array
Out[570]:
array([[1, 2],
[3, 4],
[[1], [3, 4]]], dtype=object)
In [571]: np.vstack((a1,b)) # (2,2) array
Out[571]:
array([[[1, 2], [3, 4]],
[[1], [3, 4]]], dtype=object)
Your array b is little more than the original list in an array wrapper. Is that really what you need? The 2d a is a normal numpy array. b is an oddball construction.

How do I get a row of a 2d numpy array as 2d array

How do I select a row from a NxM numpy array as an array of size 1xM:
> import numpy
> a = numpy.array([[1,2], [3,4], [5,6]])
> a
array([[1, 2],
[3, 4],
[5, 6]])
> a.shape
(e, 2)
> a[0]
array([1, 2])
> a[0].shape
(2,)
I'd like
a[0].shape == (1,2)
I'm doing this because a library I want to use seems to require this.
If you have something of shape (2,) and you want to add a new axis so that the shape is (1,2), the easiest way is to use np.newaxis:
a = np.array([1,2])
a.shape
#(2,)
b = a[np.newaxis, :]
print b
#array([[1,2]])
b.shape
#(1,2)
If you have something of shape (N,2) and want to slice it with the same dimensionality to get a slice with shape (1,2), then you can use a range of length 1 as your slice instead of one index:
a = np.array([[1,2], [3,4], [5,6]])
a[0:1]
#array([[1, 2]])
a[0:1].shape
#(1,2)
Another trick is that some functions have a keepdims option, for example:
a
#array([[1, 2],
# [3, 4],
# [5, 6]])
a.sum(1)
#array([ 3, 7, 11])
a.sum(1, keepdims=True)
#array([[ 3],
# [ 7],
# [11]])
If you already have it, call .reshape():
>>> a = numpy.array([[1, 2], [3, 4]])
>>> b = a[0]
>>> c = b.reshape((1, -1))
>>> c
array([[1, 2]])
>>> c.shape
(1, 2)
You can also use a range to keep the array two-dimensional in the first place:
>>> b = a[0:1]
>>> b
array([[1, 2]])
>>> b.shape
(1, 2)
Note that all of these will have the same backing store.

Categories

Resources