I have a Numpy matrix and I am looping through every row in the matrix using a for loop and I would like to find the first non-zero value from each row
I found a way to find the first non-zero value on here already but it requires a list as it's argument:
for row in matrix:
val = next((i for i, x in enumerate(row) if x), None)
Which always returned 0 for val
I've also tried converting the row to a list before calculating 'val'
rowList = row.tolist()
But this also returned the same value
When I print either values the output contains 2 brackets around the list, maybe this has an affect?
ie.
[[0, 0, 1, 2, 3]]
This occurs even after I've converted the row to a list
Is there any way I can convert each row to a list so I can then find the index of the first non-zero value or is there another way to do this that is more simple?
Your next expression works:
In [793]: [next((i for i,x in enumerate(row) if x),None) for row in np.eye(10)]
Out[793]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
OK, that gives the index of the first nonzero, but in my sample case that's more interesting that the 1 value.
In [801]: [row.nonzero()[0][0] for row in np.eye(10)]
Out[801]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
But if the array has a row with all 0s, such as in
arr =np.diag(np.arange(0,20,2))
the nonzero version raises an error. It needs to be sensitive to the case where nonzero returns an empty list.
To get values from the idx list use
arr[np.arange(len(idx)), idx]
timings
for a large diagonal array, the nonzero is substantially faster:
In [822]: arr =np.diag(np.arange(1,2000,2))
In [823]: timeit idx = [next((i for i,x in enumerate(row) if x),None) for row in arr]
10 loops, best of 3: 87.6 ms per loop
In [824]: timeit [row.nonzero()[0][0] for row in arr]
100 loops, best of 3: 6.44 ms per loop
for same size array with all the 1s early in the row, the next approach is somewhat faster.
In [825]: arr = np.zeros_like(arr,int)
In [826]: arr[:,10]=1
In [827]: timeit idx = [next((i for i,x in enumerate(row) if x),None) for row in arr]
100 loops, best of 3: 3.61 ms per loop
In [828]: timeit [row.nonzero()[0][0] for row in arr]
100 loops, best of 3: 6.41 ms per loop
There's trade off between short circuiting looping in Python v full looping in C code.
argmax is another way of finding the first nonzero index in each row:
idx = np.argmax(arr>0, axis=1)
With an axis parameter argmax has to iterate by row, and then within the row, but it does so in compiled code. With a boolean argument like this, argmax does short circuit. I've explored this in another question about argmax (or min) and nan values, which also short circuit.
https://stackoverflow.com/a/41324751/901925
Another possibility (channeling #Divakar? )
def foo(arr):
I,J=np.where(arr>0)
u,i=np.unique(I,return_index=True)
return J[i]
You don't need to "convert a numpy array to list", you need to a better way of finding non-zero elements. For that you should use nonzero:
Return the indices of the elements that are non-zero.
And such:
import numpy as np
arr = np.array([0, 0, 9, 2])
print(arr[arr.nonzero()][0])
# 9
Or:
import numpy as np
matrix = np.array([[0, 0, 9, 2], [0, 3, 0, 1]])
for row in matrix:
print(row[row.nonzero()][0])
# 9
# 3
My guess is that like many others before you including myself you have been tripped up by the np.matrix class.
Slicing instances of this class gives unexpected results:
>> id = np.identity(4)
>>> type(id)
<class 'numpy.ndarray'>
>>> id[2]
array([ 0., 0., 1., 0.]) # shape == (4,)
>>> id_m = np.matrix(id)
>> type(id_m)
<class 'numpy.matrixlib.defmatrix.matrix'>
>>> id_m[2]
matrix([[ 0., 0., 1., 0.]]) # shape == (4, 1)
As you suspected this is probably also the reason why your generator trick doesn't work.
Iterating over a row of an np.matrix will because it's nested return the entire row in one go and then stop.
If for some reason you are handling a matrix but would prefer it to behave like an array you can use the .A attribute.
>>> id_m.A
array([[ 1., 0., 0., 0.],
[ 0., 1., 0., 0.],
[ 0., 0., 1., 0.],
[ 0., 0., 0., 1.]])
One last remark:
Do not convert your rows to list here! The point of the generator trick you are using is to stop searching as soon as possible. Imagine your rows have 100,000 elements each and every other is nonzero. The generator will look at the first few and as soon as it has found the first nonzero (almost certainly within the first 50, say) it will skip the rest of the row (> 99,950). If you convert to list you are throwing this saving away, because to generate the equivalent list every single element has to be read. That is also the reason why in this case a generator can compete with vectorised numpy functions.
Related
I am trying to find the both location and the value of the minimum element of a sparse matrix for each row. A toy example for the question is given below:
Here, we have a 3x6 sparse matrix "M".
H = np.array([[1, 2, 3, 0, 4, 0 ,0],
[0, 5, 0, 6, 0, 0 ,0],
[0, 0, 0, 7, 0, 0 ,8], dtype = np.float32)
M = scipy.sparse.csr_matrix(H)
Then, what I would like to obtain is the nonzero minimum elements of each row.
For the example above:
min_elements = some_function(M,axis = 0)
and receiving the return as min_elements = [1,5,7]. The method M.min(axis=0) does not work for my case since the minimum element of each row is zero, therefore, returning an all-zeros array.
Thus, is there any efficient way of implementing such a function in an computationally efficient way using sparse matrix. In my general case, the sparse matrices will be quite huge and requires lots of additional computation. For this reason, the performance/speed is the main benchmark for me.
Thank you!
In [333]: from scipy import sparse
In [334]: M = sparse.csr_matrix(H)
In [335]: M
Out[335]:
<3x7 sparse matrix of type '<class 'numpy.float32'>'
with 8 stored elements in Compressed Sparse Row format>
M is stored as:
In [336]: M.indptr
Out[336]: array([0, 4, 6, 8], dtype=int32)
In [337]: M.data
Out[337]: array([1., 2., 3., 4., 5., 6., 7., 8.], dtype=float32)
In [338]: M.indices
Out[338]: array([0, 1, 2, 4, 1, 3, 3, 6], dtype=int32)
We can iterate on the slices defined by indptr, and take the min:
In [340]: for i in range(M.shape[0]):
...: sl = slice(M.indptr[i],M.indptr[i+1])
...: x, y = M.data[sl], M.indices[sl]
...: m = np.argmin(x)
...: print(y[m], x[m])
...:
0 1.0
1 5.0
3 7.0
This can be streamlined a bit, but it gives the basic idea.
It may be easier to picture what's going on in the lil format:
In [341]: Ml = M.tolil()
In [342]: Ml.data
Out[342]:
array([list([1.0, 2.0, 3.0, 4.0]), list([5.0, 6.0]), list([7.0, 8.0])],
dtype=object)
In [343]: Ml.rows
Out[343]: array([list([0, 1, 2, 4]), list([1, 3]), list([3, 6])], dtype=object)
In [344]: for d,r in zip(Ml.data, Ml.rows):
...: m = np.argmin(d)
...: print(r[m], d[m])
...:
0 1.0
1 5.0
3 7.0
Previous SO have asked for things like the smallest (or largest) N values by row.
Sparse is best for things that can be expressed as some sort of matrix multiplication. That includes row (or column) sums. Even csr indexing is done with matrix multiplication. Other row-by-row operations aren't as easy.
You could flip all your data and find the maximum. This is assuming all your data is positive, as in the example.
M_inv = M.copy()
M_inv.data = 1/M.data
one_over_min_M = M_inv.max(axis=1)
min_M = 1/one_over_min_M.to_array()
On your example I get the output
[[1. ]
[5. ]
[6.9999995]]
There is some horrible numerical error there, but if you're happy to round your answer...
Edit: This approach might be redeemed if you're after the indices and want to do M_inv.argmax(axis=1), otherwise it's probably not the best.
I have the following question. Is there somekind of method with numpy or scipy , which I can use to get an given unsorted array like this
a = np.array([0,0,1,1,4,4,4,4,5,1891,7]) #could be any number here
to something where the numbers are interpolated/mapped , there is no gap between the values and they are in the same order like before?:
[0,0,1,1,2,2,2,2,3,5,4]
EDIT
Is it furthermore possible to swap/shuffle the numbers after the mapping, so that
[0,0,1,1,2,2,2,2,3,5,4]
become something like:
[0,0,3,3,5,5,5,5,4,1,2]
Edit: I'm not sure what the etiquette is here (should this be a separate answer?), but this is actually directly obtainable from np.unique.
>>> u, indices = np.unique(a, return_inverse=True)
>>> indices
array([0, 0, 1, 1, 2, 2, 2, 2, 3, 5, 4])
Original answer: This isn't too hard to do in plain python by building a dictionary of what index each value of the array would map to:
x = np.sort(np.unique(a))
index_dict = {j: i for i, j in enumerate(x)}
[index_dict[i] for i in a]
Seems you need to rank (dense) your array, in which case use scipy.stats.rankdata:
from scipy.stats import rankdata
rankdata(a, 'dense')-1
# array([ 0., 0., 1., 1., 2., 2., 2., 2., 3., 5., 4.])
I have a large 2d numpy array and two 1d arrays that represent x/y indexes within the 2d array. I want to use these 1d arrays to perform an operation on the 2d array.
I can do this with a for loop, but it's very slow when working on a large array. Is there a faster way? I tried using the 1d arrays simply as indexes but that didn't work. See this example:
import numpy as np
# Two example 2d arrays
cnt_a = np.zeros((4,4))
cnt_b = np.zeros((4,4))
# 1d arrays holding x and y indices
xpos = [0,0,1,2,1,2,1,0,0,0,0,1,1,1,2,2,3]
ypos = [3,2,1,1,3,0,1,0,0,1,2,1,2,3,3,2,0]
# This method works, but is very slow for a large array
for i in range(0,len(xpos)):
cnt_a[xpos[i],ypos[i]] = cnt_a[xpos[i],ypos[i]] + 1
# This method is fast, but gives incorrect answer
cnt_b[xpos,ypos] = cnt_b[xpos,ypos]+1
# Print the results
print 'Good:'
print cnt_a
print ''
print 'Bad:'
print cnt_b
The output from this is:
Good:
[[ 2. 1. 2. 1.]
[ 0. 3. 1. 2.]
[ 1. 1. 1. 1.]
[ 1. 0. 0. 0.]]
Bad:
[[ 1. 1. 1. 1.]
[ 0. 1. 1. 1.]
[ 1. 1. 1. 1.]
[ 1. 0. 0. 0.]]
For the cnt_b array numpy is obviously not summing correctly, but I'm unsure how to fix this without resorting to the (v. inefficient) for loop used to calculate cnt_a.
Another approach by using 1D indexing (suggested by #Shai) extended to answer the actual question:
>>> out = np.zeros((4, 4))
>>> idx = np.ravel_multi_index((xpos, ypos), out.shape) # extract 1D indexes
>>> x = np.bincount(idx, minlength=out.size)
>>> out.flat += x
np.bincount calculates how many times each of the index is present in the xpos, ypos and stores them in x.
Or, as suggested by #Divakar:
>>> out.flat += np.bincount(idx, minlength=out.size)
We could compute the linear indices, then accumulate into zeros-initialized output array with np.add.at. Thus, with xpos and ypos as arrays, here's one implementation -
m,n = xpos.max()+1, ypos.max()+1
out = np.zeros((m,n),dtype=int)
np.add.at(out.ravel(), xpos*n+ypos, 1)
Sample run -
In [95]: # 1d arrays holding x and y indices
...: xpos = np.array([0,0,1,2,1,2,1,0,0,0,0,1,1,1,2,2,3])
...: ypos = np.array([3,2,1,1,3,0,1,0,0,1,2,1,2,3,3,2,0])
...:
In [96]: cnt_a = np.zeros((4,4))
In [97]: # This method works, but is very slow for a large array
...: for i in range(0,len(xpos)):
...: cnt_a[xpos[i],ypos[i]] = cnt_a[xpos[i],ypos[i]] + 1
...:
In [98]: m,n = xpos.max()+1, ypos.max()+1
...: out = np.zeros((m,n),dtype=int)
...: np.add.at(out.ravel(), xpos*n+ypos, 1)
...:
In [99]: cnt_a
Out[99]:
array([[ 2., 1., 2., 1.],
[ 0., 3., 1., 2.],
[ 1., 1., 1., 1.],
[ 1., 0., 0., 0.]])
In [100]: out
Out[100]:
array([[2, 1, 2, 1],
[0, 3, 1, 2],
[1, 1, 1, 1],
[1, 0, 0, 0]])
you can iterate on both lists at once, and increment for each couple (if you are not used to it, zip can combine lists)
for x, y in zip(xpos, ypos):
cnt_b[x][y] += 1
But this will be about the same speed as your solution A.
If your lists xpos/ypos are of length n, I don't see how you can update your matrix in less than o(n) since you'll have to check each pair one way or an other.
Other solution: you could count (with collections.Counter possibly) the similar index pairs (ex: (0, 3) etc...) and update the matrix with the count value. But I doubt it would be much faster, since you the time gained on updating the matrix would be lost on counting multiple occurrences.
Maybe I am totally wrong tho, in which case I'd be curious too to see a not o(n) answer
I think you are looking for ravel_multi_index funciton
lidx = np.ravel_multi_index((xpos, ypos), cnt_a.shape)
converts to "flatten" 1D indices into cnt_a and cnt_b:
np.add.at( cnt_b, lidx, 1 )
I am trying to fill an empty(not np.empty!) array with values using append but I am gettin error:
My code is as follows:
import numpy as np
result=np.asarray([np.asarray([]),np.asarray([])])
result[0]=np.append([result[0]],[1,2])
And I am getting:
ValueError: could not broadcast input array from shape (2) into shape (0)
I might understand the question incorrectly, but if you want to declare an array of a certain shape but with nothing inside, the following might be helpful:
Initialise empty array:
>>> a = np.zeros((0,3)) #or np.empty((0,3)) or np.array([]).reshape(0,3)
>>> a
array([], shape=(0, 3), dtype=float64)
Now you can use this array to append rows of similar shape to it. Remember that a numpy array is immutable, so a new array is created for each iteration:
>>> for i in range(3):
... a = np.vstack([a, [i,i,i]])
...
>>> a
array([[ 0., 0., 0.],
[ 1., 1., 1.],
[ 2., 2., 2.]])
np.vstack and np.hstack is the most common method for combining numpy arrays, but coming from Matlab I prefer np.r_ and np.c_:
Concatenate 1d:
>>> a = np.zeros(0)
>>> for i in range(3):
... a = np.r_[a, [i, i, i]]
...
>>> a
array([ 0., 0., 0., 1., 1., 1., 2., 2., 2.])
Concatenate rows:
>>> a = np.zeros((0,3))
>>> for i in range(3):
... a = np.r_[a, [[i,i,i]]]
...
>>> a
array([[ 0., 0., 0.],
[ 1., 1., 1.],
[ 2., 2., 2.]])
Concatenate columns:
>>> a = np.zeros((3,0))
>>> for i in range(3):
... a = np.c_[a, [[i],[i],[i]]]
...
>>> a
array([[ 0., 1., 2.],
[ 0., 1., 2.],
[ 0., 1., 2.]])
numpy.append is pretty different from list.append in python. I know that's thrown off a few programers new to numpy. numpy.append is more like concatenate, it makes a new array and fills it with the values from the old array and the new value(s) to be appended. For example:
import numpy
old = numpy.array([1, 2, 3, 4])
new = numpy.append(old, 5)
print old
# [1, 2, 3, 4]
print new
# [1, 2, 3, 4, 5]
new = numpy.append(new, [6, 7])
print new
# [1, 2, 3, 4, 5, 6, 7]
I think you might be able to achieve your goal by doing something like:
result = numpy.zeros((10,))
result[0:2] = [1, 2]
# Or
result = numpy.zeros((10, 2))
result[0, :] = [1, 2]
Update:
If you need to create a numpy array using loop, and you don't know ahead of time what the final size of the array will be, you can do something like:
import numpy as np
a = np.array([0., 1.])
b = np.array([2., 3.])
temp = []
while True:
rnd = random.randint(0, 100)
if rnd > 50:
temp.append(a)
else:
temp.append(b)
if rnd == 0:
break
result = np.array(temp)
In my example result will be an (N, 2) array, where N is the number of times the loop ran, but obviously you can adjust it to your needs.
new update
The error you're seeing has nothing to do with types, it has to do with the shape of the numpy arrays you're trying to concatenate. If you do np.append(a, b) the shapes of a and b need to match. If you append an (2, n) and (n,) you'll get a (3, n) array. Your code is trying to append a (1, 0) to a (2,). Those shapes don't match so you get an error.
This error arise from the fact that you are trying to define an object of shape (0,) as an object of shape (2,). If you append what you want without forcing it to be equal to result[0] there is no any issue:
b = np.append([result[0]], [1,2])
But when you define result[0] = b you are equating objects of different shapes, and you can not do this. What are you trying to do?
Here's the result of running your code in Ipython. Note that result is a (2,0) array, 2 rows, 0 columns, 0 elements. The append produces a (2,) array. result[0] is (0,) array. Your error message has to do with trying to assign that 2 item array into a size 0 slot. Since result is dtype=float64, only scalars can be assigned to its elements.
In [65]: result=np.asarray([np.asarray([]),np.asarray([])])
In [66]: result
Out[66]: array([], shape=(2, 0), dtype=float64)
In [67]: result[0]
Out[67]: array([], dtype=float64)
In [68]: np.append(result[0],[1,2])
Out[68]: array([ 1., 2.])
np.array is not a Python list. All elements of an array are the same type (as specified by the dtype). Notice also that result is not an array of arrays.
Result could also have been built as
ll = [[],[]]
result = np.array(ll)
while
ll[0] = [1,2]
# ll = [[1,2],[]]
the same is not true for result.
np.zeros((2,0)) also produces your result.
Actually there's another quirk to result.
result[0] = 1
does not change the values of result. It accepts the assignment, but since it has 0 columns, there is no place to put the 1. This assignment would work in result was created as np.zeros((2,1)). But that still can't accept a list.
But if result has 2 columns, then you can assign a 2 element list to one of its rows.
result = np.zeros((2,2))
result[0] # == [0,0]
result[0] = [1,2]
What exactly do you want result to look like after the append operation?
numpy.append always copies the array before appending the new values. Your code is equivalent to the following:
import numpy as np
result = np.zeros((2,0))
new_result = np.append([result[0]],[1,2])
result[0] = new_result # ERROR: has shape (2,0), new_result has shape (2,)
Perhaps you mean to do this?
import numpy as np
result = np.zeros((2,0))
result = np.append([result[0]],[1,2])
SO thread 'Multiply two arrays element wise, where one of the arrays has arrays as elements' has an example of constructing an array from arrays. If the subarrays are the same size, numpy makes a 2d array. But if they differ in length, it makes an array with dtype=object, and the subarrays retain their identity.
Following that, you could do something like this:
In [5]: result=np.array([np.zeros((1)),np.zeros((2))])
In [6]: result
Out[6]: array([array([ 0.]), array([ 0., 0.])], dtype=object)
In [7]: np.append([result[0]],[1,2])
Out[7]: array([ 0., 1., 2.])
In [8]: result[0]
Out[8]: array([ 0.])
In [9]: result[0]=np.append([result[0]],[1,2])
In [10]: result
Out[10]: array([array([ 0., 1., 2.]), array([ 0., 0.])], dtype=object)
However, I don't offhand see what advantages this has over a pure Python list or lists. It does not work like a 2d array. For example I have to use result[0][1], not result[0,1]. If the subarrays are all the same length, I have to use np.array(result.tolist()) to produce a 2d array.
Are there good ways to "expand" a numpy ndarray? Say I have an ndarray like this:
[[1 2]
[3 4]]
And I want each row to contains more elements by filling zeros:
[[1 2 0 0 0]
[3 4 0 0 0]]
I know there must be some brute-force ways to do so (say construct a bigger array with zeros then copy elements from old smaller arrays), just wondering are there pythonic ways to do so. Tried numpy.reshape but didn't work:
import numpy as np
a = np.array([[1, 2], [3, 4]])
np.reshape(a, (2, 5))
Numpy complains that: ValueError: total size of new array must be unchanged
You can use numpy.pad, as follows:
>>> import numpy as np
>>> a=[[1,2],[3,4]]
>>> np.pad(a, ((0,0),(0,3)), mode='constant', constant_values=0)
array([[1, 2, 0, 0, 0],
[3, 4, 0, 0, 0]])
Here np.pad says, "Take the array a and add 0 rows above it, 0 rows below it, 0 columns to the left of it, and 3 columns to the right of it. Fill these columns with a constant specified by constant_values".
There are the index tricks r_ and c_.
>>> import numpy as np
>>> a = np.array([[1, 2], [3, 4]])
>>> z = np.zeros((2, 3), dtype=a.dtype)
>>> np.c_[a, z]
array([[1, 2, 0, 0, 0],
[3, 4, 0, 0, 0]])
If this is performance critical code, you might prefer to use the equivalent np.concatenate rather than the index tricks.
>>> np.concatenate((a,z), axis=1)
array([[1, 2, 0, 0, 0],
[3, 4, 0, 0, 0]])
There are also np.resize and np.ndarray.resize, but they have some limitations (due to the way numpy lays out data in memory) so read the docstring on those ones. You will probably find that simply concatenating is better.
By the way, when I've needed to do this I usually just do it the basic way you've already mentioned (create an array of zeros and assign the smaller array inside it), I don't see anything wrong with that!
Just to be clear: there's no "good" way to extend a NumPy array, as NumPy arrays are not expandable. Once the array is defined, the space it occupies in memory, a combination of the number of its elements and the size of each element, is fixed and cannot be changed. The only thing you can do is to create a new array and replace some of its elements by the elements of the original array.
A lot of functions are available for convenience (the np.concatenate function and its np.*stack shortcuts, the np.column_stack, the indexes routines np.r_ and np.c_...), but there are just that: convenience functions. Some of them are optimized at the C level (the np.concatenate and others, I think), some are not.
Note that there's nothing at all with your initial suggestion of creating a large array 'by hand' (possibly filled with zeros) and filling it yourself with your initial array. It might be more readable that more complicated solutions.
A simple way:
# what you want to expand
x = np.ones((3, 3))
# expand to what shape
target = np.zeros((6, 6))
# do expand
target[:x.shape[0], :x.shape[1]] = x
# print target
array([[ 1., 1., 1., 0., 0., 0.],
[ 1., 1., 1., 0., 0., 0.],
[ 1., 1., 1., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0.]])
Functional way:
borrow from https://stackoverflow.com/a/35751427/1637673, with a little modification.
def pad(array, reference_shape, offsets=None):
"""
array: Array to be padded
reference_shape: tuple of size of narray to create
offsets: list of offsets (number of elements must be equal to the dimension of the array)
will throw a ValueError if offsets is too big and the reference_shape cannot handle the offsets
"""
if not offsets:
offsets = np.zeros(array.ndim, dtype=np.int32)
# Create an array of zeros with the reference shape
result = np.zeros(reference_shape, dtype=np.float32)
# Create a list of slices from offset to offset + shape in each dimension
insertHere = [slice(offsets[dim], offsets[dim] + array.shape[dim]) for dim in range(array.ndim)]
# Insert the array in the result at the specified offsets
result[insertHere] = array
return result
You should use np.column_stack or append
import numpy as np
p = np.array([ [1,2] , [3,4] ])
p = np.column_stack( [ p , [ 0 , 0 ],[0,0] ] )
p
Out[277]:
array([[1, 2, 0, 0],
[3, 4, 0, 0]])
Append seems to be faster though:
timeit np.column_stack( [ p , [ 0 , 0 ],[0,0] ] )
10000 loops, best of 3: 61.8 us per loop
timeit np.append(p, [[0,0],[0,0]],1)
10000 loops, best of 3: 48 us per loop
And a comparison with np.c_ and np.hstack [append still seems to be the fastest]:
In [295]: z=np.zeros((2, 2), dtype=a.dtype)
In [296]: timeit np.c_[a, z]
10000 loops, best of 3: 47.2 us per loop
In [297]: timeit np.append(p, z,1)
100000 loops, best of 3: 13.1 us per loop
In [305]: timeit np.hstack((p,z))
10000 loops, best of 3: 20.8 us per loop
and np.concatenate [that is a even a bit faster than append]:
In [307]: timeit np.concatenate((p, z), axis=1)
100000 loops, best of 3: 11.6 us per loop
there are also similar methods like np.vstack, np.hstack, np.dstack. I like these over np.concatente as it makes it clear what dimension is being "expanded".
temp = np.array([[1, 2], [3, 4]])
np.hstack((temp, np.zeros((2,3))))
it's easy to remember becase numpy's first axis is vertical so vstack expands the first axis and 2nd axis is horizontal so hstack.