Visualising a numpy array/matrix - python

I have a function that returns a list. I think I use np.append to add this list as a new line in an array, my intention is as follow:
list = 4 5 6
b = 1 2 3
b = np.append(b, list)
output;
1 2 3
4 5 6
This isn't the code I use (there's a lot of messing around in between). But the output I get is this:
2016-06-01 PRINT [ 99.86 99.928 99.9 99.875 99.8 89.7933
97.60018333 98.903 99.928 0.2801201 98.95 98.93
98.87 98.94 99.05 89.097 97.6712 98.87
99.59 0.23538903 99.711 99.732 99.725 99.724
99.769 89.777 98.12053333 99.68 99.88
0.30333219 99.805 99.79 99.743 99.71 99.69
89.7728 98.06653333 99.617 99.82 0.28981292
99.882 99.879 99.865 99.84 99.9 89.9206
98.29823333 99.82 100.08 0.31420778]
Is this a 10 column by 5 row array/matrix or is this a 50 column/row array? I feel like I'm missing something here - or is it just that the output doesn't really show the shape of the array?

True list append:
In [701]: alist = [4,5,6]
In [702]: b=[1,2,3]
In [703]: b.append(alist)
In [704]: b
Out[704]: [1, 2, 3, [4, 5, 6]]
bad array operation:
In [705]: anArray=np.array([4,5,6])
In [706]: b=np.array([1,2,3])
In [707]: b=np.append(b,anArray)
In [708]: b
Out[708]: array([1, 2, 3, 4, 5, 6])
In [709]: b.shape
Out[709]: (6,)
Here I just concatenated anArray onto b, making a longer array.
I've said this before - np.append is not a good function. It looks too much like the list append, and people end up misusing it. Either they miss the fact that it returns a new array, as opposed to modifying in-place. Or they use it repeatedly.
Here's the preferred way of collecting lists or arrays and joining them into one
In [710]: alist = []
In [711]: b=np.array([1,2,3]) # could be b=[1,2,3]
In [712]: alist.append(b)
In [713]: b=np.array([4,5,6]) # b=[4,5,6]
In [714]: alist.append(b)
In [715]: alist
Out[715]: [array([1, 2, 3]), array([4, 5, 6])]
In [716]: np.array(alist)
Out[716]:
array([[1, 2, 3],
[4, 5, 6]])
In [717]: _.shape
Out[717]: (2, 3)
The result is a 2d array. List append is much faster than array append (which is real array concatenate). Build the list and then make the array.
The most common way of defining a 2d array is with a list of lists:
In [718]: np.array([[1,2,3],[4,5,6]])
Out[718]:
array([[1, 2, 3],
[4, 5, 6]])
np.concatenate is another option for joining arrays and lists. If gives more control over how they are joined, but you have to pay attention to the dimensions of the inputs (you should pay attention to those anyways).
There are several 'stack' functions which streamline the dimension handling a bit, stack, hstack, vstack and yes, append. It's worth looking at their code.

you should use hstack or vstack
import numpy as np
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
np.vstack((a,b))
gives
array([[1, 2, 3],
[4, 5, 6]])
or
np.hstack((a,b))
gives
array([1, 2, 3, 4, 5, 6])

Related

Create numpy array within numpy array

I want to create a numpy array within a numpy array. If i do it with normal python its something like
a = [[1,2], [3,4]]
a[0][1] = [1,1,1]
print a
The result is [[1, [1, 1, 1]], [3, 4]]
How can I achieve the same using numpy arrays? The code I have is:
a = np.array([(1, 2, 3),(4, 5, 6)])
b = np.array([1,1,1])
a[0][1] = b
a as created is dtype int. Each element can only be another integer:
In [758]: a = np.array([(1, 2, 3),(4, 5, 6)])
...: b = np.array([1,1,1])
...:
In [759]: a
Out[759]:
array([[1, 2, 3],
[4, 5, 6]])
In [760]: b
Out[760]: array([1, 1, 1])
In [761]: a[0,1]=b
...
ValueError: setting an array element with a sequence.
You can make another dtype of array, one that holds pointers to objects, much as list does:
In [762]: aO = a.astype(object)
In [763]: aO
Out[763]:
array([[1, 2, 3],
[4, 5, 6]], dtype=object)
Now it is possible to replace one of those element pointers with a pointer to b array:
In [765]: aO[0,1]=b
In [766]: aO
Out[766]:
array([[1, array([1, 1, 1]), 3],
[4, 5, 6]], dtype=object)
But as asked in the comments - why do you want/need to do this? What are you going to do with such an array? It is possible to do some numpy math on such an array, but as shown in some recent SO questions, it is hit-or-miss. It is also slower.
As far as I know, you cannot do this. Numpy arrays cannot have entries of varying shape. Your request to make an array like [[1, [1, 1, 1]], [3, 4]] is impossible. However, you could make a numpy matrix of dimensions (3x2x3) to get
[
[
[1,0,0],
[1,1,1],
[0,0,0],
]
[
[3,0,0],
[4,0,0],
[0,0,0]
]
]
Your only option is to pad empty elements with some number (I used 0s above) or use another data structure.

Selecting specific groups of rows from numpy array [duplicate]

Given:
test = numpy.array([[1, 2], [3, 4], [5, 6]])
test[i] gives the ith row (e.g. [1, 2]). How do I access the ith column? (e.g. [1, 3, 5]). Also, would this be an expensive operation?
To access column 0:
>>> test[:, 0]
array([1, 3, 5])
To access row 0:
>>> test[0, :]
array([1, 2])
This is covered in Section 1.4 (Indexing) of the NumPy reference. This is quick, at least in my experience. It's certainly much quicker than accessing each element in a loop.
>>> test[:,0]
array([1, 3, 5])
this command gives you a row vector, if you just want to loop over it, it's fine, but if you want to hstack with some other array with dimension 3xN, you will have
ValueError: all the input arrays must have same number of dimensions
while
>>> test[:,[0]]
array([[1],
[3],
[5]])
gives you a column vector, so that you can do concatenate or hstack operation.
e.g.
>>> np.hstack((test, test[:,[0]]))
array([[1, 2, 1],
[3, 4, 3],
[5, 6, 5]])
And if you want to access more than one column at a time you could do:
>>> test = np.arange(9).reshape((3,3))
>>> test
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> test[:,[0,2]]
array([[0, 2],
[3, 5],
[6, 8]])
You could also transpose and return a row:
In [4]: test.T[0]
Out[4]: array([1, 3, 5])
Although the question has been answered, let me mention some nuances.
Let's say you are interested in the first column of the array
arr = numpy.array([[1, 2],
[3, 4],
[5, 6]])
As you already know from other answers, to get it in the form of "row vector" (array of shape (3,)), you use slicing:
arr_col1_view = arr[:, 1] # creates a view of the 1st column of the arr
arr_col1_copy = arr[:, 1].copy() # creates a copy of the 1st column of the arr
To check if an array is a view or a copy of another array you can do the following:
arr_col1_view.base is arr # True
arr_col1_copy.base is arr # False
see ndarray.base.
Besides the obvious difference between the two (modifying arr_col1_view will affect the arr), the number of byte-steps for traversing each of them is different:
arr_col1_view.strides[0] # 8 bytes
arr_col1_copy.strides[0] # 4 bytes
see strides and this answer.
Why is this important? Imagine that you have a very big array A instead of the arr:
A = np.random.randint(2, size=(10000, 10000), dtype='int32')
A_col1_view = A[:, 1]
A_col1_copy = A[:, 1].copy()
and you want to compute the sum of all the elements of the first column, i.e. A_col1_view.sum() or A_col1_copy.sum(). Using the copied version is much faster:
%timeit A_col1_view.sum() # ~248 µs
%timeit A_col1_copy.sum() # ~12.8 µs
This is due to the different number of strides mentioned before:
A_col1_view.strides[0] # 40000 bytes
A_col1_copy.strides[0] # 4 bytes
Although it might seem that using column copies is better, it is not always true for the reason that making a copy takes time too and uses more memory (in this case it took me approx. 200 µs to create the A_col1_copy). However if we needed the copy in the first place, or we need to do many different operations on a specific column of the array and we are ok with sacrificing memory for speed, then making a copy is the way to go.
In the case we are interested in working mostly with columns, it could be a good idea to create our array in column-major ('F') order instead of the row-major ('C') order (which is the default), and then do the slicing as before to get a column without copying it:
A = np.asfortranarray(A) # or np.array(A, order='F')
A_col1_view = A[:, 1]
A_col1_view.strides[0] # 4 bytes
%timeit A_col1_view.sum() # ~12.6 µs vs ~248 µs
Now, performing the sum operation (or any other) on a column-view is as fast as performing it on a column copy.
Finally let me note that transposing an array and using row-slicing is the same as using the column-slicing on the original array, because transposing is done by just swapping the shape and the strides of the original array.
A[:, 1].strides[0] # 40000 bytes
A.T[1, :].strides[0] # 40000 bytes
To get several and indepent columns, just:
> test[:,[0,2]]
you will get colums 0 and 2
>>> test
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
>>> ncol = test.shape[1]
>>> ncol
5L
Then you can select the 2nd - 4th column this way:
>>> test[0:, 1:(ncol - 1)]
array([[1, 2, 3],
[6, 7, 8]])
This is not multidimensional. It is 2 dimensional array. where you want to access the columns you wish.
test = numpy.array([[1, 2], [3, 4], [5, 6]])
test[:, a:b] # you can provide index in place of a and b

Maintaining shape of output as of input after Boolean indexing in python

I want help in the following problem, plz.
Suppose X = [1 3 0 8
1 4 6 0
2 0 7 8 ]
mask = (X != 0)
mask = [ T T F T
T T T F
T F T T]
X1 = X[(mask,np.newaxis)]
Its output X1 is of shape (9,1)
But i want X1 to be of (3,3), i.e., maintaining the same shape as of X except the masked entries.
X1 = [1 3 8
1 4 6
2 7 8 ]
Can someone help me plz? Thank you.
Every row of X will contain a zero and I don't want to use reshape(). Here is the working
X= np.array([[1,3,0,8],[1,4,6,0],[2,0,7,8]])
mask = (X!=0)
X1=X[(mask,np.newaxis)]
The output X is of shape (9,1). Is there any way that X1 be of (3,3) as mentioned.
I think you might want to start on something easier in python, since your question doesn't even contain correct syntax. I'm hoping this was just a psuedocode attempt. However, here's some code to do the mask you desire.
import numpy as np
X = np.array([1, 3, 0, 8, 1, 4, 6, 0, 2, 0, 7, 8])
indicies_we_want = np.where(X > 0) # Results in an array containing the indicies of X we want to keep
result = np.take(X, indicies_we_want) # Filter by these indicies
result = result.reshape(3, 3) # Reshape to desired result
print result
This code could be condensed considerably, but I wanted to show each step as you have in your question for clarity.
As pointed out in the comments section, the reshape typically isn't a good idea unless you somehow know after filtering out 0s that you'll be left with 9 elements. In the case you described, we certainly know this, but for a given array, not so much.
In [173]: x=[[1,3,0,8],[1,4,6,0],[2,0,7,8]]
In [174]: xa=np.array(x)
solution with reshape:
In [175]: xa[xa!=0].reshape(3,3)
Out[175]:
array([[1, 3, 8],
[1, 4, 6],
[2, 7, 8]])
a solution without reshape:
In [176]: np.array([i[i!=0] for i in xa])
Out[176]:
array([[1, 3, 8],
[1, 4, 6],
[2, 7, 8]])
Obviously both depend on there being only one deletion per row.
You aren't deleting a common column; nothing in your code tells the underlying numpy that the result will be reshapeable. So boolean indexing operates on the flattened array.
In [177]: xa[xa!=0]
Out[177]: array([1, 3, 8, 1, 4, 6, 2, 7, 8])
In [178]: xa.flat[xa.flat!=0]
Out[178]: array([1, 3, 8, 1, 4, 6, 2, 7, 8])
I could throw in an extra 0, and this indexing would still work the same; but the efforts to reshape it to 3x3 will fail.
Keep in mind that the underlying data buffer is flat, 1d, and that it only displays as 2d because of the shape and striding attributes. Selecting elements (or skipping some) will produce a copy, and a 1d copy is just as easy, even faster, than a 2d one. reshape doesn't change the data buffer, just the shape attribute.

Vectorize np.arange or equivalent

I have a long 1D array. I'd like to create an array that is the result of np.arange() applied to each value in the array plus some constant. E.g if the constant = 3 and my array looks like
[1,2,3,4,5]
I'd like to get
[[1,2,3]
[2,3,4]
[3,4,5]
[4,5,6]
[5,6,7]]
np.arange() only accepts scalars as arguments. I played around with np.vectorize() a bit to no success. Clearly I could do this with a loop, or with lists and then convert to an array, but I was wondering if there's a good numpy-only solution.
You could use addition and broadcasting:
>>> x = np.array([1,2,3,4,5])
>>> constant = 3
>>> x[:,None] + np.arange(constant)
array([[1, 2, 3],
[2, 3, 4],
[3, 4, 5],
[4, 5, 6],
[5, 6, 7]])
This could also be written as np.add.outer(x, np.arange(constant)).

Concatenate two NumPy arrays vertically

I tried the following:
>>> a = np.array([1,2,3])
>>> b = np.array([4,5,6])
>>> np.concatenate((a,b), axis=0)
array([1, 2, 3, 4, 5, 6])
>>> np.concatenate((a,b), axis=1)
array([1, 2, 3, 4, 5, 6])
However, I'd expect at least that one result looks like this
array([[1, 2, 3],
[4, 5, 6]])
Why is it not concatenated vertically?
Because both a and b have only one axis, as their shape is (3), and the axis parameter specifically refers to the axis of the elements to concatenate.
this example should clarify what concatenate is doing with axis. Take two vectors with two axis, with shape (2,3):
a = np.array([[1,5,9], [2,6,10]])
b = np.array([[3,7,11], [4,8,12]])
concatenates along the 1st axis (rows of the 1st, then rows of the 2nd):
np.concatenate((a,b), axis=0)
array([[ 1, 5, 9],
[ 2, 6, 10],
[ 3, 7, 11],
[ 4, 8, 12]])
concatenates along the 2nd axis (columns of the 1st, then columns of the 2nd):
np.concatenate((a, b), axis=1)
array([[ 1, 5, 9, 3, 7, 11],
[ 2, 6, 10, 4, 8, 12]])
to obtain the output you presented, you can use vstack
a = np.array([1,2,3])
b = np.array([4,5,6])
np.vstack((a, b))
array([[1, 2, 3],
[4, 5, 6]])
You can still do it with concatenate, but you need to reshape them first:
np.concatenate((a.reshape(1,3), b.reshape(1,3)))
array([[1, 2, 3],
[4, 5, 6]])
Finally, as proposed in the comments, one way to reshape them is to use newaxis:
np.concatenate((a[np.newaxis,:], b[np.newaxis,:]))
If the actual problem at hand is to concatenate two 1-D arrays vertically, and we are not fixated on using concatenate to perform this operation, I would suggest the use of np.column_stack:
In []: a = np.array([1,2,3])
In []: b = np.array([4,5,6])
In []: np.column_stack((a, b))
array([[1, 4],
[2, 5],
[3, 6]])
A not well known feature of numpy is to use r_. This is a simple way to build up arrays quickly:
import numpy as np
a = np.array([1,2,3])
b = np.array([4,5,6])
c = np.r_[a[None,:],b[None,:]]
print(c)
#[[1 2 3]
# [4 5 6]]
The purpose of a[None,:] is to add an axis to array a.
a = np.array([1,2,3])
b = np.array([4,5,6])
np.array((a,b))
works just as well as
np.array([[1,2,3], [4,5,6]])
Regardless of whether it is a list of lists or a list of 1d arrays, np.array tries to create a 2d array.
But it's also a good idea to understand how np.concatenate and its family of stack functions work. In this context concatenate needs a list of 2d arrays (or any anything that np.array will turn into a 2d array) as inputs.
np.vstack first loops though the inputs making sure they are at least 2d, then does concatenate. Functionally it's the same as expanding the dimensions of the arrays yourself.
np.stack is a new function that joins the arrays on a new dimension. Default behaves just like np.array.
Look at the code for these functions. If written in Python you can learn quite a bit. For vstack:
return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
Suppose you have 3 NumPy arrays (A, B, C). You can contact these arrays vertically like this:
import numpy as np
np.concatenate((A, B, C), axis=1)
np.shape

Categories

Resources