Shortest Syntax To Use numpy 1d-array As sklearn X - python

I often have two numpy 1d arrays, x and y, and would like to perform some quick sklearn fitting + prediction using them.
import numpy as np
from sklearn import linear_model
# This is an example for the 1d aspect - it's obtained from something else.
x = np.array([1, 3, 2, ...])
y = np.array([12, 32, 4, ...])
Now I'd like to do something like
linear_model.LinearRegression().fit(x, y)...
The problem is that it expects an X which is a 2d column array. For this reason, I usually feed it
x.reshape((len(x), 1))
which I find cumbersome and hard to read.
Is there some shorter way to transform a 1d array to a 2d column array (or, alternatively, get sklearn to accept 1d arrays)?

You can slice your array, creating a newaxis:
x[:, None]
This:
>>> x = np.arange(5)
>>> x[:, None]
array([[0],
[1],
[2],
[3],
[4]])
Is equivalent to:
>>> x.reshape(len(x), 1)
array([[0],
[1],
[2],
[3],
[4]])
If you find it more readable, you can use a transposed matrix:
np.matrix(x).T
If you want an array:
np.matrix(x).T.A

Related

Python Numpy Transpose Matrix [duplicate]

I use Python and NumPy and have some problems with "transpose":
import numpy as np
a = np.array([5,4])
print(a)
print(a.T)
Invoking a.T is not transposing the array. If a is for example [[],[]] then it transposes correctly, but I need the transpose of [...,...,...].
It's working exactly as it's supposed to. The transpose of a 1D array is still a 1D array! (If you're used to matlab, it fundamentally doesn't have a concept of a 1D array. Matlab's "1D" arrays are 2D.)
If you want to turn your 1D vector into a 2D array and then transpose it, just slice it with np.newaxis (or None, they're the same, newaxis is just more readable).
import numpy as np
a = np.array([5,4])[np.newaxis]
print(a)
print(a.T)
Generally speaking though, you don't ever need to worry about this. Adding the extra dimension is usually not what you want, if you're just doing it out of habit. Numpy will automatically broadcast a 1D array when doing various calculations. There's usually no need to distinguish between a row vector and a column vector (neither of which are vectors. They're both 2D!) when you just want a vector.
Use two bracket pairs instead of one. This creates a 2D array, which can be transposed, unlike the 1D array you create if you use one bracket pair.
import numpy as np
a = np.array([[5, 4]])
a.T
More thorough example:
>>> a = [3,6,9]
>>> b = np.array(a)
>>> b.T
array([3, 6, 9]) #Here it didn't transpose because 'a' is 1 dimensional
>>> b = np.array([a])
>>> b.T
array([[3], #Here it did transpose because a is 2 dimensional
[6],
[9]])
Use numpy's shape method to see what is going on here:
>>> b = np.array([10,20,30])
>>> b.shape
(3,)
>>> b = np.array([[10,20,30]])
>>> b.shape
(1, 3)
For 1D arrays:
a = np.array([1, 2, 3, 4])
a = a.reshape((-1, 1)) # <--- THIS IS IT
print a
array([[1],
[2],
[3],
[4]])
Once you understand that -1 here means "as many rows as needed", I find this to be the most readable way of "transposing" an array. If your array is of higher dimensionality simply use a.T.
You can convert an existing vector into a matrix by wrapping it in an extra set of square brackets...
from numpy import *
v=array([5,4]) ## create a numpy vector
array([v]).T ## transpose a vector into a matrix
numpy also has a matrix class (see array vs. matrix)...
matrix(v).T ## transpose a vector into a matrix
numpy 1D array --> column/row matrix:
>>> a=np.array([1,2,4])
>>> a[:, None] # col
array([[1],
[2],
[4]])
>>> a[None, :] # row, or faster `a[None]`
array([[1, 2, 4]])
And as #joe-kington said, you can replace None with np.newaxis for readability.
To 'transpose' a 1d array to a 2d column, you can use numpy.vstack:
>>> numpy.vstack(numpy.array([1,2,3]))
array([[1],
[2],
[3]])
It also works for vanilla lists:
>>> numpy.vstack([1,2,3])
array([[1],
[2],
[3]])
instead use arr[:,None] to create column vector
You can only transpose a 2D array. You can use numpy.matrix to create a 2D array. This is three years late, but I am just adding to the possible set of solutions:
import numpy as np
m = np.matrix([2, 3])
m.T
Basically what the transpose function does is to swap the shape and strides of the array:
>>> a = np.ones((1,2,3))
>>> a.shape
(1, 2, 3)
>>> a.T.shape
(3, 2, 1)
>>> a.strides
(48, 24, 8)
>>> a.T.strides
(8, 24, 48)
In case of 1D numpy array (rank-1 array) the shape and strides are 1-element tuples and cannot be swapped, and the transpose of such an 1D array returns it unchanged. Instead, you can transpose a "row-vector" (numpy array of shape (1, n)) into a "column-vector" (numpy array of shape (n, 1)). To achieve this you have to first convert your 1D numpy array into row-vector and then swap the shape and strides (transpose it). Below is a function that does it:
from numpy.lib.stride_tricks import as_strided
def transpose(a):
a = np.atleast_2d(a)
return as_strided(a, shape=a.shape[::-1], strides=a.strides[::-1])
Example:
>>> a = np.arange(3)
>>> a
array([0, 1, 2])
>>> transpose(a)
array([[0],
[1],
[2]])
>>> a = np.arange(1, 7).reshape(2,3)
>>> a
array([[1, 2, 3],
[4, 5, 6]])
>>> transpose(a)
array([[1, 4],
[2, 5],
[3, 6]])
Of course you don't have to do it this way since you have a 1D array and you can directly reshape it into (n, 1) array by a.reshape((-1, 1)) or a[:, None]. I just wanted to demonstrate how transposing an array works.
Another solution.... :-)
import numpy as np
a = [1,2,4]
[1, 2, 4]
b = np.array([a]).T
array([[1],
[2],
[4]])
The name of the function in numpy is column_stack.
>>>a=np.array([5,4])
>>>np.column_stack(a)
array([[5, 4]])
I am just consolidating the above post, hope it will help others to save some time:
The below array has (2, )dimension, it's a 1-D array,
b_new = np.array([2j, 3j])
There are two ways to transpose a 1-D array:
slice it with "np.newaxis" or none.!
print(b_new[np.newaxis].T.shape)
print(b_new[None].T.shape)
other way of writing, the above without T operation.!
print(b_new[:, np.newaxis].shape)
print(b_new[:, None].shape)
Wrapping [ ] or using np.matrix, means adding a new dimension.!
print(np.array([b_new]).T.shape)
print(np.matrix(b_new).T.shape)
There is a method not described in the answers but described in the documentation for the numpy.ndarray.transpose method:
For a 1-D array this has no effect, as a transposed vector is simply the same vector. To convert a 1-D array into a 2D column vector, an additional dimension must be added. np.atleast2d(a).T achieves this, as does a[:, np.newaxis].
One can do:
import numpy as np
a = np.array([5,4])
print(a)
print(np.atleast_2d(a).T)
Which (imo) is nicer than using newaxis.
As some of the comments above mentioned, the transpose of 1D arrays are 1D arrays, so one way to transpose a 1D array would be to convert the array to a matrix like so:
np.transpose(a.reshape(len(a), 1))
To transpose a 1-D array (flat array) as you have in your example, you can use the np.expand_dims() function:
>>> a = np.expand_dims(np.array([5, 4]), axis=1)
array([[5],
[4]])
np.expand_dims() will add a dimension to the chosen axis. In this case, we use axis=1, which adds a column dimension, effectively transposing your original flat array.

Axis interpretation in 3D Numpy arrays

I am using the following example to understand the working of axis in 3D arrays in Numpy.
a = np.array([[9],[9],[8]])
b = np.array([[1],[4],[6]])
print(np.stack([a,b],axis=0)
>>>
array([[[9],
[9],
[8]],
[[1],
[4],
[6]]])
print(np.stack([a,b],axis=1)
>>>
array([[[9],
[1]],
[[9],
[4]],
[[8],
[6]]])
print(np.stack([a,b],axis=2)
>>>
array([[[9, 1]],
[[9, 4]],
[[8, 6]]])
I am able to understand how axis=0 and axis=1 work. Can anyone explain how axis=2 works with pictorial representation as it is done for 2D arrays?
For reference
print(np.stack([a,b],axis=0.shape) #(2,3,1)
print(np.stack([a,b],axis=1.shape) #(3,2,1)
print(np.stack([a,b],axis=2.shape) #(3,1,2)
You can see 3D numpy arrays as data cube.
Let's suppose we have an np.array A.
(z, y, x) = np.shape(A). You notice that the z dimensions corresponds to the indices 0.
Your array A is simply z 2d array of dimensions (y, x) that you stack together.
It explains why A[0,:,:] is a 2d array.
Axis = 2 simply points out that you consider 2d arrays stacked on the z direction!

Add lists of numpy arrays element-wise

I've been working on an algorithm for backpropagation in neural networks. My program calculates the partial derivative of each weight with respect to the loss function, and stores it in an array. The weights at each layer are stored in a single 2d numpy array, and so the partial derivatives are stored as an array of numpy arrays, where each numpy array has a different size depending on the number of neurons in each layer.
When I want to average the array of partial derivatives after a number of training data has been used, I want to add each array together and divide by the number of arrays. Currently, I just iterate through each array and add each element together, but is there a quicker way? I could use ndarray with dtype=object but apparently, this has been deprecated.
For example, if I have the arrays:
arr1 = [ndarray([[1,1],[1,1],[1,1]]), ndarray([[2,2],[2,2]])]
arr2 = [ndarray([[3,3],[3,3],[3,3]]), ndarray([[4,4],[4,4]])]
How can I add these together to get the array:
arr3 = [ndarray([[4,4],[4,4],[4,4]]), ndarray([[6,6],[6,6]])]
You don't need to add the numbers in the array element-wise, make use of numpy's parallel computations by using numpy.add
Here's some code to do just that:
import numpy as np
arr1 = np.asarray([[[1,1],[1,1],[1,1]], [[2,2],[2,2]]])
arr2 = np.asarray([[[3,3],[3,3],[3,3]], [[4,4],[6,6]]])
ans = []
for first, second in zip(arr1, arr2):
ans.append(np.add(first,second))
Outputs:
>>> [array([[4, 4], [4, 4], [4, 4]]), array([[6, 6], [8, 8]])]
P.S
Could use a one-liner list-comprehension as well
ans = [np.add(first, second) for first, second in zip(arr1, arr2)]
You can use zip/map/sum:
import numpy as np
arr1 = [np.array([[1,1],[1,1],[1,1]]), np.array([[2,2],[2,2]])]
arr2 = [np.array([[3,3],[3,3],[3,3]]), np.array([[4,4],[4,4]])]
arr3 = list(map(sum, zip(arr1, arr2)))
output:
>>> arr3
[array([[4, 4],
[4, 4],
[4, 4]]),
array([[6, 6],
[6, 6]])]
In NumPy, you can add two arrays element-wise by adding two NumPy arrays.
N.B: if your array shape varies then reshape the array and fill with 0.
arr1 = np.array([np.array([[1,1],[1,1],[1,1]]), np.array([[2,2],[2,2]])])
arr2 = np.array([np.array([[3,3],[3,3],[3,3]]), np.array([[4,4],[4,4]])])
arr3 = arr2 + arr1
You can use a list comprehension:
[x + y for x, y in zip(arr1, arr2)]

what's the difference between np.array[:,0] and np.array[:,[0]]?

I have a numpy array cols2:
print(type(cols2))
print(cols2.shape)
<class 'numpy.ndarray'>
(97, 2)
I was trying to get the first column of this 2d numpy array using the first code below, then i got a vector instead of my ideal one column of data. the second code seem to get me the ideal answer, but i am confused what does the second code is doing by adding a bracket outside the zero?
print(type(cols2[:,0]))
print(cols2[:,0].shape)
<class 'numpy.ndarray'>
(97,)
print(type(cols2[:,[0]]))
print(cols2[:,[0]].shape)
<class 'numpy.ndarray'>
(97, 1)
cols2[:, 0] specifies that you want to slice out a 1D vector of length 97 from a 2D array. cols2[:, [0]] specifies that you want to slice out a 2D sub-array of shape (97, 1) from the 2D array. The square brackets [] make all the difference here.
v = np.arange(6).reshape(-1, 2)
v[:, 0]
array([0, 2, 4])
v[:, [0]]
array([[0],
[2],
[4]])
The fundamental difference is the extra dimension in the latter command (as you've noted). This is intended behaviour, as implemented in numpy.ndarray.__get/setitem__ and codified in the NumPy documentation.
You can also specify cols2[:,0:1] to the same effect - a column sub-slice.
v[:, 0:1]
array([[0],
[2],
[4]])
For more information, look at the notes on Advanced Indexing in the NumPy docs.
The extra square brackets around 0 in cols2[:, [0]] adds an extra dimension.
This becomes more clear when you print the results of your code:
A = np.array([[1, 2],
[3, 4],
[5, 6]])
A.shape # (3, 2)
A[:, 0].shape # (3,)
A[:, 0] # array([1, 3, 5])
A[:, [0]]
# array([[1],
# [3],
# [5]])
An n-D numpy array can only use n integers to represent its shape. Therefore, a 1D array is represented by only a single integer. There is no concept of "rows" or "columns" of a 1D array.
You should resist the urge to think of numpy arrays as having rows and columns, but instead consider them as having dimensions and shape. This is a fundamental difference between numpy.array and numpy.matrix. In almost all cases, numpy.array is sufficient.

How to generate 2d numpy array?

I'm trying to generate a 2d numpy array with the help of generators:
x = [[f(a) for a in g(b)] for b in c]
And if I try to do something like this:
x = np.array([np.array([f(a) for a in g(b)]) for b in c])
I, as expected, get a np.array of np.array. But I want not this, but ndarray, so I can get, for example, column in a way like this:
y = x[:, 1]
So, I'm curious whether there is a way to generate it in such a way.
Of course it is possible with creating npdarray of required size and filling it with required values, but I want a way to do so in a line of code.
This works:
a = [[1, 2, 3], [4, 5, 6]]
nd_a = np.array(a)
So this should work too:
nd_a = np.array([[x for x in y] for y in a])
To create a new array, it seems numpy.zeros is the way to go
import numpy as np
a = np.zeros(shape=(x, y))
You can also set a datatype to allocate it sensibly
>>> np.zeros(shape=(5,2), dtype=np.uint8)
array([[0, 0],
[0, 0],
[0, 0],
[0, 0],
[0, 0]], dtype=uint8)
>>> np.zeros(shape=(5,2), dtype="datetime64[ns]")
array([['1970-01-01T00:00:00.000000000', '1970-01-01T00:00:00.000000000'],
['1970-01-01T00:00:00.000000000', '1970-01-01T00:00:00.000000000'],
['1970-01-01T00:00:00.000000000', '1970-01-01T00:00:00.000000000'],
['1970-01-01T00:00:00.000000000', '1970-01-01T00:00:00.000000000'],
['1970-01-01T00:00:00.000000000', '1970-01-01T00:00:00.000000000']],
dtype='datetime64[ns]')
See also
How do I create an empty array/matrix in NumPy?
np.full(size, 0) vs. np.zeros(size) vs. np.empty()
Its very simple, do like this
import numpy as np
arr=np.arange(50)
arr_2d=arr.reshape(10,5) #Reshapes 1d array in to 2d, containing 10 rows and 5 columns.
print(arr_2d)

Categories

Resources