Related
for f in train.columns:
missings = train[train[f] == -1][f].count()
what does trainp[][] mean? How can this be two dimensional array if f referring to another column?
For vanilla python It is certainly very odd and poorly written code, but it could be valid in a very limited number of situations. Below are a couple examples in which it would work. I am sure there are more, but either way it is not very easy to understand and I do not recommend using it in your own code.
Note: the iterable.count() method requires 1 argument.
example 2
f = 4
train = [[1, 2, 3, 4, [0, 0, 1, 0]], [1, 2, 3, 4, [1, 0, 1, 1]], 0, 1, -1]
missings = train[train[f] == -1][f].count(1)
print(missings) # output = 3
example 1
f = 4
train = {True: [1, 2, 3, 4, [0, 0, 0, 1]], False: [1, 2, 3, 4, [1, 1, 1, 0]], 4: 1}
missing = train[train[f] == -1][f].count(1)
print(missing) # output = 3
It's looking like you are already getting values from the 2D array i-e train[train[f] == -1][f]
you can make it a 2D array by doing something like that
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
or
arr = [[12, 13, 5, 4], [14, 8,11], [12, 10, 12, 6], [15,17,9,0]]
I have a 3D array that I like to repeat 4 times.
Achieved via a mixture of Numpy and Python methods:
>>> z = np.arange(9).reshape(3,3)
>>> z
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> z2 = []
>>> for i in range(4):
z2.append(z)
>>> z2
[array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]]), array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]]), array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]]), array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])]
>>> z2 = np.array(z2)
>>> z2
array([[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]],
[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]],
[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]],
[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]]])
Achieved via Pure NumPy:
>>> z2 = np.repeat(z[np.newaxis,...], 4, axis=0)
>>> z2
array([[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]],
[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]],
[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]],
[[0, 1, 2],
[3, 4, 5],
[6, 7, 8]]])
Are the elements created by numpy.repeat() views of the original numpy.array() or unique elements?
If the latter, is there an equivalent NumPy functions that can create views of the original array the same way as numpy.repeat()?
I think such an ability can help reduce the buffer space of z2 in the event size of z is large and when there are many repeats of z involved.
A follow-up on one of #FrankYellin answer:
>>> z = np.arange(9).reshape(3,3)
>>> z
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> z2 = np.repeat(z[np.newaxis,...], 1_000_000_000, axis=0)
>>> z2.nbytes
72000000000
>>> y2 = np.broadcast_to(z, (1_000_000_000, 3, 3))
>>> y2.nbytes
72000000000
The nbytes from using np.broadcast_to() is the same as np.repeat(). This is surprising given that the former returns a readonly view on the original z array with the given shape. Having said this, I did notice that np.broadcast_to() created the y2 array instantaneously, while the creation of z2 via np.repeat() took abt 40 seconds to complete. Hence,np.broadcast_to() yielded significantly faster performance.
If you want a writable version, it is doable, but it's really ugly.
If you want a read-only version, np.broadcast_to(z, (4, 3, 3)) should be all you need.
Now the ugly writable version. Be careful. You can corrupt memory if you mess the arguments up.
> z.shape
(3, 3)
> z.strides
(24, 8)
from numpy.lib.stride_tricks import as_strided
z2 = as_strided(z, shape=(4, 3, 3), strides=(0, 24, 8))
and you end up with:
>>> z2[1, 1, 1]
4
>>> z2[1, 1, 1] = 100
>>> z2[2, 1, 1]
100
>>>
You are using strides to say that I want to create a second array overlayed on top of the first array. You set its new shape, and you prefix 0 to the previous stride, indicating that the first dimension has no effect on the data you want.
Make sure you understand strides.
numpy.repeat creates a new array and not a view (you can check it by looking the __array_interface__ field). In fact, it is not possible to create a view on the original array in the general case since Numpy views does not support such pattern. A views is basically just an object containing a pointer to a raw memory buffer, some strides, a shape and a type. While it is possible to repeat one item N times with a 0 stride, it is not possible to repeat 2 items N times (without adding a new dimension to the output array). Thus, no there is no way to build a function like numpy.repeat having the same array output shape to repeat items of the last axis. If adding a new dimension is Ok, then you can build an array with a new dimension and a stride set to 0. Repeating the last dimension is possible though. The answer of #FrankYellin gives a good example. Note that reshaping/ravel the resulting array cause a mandatory copy. Supporting such advanced views would make the Numpy code more complex or/and less efficient for a feature that is only used rarely by users.
I am learning more about numpy and need help creating an numpy array from multiple lists. Say I have 3 lists,
a = [1, 1, 1]
b = [2, 2, 2]
c = [3, 3, 3]
How can I create a new numpy array with each list as a column? Meaning that the new array would be [[1, 2, 3], [1, 2, 3], [1, 2, 3]]. I know how to do this by looping through the lists but I am not sure if there is an easier way to accomplish this. The numpy concatenate function seems to be close but I couldn't figure out how to get it to do what I'm after.
Thanks
Try with np.column_stack:
d = np.column_stack([a, b, c])
No need to use numpy. Python zip does a nice job:
In [606]: a = [1, 1, 1]
...: b = [2, 2, 2]
...: c = [3, 3, 3]
In [607]: abc = list(zip(a,b,c))
In [608]: abc
Out[608]: [(1, 2, 3), (1, 2, 3), (1, 2, 3)]
But if your heart is set on using numpy, a good way is to make a 2d array, and transpose it:
In [609]: np.array((a,b,c))
Out[609]:
array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3]])
In [610]: np.array((a,b,c)).T
Out[610]:
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
Others show how to do this with stack and column_stack, but underlying these is a concatenate. In one way or other they turn the lists into 2d arrays that can be joined on axis=1, e.g.
In [616]: np.concatenate([np.array(x)[:,None] for x in [a,b,c]], axis=1)
Out[616]:
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
Say I have 10 4*4 numpy arrays:
[[1, 1, 1, 1],
[2, 2, 2, 2],
[3, 3, 3, 3],
[4, 4, 4, 4]]
[[2, 2, 2, 2],
[3, 3, 3, 3],
[4, 4, 4, 4],
[5, 5, 5, 5]]
etc...
What I want to do is calculate a least squares linear regression for each entry in the matrix.
So I want to take m0[0][0], m1[0][0], m2[0][0], etc... and calculate the linear regression. Then do the same for the [0][1] values.
Is there any way of doing this without having to first extract all [0][0] values into a new array and calling numpy.linalg.lstsq? Can I somehow pass my 10*4*4 array to numpy.linalg.lstsq so that it will calculate multiple regressions?
Give this a shot... I'm sure there is a way to make this more efficient though.
def my_lin_reg(arr0):
n = arr0.shape[0]
s = arr0.shape[1] * arr0.shape[2]
arr1 = arr0.swapaxes(0, 2).reshape(s, n)
x = np.vstack([range(n), np.ones(n)]).T
mc = []
for sub_arr in arr1:
mc.append(np.linalg.lstsq(x, sub_arr)[0])
return np.array(mc)
This may be a silly question, but I've just started using numpy and I have to figure out how to perform some simple operations.
Suppose that I have the 2x3 array
array([[1, 3, 5],
[2, 4, 6]])
And that I want to perform some operation on the first column, for example subtract 1 to all the elements to get
array([[0, 3, 5],
[1, 4, 6]])
How can I perform such an operation?
arr
# array([[1, 3, 5],
# [2, 4, 6]])
arr[:,0] = arr[:,0] - 1 # choose the first column here, subtract one and
# assign it back to the same column
arr
# array([[0, 3, 5],
# [1, 4, 6]])