stacking datafram in numpy arrays in a loop - python

I have a dataframe like this,
pd.DataFrame({'a': [1,22,34],
'b': [3,49,65]})
and I want to add 1 to all arrays of this dataframe and store it in the 3rd dimension of a numpy array like the following figure. I want to do this in a for loop because my calculations is more than just adding one to arrays in reality. Any suggestion for a minimal implementation of this?

Another possible solution:
np.array([df.apply(lambda x: x+y) for y in np.arange(2)])
Output:
array([[[ 1, 3],
[22, 49],
[34, 65]],
[[ 2, 4],
[23, 50],
[35, 66]]])

df = pd.DataFrame({'a': [1,22,34],'b': [3,49,65]})
array_2d = df.values
array_3d = np.repeat(array_2d[np.newaxis, :, :], 2, axis=0)
# loop
for i in range(2):
array_3d[i] = array_3d[i] + i
array_3d
###
[[[ 1 3]
[22 49]
[34 65]]
[[ 2 4]
[23 50]
[35 66]]]
Here's #Michael Szczesny way,
(broadcasting)
you only have to choose how many layers you want,
for example, 3 layer
df.values + np.arange(3)[:,None,None]
###
array([[[ 1, 3],
[22, 49],
[34, 65]],
[[ 2, 4],
[23, 50],
[35, 66]],
[[ 3, 5],
[24, 51],
[36, 67]]])

Related

Indexing ndarray by ndarray

I have a first ndarray, foo, in which I want to select several elements.
foo = array([0, 10, 30] , [20, 40, 60], [30, 50, 70])
To be precised, I have another ndarray, bar, in which I store the rows I want in each column of my first ndarray.
bar = array([1, 2, 0], [0, 0, 1])
What I want as result is :
array([20, 50, 30] , [0, 10, 60])
Is it a vectorized way to do it ?
When I try foo[bar], it increases the size of the array.
That is not what I'm looking for.
In [17]: foo[bar, np.arange(3)]
Out[17]:
array([[20, 50, 30],
[ 0, 10, 60]])
The 1-dimensional array np.arange(3) is broadcasted to the same shape as bar
so that it is equivalent to
In [35]: X, Y = np.broadcast_arrays(bar, np.arange(3)); Y
Out[35]:
array([[0, 1, 2],
[0, 1, 2]])
X is the same as bar since broadcasting does not change the shape of bar.
Then NumPy integer array indexing rules say that the (i,j) element of foo[X, Y] equals
foo[X, Y][i, j] = foo[X[i,j], Y[i,j]]
So for example,
foo[bar, np.arange(3)][0, 1] = foo[ bar[0,1], Y[0,1] ]
= foo[2, 1]
= 50
you need to also specify the columns to go with each index, respectively.
try this:
import numpy as np
foo = np.array([[0, 10, 30], [20, 40, 60], [30, 50, 70]])
bar = np.array([[1, 2, 0], [0, 0, 1]])
foo[bar, range(len(foo))]
Output:
array([[20, 50, 30],
[ 0, 10, 60]])

How to randomly shift rows of a numpy array

I am looking for a more pythonic way of randomly shifting rows of a numpy array. The idea is that I have an array of data, and I want to left-shift each row of the array by a random amount. My solution, which works, but I feel is a bit un-pythonic:
def shift_rows(data, max_shift):
"""Left-shifts each row in `data` by a random amount up to `max_shift`."""
return np.array([np.roll(row, -np.random.randint(0, max_shift)) for row in data])
And to test:
data = np.array([np.arange(0, 5) for _ in range(10)]) # toy data to illustrate
shifted = shift_rows(data, max_shift=5)
shifted
# array([1, 2, 3, 4, 0],
# [1, 2, 3, 4, 0],
# [0, 1, 2, 3, 4],
# ...
# [4, 0, 1, 2, 3]])
This is really more of a thought experiment. Can anybody come up with a more efficient or more pythonic way of doing this? I suppose list comprehensions are pythonic, but if I need to do this over a huge array is this efficient?
Edit: I marked the excellent reply by Divakar as the answer, but I would still love to hear it if anybody has any other ideas.
Generate all the column indices for all rows in one go and then simply use integer-indexing for a vectorized solution, like so -
# Store shape of input array
m,n = data.shape
# Get random column start indices for each row in one go
col_start = np.random.randint(0, max_shift, data.shape[0])
# Get the rolled indices for every row again in a vectorized manner.
# We are extending col_start to 2D and then adding a range array to get
# all column indices for every row by leveraging NumPy's braodcasting.
# Because of the additions, we might go off-limits. So, to simulate the
# rolled over version, mod it.
idx = np.mod(col_start[:,None] + np.arange(n), n)
# Finall with integer indexing get the values off data array
shifted_out = data[np.arange(m)[:,None], idx]
Step-by-step run -
1] Inputs :
In [548]: data
Out[548]:
array([[44, 23, 38, 32, 30],
[69, 15, 32, 41, 63],
[69, 41, 75, 50, 87],
[23, 28, 38, 79, 91]])
In [549]: max_shift = 5
2] Proposed solution :
2A] Get column starts :
In [550]: m,n = data.shape
In [551]: col_start = np.random.randint(0, max_shift, data.shape[0])
In [552]: col_start
Out[552]: array([1, 2, 3, 3])
2B] Get all indices :
In [553]: idx = np.mod(col_start[:,None] + np.arange(n), n)
In [554]: col_start[:,None]
Out[554]:
array([[1],
[2],
[3],
[3]])
In [555]: col_start[:,None] + np.arange(n)
Out[555]:
array([[1, 2, 3, 4, 5],
[2, 3, 4, 5, 6],
[3, 4, 5, 6, 7],
[3, 4, 5, 6, 7]])
In [556]: np.mod(col_start[:,None] + np.arange(n), n)
Out[556]:
array([[1, 2, 3, 4, 0],
[2, 3, 4, 0, 1],
[3, 4, 0, 1, 2],
[3, 4, 0, 1, 2]])
2C] Finally index into data :
In [557]: data[np.arange(m)[:,None], idx]
Out[557]:
array([[23, 38, 32, 30, 44],
[32, 41, 63, 69, 15],
[50, 87, 69, 41, 75],
[79, 91, 23, 28, 38]])
Verification -
1] Original approach :
In [536]: data = np.random.randint(11,99,(4,5))
...: max_shift = 5
...: col_start = -np.random.randint(0, max_shift, data.shape[0])
...: for i,row in enumerate(data):
...: print np.array([np.roll(row, col_start[i])])
...:
[[83 93 17 53 61]]
[[55 88 84 94 89]]
[[59 63 29 72 85]]
[[57 95 13 21 14]]
2] Proposed approach re-using col_start, so that we could do a value verification :
In [537]: m,n = data.shape
In [538]: idx = np.mod(-col_start[:,None] + np.arange(n), n)
In [539]: data[np.arange(m)[:,None], idx]
Out[539]:
array([[83, 93, 17, 53, 61],
[55, 88, 84, 94, 89],
[59, 63, 29, 72, 85],
[57, 95, 13, 21, 14]])

numpy get values in array of arrays of arrays for array of indices

I have a np array of arrays of arrays:
arr1 = np.array([[1,2,3],[4,5,6],[7,8,9]])
arr2 = np.array([[10,20,30],[40,50,60],[70,80,90]])
arr3 = np.array([[15,25,35],[45,55,65],[75,85,95]])
list_arr = np.array([arr1,arr2,arr3])
and indices array:
indices_array = np.array([1,0,2])
I want to get the array at index 1 for the first (array of arrays), the array at
index 0 for the second (array of arrays) and the array at index 2 for the third (array of arrays)
expected output:
#[[ 4 5 6]
#[10 20 30]
#[75 85 95]]
I am looking for a numpy way to do it. As I have large arrays, I prefer not to use comprehension lists.
Basically, you are selecting the second axis elements with indices_array corresponding to each position along the first axis for all the elements along the third axis. As such, you can do -
list_arr[np.arange(list_arr.shape[0]),indices_array,:]
Sample run -
In [16]: list_arr
Out[16]:
array([[[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9]],
[[10, 20, 30],
[40, 50, 60],
[70, 80, 90]],
[[15, 25, 35],
[45, 55, 65],
[75, 85, 95]]])
In [17]: indices_array
Out[17]: array([1, 0, 2])
In [18]: list_arr[np.arange(list_arr.shape[0]),indices_array,:]
Out[18]:
array([[ 4, 5, 6],
[10, 20, 30],
[75, 85, 95]])
Just acces by linking postions to desired indexes (0-1, 1-0, 2-2) as follows:
desired_array = np.array([list_arrr[x][y] for x,y in enumerate([1,0,2])])

Can I produce the result of np.outer using np.dot?

I am trying to improve my understanding of numpy functions. I understand the behaviour of numpy.dot. I'd like to understand the behaviour of numpy.outer in terms of numpy.dot.
Based on this Wikipedia article https://en.wikipedia.org/wiki/Outer_product I'd expect for array_equal to return True in the following code. However it does not.
X = np.matrix([
[1,5],
[5,9],
[4,1]
])
r1 = np.outer(X,X)
r2 = np.dot(X, X.T)
np.array_equal(r1, r2)
How can I assign r2 so that np.array_equal returns True? Also, why does numpy's implementation of np.outer not match the definition of outer multiplication on Wikipedia?
Using numpy 1.9.2
In [303]: X=np.array([[1,5],[5,9],[4,1]])
In [304]: X
Out[304]:
array([[1, 5],
[5, 9],
[4, 1]])
In [305]: np.inner(X,X)
Out[305]:
array([[ 26, 50, 9],
[ 50, 106, 29],
[ 9, 29, 17]])
In [306]: np.dot(X,X.T)
Out[306]:
array([[ 26, 50, 9],
[ 50, 106, 29],
[ 9, 29, 17]])
The Wiki outer link mostly talks about vectors, 1d arrays. Your X is 2d.
In [310]: x=np.arange(3)
In [311]: np.outer(x,x)
Out[311]:
array([[0, 0, 0],
[0, 1, 2],
[0, 2, 4]])
In [312]: np.inner(x,x)
Out[312]: 5
In [313]: np.dot(x,x) # same as inner
Out[313]: 5
In [314]: x[:,None]*x[None,:] # same as outer
Out[314]:
array([[0, 0, 0],
[0, 1, 2],
[0, 2, 4]])
Notice that the Wiki outer does not involve summation. Inner does, in this example 5 is the sum of the 3 diagonal values of the outer.
dot also involves summation - all the products followed summation along a specific axis.
Some of the wiki outer equations use explicit indices. The einsum function can implement these calculations.
In [325]: np.einsum('ij,kj->ik',X,X)
Out[325]:
array([[ 26, 50, 9],
[ 50, 106, 29],
[ 9, 29, 17]])
In [326]: np.einsum('ij,jk->ik',X,X.T)
Out[326]:
array([[ 26, 50, 9],
[ 50, 106, 29],
[ 9, 29, 17]])
In [327]: np.einsum('i,j->ij',x,x)
Out[327]:
array([[0, 0, 0],
[0, 1, 2],
[0, 2, 4]])
In [328]: np.einsum('i,i->',x,x)
Out[328]: 5
As mentioned in the comment, np.outer uses ravel, e.g.
return a.ravel()[:, newaxis]*b.ravel()[newaxis,:]
This the same broadcasted multiplication that I demonstrated earlier for x.
numpy.outer only works for 1-d vectors, not matrices. But for the case of 1-d vectors, there is a relation.
If
import numpy as np
A = np.array([1.0,2.0,3.0])
then this
np.matrix(A).T.dot(np.matrix(A))
should be the same as this
np.outer(A,A)
Another (clunky) version similar to a[:,None] * a[None,:]
a.reshape(a.size, 1) * a.reshape(1, a.size)

splitting an array into two smaller arrays in python

I have an array of size 80x40 and want to send each row into one of two smaller arrays based on a value in a specific column (10). I have code similar to below-but this ends up flattening the array. I don't know the Y dimensions of the output arrays (Array2,Array3). I guess I could have some code count all the values above and below 50 to get the Y dimensions of the output axes and then make 2 output arrays of np.zeros(Array.shape[0],Yvalues) and append row by row to that but I'm still not sure how that would work.
Array.shape=(80,40)
Array2=[]
Array3=[]
for x in range(0,Array.shape[0]):
if Array[x,10]<50:
Array2.append(Array[x,:])
else:
Array3.append(Array[x,:])
As a smaller example:
a = np.array([[1, 10], [1, 20], [2, 30], [2, 40], [1, 50], [3, 60], [1, 70]])
a2 = a[a[:, 0] < 1.5]
a3 = a[a[:, 0] >= 1.5]
a2 is now:
array([[ 1, 10],
[ 1, 20],
[ 1, 50],
[ 1, 70]])
and a3 is now:
array([[ 2, 30],
[ 2, 40],
[ 3, 60]])
So in your case, use:
a2 = a[a[:, 10] < 50]
a3 = a[a[:, 10] >= 50]

Categories

Resources