I am looking for a more pythonic way of randomly shifting rows of a numpy array. The idea is that I have an array of data, and I want to left-shift each row of the array by a random amount. My solution, which works, but I feel is a bit un-pythonic:
def shift_rows(data, max_shift):
"""Left-shifts each row in `data` by a random amount up to `max_shift`."""
return np.array([np.roll(row, -np.random.randint(0, max_shift)) for row in data])
And to test:
data = np.array([np.arange(0, 5) for _ in range(10)]) # toy data to illustrate
shifted = shift_rows(data, max_shift=5)
shifted
# array([1, 2, 3, 4, 0],
# [1, 2, 3, 4, 0],
# [0, 1, 2, 3, 4],
# ...
# [4, 0, 1, 2, 3]])
This is really more of a thought experiment. Can anybody come up with a more efficient or more pythonic way of doing this? I suppose list comprehensions are pythonic, but if I need to do this over a huge array is this efficient?
Edit: I marked the excellent reply by Divakar as the answer, but I would still love to hear it if anybody has any other ideas.
Generate all the column indices for all rows in one go and then simply use integer-indexing for a vectorized solution, like so -
# Store shape of input array
m,n = data.shape
# Get random column start indices for each row in one go
col_start = np.random.randint(0, max_shift, data.shape[0])
# Get the rolled indices for every row again in a vectorized manner.
# We are extending col_start to 2D and then adding a range array to get
# all column indices for every row by leveraging NumPy's braodcasting.
# Because of the additions, we might go off-limits. So, to simulate the
# rolled over version, mod it.
idx = np.mod(col_start[:,None] + np.arange(n), n)
# Finall with integer indexing get the values off data array
shifted_out = data[np.arange(m)[:,None], idx]
Step-by-step run -
1] Inputs :
In [548]: data
Out[548]:
array([[44, 23, 38, 32, 30],
[69, 15, 32, 41, 63],
[69, 41, 75, 50, 87],
[23, 28, 38, 79, 91]])
In [549]: max_shift = 5
2] Proposed solution :
2A] Get column starts :
In [550]: m,n = data.shape
In [551]: col_start = np.random.randint(0, max_shift, data.shape[0])
In [552]: col_start
Out[552]: array([1, 2, 3, 3])
2B] Get all indices :
In [553]: idx = np.mod(col_start[:,None] + np.arange(n), n)
In [554]: col_start[:,None]
Out[554]:
array([[1],
[2],
[3],
[3]])
In [555]: col_start[:,None] + np.arange(n)
Out[555]:
array([[1, 2, 3, 4, 5],
[2, 3, 4, 5, 6],
[3, 4, 5, 6, 7],
[3, 4, 5, 6, 7]])
In [556]: np.mod(col_start[:,None] + np.arange(n), n)
Out[556]:
array([[1, 2, 3, 4, 0],
[2, 3, 4, 0, 1],
[3, 4, 0, 1, 2],
[3, 4, 0, 1, 2]])
2C] Finally index into data :
In [557]: data[np.arange(m)[:,None], idx]
Out[557]:
array([[23, 38, 32, 30, 44],
[32, 41, 63, 69, 15],
[50, 87, 69, 41, 75],
[79, 91, 23, 28, 38]])
Verification -
1] Original approach :
In [536]: data = np.random.randint(11,99,(4,5))
...: max_shift = 5
...: col_start = -np.random.randint(0, max_shift, data.shape[0])
...: for i,row in enumerate(data):
...: print np.array([np.roll(row, col_start[i])])
...:
[[83 93 17 53 61]]
[[55 88 84 94 89]]
[[59 63 29 72 85]]
[[57 95 13 21 14]]
2] Proposed approach re-using col_start, so that we could do a value verification :
In [537]: m,n = data.shape
In [538]: idx = np.mod(-col_start[:,None] + np.arange(n), n)
In [539]: data[np.arange(m)[:,None], idx]
Out[539]:
array([[83, 93, 17, 53, 61],
[55, 88, 84, 94, 89],
[59, 63, 29, 72, 85],
[57, 95, 13, 21, 14]])
Related
I have a dataframe like this,
pd.DataFrame({'a': [1,22,34],
'b': [3,49,65]})
and I want to add 1 to all arrays of this dataframe and store it in the 3rd dimension of a numpy array like the following figure. I want to do this in a for loop because my calculations is more than just adding one to arrays in reality. Any suggestion for a minimal implementation of this?
Another possible solution:
np.array([df.apply(lambda x: x+y) for y in np.arange(2)])
Output:
array([[[ 1, 3],
[22, 49],
[34, 65]],
[[ 2, 4],
[23, 50],
[35, 66]]])
df = pd.DataFrame({'a': [1,22,34],'b': [3,49,65]})
array_2d = df.values
array_3d = np.repeat(array_2d[np.newaxis, :, :], 2, axis=0)
# loop
for i in range(2):
array_3d[i] = array_3d[i] + i
array_3d
###
[[[ 1 3]
[22 49]
[34 65]]
[[ 2 4]
[23 50]
[35 66]]]
Here's #Michael Szczesny way,
(broadcasting)
you only have to choose how many layers you want,
for example, 3 layer
df.values + np.arange(3)[:,None,None]
###
array([[[ 1, 3],
[22, 49],
[34, 65]],
[[ 2, 4],
[23, 50],
[35, 66]],
[[ 3, 5],
[24, 51],
[36, 67]]])
Here is the code:
import numpy as np
array = np.array([15, 55, 9, 99, 8, 21, 2, 90, 88])
this will output an array([ 15, 55, 9, 99, 8, 21, 2, 90, 88]).
How can I find the first minimum number without sorting then the second minimum number?
I expect the output to be:
first min = 9
second min = 8
You can find the absolute minimum like this:
In [35]: import numpy as np
In [36]: arr = np.array([15, 55, 9, 99, 8, 21, 2, 90, 88])
In [37]: first = np.min(arr)
In [38]: second = np.min(arr[arr != first])
In [39]: first
Out[39]: 2
In [40]: second
Out[40]: 8
To obtain the indices of the local minima, you could use scipy.signal.argrelmin:
In [52]: from scipy.signal import argrelmin
In [53]: idx = argrelmin(arr)[0]
In [54]: idx
Out[54]: array([2, 4, 6], dtype=int64)
In [55]: arr[idx]
Out[55]: array([9, 8, 2])
You could offset the list and zip them:
l0 = [15, 55, 9, 99, 8, 21, 2, 90, 88]
l1 = l0[1:]
l2 = [-1] + l0
[x for x,y,z in zip(l0,l1,l2) if (x < y) & (x < z)]
# Out[32]: [9, 8, 2]
or in one line:
l = [15, 55, 9, 99, 8, 21, 2, 90, 88]
[x for x,y,z in zip(l,l[1:],[-1]+l) if (x < y) & (x < z)]
# Out[32]: [9, 8, 2]
I am trying to use the scatter_nd function in TensorFlow to reorder elements within rows of a Matrix. For example, suppose I have the code:
indices = tf.constant([[1],[0]])
updates = tf.constant([ [5, 6, 7, 8],
[1, 2, 3, 4] ])
shape = tf.constant([2, 4])
scatter1 = tf.scatter_nd(indices, updates, shape)
$ print(scatter1) = [[1,2,3,4]
[5,6,7,8]]
This reorders the rows of the updates matrix.
Instead of only being able to reorder the rows, I'd like to reorder the individual elements within each row as well. If I just have a vector (Tensor of rank 1), then this example works:
indices = tf.constant([[1],[0],[2],[3]])
updates = tf.constant([5, 6, 7, 8])
shape = tf.constant([4])
scatter2 = tf.scatter_nd(indices, updates, shape)
$ print(scatter2) = [6,5,7,8]
What I really care about is to be able to swap elements within each row in scatter1, as I had done in scatter2, but do it for each row of scatter1. I've tried various combinations of indices but keep getting errors that the sizes are inconsistent thrown by the scatter_nd function.
The following swaps the elements of each row of each row using scatter_nd
indices = tf.constant([[[0, 1], [0, 0], [0, 2], [0, 3]],
[[1, 1], [1, 0], [1, 2], [1, 3]]])
updates = tf.constant([ [5, 6, 7, 8],
[1, 2, 3, 4] ])
shape = tf.constant([2, 4])
scatter1 = tf.scatter_nd(indices, updates, shape)
with tf.Session() as sess:
print(sess.run(scatter1))
Giving an output of:
[[6 5 7 8]
[2 1 3 4]]
The locations of the coordinate in indices define where the values are being taken from in updates and the actual cordinates define where the values will be placed in scatter1.
This answer is a few months late but hopefully still helpful.
Suppose you want to swap elements in the second dimension either keeping the first dimension order or not.
import tensorflow as tf
sess = tf.InteractiveSession()
def prepare_fd(fd_indices, sd_dims):
fd_indices = tf.expand_dims(fd_indices, 1)
fd_indices = tf.tile(fd_indices, [1, sd_dims])
return fd_indices
# define the updates
updates = tf.constant([[11, 12, 13, 14],
[21, 22, 23, 24],
[31, 32, 33, 34]])
sd_dims = tf.shape(updates)[1]
sd_indices = tf.constant([[1, 0, 2, 3], [0, 2, 1, 3], [0, 1, 3, 2]])
fd_indices_range = tf.range(0, limit=tf.shape(updates)[0])
fd_indices_custom = tf.constant([2, 0, 1])
# define the indices
indices1 = tf.stack((prepare_fd(fd_indices_range, sd_dims), sd_indices), axis=2)
indices2 = tf.stack((prepare_fd(fd_indices_custom, sd_dims), sd_indices), axis=2)
# define the shape
shape = tf.shape(updates)
scatter1 = tf.scatter_nd(indices1, updates, shape)
scatter2 = tf.scatter_nd(indices2, updates, shape)
print(scatter1.eval())
# array([[12, 11, 13, 14],
# [21, 23, 22, 24],
# [31, 32, 34, 33]], dtype=int32)
print(scatter2.eval())
# array([[21, 23, 22, 24],
# [31, 32, 34, 33],
# [12, 11, 13, 14]], dtype=int32)
May this example help.
I frequently want to pixel bin/pixel bucket a numpy array, meaning, replace groups of N consecutive pixels with a single pixel which is the sum of the N replaced pixels. For example, start with the values:
x = np.array([1, 3, 7, 3, 2, 9])
with a bucket size of 2, this transforms into:
bucket(x, bucket_size=2)
= [1+3, 7+3, 2+9]
= [4, 10, 11]
As far as I know, there's no numpy function that specifically does this (please correct me if I'm wrong!), so I frequently roll my own. For 1d numpy arrays, this isn't bad:
import numpy as np
def bucket(x, bucket_size):
return x.reshape(x.size // bucket_size, bucket_size).sum(axis=1)
bucket_me = np.array([3, 4, 5, 5, 1, 3, 2, 3])
print(bucket(bucket_me, bucket_size=2)) #[ 7 10 4 5]
...however, I get confused easily for the multidimensional case, and I end up rolling my own buggy, half-assed solution to this "easy" problem over and over again. I'd love it if we could establish a nice N-dimensional reference implementation.
Preferably the function call would allow different bin sizes along different axes (perhaps something like bucket(x, bucket_size=(2, 2, 3)))
Preferably the solution would be reasonably efficient (reshape and sum are fairly quick in numpy)
Bonus points for handling edge effects when the array doesn't divide nicely into an integer number of buckets.
Bonus points for allowing the user to choose the initial bin edge offset.
As suggested by Divakar, here's my desired behavior in a sample 2-D case:
x = np.array([[1, 2, 3, 4],
[2, 3, 7, 9],
[8, 9, 1, 0],
[0, 0, 3, 4]])
bucket(x, bucket_size=(2, 2))
= [[1 + 2 + 2 + 3, 3 + 4 + 7 + 9],
[8 + 9 + 0 + 0, 1 + 0 + 3 + 4]]
= [[8, 23],
[17, 8]]
...hopefully I did my arithmetic correctly ;)
I think you can do most of the fiddly work with skimage's view_as_blocks. This function is implemented using as_strided so it is very efficient (it just changes the stride information to reshape the array). Because it's written in Python/NumPy, you can always copy the code if you don't have skimage installed.
After applying that function, you just need to sum the N trailing axes of the reshaped array (where N is the length of the bucket_size tuple). Here's a new bucket() function:
from skimage.util import view_as_blocks
def bucket(x, bucket_size):
blocks = view_as_blocks(x, bucket_size)
tup = tuple(range(-len(bucket_size), 0))
return blocks.sum(axis=tup)
Then for example:
>>> x = np.array([1, 3, 7, 3, 2, 9])
>>> bucket(x, bucket_size=(2,))
array([ 4, 10, 11])
>>> x = np.array([[1, 2, 3, 4],
[2, 3, 7, 9],
[8, 9, 1, 0],
[0, 0, 3, 4]])
>>> bucket(x, bucket_size=(2, 2))
array([[ 8, 23],
[17, 8]])
>>> y = np.arange(6*6*6).reshape(6,6,6)
>>> bucket(y, bucket_size=(2, 2, 3))
array([[[ 264, 300],
[ 408, 444],
[ 552, 588]],
[[1128, 1164],
[1272, 1308],
[1416, 1452]],
[[1992, 2028],
[2136, 2172],
[2280, 2316]]])
Natively from as_strided :
x = array([[1, 2, 3, 4],
[2, 3, 7, 9],
[8, 9, 1, 0],
[0, 0, 3, 4]])
from numpy.lib.stride_tricks import as_strided
def bucket(x,bucket_size):
x=np.ascontiguousarray(x)
oldshape=array(x.shape)
newshape=concatenate((oldshape//bucket_size,bucket_size))
oldstrides=array(x.strides)
newstrides=concatenate((oldstrides*bucket_size,oldstrides))
axis=tuple(range(x.ndim,2*x.ndim))
return as_strided (x,newshape,newstrides).sum(axis)
if a dimension not divide evenly into the corresponding dimension of x, remaining elements are lost.
verification :
In [9]: bucket(x,(2,2))
Out[9]:
array([[ 8, 23],
[17, 8]])
To specify different bin sizes along each axis for ndarray cases, you can use iteratively use np.add.reduceat along each axis of it, like so -
def bucket(x, bin_size):
ndims = x.ndim
out = x.copy()
for i in range(ndims):
idx = np.append(0,np.cumsum(bin_size[i][:-1]))
out = np.add.reduceat(out,idx,axis=i)
return out
Sample run -
In [126]: x
Out[126]:
array([[165, 107, 133, 82, 199],
[ 35, 138, 91, 100, 207],
[ 75, 99, 40, 240, 208],
[166, 171, 78, 7, 141]])
In [127]: bucket(x, bin_size = [[2, 2],[3, 2]])
Out[127]:
array([[669, 588],
[629, 596]])
# [2, 2] are the bin sizes along axis=0
# [3, 2] are the bin sizes along axis=1
# array([[165, 107, 133, | 82, 199],
# [ 35, 138, 91, | 100, 207],
# -------------------------------------
# [ 75, 99, 40, | 240, 208],
# [166, 171, 78, | 7, 141]])
In [128]: x[:2,:3].sum()
Out[128]: 669
In [129]: x[:2,3:].sum()
Out[129]: 588
In [130]: x[2:,:3].sum()
Out[130]: 629
In [131]: x[2:,3:].sum()
Out[131]: 596
I have a np array of arrays of arrays:
arr1 = np.array([[1,2,3],[4,5,6],[7,8,9]])
arr2 = np.array([[10,20,30],[40,50,60],[70,80,90]])
arr3 = np.array([[15,25,35],[45,55,65],[75,85,95]])
list_arr = np.array([arr1,arr2,arr3])
and indices array:
indices_array = np.array([1,0,2])
I want to get the array at index 1 for the first (array of arrays), the array at
index 0 for the second (array of arrays) and the array at index 2 for the third (array of arrays)
expected output:
#[[ 4 5 6]
#[10 20 30]
#[75 85 95]]
I am looking for a numpy way to do it. As I have large arrays, I prefer not to use comprehension lists.
Basically, you are selecting the second axis elements with indices_array corresponding to each position along the first axis for all the elements along the third axis. As such, you can do -
list_arr[np.arange(list_arr.shape[0]),indices_array,:]
Sample run -
In [16]: list_arr
Out[16]:
array([[[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9]],
[[10, 20, 30],
[40, 50, 60],
[70, 80, 90]],
[[15, 25, 35],
[45, 55, 65],
[75, 85, 95]]])
In [17]: indices_array
Out[17]: array([1, 0, 2])
In [18]: list_arr[np.arange(list_arr.shape[0]),indices_array,:]
Out[18]:
array([[ 4, 5, 6],
[10, 20, 30],
[75, 85, 95]])
Just acces by linking postions to desired indexes (0-1, 1-0, 2-2) as follows:
desired_array = np.array([list_arrr[x][y] for x,y in enumerate([1,0,2])])