Python NumPy: Performing different column operations over every N rows

Python NumPy: Performing different column operations over every N rows - python

I have a large NumPy array (OriginalArray) with many rows and 8 columns.
I want to create a new array (NewArray) in which each row has the following properties:
Columns 1, 3, 5, and 7 of NewArray are the sum over N rows of columns 1, 3, 5, and 7 of OriginalArray
Columns 2, 4, 6, and 8 of NewArray are the mean over N rows of columns 2, 4, 6, and 8 of OriginalArray
So, the NewArray has 1/N as many rows as the OriginalArray.
For example:
Original Array = [1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 ]
with N = 2
NewArray = [2 1 2 1 2 1 2 1
2 1 2 1 2 1 2 1]
Please excuse the messy formatting. I'm still very new at this (my first question here, actually).
Thanks!

Here's a vectorized approach making heavy usage of slicing -
nrows = a.shape[0]//N # a is input array
out = np.empty((nrows,8))
out[:,::2] = a[:,::2].reshape(-1,N,4).sum(1)
out[:,1::2] = a[:,1::2].reshape(-1,N,4).mean(1)
Sample run -
In [64]: a # Input array
Out[64]:
array([[5, 1, 5, 8, 5, 0, 3, 1],
[0, 7, 8, 7, 0, 3, 5, 1],
[8, 6, 6, 4, 1, 6, 1, 2],
[4, 5, 5, 7, 5, 2, 1, 2]])
In [65]: N = 2 # Summing/averaging length
In [66]: a[:,::2] # Select [1,3,5,7] cols
Out[66]:
array([[5, 5, 5, 3],
[0, 8, 0, 5],
[8, 6, 1, 1],
[4, 5, 5, 1]])
In [67]: a[:,::2].reshape(-1,N,4).sum(1) # Sum N rows by splitting axis
Out[67]:
array([[ 5, 13, 5, 8],
[12, 11, 6, 2]])
In [68]: a[:,1::2] # Select [2,4,6,8] cols
Out[68]:
array([[1, 8, 0, 1],
[7, 7, 3, 1],
[6, 4, 6, 2],
[5, 7, 2, 2]])
In [69]: a[:,1::2].reshape(-1,N,4).mean(1) # Similarly average across N rows
Out[69]:
array([[ 4. , 7.5, 1.5, 1. ],
[ 5.5, 5.5, 4. , 2. ]])

I'm assuming that your original_array (note the PEP8 style) is already formatted in rows and columns. By this I mean, original_array = np.array([[1,1...],[1,...],[1,...],[1,...]])
An easy one-liner to create a single row of new_array would be as follows:
import numpy as np
row = [np.sum(original_array[:,x]) if x%2==1 else np.mean(test[:,x]) for x in range(len(original_array[0]))]
And then to copy the row, simply:
new_array = [row]*N

Related

Python program to delete all the rows and columns with all zeros and print the remaining matrix

Is there any program possible with a complexity less than O(mn)? The input is in the form as the first line contains MN and the next M lines each containing N integers
For example
4 4
1 0 3 4
0 0 0 0
4 0 6 8
4 0 2 4
The output should be:
1 3 4
4 6 8
4 2 4

You can do this by individually filtering rows and columns with all values equal to 1 but checking if set(row or column)!={0}
arr = [[1, 0, 3, 4],
[0, 0, 0, 0],
[4, 0, 6, 8],
[4, 0, 2, 4]]
rows = [i for i in arr if set(i)!={0}]
cols = [i for i in zip(*rows) if set(i)!={0}]
arr_new = [list(i) for i in zip(*cols)]
print(arr_new)
[[1, 3, 4],
[4, 6, 8],
[4, 2, 4]]
EDIT:
If you are ok with using numpy then you can do this a bit more easily -
import numpy as np
arr = np.array(arr)
arr[~(arr==0).all(0)][:,~(arr==0).all(1)]
array([[1, 3, 4],
[4, 6, 8],
[4, 2, 4]])

how to understand such shuffling data code in Numpy

I am learning at Numpy and I want to understand such shuffling data code as following:
# x is a m*n np.array
# return a shuffled-rows array
def shuffle_col_vals(x):
rand_x = np.array([np.random.choice(x.shape[0], size=x.shape[0], replace=False) for i in range(x.shape[1])]).T
grid = np.indices(x.shape)
rand_y = grid[1]
return x[(rand_x, rand_y)]
So I input an np.array object as following:
x1 = np.array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16]], dtype=int)
And I get a output of shuffle_col_vals(x1) like comments as following:
array([[ 1, 5, 11, 15],
[ 3, 8, 9, 14],
[ 4, 6, 12, 16],
[ 2, 7, 10, 13]], dtype=int64)
I get confused about the initial way of rand_x and I didn't get such way in numpy.array
And I have been thinking it a long time, but I still don't understand why return x[(rand_x, rand_y)] will get a shuffled-rows array.
If not mind, could anyone explain the code to me?
Thanks in advance.

In indexing Numpy arrays, you can take single elements. Let's use a 3x4 array to be able to differentiate between the axes:
In [1]: x1 = np.array([[1, 2, 3, 4],
...: [5, 6, 7, 8],
...: [9, 10, 11, 12]], dtype=int)
In [2]: x1[0, 0]
Out[2]: 1
If you review Numpy Advanced indexing, you will find that you can do more in indexing, by providing lists for each dimension. Consider indexing with x1[rows..., cols...], let's take two elements.
Pick from the first and second row, but always from the first column:
In [3]: x1[[0, 1], [0, 0]]
Out[3]: array([1, 5])
You can even index with arrays:
In [4]: x1[[[0, 0], [1, 1]], [[0, 1], [0, 1]]]
Out[4]:
array([[1, 2],
[5, 6]])
np.indices creates a row and col array, that if used for indexing, give back the original array:
In [5]: grid = np.indices(x1.shape)
In [6]: np.alltrue(x1[grid[0], grid[1]] == x1)
Out[6]: True
Now if you shuffle the values of grid[0] col-wise, but keep grid[1] as-is, and then use these for indexing, you get an array with the values of the columns shuffled.
Each column index vector is [0, 1, 2]. The code now shuffles these column index vectors for each column individually, and stacks them together into rand_x into the same shape as x1.
Create a single shuffled column index vector:
In [7]: np.random.seed(0)
In [8]: np.random.choice(x1.shape[0], size=x1.shape[0], replace=False)
Out[8]: array([2, 1, 0])
The stacking works by (pseudo-code) stacking with [random-index-col-vec for cols in range(x1.shape[1])] and then transposing (.T).
To make it a little clearer we can rewrite i as col and use column_stack instead of np.array([... for col]).T:
In [9]: np.random.seed(0)
In [10]: col_list = [np.random.choice(x1.shape[0], size=x1.shape[0], replace=False)
for col in range(x1.shape[1])]
In [11]: col_list
Out[11]: [array([2, 1, 0]), array([2, 0, 1]), array([0, 2, 1]), array([2, 0, 1])]
In [12]: rand_x = np.column_stack(col_list)
In [13]: rand_x
Out[13]:
array([[2, 2, 0, 2],
[1, 0, 2, 0],
[0, 1, 1, 1]])
In [14]: x1[rand_x, grid[1]]
Out[14]:
array([[ 9, 10, 3, 12],
[ 5, 2, 11, 4],
[ 1, 6, 7, 8]])
Details to note:
the example output you give is different from what the function you provide does. It seems to be transposed.
the use of rand_x and rand_y in the sample code can be confusing when being used to the convention of x=column index, y=row index

See output:
import numpy as np
def shuffle_col_val(x):
print("----------------------------\n A rand_x\n")
f = np.random.choice(x.shape[0], size=x.shape[0], replace=False)
print(f, "\nNow I transpose an array.")
rand_x = np.array([f]).T
print(rand_x)
print("----------------------------\n B rand_y\n")
print("Grid gives you two possibilities\n you choose second:")
grid = np.indices(x.shape)
print(format(grid))
rand_y = grid[1]
print("\n----------------------------\n C Our rand_x, rand_y:")
print("\nThe order of values in the column CHANGE:\n has random order\n{}".format(rand_x))
print("\nThe order of values in the row NO CHANGE:\n has normal order 0, 1, 2, 3\n{}".format(rand_y))
return x[(rand_x, rand_y)]
x1 = np.array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16]], dtype=int)
print("\n----------------------------\n D Our shuffled-rows: \n{}\n".format(shuffle_col_val(x1)))
Output:
A rand_x
[2 3 0 1]
Now I transpose an array.
[[2]
[3]
[0]
[1]]
----------------------------
B rand_y
Grid gives you two possibilities, you choose second:
[[[0 0 0 0]
[1 1 1 1]
[2 2 2 2]
[3 3 3 3]]
[[0 1 2 3]
[0 1 2 3]
[0 1 2 3]
[0 1 2 3]]]
----------------------------
C Our rand_x, rand_y:
The order of values in the column CHANGE: has random order
[[2]
[3]
[0]
[1]]
The order of values in the row NO CHANGE: has normal order 0, 1, 2, 3
[[0 1 2 3]
[0 1 2 3]
[0 1 2 3]
[0 1 2 3]]
----------------------------
D Our shuffled-rows:
[[ 9 10 11 12]
[13 14 15 16]
[ 1 2 3 4]
[ 5 6 7 8]]

numpy: convert multiple assignments to a single one using OR

taxi_modified is a two-dimensional ndarray.
Code below works, but seems un-pythonic:
taxi_modified[taxi_modified[:, 5] == 2, 15] = 1
taxi_modified[taxi_modified[:, 5] == 3, 15] = 1
taxi_modified[taxi_modified[:, 5] == 5, 15] = 1
Need to assign 1 to col at index 15 if col at index 5 is 2, 3, or 5.
The below didn't work:
taxi_modified[taxi_modified[:, 5] == 2 | 3 | 5, 15] = 1

You can use fancy indexing with np.isin (NumPy v1.13+), or np.in1d for older versions.
Here's a demo:
# example input array
A = np.arange(16).reshape((4, 4))
# calculate Boolean mask for rows
mask = np.isin(A[:, 1], [1, 5, 13])
# assign values, converting mask to integers
A[np.where(mask), 2] = -1
print(A)
array([[ 0, 1, -1, 3],
[ 4, 5, -1, 7],
[ 8, 9, 10, 11],
[12, 13, -1, 15]])
In one line, this can be written:
A[np.where(np.isin(A[:, 1], [1, 5, 13])), 2] = -1

How to select value from array that is closest to value in array using vectorization?

I have an array of values that I want to replace with from an array of choices based on which choice is linearly closest.
The catch is the size of the choices is defined at runtime.
import numpy as np
a = np.array([[0, 0, 0], [4, 4, 4], [9, 9, 9]])
choices = np.array([1, 5, 10])
If choices was static in size, I would simply use np.where
d = np.where(np.abs(a - choices[0]) > np.abs(a - choices[1]),
np.where(np.abs(a - choices[0]) > np.abs(a - choices[2]), choices[0], choices[2]),
np.where(np.abs(a - choices[1]) > np.abs(a - choices[2]), choices[1], choices[2]))
To get the output:
>>d
>>[[1, 1, 1], [5, 5, 5], [10, 10, 10]]
Is there a way to do this more dynamically while still preserving the vectorization.

Subtract choices from a, find the index of the minimum of the result, substitute.
a = np.array([[0, 0, 0], [4, 4, 4], [9, 9, 9]])
choices = np.array([1, 5, 10])
b = a[:,:,None] - choices
np.absolute(b,b)
i = np.argmin(b, axis = -1)
a = choices[i]
print a
>>>
[[ 1 1 1]
[ 5 5 5]
[10 10 10]]
a = np.array([[0, 3, 0], [4, 8, 4], [9, 1, 9]])
choices = np.array([1, 5, 10])
b = a[:,:,None] - choices
np.absolute(b,b)
i = np.argmin(b, axis = -1)
a = choices[i]
print a
>>>
[[ 1 1 1]
[ 5 10 5]
[10 1 10]]
>>>
The extra dimension was added to a so that each element of choices would be subtracted from each element of a. choices was broadcast against a in the third dimension, This link has a decent graphic. b.shape is (3,3,3). EricsBroadcastingDoc is a pretty good explanation and has a graphic 3-d example at the end.
For the second example:
>>> print b
[[[ 1 5 10]
[ 2 2 7]
[ 1 5 10]]
[[ 3 1 6]
[ 7 3 2]
[ 3 1 6]]
[[ 8 4 1]
[ 0 4 9]
[ 8 4 1]]]
>>> print i
[[0 0 0]
[1 2 1]
[2 0 2]]
>>>
The final assignment uses an Index Array or Integer Array Indexing.
In the second example, notice that there was a tie for element a[0,1] , either one or five could have been substituted.

To explain wwii's excellent answer in a little more detail:
The idea is to create a new dimension which does the job of comparing each element of a to each element in choices using numpy broadcasting. This is easily done for an arbitrary number of dimensions in a using the ellipsis syntax:
>>> b = np.abs(a[..., np.newaxis] - choices)
array([[[ 1, 5, 10],
[ 1, 5, 10],
[ 1, 5, 10]],
[[ 3, 1, 6],
[ 3, 1, 6],
[ 3, 1, 6]],
[[ 8, 4, 1],
[ 8, 4, 1],
[ 8, 4, 1]]])
Taking argmin along the axis you just created (the last axis, with label -1) gives you the desired index in choices that you want to substitute:
>>> np.argmin(b, axis=-1)
array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2]])
Which finally allows you to choose those elements from choices:
>>> d = choices[np.argmin(b, axis=-1)]
>>> d
array([[ 1, 1, 1],
[ 5, 5, 5],
[10, 10, 10]])
For a non-symmetric shape:
Let's say a had shape (2, 5):
>>> a = np.arange(10).reshape((2, 5))
>>> a
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
Then you'd get:
>>> b = np.abs(a[..., np.newaxis] - choices)
>>> b
array([[[ 1, 5, 10],
[ 0, 4, 9],
[ 1, 3, 8],
[ 2, 2, 7],
[ 3, 1, 6]],
[[ 4, 0, 5],
[ 5, 1, 4],
[ 6, 2, 3],
[ 7, 3, 2],
[ 8, 4, 1]]])
This is hard to read, but what it's saying is, b has shape:
>>> b.shape
(2, 5, 3)
The first two dimensions came from the shape of a, which is also (2, 5). The last dimension is the one you just created. To get a better idea:
>>> b[:, :, 0] # = abs(a - 1)
array([[1, 0, 1, 2, 3],
[4, 5, 6, 7, 8]])
>>> b[:, :, 1] # = abs(a - 5)
array([[5, 4, 3, 2, 1],
[0, 1, 2, 3, 4]])
>>> b[:, :, 2] # = abs(a - 10)
array([[10, 9, 8, 7, 6],
[ 5, 4, 3, 2, 1]])
Note how b[:, :, i] is the absolute difference between a and choices[i], for each i = 1, 2, 3.
Hope that helps explain this a little more clearly.

I love broadcasting and would have gone that way myself too. But, with large arrays, I would like to suggest another approach with np.searchsorted that keeps it memory efficient and thus achieves performance benefits, like so -
def searchsorted_app(a, choices):
lidx = np.searchsorted(choices, a, 'left').clip(max=choices.size-1)
ridx = (np.searchsorted(choices, a, 'right')-1).clip(min=0)
cl = np.take(choices,lidx) # Or choices[lidx]
cr = np.take(choices,ridx) # Or choices[ridx]
mask = np.abs(a - cl) > np.abs(a - cr)
cl[mask] = cr[mask]
return cl
Please note that if the elements in choices are not sorted, we need to add in the additional argument sorter with np.searchsorted.
Runtime test -
In [160]: # Setup inputs
...: a = np.random.rand(100,100)
...: choices = np.sort(np.random.rand(100))
...:
In [161]: def broadcasting_app(a, choices): # #wwii's solution
...: return choices[np.argmin(np.abs(a[:,:,None] - choices),-1)]
...:
In [162]: np.allclose(broadcasting_app(a,choices),searchsorted_app(a,choices))
Out[162]: True
In [163]: %timeit broadcasting_app(a, choices)
100 loops, best of 3: 9.3 ms per loop
In [164]: %timeit searchsorted_app(a, choices)
1000 loops, best of 3: 1.78 ms per loop
Related post : Find elements of array one nearest to elements of array two

numpy.tile did not work as Matlab repmat

According to What is the equivalent of MATLAB's repmat in NumPy, I tried to build 3x3x5 array from 3x3 array using python.
In Matlab, this work as I expected.
a = [1,1,1;1,2,1;1,1,1];
a_= repmat(a,[1,1,5]);
size(a_) = 3 3 5
But for numpy.tile
b = numpy.array([[1,1,1],[1,2,1],[1,1,1]])
b_ = numpy.tile(b, [1,1,5])
b_.shape = (1, 3, 15)
If I want to generate the same array as in Matlab, what is the equivalent?
Edit 1
The output I would expect to get is
b_(:,:,1) =
1 1 1
1 2 1
1 1 1
b_(:,:,2) =
1 1 1
1 2 1
1 1 1
b_(:,:,3) =
1 1 1
1 2 1
1 1 1
b_(:,:,4) =
1 1 1
1 2 1
1 1 1
b_(:,:,5) =
1 1 1
1 2 1
1 1 1
but what #farenorth and the numpy.dstack give is
[[[1 1 1 1 1]
[1 1 1 1 1]
[1 1 1 1 1]]
[[1 1 1 1 1]
[2 2 2 2 2]
[1 1 1 1 1]]
[[1 1 1 1 1]
[1 1 1 1 1]
[1 1 1 1 1]]]

NumPy functions are not, in general, 'drop-in' replacements for matlab functions. Often times there are subtle difference to how the 'equivalent' functions are used. It does take time to adapt, but I've found the transition to be very worthwhile.
In this case, the np.tile documentation indicates what happens when you are trying to tile an array to higher dimensions than it is defined,
numpy.tile(A, reps)
Construct an array by repeating A the number of times given by reps.
If reps has length d, the result will have dimension of max(d, A.ndim).
If A.ndim < d, A is promoted to be d-dimensional by prepending new axes. So a shape (3,) array is promoted to (1, 3) for 2-D replication, or shape (1, 1, 3) for 3-D replication. If this is not the desired behavior, promote A to d-dimensions manually before calling this function.
In this case, your array is being cast to a shape of [1, 3, 3], then being tiled. So, to get your desired behavior just be sure to append a new singleton-dimension to the array where you want it,
>>> b_ = numpy.tile(b[..., None], [1, 1, 5])
>>> print(b_.shape)
(3, 3, 5)
Note here that I've used None (i.e. np.newaxis) and ellipses notation to specify a new dimension at the end of the array. You can find out more about these capabilities here.
Another option, which is inspired by the OP's comment would be:
b_ = np.dstack((b, ) * 5)
In this case, I've used tuple multiplication to 'repmat' the array, which is then constructed by np.dstack.
As #hpaulj indicated, Matlab and NumPy display matrices differently. To replicate the Matlab output you can do something like:
>>> for idx in xrange(b_.shape[2]):
... print 'b_[:, :, {}] = \n{}\n'.format(idx, str(b_[:, :, idx]))
...
b_[:, :, 0] =
[[1 1 1]
[1 2 1]
[1 1 1]]
b_[:, :, 1] =
[[1 1 1]
[1 2 1]
[1 1 1]]
b_[:, :, 2] =
[[1 1 1]
[1 2 1]
[1 1 1]]
b_[:, :, 3] =
[[1 1 1]
[1 2 1]
[1 1 1]]
b_[:, :, 4] =
[[1 1 1]
[1 2 1]
[1 1 1]]
Good luck!

Let's try the comparison, taking care to diversify the shapes and values.
octave:7> a=reshape(0:11,3,4)
a =
0 3 6 9
1 4 7 10
2 5 8 11
octave:8> repmat(a,[1,1,2])
ans =
ans(:,:,1) =
0 3 6 9
1 4 7 10
2 5 8 11
ans(:,:,2) =
0 3 6 9
1 4 7 10
2 5 8 11
numpy equivalent - more or less:
In [61]: a=np.arange(12).reshape(3,4)
In [62]: np.tile(a,[2,1,1])
Out[62]:
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]]])
numpy again, but with order F to better match the MATLAB Fortran-derived layout
In [63]: a=np.arange(12).reshape(3,4,order='F')
In [64]: np.tile(a,[2,1,1])
Out[64]:
array([[[ 0, 3, 6, 9],
[ 1, 4, 7, 10],
[ 2, 5, 8, 11]],
[[ 0, 3, 6, 9],
[ 1, 4, 7, 10],
[ 2, 5, 8, 11]]])
I'm adding the new numpy dimension at the start, because in many ways it better replicates the MATLAB practice of adding it at the end.
Try adding the new dimension at the end. The shape is (3,4,5), but you might not like the display.
np.tile(a[:,:,None],[1,1,2])
Another consideration - what happens when you flatten the tile?
octave:10> repmat(a,[1,1,2])(:).'
ans =
0 1 2 3 4 5 6 7 8 9 10 11
0 1 2 3 4 5 6 7 8 9 10 11
with the order F a
In [78]: np.tile(a[:,:,None],[1,1,2]).flatten()
Out[78]:
array([ 0, 0, 3, 3, 6, 6, 9, 9, 1, 1, 4, 4, 7, 7, 10, 10, 2,
2, 5, 5, 8, 8, 11, 11])
In [79]: np.tile(a,[2,1,1]).flatten()
Out[79]:
array([ 0, 3, 6, 9, 1, 4, 7, 10, 2, 5, 8, 11, 0, 3, 6, 9, 1,
4, 7, 10, 2, 5, 8, 11])
with a C order array:
In [80]: a=np.arange(12).reshape(3,4)
In [81]: np.tile(a,[2,1,1]).flatten()
Out[81]:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 0, 1, 2, 3, 4,
5, 6, 7, 8, 9, 10, 11])
This last one matches the Octave layout.
So does:
In [83]: a=np.arange(12).reshape(3,4,order='F')
In [84]: np.tile(a[:,:,None],[1,1,2]).flatten(order='F')
Out[84]:
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 0, 1, 2, 3, 4,
5, 6, 7, 8, 9, 10, 11])
Confused yet?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python NumPy: Performing different column operations over every N rows - python

Related

Python program to delete all the rows and columns with all zeros and print the remaining matrix

how to understand such shuffling data code in Numpy

numpy: convert multiple assignments to a single one using OR

How to select value from array that is closest to value in array using vectorization?

numpy.tile did not work as Matlab repmat

Categories

Resources