Grouping elements of a numpy array using an array of group counts

Grouping elements of a numpy array using an array of group counts - python

Given two arrays, one representing a stream of data, and another representing group counts, such as:
import numpy as np
# given group counts: 3 4 3 2
# given flattened data:[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 ]
group_counts = np.array([3,4,3,2])
data = np.arange(group_counts.sum()) # placeholder data, real life application will be a very large array
I want to generate matrices based on the group counts for the streamed data, such as:
target_count = 3 # I want to make a matrix of all data items who's group_counts = target_count
# Expected result
# [[ 0 1 2]
# [ 7 8 9]]
To do this I wrote the following:
# Find all matches
match = np.where(groups == group_target)[0]
i1 = np.cumsum(groups)[match] # start index for slicing
i0 = i1 - groups[match] # end index for slicing
# Prep the blank matrix and fill with resuls
matched_matrix = np.empty((match.size,target_count))
# Is it possible to get rid of this loop?
for i in xrange(match.size):
matched_matrix[i] = data[i0[i]:i1[i]]
matched_matrix
# Result: array([[ 0, 1, 2],
[ 7, 8, 9]]) #
This works, but I would like to get rid of the loop and I can't figure out how.
Doing some research I did find numpy.split and numpy.array_split:
match = np.where(group_counts == target_count)[0]
match = np.array(np.split(data,np.cumsum(groups)))[match]
# Result: array([array([0, 1, 2]), array([7, 8, 9])], dtype=object) #
But numpy.split produces a list of dtype=object that I have to convert.
Is there an elegant way to produce the desired result without a loop?

You can repeat group_counts so it has the same size as data, then filter and reshape based on the target:
group_counts = np.array([3,4,3,2])
data = np.arange(group_counts.sum())
target = 3
data[np.repeat(group_counts, group_counts) == target].reshape(-1, target)
#array([[0, 1, 2],
# [7, 8, 9]])

Related

Python Numpy - Slicing assignment not assigning correctly

I have a 2d numpy array called arm_resets that has positive integers. The first column has all positive integers < 360. For all columns other than the first, I need to replace all values over 360 with the value that is in the same row in the 1st column. I thought this would be a relatively easy thing to do, here's what I have:
i = 300
over_360 = arm_resets[:, [i]] >= 360
print(arm_resets[:, [i]][over_360])
print(arm_resets[:, [0]][over_360])
arm_resets[:, [i]][over_360] = arm_resets[:, [0]][over_360]
print(arm_resets[:, [i]][over_360])
And here's what prints:
[3600 3609 3608 ... 3600 3611 3605]
[ 0 9 8 ... 0 11 5]
[3600 3609 3608 ... 3600 3611 3605]
Since all numbers that are being shown in the first print (first 3 and last 3) are above 360, they should be getting replaced by the 2nd print in the 3rd print. Why is this not working?
edit: reproducible example:
df = pd.DataFrame({"start":[1,2,5,6],"freq":[1,5,6,9]})
periods = 6
arm_resets = df[["start"]].values
freq = df[["freq"]].values
arm_resets = np.pad(arm_resets,((0,0),(0,periods-1)))
for i in range(1,periods):
arm_resets[:,[i]] = arm_resets[:,[i-1]] + freq
#over_360 = arm_resets[:,[i]] >= periods
#arm_resets[:,[i]][over_360] = arm_resets[:,[0]][over_360]
arm_resets
Given commented out code here's what prints:
array([[ 1, 2, 3, 4, 5, 6],
[ 2, 7, 12, 17, 22, 27],
[ 3, 9, 15, 21, 27, 33],
[ 4, 13, 22, 31, 40, 49]])
What I would expect:
array([[ 1, 2, 3, 4, 5, 1],
[ 2, 2, 2, 2, 2, 2],
[ 3, 3, 3, 3, 3, 3],
[ 4, 4, 4, 4, 4, 4]])
Now if it helps, the final 2d array I'm actually trying to create is a 1/0 array that indicates which are filled in, so in this example I'd want this:
array([[ 0, 1, 1, 1, 1, 1],
[ 0, 0, 1, 0, 0, 0],
[ 0, 0, 0, 1, 0, 0],
[ 0, 0, 0, 0, 1, 0]])
The code I use to achieve this from the above arm_resets is this:
fin = np.zeros((len(arm_resets),periods),dtype=int)
for i in range(len(arm_resets)):
fin[i,a[i]] = 1

The slice arm_resets[:, [i]] is a fancy index, and therefore makes a copy of the ith column of the data. arm_resets[:, [i]][over_360] = ... therefore calls __setitem__ on a temporary array that is discarded as soon as the statement executes. If you want to assign to the mask, call __setitem__ on the sliced object directly:
arm_resets[over_360, [i]] = ...
You also don't need to make the index into a list. It's generally better to use simple indices, especially when doing assignments, since they create views rather than copies:
arm_resets[over_360, i] = ...
With slicing, even the following should work, since it calls __setitem__ on a view:
arm_resets[:, i][over_360] = ...
This index does not help you process each row of the data, since i is a column. In fact, you can process the entire matrix in one step, without looping, if you use indices rather than a boolean mask. The reason that indices are useful is that you can match the item from the correct row in the first column:
rows, cols = np.nonzero(arm_resets[:, 1:] >= 360)
arm_resets[rows, cols] = arm_resets[rows, 1]

You can use np.where()
first_col = arm_resets[:,0] # first col
first_col = first_col.reshape(first_col.size,1) #Transfor in 2d array
arm_resets = np.where(arm_resets >= 360,first_col,arm_resets)
You can see in detail how np.where work here, but basically it compare arm_resets >= 360, if true it put first_col value in place (there another detail here with broadcasting) if false it put arm_resets value.
Edit: As suggested by Mad Physicist. You can use arm_resets[:,0,None] directly instead of creating first_col variable.
arm_resets = np.where(arm_resets >= 360,arm_resets[:,0,None],arm_resets)

how to delete rows and columns in numpy python?

I am having trouble creating a function which takes a matrix M as an input and deletes BOTH rows and columns containing the number 0 and giving an output containing the remaining numbers. Any help is much appreciated as I have my programming exam coming up soon.
By "deleting both rows and columns" this is what I mean:

import numpy as np
x = np.array([[1,2,3,4,5],
[6,0,8,9,10],
[11,12,13,14,15],
[16,0,0,19,20]])
idxs_array = list(np.where(x==0))
idxs_array = [list(dict.fromkeys(x)) for x in idxs_array]
for axis, idxs in enumerate(idxs_array):
sub_factor = 0
for idx in idxs:
x = np.delete(x,idx-sub_factor,axis)
sub_factor += 1
print(x)
# x = [[ 1, 4, 5],
# [11, 14, 15]]

1. Locate zero elements
First of all, we need to identify the location of the zero elements in the matrix, which can be done easily with np.where().
np.where will return the row/column indices of the elements matched specific condition (doc).
row_idx, col_idx = np.where(arr == 0)
2. Remove corresponding rows/columns
To remove corresponding rows and columns, there is an easy way to do this, which is indexing (doc).
That is, you can specify the row (or column) you want to keep with True, else it shall be False.
print(np.arange(4)[[True, False, True, False]])
# array([0, 2])
3. Put two things together
Here is a minimal example.
arr = np.array([[ 1, 2, 3, 4, 5],
[ 6, 0, 8, 9, 10],
[11, 12, 13, 14, 15],
[16, 0, 0, 19, 20]])
row_idx, col_idx = np.where(arr == 0)
rm_row_idx = set(row_idx.tolist())
rm_col_idx = set(col_idx.tolist())
row_mask = [i not in rm_row_idx for i in range(arr.shape[0])]
col_mask = [i not in rm_col_idx for i in range(arr.shape[1])]
arr = arr[row_mask, :]
arr = arr[:, col_mask]
print(arr)
# Shall be:
# array([[ 1, 4, 5],
# [11, 14, 15]])

How to calculate moving average of NumPy array with varying window sizes defined by an array of indices?

Which is the most pythonic way to average the values in a 2d array (axis=1) based on a range in a 1d array?
I am trying to average arrays of environmental variables (my 2d array) based on every 2 degrees of latitude (my id array). I have a latitude array that goes from -33.9 to 29.5. I'd like to average the environmental variables within every 2 degrees from -34 to 30.
The number of elements within each 2 degrees may be different, for example:
arr = array([[5,3,4,5,6,4,2,4,5,8],
[4,5,8,5,2,3,6,4,1,7],
[8,3,5,8,5,2,5,9,9,4]])
idx = array([1,1,1,2,2,3,3,3,3,4])
I would then average the values in arr based on idx[0:3], idx[3:9], idx[9].
I would like to get a result of:
arrAvg = array([4,4.2,8],
[6.3,3.5,7],
[5.3,6.3,4])

#Andyk already explained in his post how to calculate the average having a list of indices.
I will provide a solution for getting those indices.
Here is a general approach:
from typing import Optional
import numpy as np
def get_split_indices(array: np.ndarray,
*,
window_size: int,
start_value: Optional[int] = None) -> np.ndarray:
"""
:param array: input array with consequent integer indices
:param window_size: specifies range of indices
which will be included in a separate window
:param start_value: from which the window will start
:return: array of indices marking the borders of the windows
"""
if start_value is None:
start_value = array[0]
diff = np.diff(array)
diff_indices = np.where(diff)[0] + 1
slice_ = slice(window_size - 1 - (array[0] - start_value) % window_size,
None,
window_size)
return diff_indices[slice_]
Examples of usage:
Checking it with your example data:
# indices: 3 9
idx = np.array([1,1,1, 2,2,3,3,3,3, 4])
you can get the indices separating different windows like this:
get_split_indices(idx,
window_size=2,
start_value=0)
>>> array([3, 9])
With this function you can also specify different window sizes:
# indices: 7 11 17
idx = np.array([0,1,1,2,2,3,3, 4,5,6,7, 8,9,10,11,11,11, 12,13])
get_split_indices(idx,
window_size=4,
start_value=0)
>>> array([ 7, 11, 17])
and different starting values:
# indices: 1 7 10 13 18
idx = np.array([0, 1,1,2,2,3,3, 4,5,6, 7,8,9, 10,11,11,11,12, 13])
get_split_indices(idx,
window_size=3,
start_value=-2)
>>> array([ 1, 7, 10, 13, 18])
Note that I made the first element of array a starting value by default.

You could use the np.hsplit function. For your example of indices 0:3, 3:9, 9 it goes like this:
np.hsplit(arr, [3, 9])
which gives you a list of arrays:
[array([[5, 3, 4],
[4, 5, 8],
[8, 3, 5]]),
array([[5, 6, 4, 2, 4, 5],
[5, 2, 3, 6, 4, 1],
[8, 5, 2, 5, 9, 9]]),
array([[8],
[7],
[4]])]
Then you can compute the mean as follows:
m = [np.mean(a, axis=1) for a in np.hsplit(arr, [3, 9])]
And convert it back to an array:
np.vstack(m).T

How to sum up (W * H) of 3D matrix and store it in 1D matrix with length=depth(third dimension of input matrix)

I want to sum up all elements (W * H) of 3D matrix and store it in 1D matrix with length=depth(third dimension of input matrix)
To make myself clear:
Input dimension = 1D in the form of (W * H * D).
Required output = 1D again with length=D
let's consider below 3D Matrix : 2 x 3 x 2.
Layer 1 Layer 2
[1, 2, 3 [7, 8, 9
4, 5, 6] 10, 11, 12]
output is 1D : [21, 57]
I am new to python and wrote like this:
def test(w, h, c, image_inp):
output = [image_inp[j * w + k] for i in enumerate(image_inp)
for j in range(0,h)
for k in range(0,w)
#image_inp[j * w + k] comment
]
printout(output)
I know this will copy the input array as it is to output array.
also output array length is not equal to Depth.
Some one please help me in getting this right
def test(w, h, c, image_inp):
output = [hwsum for i in enumerate(image_inp)
hwsum += wsum for j in range(0,h)
wsum += image_inp[j*w + k] for k in range(0,w)
#image_inp[j * w + k]
]
print "Calling outprint"
printout(output)
Note: I do not want to use numpy(with this it is working) or any math libraries.
reason being I am writing test code in python to evaluate a working on language.
EDIT:
input matrix will be entering the test function as 1D with w, h, c as arguments,
so it takes the form as:
[1,2,3,4,5,6,7,8,9,10,12],
with w, h, c have to compute considering input1D as 3D matrix.
thanks

Numpy is very suitable for slicing and manipulating single and multiple dimensional data. It is fast, easy to use and very "pythonic".
Following your example, you can just do:
>>> import numpy
>>> img3d=numpy.array([[[1,2,3],[4,5,6]],[[7,8,9],[10,12,12]]])
>>> img3d.shape
(2, 2, 3)
You can see here that img3d has 2 layers, 2 rows and 3 columns. You can just slice using indexing like this:
>>> img3d[0,:,:]
array([[1, 2, 3],
[4, 5, 6]])
To go from 3D to 1D, just use numpy.flatten():
>>> f=img3d.flatten()
>>> f
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 12])
And reversed, use numpy.reshape():
>>> f.reshape((2,2,3))
array([[[ 1, 2, 3],
[ 4, 5, 6]],
[[ 7, 8, 9],
[10, 12, 12]]])
Now add just jusing numpy.sum, giving the dimensions you want to add (in your case, dimensions 1 and 2 (dimensions being 0-indexed):
>>> numpy.sum(img3d,(1,2))
array([21, 58])
Just to summarize in a oneliner, you can do (variable names from your question):
>>> numpy.sum(numpy.array(image_inp).reshape(w,h,c),(1,2))
From the numpy manual on numpy.sum:
numpy.sum
numpy.sum(a, axis=None, dtype=None, out=None, keepdims=numpy._globals._NoValue>)
Sum of array elements over a given axis.
Parameters:
a : array_like Elements to sum.
axis : None or int or
tuple of ints, optional Axis or axes along which a sum is performed.
The default, axis=None, will sum all of the elements of the input
array. If axis is negative it counts from the last to the first axis.
New in version 1.7.0.: If axis is a tuple of ints, a sum is performed
on all of the axes specified in the tuple instead of a single axis or
all the axes as before.

If your matrix is set as your post implies with your "3D" matrix as an array of arrays:
M = [ [1, 2, 3,
4, 5, 6],
[ 7, 8, 9,
10,11,12],
]
array_of_sums = []
for pseudo_2D_matrix in M:
array_of_sums.append(sum(pseudo_2D_matrix))
If your 3D matrix, as a real three dimensional object, is set up as:
M = [
[ [ 1, 2, 3],
[ 4, 5, 6]
],
[ [ 7, 8, 9],
[10,11,12],
]
You could create a 1D array of sums by doing the following:
array_of_sums = []
for 2D_matrix in M:
s = 0
for row in 2D_matrix:
s += sum(row)
array_of_sums.append(s)
It's a bit unclear how your data are formatted, but hopefully you get the idea from these two examples.
EDIT:
In light of clarification on input you could easily accomplish this:
If dimensions w,h,c are given as dimensional breakout of the array [1,2,3,4,5,6,7,8,9,10,12], then you simply need to boundary off those regions and sum based on that:
input_array = [1,2,3,4,5,6,7,8,9,10,11,12]
w,h,c = 2,3,2
array_of_sums = []
i = 0
while i < w:
array_of_sums.append(sum(input_array[i*h*c:(i+1)*h*c]))
i += 1
as a simplified method:
def sum_2D_slices(w,h,c,matrix_3D):
return [sum(matrix_3D[i*h*c:(i+1)*h*c]) for i in range(w)]

How to slice a multidimensional array in python/numpy in a way to select specific row, column and depth?

I'm trying to convert my MATLAB code to python but I'm having some issues. This code is supposed to segment letters from a picture.
Here's the whole code in MATLAB:
he = imread('r.jpg');
imshow(he);
%C = makecform(type) creates the color transformation structure C that defines the color space conversion specified by type.
cform = makecform('srgb2lab');
%To perform the transformation, pass the color transformation structure as an argument to the applycform function.
lab_he = applycform(he,cform);
%convert to double precision
ab = double(lab_he(:,:,2:3));
%size of dimension in 2D array
nrows = size(ab,1);
ncols = size(ab,2);
%B = reshape(A,sz1,...,szN) reshapes A into a sz1-by-...-by-szN array where
%sz1,...,szN indicates the size of each dimension. You can specify a single
% dimension size of [] to have the dimension size automatically calculated,
% such that the number of elements in B matches the number of elements in A.
% For example, if A is a 10-by-10 matrix, then reshape(A,2,2,[]) reshapes
% the 100 elements of A into a 2-by-2-by-25 array.
ab = reshape(ab,nrows*ncols,2);
% repeat the clustering 3 times to avoid local minima
nColors = 3;
[cluster_idx, cluster_center] = kmeans(ab,nColors,'distance','sqEuclidean', ...
'Replicates',3);
pixel_labels = reshape(cluster_idx,nrows,ncols);
imshow(pixel_labels,[]), title('image labeled by cluster index');
segmented_images = cell(1,3);
rgb_label = repmat(pixel_labels,[1 1 3]);
for k = 1:nColors
color = he;
color(rgb_label ~= k) = 0;
segmented_images{k} = color;
end
figure,imshow(segmented_images{1}), title('objects in cluster 1');
figure,imshow(segmented_images{2}), title('objects in cluster 2');
figure,imshow(segmented_images{3}), title('objects in cluster 3');
mean_cluster_value = mean(cluster_center,2);
[tmp, idx] = sort(mean_cluster_value);
blue_cluster_num = idx(1);
L = lab_he(:,:,1);
blue_idx = find(pixel_labels == blue_cluster_num);
L_blue = L(blue_idx);
is_light_blue = im2bw(L_blue,graythresh(L_blue));
% target_labels = repmat(uint8(0),[nrows ncols]);
% target_labels(blue_idx(is_light_blue==false)) = 1;
% target_labels = repmat(target_labels,[1 1 3]);
% blue_target = he;
% blue_target(target_labels ~= 1) = 0;
% figure,imshow(blue_target), title('blue');
Here's what I have in Python so far:
import cv2
import numpy as np
from matplotlib import pyplot as plt
import sys
img = cv2.imread('r.jpg',1)
print "original image: ", img.shape
cv2.imshow('BGR', img)
img1 = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img2 = cv2.cvtColor(img1, cv2.COLOR_RGB2LAB)
cv2.imshow('RGB', img1)
cv2.imshow('LAB', img2) #differs from the LAB color space in MATLAB (need to patch maybe?)
print "LAB converted image: ", img2.shape
print "LAB converted image dimension", img2.ndim #says the image is a 3 dimensional array
img2a = img2.shape[2][1:2]
print "Slicing the LAB converted image", img2a
#we need to convert that to double precision
print img2.dtype
img2a = img2.astype(np.uint64) #convert to double precision
print img2a.dtype
#print img2a
row = img2a.shape[0] #returns number of rows of img2a
column = img2a.shape[1] #returns number of columns of img2a
print "row: ", row #matches the MATLAB version
print "column: ", column #matchees the MATLAB version
rowcol = row * column
k = cv2.waitKey(0)
if k == 27: # wait for ESC key to exit
cv2.destroyAllWindows()
elif k == ord('s'): # wait for 's' key to save and exit
cv2.imwrite('final image',final_image)
cv2.destroyAllWindows()
Now the part i'm currently stuck in is that here in Matlab code, I have lab_he(:,:,2:3) which means all the rows and all the columns from depth 2 and 3. However when I try to replicate that in Python img2a = img2.shape[2][1:2] but it doesn't work or makes sense. Please help.

In Octave/MATLAB
octave:29> x=reshape(1:(2*3*4),3,2,4);
octave:30> x(:,:,2:3)
ans =
ans(:,:,1) =
7 10
8 11
9 12
ans(:,:,2) =
13 16
14 17
15 18
octave:31> size(x(:,:,2:3))
ans =
3 2 2
octave:33> x(:,:,2:3)(2,2,:)
ans(:,:,1) = 11
ans(:,:,2) = 17
In numpy:
In [13]: x=np.arange(1,1+2*3*4).reshape(3,2,4,order='F')
In [14]: x[:,:,1:3]
Out[14]:
array([[[ 7, 13],
[10, 16]],
[[ 8, 14],
[11, 17]],
[[ 9, 15],
[12, 18]]])
In [15]: _.shape
Out[15]: (3, 2, 2)
In [17]: x[:,:,1:3][1,1,:]
Out[17]: array([11, 17])
Or with numpy normal 'C' ordering, and indexing on the 1st dimension
In [18]: y=np.arange(1,1+2*3*4).reshape(4,2,3)
In [19]: y[1:3,:,:]
Out[19]:
array([[[ 7, 8, 9],
[10, 11, 12]],
[[13, 14, 15],
[16, 17, 18]]])
In [20]: y[1:3,:,:][:,1,1]
Out[20]: array([11, 17])
Same indexing ideas, though matching numbers and shapes requires some care, not only with the 0 v 1 index base. A 3d array is displayed in a different arangement. Octave divides it into blocks on the last index (its primary iterator), numpy iterates on the first index.
In 3d it makes more sense to talk about first, 2nd, 3rd dimensions rather than row,col,depth. In 4d you run out of names. :)

I had to reshape array at specific depth, and I programmed a little recursive function to do so:
def recursive_array_cutting(tab, depth, i, min, max):
if(i==depth):
tab = tab[min:max]
return tab
temp_list = []
nb_subtab = len(tab)
for index in range(nb_subtab):
temp_list.append(recursive_array_cutting(tab[index], depth, i+1, min, max))
return np.asanyarray(temp_list)
It allow to get all array/values between the min and the max of a specific depth, for instance, if you have a (3, 4) tab = [[0, 1, 2, 3], [0, 1, 2, 3], [0, 1, 2, 3]] and only want the last two values of the deepest array, you call like this : tab = recursive_array_cutting(tab, 1, 0, 0, 2) to get the output : [[0 1][0 1][0 1]].
If you have a more complexe array like this tab = [[[0, 1, 2, 3], [1, 1, 2, 3], [2, 1, 2, 3]], [[0, 1, 2, 3], [1, 1, 2, 3], [2, 1, 2, 3]], [[0, 1, 2, 3], [1, 1, 2, 3], [2, 1, 2, 3]]] (3, 3, 4) and want a (3, 2, 4) array, you can call like this : tab = recursive_array_cutting(tab, 1, 0, 0, 2) to get this output, and get rid of the last dimension in depth 1.
Function like this surely exist in numpy, but I did not found it.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Grouping elements of a numpy array using an array of group counts - python

You can repeat group_counts so it has the same size as data, then filter and reshape based on the target: group_counts = np.array([3,4,3,2]) data = np.arange(group_counts.sum()) target = 3 data[np.repeat(group_counts, group_counts) == target].reshape(-1, target) #array([[0, 1, 2], # [7, 8, 9]])

Related

Python Numpy - Slicing assignment not assigning correctly

how to delete rows and columns in numpy python?

How to calculate moving average of NumPy array with varying window sizes defined by an array of indices?

How to sum up (W * H) of 3D matrix and store it in 1D matrix with length=depth(third dimension of input matrix)

How to slice a multidimensional array in python/numpy in a way to select specific row, column and depth?

Categories

Resources