Sum Rows of 2D np array with List of Indices - python

I have a 2d numpy array and a list of numbers. If the list is [1, 3, 1, 8] where the list sums to the number of rows, I want to output an array with the first row unchanged, the next three rows summed, the fifth row unchanged, and the remaining eight rows summed.
As an example:
A = [[0,0], [1,2], [3,4]] and l = [1, 2] would output [[0,0], [4,6]
I looked through np.sum and other functions but could not find not this functionality. Thank you.

You can just iterate over the indices of l and based on the position either take that row or sum over a range of rows.
import numpy as np
A = [[0,0], [1,2], [3,4]]
l = [1, 2]
ans = []
for i in range(len(l)):
if i%2 == 0:
ans.append(A[ l[i] ])
else:
ans.append( np.sum( A[ l[i-1]:l[i-1] + l[i] ], axis=0 ) )
ans = np.array(ans)
print(ans)
[[1 2]
[4 6]]
N.B:
If the list is [1, 3, 1, 8] where the list sums to the number of rows,
I want to output an array with the first row unchanged, the next three
rows summed, the fifth row unchanged, and the remaining eight rows
summed.
I think you meant [1, 3, 5, 8]

If the number of elements in l is relatively large large, you might get better performance by using groupby from pandas, e.g.
import pandas as pd
labels = np.repeat(np.arange(1, len(l) + 1), l)
# [1, 2, 2]
df = pd.DataFrame(A)
df['label'] = labels
result = df.groupby('label').sum().values

I ended up coming up with my own solution when I realized I could sort my list without affecting my desired output. I used np.unique to determine the first indices of each element in the sorted list and then summed the rows between those indices. See below.
elements, indices = np.unique(data, return_counts=True)
row_summing = np.append([0], np.cumsum(indices))[:-1] #[0, index1, index2,...]
output = np.add.reduceat(matrix, row_summing, axis=0)

Related

Add repeated elements of array indexed by another array

I have a relatively simple problem that I cannot solve without using loops. It is difficult for me to figure out the correct title for this problem.
Lets say we have two numpy arrays:
array_1 = np.array([[0, 1, 2],
[3, 3, 3],
[3, 3, 4],
[3, 6, 2]])
array_2 = np.array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2],
[3, 3, 3],
[4, 4, 4],
[5, 5, 5],
[6, 6, 6]])
array_1 represents indices of the rows in array_2 that we want to sum. So for example, 4th row in result array should contain summed all rows in array_2 that have same row indices as all 3s in array_1.
It is much easier to understand it in the code:
result = np.empty(array_2.shape)
for i in range(array_1.shape[0]):
for j in range(array_1.shape[1]):
index = array_1[i, j]
result[index] = result[index] + array_2[i]
Result should be:
[[ 0 0 0]
[ 0 0 0]
[ 3 3 3]
[10 10 10]
[ 2 2 2]
[ 0 0 0]
[ 3 3 3]]
I tried to use np.einsum but I need to use both elements in array as indices and also its rows as indices so I'm not sure if np.einsum is the best path here.
This is the problem I have in graphics. array_1 represent indices of vertices for triangles and array_2 represents normals where index of a row corresponds to the index of the vertex
Any time you're adding something from a repeated index, normal ufuncs like np.add don't work out of the box because they only process a repeated fancy index once. Instead, you have to use the unbuffered version, which is np.add.at.
Here, you have a pair of indices: the row in array_1 is the row index into array_2, and the element of array_1 is the row index into the output.
First, construct the indices explicitly as fancy indices. This will make it much simpler to use them:
output_row = array_1.ravel()
input_row = np.repeat(np.arange(array_1.shape[0]), array_1.shape[1]).ravel()
You can apply input_row directly to array_2, but you need add.at to use output_row:
output = np.zeros_like(array_2)
np.add.at(output, output_row, array_2[input_row])
You really only use the first four rows of array_2, so it could be truncated to
array_2 = array2[:array_1.shape[0]]
In that case, you would want to initialize the output as:
output = np.zeros_like(array_2, shape=(output_row.max() + 1, array2.shape[1]))

I need to remove every point that has the same Y coordinate in an array

Basically I have an array list [x,y] that goes : [0,1][1,2][2,4][3,1][4,3] and the list goes on. I want to execute a code that removes the points that have the same y coordinate except the first one in order. I would like to have as output : [0,1][1,2][2,4][4,3]. How can I do this I have tried using np.unique but I can't mange to keep the first appearance or to remove based on the y coordinate.
Thanks
You can use HYRY's solution from numpy.unique with order preserved, you just need to select the Y column.
import numpy as np
a = np.array([[0,1], [1,2], [2,4], [3,1], [4,3]])
_, idx = np.unique(a[:, 1], return_index=True)
a[np.sort(idx)]
result:
[[0 1]
[1 2]
[2 4]
[4 3]]
array = [[0,1],[1,2],[2,4],[3,1],[4,3]]
occured = set()
result = []
for element in array:
if element[1] not in occured:
result.append(element)
occured.add(element[1])
array.clear()
array.extend(result)
print(array)
>> [[0, 1], [1, 2], [2, 4], [4, 3]]

Fastest way to get max frequency element for every row of numpy matrix

Given a 2d numpy matrix, X of shape [m,n] whose all values are guaranteed to be integers between 0 and 9 inclusive, I wish to calculate for every row, the value which occurs most frequently in that particular row (to break ties, return the highest value), and output this max-value array of length m. A short example would be as follows:
X = [[1,2,3,4],
[0,0,6,9],
[5,7,7,5],
[1,0,0,0],
[1,8,1,8]]
The output for the above matrix should be:
y = [4,0,7,0,8]
Consider the first row - all elements occur with same frequency, hence the numerically greatest value with highest frequency is 4. In the second row, there is only one number 0 with the highest frequency. In the third row, both 5 and 7 occur twice, hence, 7 is chosen and so on.
I could do this by maintaining collections.Counter objects for each row and then choosing the number satisfying the criteria. A naive implementation which I tried:
from collections import Counter
X = np.array([[1,2,3,4],[0,0,6,9],[5,7,7,5],[1,0,0,0],[1,8,1,8]])
y = np.zeros(len(X), dtype=int)
for i in range (len(X)):
freq_count = Counter (X[i])
max_freq, max_freq_val = 0, -1
for val in range (10):
if (freq_count.get(val, 0) >= max_freq):
max_freq = freq_count.get(val, 0)
max_freq_val = val
y[i] = max_freq_val
print (y) #prints [4 0 7 0 8]
But using Counters is not fast enough. Is it possible to improve the running time? Maybe by also using vectorization? It is given that m = O(5e4) and n = 45.
Given than the numbers are always integers between 0 and 9, you could use numpy.bincount to count the number of occurrences, then use numpy.argmax to find the last appearance (using a reversed view [::-1]):
import numpy as np
X = np.array([[1, 2, 3, 4],
[0, 0, 6, 9],
[5, 7, 7, 5],
[1, 0, 0, 0],
[1, 8, 1, 8]])
res = [9 - np.bincount(row, minlength=10)[::-1].argmax() for row in X]
print(res)
Output
[4, 0, 7, 0, 8]
According to the timings here np.bincount is pretty fast. For more details on using argmax to find the last occurrence of the max value read this

python - Adding combinations of adjacent rows in a matrix

This is my first post here and I'm a python beginner - all help is appreciated!
I'm trying to add all combinations of adjacent rows in a numpy matrix. i.e. row 1 + row 2, row 2 + row 3, row 3 + row 4, etc... with output to a list
I will then look for the smallest of these outputs and select that item in the list to be printed
I believe I need to use a for loop of some sort but I really am a novice...
Just iterate over the length of the array - 1 and add the pairs as you go into a new list. Then, select the one you want. For example:
>>> x = np.array([[1,2,3], [4,5,6], [7,8,9]])
>>> print [x[i] + x[i+1] for i in range(len(x)-1)]
[array([5, 7, 9]), array([11, 13, 15])]
Suppose you have this
import numpy as np
arr = np.array([[1, 2, 3],
[4, 5, 6],
[7 , 8, 9]])
You can first calculate the sum of each row by using np.sum(arr, axis=1) the argument axis=1 allows to sum each column entries for each line.
In this case, sums = np.sum(arr, axis=1) = array([ 6, 15, 24]).
Then you can iterate over this tab to add the different sums :
lst_sums = []
for s in range(len(sums)-1) :
lst_sums.append(sums[i]+sums[i+1])
Then you can sorted or getting the np.min(sums)
If you need more details you can look at numpy function docs, same for the lists

Eliminating redundant numpy rows

If I have an array
arr = [[0,1]
[1,2]
[2,3]
[4,3]
[5,6]
[3,4]
[2,1]
[6,7]]
how could I eliminate redundant rows where columns values may be swapped? In the example above, the code would reduce the array to
arr = [[0,1]
[1,2]
[2,3]
[4,3]
[5,6]
[6,7]]
I have thought about using a combination of slicing arr[:,::-1, np.all, and np.any, but what I have come up so far simply gives me True and False per row when comparing rows but this wouldn't discriminate between similar rows.
j = np.any([np.all(y==x, axis=1) for y in x[:,::-1]], axis=0)
which yields [False, True, False, True, False, True, True, False].
Thanks in advance.
Basically you want to Find Unique Rows, and these answers borrow heavily from the top two answers there - but you need to sort the rows first to eliminate different orders.
If you don't care about order of rows at the end, this is the short way (but slower than below):
np.vstack({tuple(row) for row in np.sort(arr,-1)})
If you do want to maintain order, you can turn each sorted row into a void object and use np.unique with return_index
b = np.ascontiguousarray(np.sort(arr,-1)).view(np.dtype((np.void, arr.dtype.itemsize * arr.shape[1])))
_, idx = np.unique(b, return_index=True)
unique_arr = arr[idx]
It might be tempting to use set row-wise instead of using np.sort(arr,-1) and np.void to make an object array, but this only works if there are no repeated values in rows. If there are, a row of [1,2,2] will be considered equivalent to a row with [1,1,2] - both will be set(1,2)
A solution without using numpy,
In [27]: result_ = set(([tuple(sorted(row)) for row in arr]))
In [28]: result = [list(i) for i in result_]
In [29]: result
Out[29]: [[0, 1], [1, 2], [6, 7], [5, 6], [2, 3], [3, 4]]
The solution using numpy.lexsort routine:
import numpy as np
arr = np.array([
[0,1], [1,2], [2,3], [4,3], [5,6], [3,4], [2,1], [6,7]
])
order = np.lexsort(arr.T)
a = arr[order] # sorted rows
arr= a[[i for i,r in enumerate(a) if i == len(a)-1 or set(a[i]) != set(a[i+1])]]
print(arr)
The output:
[[0 1]
[1 2]
[2 3]
[3 4]
[5 6]
[6 7]]
After getting the boolean list, you can use the folllowing technique to obtain the list with values where x and y are swapped.
In order to remove same rows, you can use the following block
#This block to remove elements where x and y are swapped provided the list j
j=[True,False..] #Your Boolean List
finalArray=[]
for (bool,value) in zip(j,arr):
if not bool:
finalArray.append(value)
#This code to remove same elements
finalArray= [list(x) for x in set(tuple(x) for x in arr)]

Categories

Resources