User defined tie breaker for argsort() in numpy - python

I have two arrays v and c (can read as value and cost).
I need to perform argsort() on v such that if 2 elements in v are the same, then they need to be sorted according to their corresponding elements in c.
Example
v = [4,1,4,4] # Here 0th, 2nd and 3rd elemnt are equal
c = [5,0,30,10]
numpy.argsort(v) = [1,0,2,3] # equal values sorted by index
Required output
[1,0,3,2] # c[0] < c[3] < c[2]
How to achieve this in Python?

The function argsort receives an order parameter, from the docs:
When a is an array with fields defined, this argument specifies which
fields to compare first, second, etc.
So you could create a structured array from the two values, and the pass the fields in order:
import numpy as np
v = [4, 1, 4, 4]
c = [5, 0, 30, 10]
s = np.array(list(zip(v, c)), dtype=[('value', 'i4'), ('cost', 'i4')])
result = np.argsort(s, order=['value', 'cost'])
print(result)
Output
[1 0 3 2]

Related

compare each row with each column in matrix using only pure python

I have a certain function that I made and I want to run it on each column and each row of a matrix, to check if there are rows and columns that produce the same output.
for example:
matrix = [[1,2,3],
[7,8,9]]
I want to run the function, lets call it myfun, on each column [1,7], [2,8] and [3,9] separatly, and also run it on each row [1,2,3] and [7,8,9]. If there is a row and a column that produce the same result, the counter ct would go up 1. All of this is found in another function, called count_good, which basically counts rows and columns that produce the same result.
here is the code so far:
def count_good(mat):
ct = 0
for i in mat:
for j in mat:
if myfun(i) == myfun(j):
ct += 1
return ct
However, when I use print to check my code I get this:
mat = [[1,2,3],[7,8,9]]
​
for i in mat:
for j in mat:
print(i,j)
​
[1, 2, 3] [1, 2, 3]
[1, 2, 3] [7, 8, 9]
[7, 8, 9] [1, 2, 3]
[7, 8, 9] [7, 8, 9]
I see that the code does not return what I need' which means that the count_good function won't work. How can I run a function on each row and each column? I need to do it without any help of outside libraries, no map,zip or stuff like that, only very pure python.
Let's start by using itertools and collections for this, then translate it back to "pure" python.
from itertools import product, starmap, chain # combinations?
from collections import Counter
To iterate in a nested loop efficiently, you can use itertools.product. You can use starmap to expand the arguments of a function as well. Here is a generator of the values of myfun over the rows:
starmap(myfun, product(matrix, repeat=2))
To transpose the matrix and iterate over the columns, use the zip(* idiom:
starmap(myfun, product(zip(*matrix), repeat=2))
You can use collections.Counter to map all the repeats for each possible return value:
Counter(starmap(myfun, chain(product(matrix, repeat=2), product(zip(*matrix), repeat=2))))
If you want to avoid running myfun on the same elements, replace product(..., repeat=2) with combinations(..., 2).
Now that you have the layout of how to do this, replace all the external library stuff with equivalent builtins:
counter = {}
for i in range(len(matrix)):
for j in range(len(matrix)):
result = myfun(matrix[i], matrix[j])
counter[result] = counter.get(result, 0) + 1
for i in range(len(matrix[0])):
for j in range(len(matrix[0])):
c1 = [matrix[row][i] for row in range(len(matrix))]
c2 = [matrix[row][j] for row in range(len(matrix))]
result = myfun(c1, c2)
counter[result] = counter.get(result, 0) + 1
If you want combinations instead, replace the loop pairs with
for i in range(len(...) - 1):
for j in range(i + 1, len(...)):
Using native python:
def count_good(mat):
ct = 0
columns = [[row[col_idx] for row in mat] for col_idx in range(len(mat[0]))]
for row in mat:
for column in columns:
if myfun(row) == myfun(column):
ct += 1
return ct
However, this is very inefficient as it is a triple nested for-loop. I would suggest using numpy instead.
e.g.
def count_good(mat):
ct = 0
mat = np.array(mat)
for row in mat:
for column in mat.T:
if myfun(row) == myfun(column):
ct += 1
return ct
TL;DR
To get a column from a 2D list of N lists of M elements, first flatten the list to a 1D list of N×M elements, then choosing elements from the 1D list with a stride equal to M, the number of columns, gives you a column of the original 2D list.
First, I create a matrix of random integers, as a list of lists of equal
length — Here I take some liberty from the objective of "pure" Python, the OP
will probably input by hand some assigned matrix.
from random import randrange, seed
seed(20220914)
dim = 5
matrix = [[randrange(dim) for column in range(dim)] for row in range(dim)]
print(*matrix, sep='\n')
We need a function to be applied to each row and each column of the matrix,
that I intend must be supplied as a list. Here I choose a simple summation of
the elements.
def myfun(l_st):
the_sum = 0
for value in l_st: the_sum = the_sum+value
return the_sum
To proceed, we are going to do something unexpected, that is we unwrap the
matrix, starting from an empty list we do a loop on the rows and "sum" the
current row to unwrapped, note that summing two lists gives you a single
list containing all the elements of the two lists.
unwrapped = []
for row in matrix: unwrapped = unwrapped+row
In the following we will need the number of columns in the matrix, this number
can be computed counting the elements in the last row of the matrix.
ncols = 0
for value in row: ncols = ncols+1
Now, we can compute the values produced applying myfunc to each column,
counting how many times we have the same value.
We use an auxiliary variable, start, that is initialized to zero and
incremented in every iteration of the following loop, that scans, using a
dummy variable, all the elements of the current row, hence start has the
values 0, 1, ..., ncols-1, so that unwrapped[start::ncols] is a list
containing exactly one of the columns of the matrix.
count_of_column_values = {}
start = 0
for dummy in row:
column_value = myfun(unwrapped[start::ncols])
if column_value not in count_of_column_values:
count_of_column_values[column_value] = 1
else:
count_of_column_values[column_value] = count_of_column_values[column_value] + 1
start = start+1
At this point, we are ready to apply myfun to the rows
count = 0
for row in matrix:
row_value = myfun(row)
if row_value in count_of_column_values: count = count+count_of_column_values[row_value]
print(count)
Executing the code above prints
[1, 4, 4, 1, 0]
[1, 2, 4, 1, 4]
[1, 4, 4, 0, 1]
[4, 0, 3, 1, 2]
[0, 0, 4, 2, 2]
3

Get indices of top N values from list in the order of the values

I have a list like a=[3,5,7,12,4,1,5] and need to get the indices of the top K(=3) elements of this list, in the order of the elements. So in this case, result should be
[3,2,1]
since top 3 elements are 12, 7, 5 (last one is tie so first index is returned).
What is the simplest way to get this?
As this is tagged numpy, you can use numpy.argsort on the opposite values (for reverse sorting), then slice to get the K desired values:
a = np.array([3,5,7,12,4,18,1,5,18])
K = 3
out = np.argsort(-a)[:K]
output: array([3, 2, 1])
If you want the indices in order of the original array but not necessarily sorted themselves in order of the values, you can also use numpy.argpartition:
out = np.argpartition(a, K)[-K:]
I assume you mean 3,2,1 right?
[a.index(i) for i in sorted(a)[:3:-1]]
Personally, I would create a sorted list through the sorted method, then compare the first K values with the .index attribute of a. This will yield you your desired result.
K = 3 #Number of elemens
a = [3,5,7,12,4,1,5]
a_sorted = sorted(a,reverse=True)
b = [a.index(v) for v in a_sorted[:K]]
print(b)
>>> [3, 2, 1]

NumPy - descending stable arg-sort of arrays of any dtype

NumPy's np.argsort is able to do stable sorting through passing kind = 'stable' argument.
Also np.argsort doesn't support reverse (descending) order.
If non-stable behavior is needed then descending order can be easily modeled through desc_ix = np.argsort(a)[::-1].
I'm looking for efficient/easy solution to descending-stable-sort NumPy's array a of any comparable dtype. See my meaning of "stability" in the last paragraph.
For the case when dtype is any numerical then stable descending arg-sorting can be easily done through sorting negated version of array:
print(np.argsort(-np.array([1, 2, 2, 3, 3, 3]), kind = 'stable'))
# prints: array([3, 4, 5, 1, 2, 0], dtype=int64)
But I need to support any comparable dtype including np.str_ and np.object_.
Just for clarification - maybe for descending orders classical meaning of stable means that equal elements are enumerated right to left. If so then in my question meaning of stable + descending is something different - equal ranges of elements should be enumerated left to right, while equal ranges between each other are ordered in descending order. I.e. same behavior should be achieved like in the last code above. I.e. I want stability in a sense same like Python achieves in next code:
print([e[0] for e in sorted(enumerate([1,2,2,3,3,3]), key = lambda e: e[1], reverse = True)])
# prints: [3, 4, 5, 1, 2, 0]
I think this formula should work:
import numpy as np
a = np.array([1, 2, 2, 3, 3, 3])
s = len(a) - 1 - np.argsort(a[::-1], kind='stable')[::-1]
print(s)
# [3 4 5 1 2 0]
We can make use of np.unique(..., return_inverse=True) -
u,tags = np.unique(a, return_inverse=True)
out = np.argsort(-tags, kind='stable')
One simplest solution would be through mapping sorted unique elements of any dtype to ascending integers and then stable ascending arg-sorting of negated integers.
Try it online!
import numpy as np
a = np.array(['a', 'b', 'b', 'c', 'c', 'c'])
u = np.unique(a)
i = np.searchsorted(u, a)
desc_ix = np.argsort(-i, kind = 'stable')
print(desc_ix)
# prints [3 4 5 1 2 0]
Similar to #jdehesa's clean solution, this solution allows specifying an axis.
indices = np.flip(
np.argsort(np.flip(x, axis=axis), axis=axis, kind="stable"), axis=axis
)
normalised_axis = axis if axis >= 0 else x.ndim + axis
max_i = x.shape[normalised_axis] - 1
indices = max_i - indices

Loop over clump_masked indices

I have an array y_filtered that contains some masked values. I want to replace these values by some value I calculate based on their neighbouring values. I can get the indices of the masked values by using masked_slices = ma.clump_masked(y_filtered). This returns a list of slices, e.g. [slice(194, 196, None)].
I can easily get the values from my masked array, by using y_filtered[masked_slices], and even loop over them. However, I need to access the index of the values as well, so i can calculate its new value based on its neighbours. Enumerate (logically) returns 0, 1, etc. instead of the indices I need.
Here's the solution I came up with.
# get indices of masked data
masked_slices = ma.clump_masked(y_filtered)
y_enum = [(i, y_i) for i, y_i in zip(range(len(y_filtered)), y_filtered)]
for sl in masked_slices:
for i, y_i in y_enum[sl]:
# simplified example calculation
y_filtered[i] = np.average(y_filtered[i-2:i+2])
It is very ugly method i.m.o. and I think there has to be a better way to do this. Any suggestions?
Thanks!
EDIT:
I figured out a better way to achieve what I think you want to do. This code picks every window of 5 elements and compute its (masked) average, then uses those values to fill the gaps in the original array. If some index does not have any unmasked value close enough it will just leave it as masked:
import numpy as np
from numpy.lib.stride_tricks import as_strided
SMOOTH_MARGIN = 2
x = np.ma.array(data=[1, 2, 3, 4, 5, 6, 8, 9, 10],
mask=[0, 1, 0, 0, 1, 1, 1, 1, 0])
print(x)
# [1 -- 3 4 -- -- -- -- 10]
pad_data = np.pad(x.data, (SMOOTH_MARGIN, SMOOTH_MARGIN), mode='constant')
pad_mask = np.pad(x.mask, (SMOOTH_MARGIN, SMOOTH_MARGIN), mode='constant',
constant_values=True)
k = 2 * SMOOTH_MARGIN + 1
isize = x.dtype.itemsize
msize = x.mask.dtype.itemsize
x_pad = np.ma.array(
data=as_strided(pad_data, (len(x), k), (isize, isize), writeable=False),
mask=as_strided(pad_mask, (len(x), k), (msize, msize), writeable=False))
x_avg = np.ma.average(x_pad, axis=1).astype(x_pad.dtype)
fill_mask = ~x_avg.mask & x.mask
result = x.copy()
result[fill_mask] = x_avg[fill_mask]
print(result)
# [1 2 3 4 3 4 10 10 10]
(note all the values are integers here because x was originally of integer type)
The original posted code has a few errors, firstly it both reads and writes values from y_filtered in the loop, so the results of later indices are affected by the previous iterations, this could be fixed with a copy of the original y_filtered. Second, [i-2:i+2] should probably be [max(i-2, 0):i+3], in order to have a symmetric window starting at zero or later always.
You could do this:
from itertools import chain
# get indices of masked data
masked_slices = ma.clump_masked(y_filtered)
for idx in chain.from_iterable(range(s.start, s.stop) for s in masked_slices):
y_filtered[idx] = np.average(y_filtered[max(idx - 2, 0):idx + 3])

Replace elements in a matrix (Need help to make this faster)

C is composed of elements of the array B and i want to change each element which would correspond to A
#Program Block starts
import numpy as np
A= np.array([1, 2, 3, 4, 5])
B= np.array([1, 5, 3, 9, 15] )
# I have a 3*3 matrix
C = [[0 for x in range(3)] for x in range(3)]
C[0][:]=[1,5,3]
C[1][:]=[7,9,15]
C[2][:]=[2,9,15]
flag=(A==B).astype(int) # comparing for equality of 2 arrays A and B, and storing as binary
C_new=np.copy(C)
flag_ind=[i for i, e in enumerate(flag) if e==0] # storing the indices of non differing elements
for x in flag_ind:
C_new[C_new==B[x]]=A[x]
The output will be C_new=[1, 2, 3; 7 4 5; 2 4 5]
The actual sizes of A and B are ~ 600000 , size of C is 4000000*4.. time for simulation is taking ~ 14 hrs.. If there is a way to do the same operation with greater speed ..kindly let me know
This way you are iterating over the entire C array as many times as the intersection between A and B.
What I suggest is creating a dictionary that maps the B values to the A values so that you can retrieve every equivalent element in approximately constant time.
This is what I did, it took 9s to run with arrays of the same size as you specified.
A_dict = dict((k, v) for k, v in zip(B, A) if k != v)
map_c = np.vectorize(lambda x: A_dict.get(x, x))
C_new = map_c(C)
First I created the dictionary to map every value from B that has a different equivalent on A, then I created the function that will use this dictionary on the C array.

Categories

Resources