Numpy array operation to shift index - python

I have a very specific situation: I have a long 1-D numpy array (arr). I am interested in those elements that are greater than a no. (n). So I am using: idx = np.argwhere(arr > n) and: val = arr[idx] to get the elements and their indices. Now the problem: I am adding an integer offset (ofs) to the indices (idx) and bringing back the overflowing indices to the front using: idx = (idx + ofs) % len(arr) (as if the original array (arr) is rolled and again argwhere used). If it is correct till here, what exactly should I use to get the updated val (the array that corresponds to the idx)? Thanks in advance.
Ex: Let arr=[2,5,8,4,9], n=4, so idx=[1,2,4] and val=[5,8,9]. Now let ofs=3, then idx=[4,5,7]%5=[4,0,2]. I expect val=[8,9,5].

I don't know if I understand the aim of this question correctly, but if we want to rearrange val with orders in idx, it can be done by np.argsort as:
mask_idx = np.where(arr > n)[0] # satisfied indices in arr, where elements are bigger than the specified value
val = arr[mask_idx] # satisfied corresponding values
mask_updated_idx = (mask_idx + ofs) % len(arr) # --> [4 0 2]
idx_sorted = mask_updated_idx.argsort() # --> [1 2 0] indices rearranging order array
val = val[idx_sorted] # --> [8 9 5]

Related

find top_k element of numpy ndarray and ignore zero

Given a numpy ndarray like the following
x = [[4.,0.,2.,0.,8.],
[1.,3.,0.,9.,5.],
[0.,0.,4.,0.,1.]]
I want to find the indices of the top k (e.g. k=3) elements of each row, excluding 0, if possible. If there are less than k positive elements, then just return their indices (in a sorted way).
The result should look like (a list of array)
res = [[4, 0, 2],
[3, 4, 1],
[2, 4]]
or just one flatten array
res = [4,0,2,3,4,2,2,4]
I know argsort can find the indices of top k elements in a sorted order. But I am not sure how to filter out the 0.
You can use numpy.argsort with (-num) for getting index as descending. then use numpy.take_along_axis for getting values base index of 2D sorted.
Because you want to ignore zero you can insert zero for other columns after three (as you mention in the question). At the end return value from the sorted values that is not zero.
x = np.array([[4.,0.,2.,0.,8.],[1.,3.,0.,9.,5.],[0.,0.,4.,0.,1.]])
idx_srt = np.argsort(-x)
val_srt = np.take_along_axis(x, idx_srt, axis=-1)
val_srt[:, 3:] = 0
res = idx_srt[val_srt!=0]
print(res)
[4 0 2 3 4 1 2 4]
Try one of these two:
k = 3
res = [sorted(range(len(r)), key=(lambda i: r[i]), reverse=True)[:min(k, len([n for n in r if n > 0]))] for r in x]
or
res1 = [np.argsort(r)[::-1][:min(k, len([n for n in r if n > 0]))] for r in x]
I came up with the following solution:
top_index = score.argsort(axis=1) # score here is my x
positive = (score > 0).sum(axis=1)
positive = np.minimum(positive, k) # top k
# broadcasting trick to get mask matrix that selects top k (k = min(2000, num of positive scores))
r = np.arange(score.shape[1])
mask = (positive[:,None] > r)
top_index_flatten = top_index[:, ::-1][mask]
I compare my result with the one suggested by #I'mahdi and they are consistent.

Problem with python while loop going over every element in 2d array

I am having problems with while loops in python right now:
while(j < len(firstx)):
trainingset[j][0] = firstx[j]
trainingset[j][1] = firsty[j]
trainingset[j][2] = 1 #top
print(trainingset)
j += 1
print("j is " + str(j))
i = 0
while(i < len(firsty)):
trainingset[j+i][0] = twox[i]
trainingset[j+i][1] = twoy[i]
trainingset[j+i][2] = 0 #bottom
i += 1
Where trainingset = [[0,0,0]]*points*2 where points is a number. Also firstx and firsty and twox and twoy are all numpy arrays.
I want the training set to have 2*points array entries which go [firstx[0], firsty[0], 1] all the way to [twox[points-1], twoy[points-1], 0].
After some debugging, I am realizing that for each iteration, **every single value **in the training set is being changed, so that when j = 0 all the training set values are replaced with firstx[0], firsty[0], and 1.
What am I doing wrong?
In this case, I would recommend a for loop instead of a while loop; for loops are great for when you want the index to increment and you know what the last increment value should be, which is true here.
I've had to make some assumptions about the shape of your arrays based on your code. I'm assuming that:
firstx, firsty, twox, and twoy are 1D NumPy arrays with either shape (length,) or (length, 1).
trainingset is a 2D NumPy array with least len(firstx) + len(firsty) rows and at least 3 columns.
j starts at 0 before the while loop begins.
Given these assumptions, here's some code that gives you the output you want:
len_firstx = len(firstx)
# Replace each row with index j with the desired values
for j in range(len(firstx)):
trainingset[j][0:3] = [firstx[j], firsty[j], 1]
# Because i starts at 0, len_firstx needs to be added to the trainingset row index
for i in range(len(firsty)):
trainingset[i + len_firstx][0:3] = [twox[i], twoy[i], 0]
Let me know if you have any questions.
EDIT: Alright, looks like the above doesn't work correctly on rare occasions, not sure why, so if it's being fickle, you can change trainingset[j][0:3] and trainingset[i + len_firstx][0:3] to trainingset[j, 0:3] and trainingset[i + len_firstx, 0:3].
I think it has something to do with the shape of the trainingset array, but I'm not quite sure.
EDIT 2: There's also a more Pythonic way to do what you want instead of using loops. It standardizes the shapes of the four arrays assumed to be 1D (firstx, twox, etc. -- also, if you could let me know exactly what shape these arrays have, that would be super helpful and I could simplify the code!) and then makes the appropriate rows and columns in trainingset have the corresponding values.
# Function to reshape the 1D arrays.
# Works for shapes (length,), (length, 1), and (1, length).
def reshape_1d_array(arr):
shape = arr.shape
if len(shape) == 1:
return arr[:, None]
elif shape[0] >= shape[1]:
return arr
else:
return arr.T
# Reshape the 1D arrays
firstx = reshape_1d_array(firstx)
firsty = reshape_1d_array(firsty)
twox = reshape_1d_array(twox)
twoy = reshape_1d_array(twoy)
len_firstx = len(firstx)
# The following 2 lines do what the loops did, but in 1 step.
arr1 = np.concatenate((firstx, firsty[0:len_firstx], np.array([[1]*len_firstx]).T), axis=1)
arr2 = np.concatenate((twox, twoy[0:len_firstx], np.array([[0]*len_firstx]).T), axis=1)
# Now put arr1 and arr2 where they're supposed to go in trainingset.
trainingset[:len_firstx, 0:3] = arr1
trainingset[len_firstx:len_firstx + len(firsty), 0:3] = arr2
This gives the same result as the two for loops I wrote before, but is faster if firstx has more than ~50 elements.

Compare and multiply elements in a list

I'm trying to write in an algorithm a function that:
Check if all elements in a list are different
Multiply all elements in the list, except the zeros
But I can't find a way to compare all elements in one list, do you have any idea ?
Thanks!
PS: I use arr = np.random.randint(10, size=a) to create a random list
EDIT:
More precisely, I'm trying to check if, in a numpy array to be more precise, all the elements are the same or different, if they are different, that it returns me True.
Also, once that done, multiply all elements in the array except the zeros
For example:
If I have an array [4,2,6,8,9,0], the algorithm tells returns me at first True because all elements are different, then it multiplies them 4*2*6*8*9 except the 0
To check if all elements in a list are different you can convert the list into a set which removes duplicates and compare the length of the set to the original list. If the length of the set is different than the length of the list, then there are duplicates.
x = np.random.randint(10, size=10)
len(set(x)) == len(x)
To multiply all values except 0 you can do list comprehension to remove the 0s and use np.prod to multiply the values in the new list.
np.prod([i for i in x if i != 0])
Example:
x = np.random.randint(10, size=10)
if len(set(x)) == len(x):
print(np.prod([i for i in x if i != 0]))
else:
print("array is not unique")
You can use numpy.unique.
Following code snippet checks if all elements in the array are unique (different from each other) and if so, it will multiply non-zero values with factor factor:
import numpy as np
factor = 5
if np.unique(arr).size == arr.size:
arr[arr != 0] = arr[arr != 0] * factor
You can use Collections to find the unique numbers. I have included a code that solves your problem.
import numpy as np
from collections import Counter
a = 5
arr = np.random.randint(10, size=a)
result = 1 #Variable that will store the product
flag = 0 #The counter
#Check if all the numbers are unique
for i in Counter(arr):
if Counter(arr)[i] > 1:
flag = 1
break
#Convert the dictionary into a list
l = [i for i in Counter(arr)]
#Return the product of all the numbers in the list except 0
if flag == 0:
for i in l:
if i != 0:
result = result * i
else:
print("The numbers are not unique")
Just for fun, here's a one-liner:
arr = np.array([1, 2, 3, 4, 0])
np.prod(arr[arr!=0]) if np.unique(arr).size == arr.size else False
>>> 24
If the array is [1, 2, 3, 4, 4] the result is False

Interleave numpy arrays with numeric comparison quickly

I have 2 Python lists of integers. The lists are possibly different sizes. One is a list of indices of all the maxima in a dataset, and the other is a list of indices of all the minima. I want to make a list of consecutive maxes and mins in order, and skipping cases where, say, 2 mins come between 2 maxes.
Speed matters most, so I'm asking how the following can done most quickly (using Numpy, I assume, a la this answer): What numpy code can make up some_function() below to do this calculation?
>>> min_idx = [1,5,7]
>>> max_idx = [2,4,6,8]
>>> some_function(min_idx, max_idx)
[1, 2, 5, 6, 7, 8]
In the above example, we looked to see which *_idx list started with the lower value and chose it to be "first" (min_idx). From there, we hop back and forth between min_idx and max_idx to pic "the next biggest number":
Start with 1 from min_idx
Look at max_idx to find the first unused number which is larger than 1: 2
Go back to min_idx to find the first unused number which is larger than 2: 5
Again for max_idx: we skip 4 because it's less than 5 and chose 6
continue process until we run out of values in either list.
As another example, for min_idx = [1,3,5,7,21] and max_idx = [4,6,8,50], the expected result is [1,4,5,6,7,8,21,50]
My current non-Numpy solution looks like this where idx is the output:
# Ensure we use alternating mins and maxes
idx = []
max_bookmark = 0
if min_idx[0] < max_idx[0]:
first_idx = min_idx
second_idx = max_idx
else:
first_idx = max_idx
second_idx = min_idx
for i, v in enumerate(first_idx):
if not idx:
# We just started, so put our 1st value in idx
idx.append(v)
elif v > idx[-1]:
idx.append(v)
else:
# Go on to next value in first_idx until we're bigger than the last (max) value
continue
# We just added a value from first_idx, so now look for one from second_idx
for j, k in enumerate(second_idx[max_bookmark:]):
if k > v:
idx.append(k)
max_bookmark += j + 1
break
Unlike other answers about merging Numpy arrays, the difficulty here is comparing element values as one hops between the two lists along the way.
Background: Min/Max List
The 2 input lists to my problem above are generated by scipy.argrelextrema which has to be used twice: once to get indices of maxima and again to get indices of minima. I ultimately just want a single list of indices of alternating maxes and mins, so if there's some scipy or numpy function which can find maxes and mins of a dataset, and return a list of indices indicating alternating maxes and mins, that would solve what I'm looking for too.
Here is a much simpler logic without using Numpy (note: this assumes that max(min_idx) < max(max_idx):
min_idx = [1,3,5,7,21]
max_idx = [4,6,8,50]
res = []
for i in min_idx:
if not res or i > res[-1]:
pair = min([m for m in max_idx if m > i])
res.extend([i, pair])
print(res)
>>> [1, 4, 5, 6, 7, 8, 21, 50]

How to select not only the maximum of a `numpy.ndarray` but the top 3 maximal values in python?

I have a list of float values (positive and negative ones) stored in a variable row of type <type 'numpy.ndarray'>.
max_value = max(row)
gives me the maximal value of row. Is there an elegant way to select the top 3 (5, 10,...) values?
I came up with
selecting the maximum value from row
deleting the maximal value in row
selecting the maximum value from row
deleting the maximal value in row
and so on
But that's certainly an ugly style and not pythonic at all. What do the pythonistas say to that? :)
Edit
I need not only the maximal three values, bit there position (index in row), too. Sorry, I forgot to mention that...
I would use np.argsort
a = np.arange(10)
a[np.argsort(a)[-3:]]
EDIT
To also get the position, just use:
ii = np.argsort(a)[-3:] # positions
vals = a[ii] # values
Why not just sort the numpy array and then read off the values you need:
In [33]: np.sort(np.array([1,5,4,6,7,2,3,9]))[-3:]
Out[33]: array([6, 7, 9])
EDIT: seeing as the question has now changed and you need the positions as well as values, use numpy.argsort to obtain the indices instead of values:
In [43]: a=np.array([1,5,4,6,7,2,3,9])
In [44]: idx=np.argsort(a)
In [45]: topvals=idx[-3:]
In [46]: print topvals
[3 4 7]
In [47]: print a[topvals]
[6 7 9]
This ugly trick is somewhat faster than argsort()[-3:], at least in numpy 1.5.1 on my old mac ppc.
argpartsort in Bottleneck,
some NumPy array functions written in Cython, would be waaay faster.
#!/bin/sh
python -mtimeit -s '
import numpy as np
def max3( A ):
j = A.argmax(); aj = A[j]; A[j] = - np.inf
j2 = A.argmax(); aj2 = A[j2]; A[j2] = - np.inf
j3 = A.argmax()
A[j] = aj
A[j2] = aj2
return [j, j2, j3]
N = '${N-1e6}'
A = np.arange(N)
' '
j3 = A.argsort()[-3:] # N 1e6: 405 msec per loop
# j3 = max3( A ) # N 1e6: 105 msec per loop
'

Categories

Resources