Interleave numpy arrays with numeric comparison quickly

Interleave numpy arrays with numeric comparison quickly - python

I have 2 Python lists of integers. The lists are possibly different sizes. One is a list of indices of all the maxima in a dataset, and the other is a list of indices of all the minima. I want to make a list of consecutive maxes and mins in order, and skipping cases where, say, 2 mins come between 2 maxes.
Speed matters most, so I'm asking how the following can done most quickly (using Numpy, I assume, a la this answer): What numpy code can make up some_function() below to do this calculation?
>>> min_idx = [1,5,7]
>>> max_idx = [2,4,6,8]
>>> some_function(min_idx, max_idx)
[1, 2, 5, 6, 7, 8]
In the above example, we looked to see which *_idx list started with the lower value and chose it to be "first" (min_idx). From there, we hop back and forth between min_idx and max_idx to pic "the next biggest number":
Start with 1 from min_idx
Look at max_idx to find the first unused number which is larger than 1: 2
Go back to min_idx to find the first unused number which is larger than 2: 5
Again for max_idx: we skip 4 because it's less than 5 and chose 6
continue process until we run out of values in either list.
As another example, for min_idx = [1,3,5,7,21] and max_idx = [4,6,8,50], the expected result is [1,4,5,6,7,8,21,50]
My current non-Numpy solution looks like this where idx is the output:
# Ensure we use alternating mins and maxes
idx = []
max_bookmark = 0
if min_idx[0] < max_idx[0]:
first_idx = min_idx
second_idx = max_idx
else:
first_idx = max_idx
second_idx = min_idx
for i, v in enumerate(first_idx):
if not idx:
# We just started, so put our 1st value in idx
idx.append(v)
elif v > idx[-1]:
idx.append(v)
else:
# Go on to next value in first_idx until we're bigger than the last (max) value
continue
# We just added a value from first_idx, so now look for one from second_idx
for j, k in enumerate(second_idx[max_bookmark:]):
if k > v:
idx.append(k)
max_bookmark += j + 1
break
Unlike other answers about merging Numpy arrays, the difficulty here is comparing element values as one hops between the two lists along the way.
Background: Min/Max List
The 2 input lists to my problem above are generated by scipy.argrelextrema which has to be used twice: once to get indices of maxima and again to get indices of minima. I ultimately just want a single list of indices of alternating maxes and mins, so if there's some scipy or numpy function which can find maxes and mins of a dataset, and return a list of indices indicating alternating maxes and mins, that would solve what I'm looking for too.

Here is a much simpler logic without using Numpy (note: this assumes that max(min_idx) < max(max_idx):
min_idx = [1,3,5,7,21]
max_idx = [4,6,8,50]
res = []
for i in min_idx:
if not res or i > res[-1]:
pair = min([m for m in max_idx if m > i])
res.extend([i, pair])
print(res)
>>> [1, 4, 5, 6, 7, 8, 21, 50]

Related

Numpy array operation to shift index

I have a very specific situation: I have a long 1-D numpy array (arr). I am interested in those elements that are greater than a no. (n). So I am using: idx = np.argwhere(arr > n) and: val = arr[idx] to get the elements and their indices. Now the problem: I am adding an integer offset (ofs) to the indices (idx) and bringing back the overflowing indices to the front using: idx = (idx + ofs) % len(arr) (as if the original array (arr) is rolled and again argwhere used). If it is correct till here, what exactly should I use to get the updated val (the array that corresponds to the idx)? Thanks in advance.
Ex: Let arr=[2,5,8,4,9], n=4, so idx=[1,2,4] and val=[5,8,9]. Now let ofs=3, then idx=[4,5,7]%5=[4,0,2]. I expect val=[8,9,5].

I don't know if I understand the aim of this question correctly, but if we want to rearrange val with orders in idx, it can be done by np.argsort as:
mask_idx = np.where(arr > n)[0] # satisfied indices in arr, where elements are bigger than the specified value
val = arr[mask_idx] # satisfied corresponding values
mask_updated_idx = (mask_idx + ofs) % len(arr) # --> [4 0 2]
idx_sorted = mask_updated_idx.argsort() # --> [1 2 0] indices rearranging order array
val = val[idx_sorted] # --> [8 9 5]

Find missing elements in a list created from a sequence of consecutive integers with duplicates in O(n)

This is a Find All Numbers Disappeared in an Array problem from LeetCode:
Given an array of integers where 1 ≤ a[i] ≤ n (n = size of array),
some elements appear twice and others appear once.
Find all the elements of [1, n] inclusive that do not appear in this array.
Could you do it without extra space and in O(n) runtime? You may
assume the returned list does not count as extra space.
Example:
Input:
[4,3,2,7,8,2,3,1]
Output:
[5,6]
My code is below - I think its O(N) but interviewer disagrees
def findDisappearedNumbers(self, nums: List[int]) -> List[int]:
results_list=[]
for i in range(1,len(nums)+1):
if i not in nums:
results_list.append(i)
return results_list

You can implement an algorithm where you loop through each element of the list and set each element at index i to a negative integer if the list contains the element i as one of the values,. You can then add each index i which is positive to your list of missing items. It doesn't take any additional space and uses at the most 3 for loops(not nested), which makes the complexity O(3*n), which is basically O(n). This site explains it much better and also provides the source code.
edit- I have added the code in case someone wants it:
#The input list and the output list
input = [4, 5, 3, 3, 1, 7, 10, 4, 5, 3]
missing_elements = []
#Loop through each element i and set input[i - 1] to -input[i - 1]. abs() is necessary for
#this or it shows an error
for i in input:
if(input[abs(i) - 1] > 0):
input[abs(i) - 1] = -input[abs(i) - 1]
#Loop through the list again and append each positive value to output list
for i in range(0, len(input)):
if input[i] > 0:
missing_elements.append(i + 1)

For me using loops is not the best way to do it because loops increase the complexity of the given problem. You can try doing it with sets.
def findMissingNums(input_arr):
max_num = max(input_arr) # get max number from input list/array
input_set = set(input_arr) # convert input array into a set
set_num = set(range(1,max(input_arr)+1)) #create a set of all num from 1 to n (n is the max from the input array)
missing_nums = list(set_num - input_set) # take difference of both sets and convert to list/array
return missing_nums
input_arr = [4,3,2,7,8,2,3,1] # 1 <= input_arr[i] <= n
print(findMissingNums(input_arr)) # outputs [5 , 6]```

Use hash table, or dictionary in Python:
def findDisappearedNumbers(self, nums):
hash_table={}
for i in range(1,len(nums)+1):
hash_table[i] = False
for num in nums:
hash_table[num] = True
for i in range(1,len(nums)+1):
if not hash_table[i]:
print("missing..",i)

Try the following :
a=input() #[4,3,2,7,8,2,3,1]
b=[x for x in range(1,len(a)+1)]
c,d=set(a),set(b)
print(list(d-c))

How can I re-write this while loop using nested for loops?

I followed an algorithm with a while loop, but one of the parameters of the question was that I use nested for loops, and I'm not sure how to do that.
This is the while loop:
i = len(lst)
while i > 0:
big = lst.index(max(lst[0:i]))
lst[big], lst[i-1] = lst[i-1], lst[big]
i = i - 1
return lst
This is the question it's answering:
Input: [5,1,7,3]
First, find the largest number, which is 7.
Swap it and the number currently at the end of the list, which is 3. Now we have: [5,1,3,7]
Now, find the largest number, not including the 7, which is 5.
Swap it and the second to last number, which is 3. Now we have: [3,1,5,7].
Now, find the third largest number (excluding the first two), which is 3.
Swap it and the third to last number, which is 1.
Output: [1, 3, 5, 7]

What you're seeing in the algorithm is a selection sort. And here's your second solution which you asked (nested for loops):
def insertion_sort(arr):
l = len(arr)
for i in range(l-1, -1, -1):
m = -10000 # it should be lower than min(arr)
idx = -1
for key, val in enumerate(arr[:i+1]):
if m < val:
m = val
idx = key
if idx != -1:
arr[i], arr[idx] = arr[idx], arr[i]
return arr
And a quick test:
arr = list(range(10))[::-1]
print(arr)
# prints [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
result = insertion_sort(arr)
print(result)
# prints [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

This looks like a (rather slow) sorting algorithm - namely bubble sort. It's iterating from the end of the list lst. Then it's searching for the maximum value in the first n-1 elements, and swapping them with the end. It will, however, fail, if the maximum value is already at the end, because then it will automatically swap the max(n-1) with the n value. You'll need to add a check for this.
So from a first look, I'm not sure if i is defined before, but let's assume it's defined at the length of the list lst, as it seems to be. So let's start with the outer loop - as have a while loop that looks like it's counting down from i to 0. This is the opposite of an increasing for-loop, so we can create a reserved range:
rev_range = range(0,len(lst))
rev_range.reverse()
for j in rev_range:
# perform the sort
We now have the outer loop for the counting-down while loop. The sort itself iterates forward until it finds the maximum. This is a forward for loop.
# sorting
max_val_so_far_index=lst[j]
# lst[:j-1] gets the first j-1 elements of the list
for k in lst[:j-1]:
if lst[k] > lst[max_val_so_far_index]:
max_val_so_far_index = k
# now we have the index of the maximum value
# swap
temp = lst[j]
lst[j] = lst[max_val_so_far_index]
lst[max_val_so_far_index]=temp
Let's put the two components together to get:
rev_range = range(0,len(lst))
rev_range.reverse()
for j in rev_range:
# perform the sort
# sorting
#print j
max_val_so_far_index=j
# get the first j items
for k in range(j):
if lst[k] > lst[max_val_so_far_index]:
max_val_so_far_index = k
# now we have the index of the maximum value
# swap
temp = lst[j]
lst[j] = lst[max_val_so_far_index]
lst[max_val_so_far_index]=temp
At the end lst is sorted.

The algorithm in the question is just another form of a bubble sort. The original algorithm uses two nested for loops. You can find a good explaination here.

How to retrieve subset in partitioning algorithm?

I have an array and I would like to split it two parts such that their sum is equal for example [10, 30, 20, 50] can be split into [10, 40] , [20, 30]. Both have a sum of 50. This is essentially partitioning algorithm but I'd like the retrieve the subsets not just identify whether it's partitionable. So, I went ahead and did the following:
Update: updated script to handle duplicates
from collections import Counter
def is_partitionable(a):
possible_sums = [a[0]]
corresponding_subsets = [[a[0]]]
target_value = sum(a)/2
if a[0] == target_value:
print("yes",[a[0]],a[1:])
return
for x in a[1:]:
temp_possible_sums = []
for (ind, t) in enumerate(possible_sums):
cursum = t + x
if cursum < target_value:
corresponding_subsets.append(corresponding_subsets[ind] + [x])
temp_possible_sums.append(cursum)
if cursum == target_value:
one_subset = corresponding_subsets[ind] + [x]
another_subset = list((Counter(a) - Counter(one_subset)).elements())
print("yes", one_subset,another_subset)
return
possible_sums.extend(temp_possible_sums)
print("no")
return
is_partitionable(list(map(int, input().split())))
Sample Input & Output:
>>> is_partitionable([10,30,20,40])
yes [10, 40] [30, 20]
>>> is_partitionable([10,30,20,20])
yes [10, 30] [20, 20]
>>> is_partitionable([10,30,20,10])
no
I'm essentially storing the corresponding values that were added to get a value in corresponding_subsets. But, as the size of a increases, it's obvious that the corresponding_subsets would have way too many sub-lists (equal to the number of elements in possible_sums). Is there a better/more efficient way to do this?

Though it is still a hard problem, you could try the following. I assume that there are n elements and they are stored in the array named arr ( I assume 1-based indexing ). Let us make two teams A and B, such that I want to partition the elements of arr among teams A and B such that sum of elements in both the teams is equal. Each element of arr has an option of either going to team A or team B. Say if an element ( say ith element ) goes to team A we denote it by -a[i] and if it goes to team B we let it be a[i]. Thus after assigning each element to a team, if the total sum is 0 our job is done. We will create n sets ( they do not store duplicates ). I will work with the example arr = {10,20,30,40}. Follow the following steps
set_1 = {10,-10} # -10 if it goes to Team A and 10 if goes to B
set_2 = {30,-10,10,-30} # four options as we add -20 and 20
set_3 = {60,0,20,-40,-20,-60} # note we don't need to store duplicates
set_4 = {100,20,40,-40,60,-20,-80,0,-60,-100} # see there is a zero means our task is possible
Now all you have to do is backtrack from the 0 in the last set to see if the ith element a[i] was added as a[i] or as -a[i], ie. whether it is added to Team A or B.
EDIT
The backtracking routine. So we have n sets from set_1 to set_n. Let us make two lists list_A to push the elements that belong to team A and similarly list_B. We start from set_n , thus using a variable current_set initially having value n. Also we are focusing at element 0 in the last list, thus using a variable current_element initially having value 0. Follow the approach in the code below ( I assume all sets 1 to n have been formed, for sake of ease I have stored them as list of list, but you should use set data structure ). Also the code below assumes a 0 is seen in the last list ie. our task is possible.
sets = [ [0], #see this dummy set it is important, this is set_0
#because initially we add -arr[0] or arr[0] to 0
[10,-10],
[30,-10,10,-30],
[60,0,20,-40,-20,-60],
[100,20,40,-40,60,-20,-80,0,-60,-100]]
# my array is 1 based so ignore the zero
arr = [0,10,20,30,40]
list_A = []
list_B = []
current_element = 0
current_set = 4 # Total number of sets in this case is n=4
while current_set >= 1:
print current_set,current_element
for element in sets[current_set-1]:
if element + arr[current_set] == current_element:
list_B.append(arr[current_set])
current_element = element
current_set -= 1
break
elif element - arr[current_set] == current_element:
list_A.append(arr[current_set])
current_element = element
current_set -= 1
break
print list_A,list_B

This is my implementation of #sasha's algo on the feasibility.
def my_part(my_list):
item = my_list.pop()
balance = []
temp = [item, -item]
while len(my_list) != 0:
new_player = my_list.pop()
for i, items in enumerate(temp):
balance.append(items + new_player)
balance.append(items - new_player)
temp = balance[:]
balance = set(balance)
if 0 in balance:
return 'YES'
else:
return 'NO'
I am working on the backtracking too.

Improving the execution time of matrix calculations in Python

I work with a large amount of data and the execution time of this piece of code is very very important. The results in each iteration are interdependent, so it's hard to make it in parallel. It would be awesome if there is a faster way to implement some parts of this code, like:
finding the max element in the matrix and its indices
changing the values in a row/column with the max from another row/column
removing a specific row and column
Filling the weights matrix is pretty fast.
The code does the following:
it contains a list of lists of words word_list, with count elements in it. At the beginning each word is a separate list.
it contains a two dimensional list (count x count) of float values weights (lower triangular matrix, the values for which i>=j are zeros)
in each iteration it does the following:
it finds the two words with the most similar value (the max element in the matrix and its indices)
it merges their row and column, saving the larger value from the two in each cell
it merges the corresponding word lists in word_list. It saves both lists in the one with the smaller index (max_j) and it removes the one with the larger index (max_i).
it stops if the largest value is less then a given THRESHOLD
I might think of a different algorithm to do this task, but I have no ideas for now and it would be great if there is at least a small performance improvement.
I tried using NumPy but it performed worse.
weights = fill_matrix(count, N, word_list)
while 1:
# find the max element in the matrix and its indices
max_element = 0
for i in range(count):
max_e = max(weights[i])
if max_e > max_element:
max_element = max_e
max_i = i
max_j = weights[i].index(max_e)
if max_element < THRESHOLD:
break
# reset the value of the max element
weights[max_i][max_j] = 0
# here it is important that always max_j is less than max i (since it's a lower triangular matrix)
for j in range(count):
weights[max_j][j] = max(weights[max_i][j], weights[max_j][j])
for i in range(count):
weights[i][max_j] = max(weights[i][max_j], weights[i][max_i])
# compare the symmetrical elements, set the ones above to 0
for i in range(count):
for j in range(count):
if i <= j:
if weights[i][j] > weights[j][i]:
weights[j][i] = weights[i][j]
weights[i][j] = 0
# remove the max_i-th column
for i in range(len(weights)):
weights[i].pop(max_i)
# remove the max_j-th row
weights.pop(max_i)
new_list = word_list[max_j]
new_list += word_list[max_i]
word_list[max_j] = new_list
# remove the element that was recently merged into a cluster
word_list.pop(max_i)
count -= 1

This might help:
def max_ij(A):
t1 = [max(list(enumerate(row)), key=lambda r: r[1]) for row in A]
t2 = max(list(enumerate(t1)), key=lambda r:r[1][1])
i, (j, max_) = t2
return max_, i, j

It depends on how much work you want to put into it but if you're really concerned about speed you should look into Cython. The quick start tutorial gives a few examples ranging from a 35% speedup to an amazing 150x speedup (with some added effort on your part).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Interleave numpy arrays with numeric comparison quickly - python

Related

Numpy array operation to shift index

Find missing elements in a list created from a sequence of consecutive integers with duplicates in O(n)

How can I re-write this while loop using nested for loops?

How to retrieve subset in partitioning algorithm?

Improving the execution time of matrix calculations in Python

Categories

Resources