Extracting significant values from an array - python

I'm looking for an efficient way to extract from an array in Python only significant values, for instance, only those 10 times bigger than the rest. The logic (no code) using a very simple case is something like that:
array = [5000, 400, 40, 10, 1, 35] # here the significant value will be 5000.
from i=0 to len.array # to run the procedure in all the array components
delta = array[i] / array [i+1] # to confirm that array[i] is significant or not.
if delta >= 10 : # assuming a rule of 10X significance i.e significance = 10 times bigger than the rest of elements in the array.
new_array = array[i] # Insert to new_array the significant value
elif delta <= 0.1 : # in this case the second element is the significant.
new_array = array[i+1] # Insert to new_array the significant value
at the end new_array will be composed by the significant values, in this case new_array =[5000], but must apply to any kind of array.
Thanks for your help!
UPDATE!!!
Thanks to all for your answers!!! in particular to Copperfield who gave me a good idea about how to do it. Here is the code that's working for the purpose!
array_o = [5000,4500,400, 4, 1, 30, 2000]
array = sorted(array_o)
new_array = []
max_array = max(array)
new_array.append(max_array)
array.remove(max_array)
for i in range(0,len(array)):
delta = max_array / array[i]
if delta <= 10:
new_array.append(array[i])

Does this answer your question?
maxNum = max(array)
array.remove(maxNum)
SecMaxNum = max(array)
if maxNum / SecMaxNum >= 10 :
# take action accordingly
else:
# take action accordingly

your pseudo code can be translate to this function
def function(array):
new_array = []
for i in range(1,len(array)):
delta = array[i-1] / array[i]
if delta >= 10:
new_array.append( array[i-1] )
elif delta <= 0.1:
new_array.append( array[i] )
return new_array
this give this result
>>> function([5000, 400, 40, 10, 1, 35])
[5000, 400, 10, 35]
>>>
Now, what you describe can be done like this in python 3.5+
*rest, secondMax, maxNum = sorted(array)
if maxNum / secondMax >= 10:
# take action accordingly
else:
# take action accordingly
or in previous versions
sortedArray = sorted(array)
if sortedArray[-1] / sortedArray[-2] >= 10:
# take action accordingly
else:
# take action accordingly
(the negative index access the element from last to first, so -1 is the last one, -2 the second last, etc )

I would not adopt the approach of only comparing each value to the one next to it. If the array is unsorted then obviously that's a disaster, but even if it is sorted:
a = [531441, 59049, 6561, 729, 81, 9, 9, 8, 6, 6, 5, 4, 4, 4, 3, 3, 1, 1, 1, 1]
In that example, the "rest" (i.e. majority) of the values are <10, but I've managed to get up into the 6-digit range very quickly with each number only being 9 times the one next to it (so, your rule would not be triggered).
One approach to outlier detection is to subtract the median from your distribution and divide by a non-parametric statistic that reflects the spread of the distribution (below, I've chosen a denominator that would be equivalent to the standard deviation if the numbers were normally distributed). That gives you an "atypicality" score on a standardized scale. Find the large values, and you have found your outliers (any score larger than, say, 3—but you may need to play around a bit to find the cutoff that works nicely for your problem).
import numpy
npstd = numpy.diff(numpy.percentile(a, [16, 84]))/2.0 # non-parametric "standard deviation" equivalent
score = (a - numpy.median(a)) / npstd
outlier_locations, = numpy.where(score > 3) # 3, 4 or 5 might work well as cut-offs

Related

Finding the minimum difference between two elements with recursion

I'm trying to make a "shortest distance algorithm for 1D".
However, I'm confused on the recursive case. I don't know how to get the value back after the recursive calls (lines 14 and 15). How can I fix the following code?
def recCPairDist(points):
if len(points) == 1:
return 0
elif len(points)== 2:
abs(points[1]-points[0])
#how do i assign the result final value back to "leftDist rightDist"
#since its a recurisive, the result can be more than 1, should i store all the result in a list first?
#then get the min(list)?
else:
mid = len(points) // 2
first_half = points[:mid]
second_half = points[mid:]
leftDist = recCPairDist(first_half)
rightDist = recCPairDist(second_half)
midDist = abs(second_half[0] - first_half[1]) #i dont think this is correct since i didnt consider the recursion
return min(leftDist,rightDist,midDist)
def cPairDist(points):
points.sort()
return recCPairDist(points)
P1 = [7, 4, 12, 14, 2, 10, 16, 6]
cPairDist(P1)
The expected result for P1 should be 1, since the shortest distance would be between 7 and 6.
You're really close! There's three things you have to do:
For the case where there's only one point to consider, you should not return 0. For example, for the array [3, 6, 9], the answer is 3, but your given base case will return 0. This is because one of the resulting subarrays will be of length 1 for odd-length arrays, and the zero return value will propagate when you return from each recursive call.
You need to return the value abs(points[1]-points[0]) in the len == 2 base case explicitly using the return keyword.
For your recursive case, the minimum difference must be between two consecutive elements in the left half, two consecutive elements in the right half, or between the last element of the first half and the first element of the second half (two consecutive elements in the original array, but not covered in the two recursive cases). So, your midDist should compute this value.
Here is a code snippet that resolves all three of these issues:
def recCPairDist(points):
if len(points) == 1:
return float('inf')
elif len(points)== 2:
return abs(points[1]-points[0])
else:
mid = len(points) // 2
first_half = points[:mid]
second_half = points[mid:]
leftDist = recCPairDist(first_half)
rightDist = recCPairDist(second_half)
midDist = abs(first_half[-1] - second_half[0])
return min(leftDist,rightDist,midDist)

Determine if an array is decreasing-increasing O(log(n)) and find the "switch" value

I'm trying to determine whether an array is decreasing first then increasing.
Also I need to find out the value when the pattern changes from ascending to descending which would be the min value in the arrays
Let's say for example I have the following array:
[10, 10, 10, 10, 10, 10, 8, 8, 8, 6, 6, 6, 6, 6, 4, 3, 12, 13, 22, 31, 40, 59, 78]
and this one
[-1, 1, 2, 4, 8, 16, 32, 64]
Edit: for simplicity you can assume the values won't be repeated like the first example I showed you
I mainly want to write a program that takes O(logn) time.
I'm trying to use binary search and this is what I have come up with. The input to the function is a sorted list or an descending-ascending list. That can have repeated values as shown in the above examples
def find_middle(ls):
start = 0
end = len(ls) - 1
idx = -1
while start < end:
mid = start + (end - start) // 2
middle = ls[mid]
if ls[mid-1] <= middle and middle > ls[mid+1]):
return middle
elif ls[mid-1] < middle:
start = mid
else:
end = mid
return idx
Sorry for the messy code, I have tinkered with it alot and at this point I've just given up on finding a solution.
If the array is JUST decreasing or increasing, I want the function to return -1.
Any help will be appreciated!
In your example there is repetition. Without repetition this is easy. Check the values at three different indices. Calling these x, y, and z, we can see that the boundary can only be in of of the three intervals (x,y), (y,z), (z,x). So if we look at the ordering of these pairs, either 2 will be ascending and 1 descending or the reverse. The majority matches the array. Then, you can use binary search to find the boundary. This is O(log n)
With repetition this requires (in the worst case) linear time. A bad example is where you have an array where all but 2 elements are identical. In this case it is O(n) just to find the different elements. In general, with repetition the same algorithm applies but you have the added work of finding different elements.

Increment first n list elements given a condition

I have a list for example
l = [10, 20, 30, 40, 50, 60]
I need to increment the first n elements of the list given a condition. The condition is independent of the list. For example if n = 3, the list l should become :
l = [11, 21, 31, 40, 50, 60]
I understand that I can do it with a for loop on each element of the list. But I need to do such operation around 150 million times. So, I am looking for a faster method to do this. Any help is highly appreciated. Thanks in advance
Here's an operation-aggregating implementation in NumPy:
initial_array = # whatever your l is, but as a NumPy array
increments = numpy.zeros_like(initial_array)
...
# every time you want to increment the first n elements
if n:
increments[n-1] += 1
...
# to apply the increments
initial_array += increments[::-1].cumsum()[::-1]
This is O(ops + len(initial_array)), where ops is the number of increment operations. Unless you're only doing a small number of increments over a very small portion of the list, this should be much faster. Unlike the naive implementation, it doesn't let you retrieve element values until the increments are applied; if you need to do that, you might need a solution based on a BST or BST-like structure to track increments.
m - queries count, n - list to increment length, O(n + m) algorithm idea:
since you only have to increment from start to some k-th element you will get ranges of increments. Let our increment be pair (up to position, increment by). Example:
(1, 2) - increment positions 0 and 1 by 2
If we are trying to calculate value at position k then we should add increments that have positions greater or equal than k to current value at position k. How we can quickly calculate sum of increments that have positions greater or equal than k? We can start calculating values from the back of the list and then remember sum of increments.
Proof of concept:
# list to increment
a = [1, 2, 5, 1, 6]
# (up to and including k-th index, increment by value)
queries = [(1, 2), (0, 10), (3, 11), (4, 3)]
# decribed algorithm implementation
increments = [0]*len(a)
for position, inc in queries:
increments[position] += inc
got = list(a)
increments_sum = 0
for i in xrange(len(increments) -1, -1, -1):
increments_sum += increments[i]
got[i] += increments_sum
# verify that solution is correct using slow but correct algorithm
expected = list(a)
for position, inc in queries:
for i in xrange(position + 1):
expected[i] += inc
print 'Expected: ', expected
print 'Got: ', got
output:
Expected: [27, 18, 19, 15, 9]
Got: [27, 18, 19, 15, 9]
You can create a simple data structure on top of your list which stores the start and end range of each increment operation. The start would be 0 in your case so you can just store the end.
This way you don't have to actually traverse the list to increment the elements, but you only retain that you performed increments on ranges for example {0 to 2} and {0 to 3}. Furthermore, you can also collate some operations, so that if multiple operations increment until the same index, you only need to store one entry.
The worst case complexity of this solution is O(q + g x qlogq + n) where g is the number of get operations, q is the number of updates and n is the length of the list. Since we can have at most n distinct endings for the intervals this reduces to O(q + nlogn + n) = O(q + nlogn). A naive solution using an update for each query would be O(q * l) where l (the length of a query) could be up to the size of n giving O(q * n). So we can expect this solution to be better when q > log n.
Working python example below:
def RangeStructure(object):
def __init__(self, l):
self.ranges = collections.defaultdict(int)
self.l = l
def incToPosition(self, k):
self.ranges[k] += 1
def get(self):
res = self.l
sorted_keys = sorted(self.ranges)
last = len(sorted_keys) - 1
to_add = 0
while last >= 0:
start = 0 if last < 1 else sorted_keys[last - 1]
end = sorted_keys[last]
to_add += self.ranges[end]
for i in range(start, end):
res[i] += to_add
last -= 1
return res
rs = RangeStructure([10, 20, 30, 40, 50, 60])
rs.incToPosition(2)
rs.incToPosition(2)
rs.incToPosition(3)
rs.incToPosition(4)
print rs.get()
And an explanation:
after the inc operations ranges will contain (start, end, inc) tuples of the form (0, 2, 2), (0, 3, 1), (0, 4, 1); these will be represented in the dict as { 2:2, 3:1, 4:1} since the start is always 1 and can be omitted
during the get operation, we ensure that we only operate on any list element once; we sort the ranges in increasing order of their end point, and traverse them in reverse order updating the contained list elements and the sum (to_add) to be added to subsequent ranges
This prints, as expected:
[14, 24, 32, 41, 50, 60]
You can use list comprehension and add the remaining list
[x + 1 for x in a[:n]]+a[n:]

Rounding Numbers that fall within variable number of ranges in Python

I have an input list of numbers:
lst = [3.253, -11.348, 6.576, 2.145, -11.559, 7.733, 5.825]
I am trying to think of a way to replace each number in a list with a given number if it falls into a range. I want to create multiple ranges based on min and max of input list and a input number that will control how many ranges there is.
Example, if i said i want 3 ranges equally divided between min and max.
numRanges = 3
lstMin = min(lst)
lstMax = max(lst)
step = (lstMax - lstMin) / numRanges
range1 = range(lstMin, lstMin + step)
range2 = range(range1 + step)
range3 = range(range2 + step)
Right away here, is there a way to make the number of ranges be driven by the numRanges variable?
Later i want to take the input list and for example if:
for i in lst:
if i in range1:
finalLst.append(1) #1 comes from range1 and will be growing if more ranges
elif i in range2:
finalLst.append(2) #2 comes from range2 and will be growing if more ranges
else i in range3:
finalLst.append(3) #3 comes from range2 and will be growing if more ranges
The way i see this now it is all "manual" and I am not sure how to make it a little more flexible where i can just specify how many ranges and a list of numbers and let the code do the rest. Thank you for help in advance.
finalLst = [3, 1, 3, 3, 1, 3, 3]
This is easy to do with basic mathematical operations in a list comprehension:
numRanges = 3
lstMin = min(lst)
lstMax = max(lst) + 1e-12 # small value added to avoid floating point rounding issues
step = (lstMax - lstMin) / numRanges
range_numbers = [int((x-lstMin) / step) for x in lst]
This will give an integer for each value in the original list, with 0 indicating that the value falls in the first range, 1 being the second, and so on. It's almost the same as your code, but the numbers start at 0 rather than 1 (you could stick a + 1 in the calculation if you really want 1-indexing).
The small value I've added to lstMax is there for two reasons. The first is to make sure that floating point rounding issues don't make the largest value in the list yield numRange as its range index rather than numRange-1 (indicating the numRangeth range). The other reason is to avoid a division by zero error if the list only contains a single value (possibly repeated multiple times) such that min(lst) and max(lst) return the same thing.
Python has a very nice tool for doing exactly this kind of work called bisect. Lets say your range list is defined as such:
ranges = [-15, -10, -5, 5, 10, 15]
For your input list, you simply call bisect, like so:
lst = [3.253, -11.348, 6.576, 2.145, -11.559, 7.733, 5.825]
results = [ranges[bisect(ranges, element)] for element in lst]
Which results in
>>>[5, -10, 10, 5, -10, 10, 10]
You can then extend this to any arbitrary list of ranges using ranges = range(start,stop,step) in python 2.7 or ranges = list(range(start,stop,step)) in python 3.X
Update
Reread your question, and this is probably closer to what you're looking for (still using bisect):
from numpy import linspace
from bisect import bisect_left
def find_range(numbers, segments):
mx = max(numbers)
mn = mn(numbers)
ranges = linspace(mn, mx, segments)
return [bisect_left(ranges, element)+1 for element in numbers]
>>> find_range(lst, 3)
[3, 2, 3, 3, 1, 3, 3]

Algorithm for finding if an array is balanced [closed]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 9 years ago.
I'm trying to create a program that will create a 10 element array and then assign random values to each element. I then want the program to tell if the array is balanced. By balanced I mean, is there anywhere in the array values that at a certain element the sum of the values in the elements are equal to the sum of the array values in the elements greater than that current element.
Example
Element (1,2,3,4) Values (2,1,3,0)
The program would then display that elements 1-2 are balanced to elemtns 3-4, because they both equal 4.
So far I have
import random
size = 10
mean = 0
lists = [0] * size
for i in range(size):
var = random.randint(0,4)
lists[i] = var
for i in lists:
mean += i
avg = (mean)/(size)
I figured the only way the elements could be balanced is if the values average is equal to 2, so I figured that's how I should start.
I'd appreciate any help in the right direction.
If I understand the question, the simplest solution is something like this:
def balanced(numbers):
for pivot in range(len(numbers)):
left_total = sum(numbers[:pivot])
right_total = sum(numbers[pivot:])
if left_total == right_total:
return pivot
return None
For example:
>>> numbers = [2, 1, 3, 0]
>>> balanced(numbers)
2
>>> more_numbers = [2, 1, 3, 4]
>>> balanced(numbers)
(That didn't print anything, because it returned None, meaning there is no pivot to balance the list around.)
While this is the simplest solution, it's obviously not the most efficient, because you keep adding the same numbers up over and over.
If you think about it, it should be pretty easy to figure out how to keep running totals for left_total and right_total, only calling sum once.
def balanced(numbers):
left_total, right_total = 0, sum(numbers)
for pivot, value in enumerate(numbers):
if left_total == right_total:
return pivot
left_total += value
right_total -= value
return None
Finally, here's how you can build a program around it:
size = 10
numbers = [random.range(4) for _ in range(size)]
pivot = balanced(numbers)
if pivot is None:
print('{} is not balanced'.format(numbers))
else:
print('{} is balanced, because elements 1-{} equal {}-{}'.format(
numbers, pivot+1, pivot+2, size+1))
A good data structure to know about for this kind of problem is an array that has the cumulative sum. element[j] - element[i] is the sum from i to j in the original series. If you have the original series [1, 2, 3, 4], the cumulative series is [0, 1, 3, 6, 10]. The sum up to the i position in the original series is element[i] - element[0]. For this problem, we are interested in only a sum starting at 0, so this is a bit of overkill but, again, more fully useful for other problems.
Here is code to make a cumulative sum:
def cumulative_sum(series):
s = [0]
for element in series:
s.append(element + s[-1])
return s
Given that, we can find the pivot point with this code:
def find_pivot(series):
cs = cumulative_sum(series)
total = cs[-1]
even_total = not (total & 1)
if even_total:
target = total // 2
for i, element in enumerate(cs[1:]):
if element == target:
return i + 1
return -1
Notice that it is not necessary to try dividing the series if we know the series sums to an odd number: there cannot be a pivot point then.
Alternatively, you can write find_pivot like this:
def find_pivot(series):
cs = cumulative_sum(series)
total = cs[-1]
even_total = not (total & 1)
if even_total:
target = total // 2
try:
return cs.index(target)
except ValueError:
return -1
return -1
It has the advantage that the looping is not done explicitly in python but in C code in the standard library.
Trying the code out:
def test():
for i in range(1, 30):
test_values = range(i)
j = find_pivot(test_values)
if j >= 0:
print "{0} == {1}".format(test_values[:j], test_values[j:])
And we get this output:
[0] == []
[0, 1, 2] == [3]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14] == [15, 16, 17, 18, 19, 20]

Categories

Resources