Efficiently find the range of an array in python?

Efficiently find the range of an array in python? - python

Is there an accepted efficient way to find the range (ie. max value - min value) of a list of numbers in python? I have tried using a loop and I know I can use the min and max functions with subtraction. I am just wondering if there is some kind of built-in that is faster.

If you really need high performance, try Numpy. The function numpy.ptp computes the range of values (i.e. max - min) across an array.

You're unlikely to find anything faster than the min and max functions.
You could possibly code up a minmax function which did a single pass to calculate the two values rather than two passes but you should benchmark this to ensure it's faster. It may not be if it's written in Python itself but a C routine added to Python may do it. Something like (pseudo-code, even though it looks like Python):
def minmax (arr):
if arr is empty:
return (None, None)
themin = arr[0]
themax = arr[0]
for each value in arr[1:]:
if value < themin:
themin = value
else:
if value > themax:
themax = value
return (themin, themax)
Another possibility is to interpose your own class around the array (this may not be possible if you want to work on real arrays directly). This would basically perform the following steps:
mark the initial empty array clean.
if adding the first element to an array, set themin and themax to that value.
if adding element to a non-empty array, set themin and themax depending on how the new value compares to them.
if deleting an element that is equal to themin or themax, mark the array dirty.
if requesting min and max from a clean array, return themin and themax.
if requesting min and max from a dirty array, calculate themin and themax using loop in above pseudo-code, then set array to be clean.
What this does is to cache the minimum and maximum values so that, at worst, you only need to do the big calculation infrequently (after deletion of an element which was either the minimum or maximum). All other requests use cached information.
In addition, the adding of elements keep themin and themax up to date without a big calculation.
And, possibly even better, you could maintain a dirty flag for each of themin and themax, so that dirtying one would still allow you to use the cached value of the nother.

If you use Numpy and you have an 1-D array (or can create one quickly from a list), then there's the function numpy.ptp():
http://docs.scipy.org/doc/numpy/reference/generated/numpy.ptp.html

Related

Reducing the number of comparisons to find min/max of an array

I was looking into something simple, as finding the max and min elements of an array (no specific language). I know there are built-in functions in many languages, and also you can write your own, such as:
Initialize max = min = firstElement in Array
Loop through each element, and check if it's less than min or more than max
Update accordingly
Return
This of course, results in k comparisons for an array of size k. Is there a way to reduce the number of comparisons that we would have to do? Just assuming the array is unsorted. I've tagged it in Python as the basic algorithm I coded was in Python.

Printing final list in variable

I'm working my way through Elements of Programming Interview in Python and I'm trying to find an alternative solution for the first problem in the Arrays chapter.
The idea is that you are to write a program that takes an array A and index i into A and rearranges the elements such that all elements less than A[i] appear first, followed by elements equal to the pivot followed by elements greater than the pivot. In the book, they already provided a solution but I'm trying to figure out an alternative one. In a nutshell, I'm creating subarrays for each aspect such as smaller, equal, and larger than A[i]. For right now I'm working on storing the integer values of A that are less than the pivot. I want to iterate over the list of A and store the values of all elements less than the pivot in the smaller (variable). The idea is to eventually return the variable smaller which will contain all values smaller than the pivot. To check my work I used the print function just to check the value of smaller. It stored each iteration smaller than the pivot in that variable. Ideally using this approach I just want to return the final iteration of the smaller variable instead of each iteration. What should be my next steps? Hopefully, that makes sense, I really don't mind elaborating on any part. Thanks in advance.
def properArray(pivot_index, A):
pivot = A[pivot_index]
smaller = []
for i in range(len(A)):
if A[i] < pivot:
smaller.append(A[i])
print (smaller)
resized_array =properArray
resized_array(3, [1,5,6,9,3,4,6])

I guess this is what you are trying to achieve
def properArray(pivot_index, A):
pivot = A[pivot_index]
smaller = []
for i in range(len(A)):
if A[i] < pivot:
smaller.append(A[i])
print (smaller)
resized_array =properArray
resized_array(3, [1,5,6,9,3,4,6])
Instead of printing smaller array on every iteration, you need to print it once, after the for loop is complete

Algos - Delete Extremes From A List of Integers in Python?

I want to eliminate extremes from a list of integers in Python. I'd say that my problem is one of design. Here's what I cooked up so far:
listToTest = [120,130,140,160,200]
def function(l):
length = len(l)
for x in xrange(0,length - 1):
if l[x] < (l[x+1] - l[x]) * 4:
l.remove(l[x+1])
return l
print function(listToTest)
So the output of this should be: 120,130,140,160 without 200, since that's way too far ahead from the others.
And this works, given 200 is the last one or there's only one extreme. Though, it gets problematic with a list like this:
listToTest = [120,200,130,140,160,200]
Or
listToTest = [120,130,140,160,200,140,130,120,200]
So, the output for the last list should be: 120,130,140,160,140,130,120. 200 should be gone, since it's a lot bigger than the "usual", which revolved around ~130-140.
To illustrate it, here's an image:
Obviously, my method doesn't work. Some thoughts:
- I need to somehow do a comparison between x and x+1, see if the next two pairs have a bigger difference than the last pair, then if it does, the pair that has a bigger difference should have one element eliminated (the biggest one), then, recursively do this again. I think I should also have an "acceptable difference", so it knows when the difference is acceptable and not break the recursivity so I end up with only 2 values.
I tried writting it, but no luck so far.

You can use statistics here, eliminating values that fall beyond n standard deviations from the mean:
import numpy as np
test = [120,130,140,160,200,140,130,120,200]
n = 1
output = [x for x in test if abs(x - np.mean(test)) < np.std(test) * n]
# output is [120, 130, 140, 160, 140, 130, 120]

Your problem statement is not clear. If you simply want to remove the max and min then that is a simple
O(N) with 2 extra memory- which is O(1)
operation. This is achieved by retaining the current min/max value and comparing it to each entry in the list in turn.
If you want the min/max K items it is still
O(N + KlogK) with O(k) extra memory
operation. This is achieved by two priorityqueue's of size K: one for the mins, one for the max's.
Or did you intend a different output/outcome from your algorithm?
UPDATE the OP has updated the question: it appears they want a moving (/windowed) average and to delete outliers.
The following is an online algorithm -i.e. it can handle streaming data http://en.wikipedia.org/wiki/Online_algorithm
We can retain a moving average: let's say you keep K entries for the average.
Then create a linked list of size K and a pointer to the head and tail. Now: handling items within the first K entries needs to be thought out separately. After the first K retained items the algo can proceed as follows:
check the next item in the input list against the running k-average. If the value exceeds the acceptable ratio threshold then put its list index into a separate "deletion queue" list. Otherwise: update the running windowed sum as follows:
(a) remove the head entry from the linked list and subtract its value from the running sum
(b) add the latest list entry as the tail of the linked list and add its value to the running sum
(c) recalculate the running average as the running sum /K
Now: how to handle the first K entries? - i.e. before we have a properly initialized running sum?
You will need to make some hard-coded decisions here. A possibility:
run through all first K+2D (D << K) entries.
Keep d max/min values
Remove the d (<< K) max/min values from that list

Memoized to DP solution - Making Change

Recently I read a problem to practice DP. I wasn't able to come up with one, so I tried a recursive solution which I later modified to use memoization. The problem statement is as follows :-
Making Change. You are given n types of coin denominations of values
v(1) < v(2) < ... < v(n) (all integers). Assume v(1) = 1, so you can
always make change for any amount of money C. Give an algorithm which
makes change for an amount of money C with as few coins as possible.
[on problem set 4]
I got the question from here
My solution was as follows :-
def memoized_make_change(L, index, cost, d):
if index == 0:
return cost
if (index, cost) in d:
return d[(index, cost)]
count = cost / L[index]
val1 = memoized_make_change(L, index-1, cost%L[index], d) + count
val2 = memoized_make_change(L, index-1, cost, d)
x = min(val1, val2)
d[(index, cost)] = x
return x
This is how I've understood my solution to the problem. Assume that the denominations are stored in L in ascending order. As I iterate from the end to the beginning, I have a choice to either choose a denomination or not choose it. If I choose it, I then recurse to satisfy the remaining amount with lower denominations. If I do not choose it, I recurse to satisfy the current amount with lower denominations.
Either way, at a given function call, I find the best(lowest count) to satisfy a given amount.
Could I have some help in bridging the thought process from here onward to reach a DP solution? I'm not doing this as any HW, this is just for fun and practice. I don't really need any code either, just some help in explaining the thought process would be perfect.
[EDIT]
I recall reading that function calls are expensive and is the reason why bottom up(based on iteration) might be preferred. Is that possible for this problem?

Here is a general approach for converting memoized recursive solutions to "traditional" bottom-up DP ones, in cases where this is possible.
First, let's express our general "memoized recursive solution". Here, x represents all the parameters that change on each recursive call. We want this to be a tuple of positive integers - in your case, (index, cost). I omit anything that's constant across the recursion (in your case, L), and I suppose that I have a global cache. (But FWIW, in Python you should just use the lru_cache decorator from the standard library functools module rather than managing the cache yourself.)
To solve for(x):
If x in cache: return cache[x]
Handle base cases, i.e. where one or more components of x is zero
Otherwise:
Make one or more recursive calls
Combine those results into `result`
cache[x] = result
return result
The basic idea in dynamic programming is simply to evaluate the base cases first and work upward:
To solve for(x):
For y starting at (0, 0, ...) and increasing towards x:
Do all the stuff from above
However, two neat things happen when we arrange the code this way:
As long as the order of y values is chosen properly (this is trivial when there's only one vector component, of course), we can arrange that the results for the recursive call are always in cache (i.e. we already calculated them earlier, because y had that value on a previous iteration of the loop). So instead of actually making the recursive call, we replace it directly with a cache lookup.
Since every component of y will use consecutively increasing values, and will be placed in the cache in order, we can use a multidimensional array (nested lists, or else a Numpy array) to store the values instead of a dictionary.
So we get something like:
To solve for(x):
cache = multidimensional array sized according to x
for i in range(first component of x):
for j in ...:
(as many loops as needed; better yet use `itertools.product`)
If this is a base case, write the appropriate value to cache
Otherwise, compute "recursive" index values to use, look up
the values, perform the computation and store the result
return the appropriate ("last") value from cache

I suggest considering the relationship between the value you are constructing and the values you need for it.
In this case you are constructing a value for index, cost based on:
index-1 and cost
index-1 and cost%L[index]
What you are searching for is a way of iterating over the choices such that you will always have precalculated everything you need.
In this case you can simply change the code to the iterative approach:
for each choice of index 0 upwards:
for each choice of cost:
compute value corresponding to index,cost
In practice, I find that the iterative approach can be significantly faster (e.g. *4 perhaps) for simple problems as it avoids the overhead of function calls and checking the cache for preexisting values.

Given a list L labeled 1 to N, and a process that "removes" a random element from consideration, how can one efficiently keep track of min(L)?

The question is pretty much in the title, but say I have a list L
L = [1,2,3,4,5]
min(L) = 1 here. Now I remove 4. The min is still 1. Then I remove 2. The min is still 1. Then I remove 1. The min is now 3. Then I remove 3. The min is now 5, and so on.
I am wondering if there is a good way to keep track of the min of the list at all times without needing to do min(L) or scanning through the entire list, etc.
There is an efficiency cost to actually removing the items from the list because it has to move everything else over. Re-sorting the list each time is expensive, too. Is there a way around this?

To remove a random element you need to know what elements have not been removed yet.
To know the minimum element, you need to sort or scan the items.
A min heap implemented as an array neatly solves both problems. The cost to remove an item is O(log N) and the cost to find the min is O(1). The items are stored contiguously in an array, so choosing one at random is very easy, O(1).
The min heap is described on this Wikipedia page
BTW, if the data are large, you can leave them in place and store pointers or indexes in the min heap and adjust the comparison operator accordingly.

Google for self-balancing binary search trees. Building one from the initial list takes O(n lg n) time, and finding and removing an arbitrary item will take O(lg n) (instead of O(n) for finding/removing from a simple list). A smallest item will always appear in the root of the tree.
This question may be useful. It provides links to several implementation of various balanced binary search trees. The advice to use a hash table does not apply well to your case, since it does not address maintaining a minimum item.

Here's a solution that need O(N lg N) preprocessing time + O(lg N) update time and O(lg(n)*lg(n)) delete time.
Preprocessing:
step 1: sort the L
step 2: for each item L[i], map L[i]->i
step 3: Build a Binary Indexed Tree or segment tree where for every 1<=i<=length of L, BIT[i]=1 and keep the sum of the ranges.
Query type delete:
Step 1: if an item x is said to be removed, with a binary search on array L (where L is sorted) or from the mapping find its index. set BIT[index[x]] = 0 and update all the ranges. Runtime: O(lg N)
Query type findMin:
Step 1: do a binary search over array L. for every mid, find the sum on BIT from 1-mid. if BIT[mid]>0 then we know some value<=mid is still alive. So we set hi=mid-1. otherwise we set low=mid+1. Runtime: O(lg**2N)
Same can be done with Segment tree.
Edit: If I'm not wrong per query can be processed in O(1) with Linked List

If sorting isn't in your best interest, I would suggest only do comparisons where you need to do them. If you remove elements that are not the old minimum, and you aren't inserting any new elements, there isn't a re-scan necessary for a minimum value.
Can you give us some more information about the processing going on that you are trying to do?
Comment answer: You don't have to compute min(L). Just keep track of its index and then only re-run the scan for min(L) when you remove at(or below) the old index (and make sure you track it accordingly).

Your current approach of rescanning when the minimum is removed is O(1)-time in expectation for each removal (assuming every item is equally likely to be removed).
Given a list of n items, a rescan is necessary with probability 1/n, so the expected work at each step is n * 1/n = O(1).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.