Sum of array excluding the maximum and the minimum number - python

I have to add up excluding the minimum and maximum of any list of numbers.
This code work but it possible using a better solution only with 2 if and do not using a min/max function or another sum function pre existent?
def total(array):
sum = 0
min = array[0]
max = array[0]
for x in array:
if x>max:
if max != min:
sum += max
max = x
elif x == max:
pass
elif x < min:
if min != max:
sum += min
min = x
elif x == min:
pass
else:
sum+=x
print(min)
print(max)
return sum

This solution utilizes more off-the-shelf functions as opposed to fewer. That said, it only uses the traditional scientific computing stack so it's something you might want to become familiar with at some point depending on your goals. Most importantly, it makes the solution much more readable.
import numpy as np
import pandas as pd
xs = pd.Series([1,2,3,4,5,6,7,8,12])
xs[xs.between(np.min(xs),np.max(xs), inclusive=False)].sum()

Related

Find the indices of the two largest and two smallest values in a matrix in python

I am attempting to find the indices of the two smallest and the two largest values in python:
I have
import sklearn
euclidean_matrix=sklearn.metrics.pairwise_distances(H10.T,metric='euclidean')
max_index =np.where(euclidean_matrix==np.max(euclidean_matrix[np.nonzero(euclidean_matrix)]))
min_index=np.where(euclidean_matrix==np.min(euclidean_matrix[np.nonzero(euclidean_matrix)]))
min_index
max_index
I get the following output
(array([158, 272]), array([272, 158]))
(array([ 31, 150]), array([150, 31]))
the above code only returns the indices of the absolute smallest and the absolute largest values of the matrix, I would like to find the indices of the next smallest value and the indices of the next largest value. How can I do this? Ideally I would like to return the indices of the 2 largest values of the matrix and the indices of the two smallest values of the matrix. How can I do this?
I can think of a couple ways of doing this. Some of these depend on how much data you need to search through.
A couple of caveats: You will have to decide what to do when there are 1, 2, 3 elements only Or if all the same value, do you want min, max, etc to be identical? What if there are multiple items in max or min or min2, max2? which should be selected?
run min then remove that element run min on the rest. run max then remove that element and run on the rest (note that this is on the original and not the one with min removed). This is the least efficient method since it requires searching 4 times and copying twice. (Actually 8 times because we find the min/max then find the index.) Something like the in the pseudo code.
PSEUDO CODE:
max_index = np.where(euclidean_matrix==np.max(euclidean_matrix[np.nonzero(euclidean_matrix)]))
tmp_euclidean_matrix = euclidean_matrix #make sure this is a deepcopy
tmp_euclidean_matrix.remove(max_index) #syntax might not be right?
max_index2 = np.where(tmp_euclidean_matrix==np.max(tmp_euclidean_matrix[np.nonzero(tmp_euclidean_matrix)]))
min_index = np.where(euclidean_matrix==np.min(euclidean_matrix[np.nonzero(euclidean_matrix)]))
tmp_euclidean_matrix = euclidean_matrix #make sure this is a deepcopy
tmp_euclidean_matrix.remove(min_index) #syntax might not be right?
min_index2 = np.where(tmp_euclidean_matrix==np.min(tmp_euclidean_matrix[np.nonzero(tmp_euclidean_matrix)]))
Sort the data (if you need it sorted anyway this is a good option) then just grab two smallest and largest. This isn't great unless you needed it sorted anyway because of many copies and comparisons to sort.
PSEUDO CODE:
euclidean_matrix.sort()
min_index = 0
min_index2 = 1
max_index = len(euclidean_matrix) - 1
max_index2 = max_index - 1
Best option would be to roll your own search function to run on the data, this would be most efficient because you would go through the data only once to collect them.
This is just a simple iterative approach, other algorithms may be more efficient. You will want to validate this works though.
PSEUDO CODE:
def minmax2(array):
""" returns (minimum, second minimum, second maximum, maximum)
"""
if len(array) == 0:
raise Exception('Empty List')
elif len(array) == 1:
#special case only 1 element need at least 2 to have different
minimum = 0
minimum2 = 0
maximum2 = 0
maximum = 0
else:
minimum = 0
minimum2 = 1
maximum2 = 1
maximum = 0
for i in range(1, len(array)):
if array[i] <= array[minimum]:
# a new minimum (or tie) will shift the other minimum
minimum2 = minimum
minimum = i
elif array[i] < array[minimum2]:
minimum2 = i
elif array[i] >= array[maximum]:
# a new maximum (or tie) will shift the second maximum
maximum2 = maximum
maximum = i
elif array[i] > array[maximum2]:
maximum2 = i
return (minimum, minimum2, maximum2, maximum)
edit: Added pseudo code

How to produce a time series data with deterministic random walk and drift, given the mean and variance for the series?

I have been given the mean and variance. I need to produce a deterministic random walk from the given variables. These are the expected properties for the time series data:
Mean: 27.57020098
Median: 27.815
Std Dev.:5.106888439
Variance:26.08030952
Maximum: 43.92
Minimum: 0
Range: 43.92
I've tried the following,
steps = np.random.normal(loc=0, scale=5.034, size=1000) #std =5.304
steps[0]=0
P = 27.50 + np.cumsum(steps) #mean =27.5
plt.plot(P)
plt.title("Simulated Random Walk")
plt.show()
Which produces,
Let's consider the number of possible values are n. Your function returns a natural number, 0 <= i < n. This would be the i'th possible value. If you look at the current timestamp, convert it into milliseconds, the result being t, then a possible deterministic random would be
i = t mod n
now, get the i'th possible value. You will need to add the minimal offset (1000) to the result and get the i'th number.
EDIT:
If you know that the minimal number is 1000 and the gap between the i'th and (i + 1)'th element is exactly k (normal distribution), then the formula to use would be
result = 1000 + i * k
With the standard library you can seed the random function with the same value each time which makes the sequence deterministic. I have included an example which illustrates the principle.
import random
seed = 'pseudo-random' # Can be anything
random.seed(a=seed, version=2) # Use the same seed every time
float_list = [] # List to contain a small sample of values
mean = 27.57020098
dev = 5.106888439
for x in range(20):
value = random.gauss(mean, dev)
if value > 43.92: value = 43.92 # Limit max value
if value < 0: value = 0 # Limit min value
float_list.append('{:.3f}'.format(value))
print(float_list)
I don't know if np allows for this way of using the same seed each time.

Python find numbers between range in list or array

I have a list with millions of numbers which are always increasing to the end, I need to find and return numbers within a specified range e.g. numbers greater than X but less than Y, the numbers in the list can change and the values I'm searching for change as well
I have been using this method, please note this is a basic example the numbers are not uniform or the same as shown below in my program
l = [i for i in range(2000000)]
nums = []
for element in l:
if element > 950004:
break
if element > 950000:
nums.append(element)
#[950001, 950002, 950003, 950004]
Although fast, I kind of need it to be a bit faster for what my program is doing, the numbers change a lot so I'm wondering if there's a better way to do this with a pandas series or a numpy array? but so far all I've done is make an example in numpy:
a = numpy.array(l,dtype=numpy.int64)
Would a pandas series be more functional? Making use of query()? what would be the best way to approach this with an array as opposed to a python list of python objects
Here is a solution using binary search. You are speaking of millions of numbers. Technically binary search will make the algorithm faster by decreasing the runtime complexity to O(log n) neglecting the final slicing step.
import bisect
l = [i for i in range(2000000)]
lower_bound = 950000
upper_bound = 950004
lower_bound_i = bisect.bisect_left(l, lower_bound)
upper_bound_i = bisect.bisect_right(l, upper_bound, lo=lower_bound_i)
nums = l[lower_bound_i:upper_bound_i]
The following are two implementations for binary search (based on code from here) - one which searches for an upper limit and one which searches for a lower limit. Does this work better for you?
def binary_search_upper(seq, limit):
min = 0
max = len(seq) - 1
while True:
if max < min:
return -1
m = (min + max) / 2
if m == (len(seq) -1) or (seq[m] <= limit and seq[m+1] > limit):
return m
elif seq[m] < limit:
min = m+1
else:
max = m - 1
def binary_search_lower(seq, limit):
min = 0
max = len(seq) - 1
while True:
if max < min:
return -1
m = (min + max) / 2
if m == 0 or (seq[m] >= limit and seq[m-1] < limit):
return m
elif seq[m] < limit:
min = m+1
else:
max = m - 1
l = [i for i in range(2000000)]
print binary_search_upper(l, 950004)
print binary_search_lower(l, 950000)
You could use numpy to get a subset of your list using a boolean slice.
import numpy as np
a = np.arange(2000000)
nums = a[(950000<a) & (a<=950004)]
nums
# returns
array([950001, 950002, 950003, 950004])

finding the min value using loop python

I can find max value, I can find average but I just can't seem to find the min. I know there is a way to find max and min in a loop but right now I can only find the max.
def large(s)
sum=0
n=0
for number in s:
if number>n:
n=number
return n
Is there a way to find the min value using this function?
You can use Python's built-in sum(), min(), and max() functions for this kind of analysis.
However, if you're wanting to do it all in one pass or just want to learn how to write it yourself, then the process is 1) iterate over the input and 2) keep track of the cumulative sum, the minimum value seen so far, and the maximum value seen so far:
def stats(iterable):
'''Return a tuple of the minimum, average, and maximum values
>>> stats([20, 50, 30, 40])
(20, 35.0, 50)
'''
it = iter(iterable)
first = next(it) # Raises an exception if the input is empty
minimum = maximum = cumsum = first
n = 1
for x in it:
n += 1
cumsum += x
if x < minimum:
minimum = x
if x > maximum:
maximum = x
average = cumsum / float(n)
return minimum, average, maximum
if __name__ == '__main__':
import doctest
print doctest.testmod()
The code has one other nuance. It uses the first value from the input iterable as the starting value for the minimum, maximum, and cumulative sum. This is preferred over creating a positive or negative infinity value as initial values for the maximum and minimum. FWIW, Python's own builtin functions are written this way.
Finding the minimum takes the same algorithm as finding the maximum, but with the comparison reversed. < becomes > and vice versa. Initialize the minimum to the largest value possible, which is float("inf"), or to the first element of the list.
FYI, Python has a builtin min function for this purpose.
You must set n to a very high number (higher than any of the expected) or to take one from the list to start comparison:
def large(s)
n = s.pop()
for number in s:
if number < n:
n = number
return n
Obviously you have already max and min for this purpose.
A straightforward solution:
def minimum(lst):
n = float('+inf')
for num in lst:
if num < n:
n = num
return n
Explanation: first, you initialize n (the minimum number) to a very large value, in such a way that any other number will be smaller than it - for example, the infinite value. It's an initialization trick, in case the list is empty, it will return infinite, meaning with that that the list was empty and it didn't contain a minimum value.
After that, we iterate over all the values in the list, checking each one to see if it is smaller than the value we assumed to be the minimum. If a new minimum is found, we update the value of n.
At the end, we return the minimum value found.
Why not just replace large with small and > with <? Also, you might not want to initialize n to 0 if you're looking for the smallest value. Your large function only works for lists of positive numbers. Also, you're missing a ":" after your def line.
def small(s):
if len(s)==0: return None
n = s[0]
for number in s[1:]:
if n < number:
n=number
return n
This handles empty lists by returning None.
Using this function to find minimum is
min=-large(-s)
The logic is just to find maximum of the negative list , which gives the minimum value
You can use same function with iteration just instead of n=0 use n=L[0]
def min(L):
n=L[0]
for x in L:
if x
def min(s):
n=s[0]
for number in s:
if number < n:
n=number
return n

Subset sum Problem

recently I became interested in the subset-sum problem which is finding a zero-sum subset in a superset. I found some solutions on SO, in addition, I came across a particular solution which uses the dynamic programming approach. I translated his solution in python based on his qualitative descriptions. I'm trying to optimize this for larger lists which eats up a lot of my memory. Can someone recommend optimizations or other techniques to solve this particular problem? Here's my attempt in python:
import random
from time import time
from itertools import product
time0 = time()
# create a zero matrix of size a (row), b(col)
def create_zero_matrix(a,b):
return [[0]*b for x in xrange(a)]
# generate a list of size num with random integers with an upper and lower bound
def random_ints(num, lower=-1000, upper=1000):
return [random.randrange(lower,upper+1) for i in range(num)]
# split a list up into N and P where N be the sum of the negative values and P the sum of the positive values.
# 0 does not count because of additive identity
def split_sum(A):
N_list = []
P_list = []
for x in A:
if x < 0:
N_list.append(x)
elif x > 0:
P_list.append(x)
return [sum(N_list), sum(P_list)]
# since the column indexes are in the range from 0 to P - N
# we would like to retrieve them based on the index in the range N to P
# n := row, m := col
def get_element(table, n, m, N):
if n < 0:
return 0
try:
return table[n][m - N]
except:
return 0
# same definition as above
def set_element(table, n, m, N, value):
table[n][m - N] = value
# input array
#A = [1, -3, 2, 4]
A = random_ints(200)
[N, P] = split_sum(A)
# create a zero matrix of size m (row) by n (col)
#
# m := the number of elements in A
# n := P - N + 1 (by definition N <= s <= P)
#
# each element in the matrix will be a value of either 0 (false) or 1 (true)
m = len(A)
n = P - N + 1;
table = create_zero_matrix(m, n)
# set first element in index (0, A[0]) to be true
# Definition: Q(1,s) := (x1 == s). Note that index starts at 0 instead of 1.
set_element(table, 0, A[0], N, 1)
# iterate through each table element
#for i in xrange(1, m): #row
# for s in xrange(N, P + 1): #col
for i, s in product(xrange(1, m), xrange(N, P + 1)):
if get_element(table, i - 1, s, N) or A[i] == s or get_element(table, i - 1, s - A[i], N):
#set_element(table, i, s, N, 1)
table[i][s - N] = 1
# find zero-sum subset solution
s = 0
solution = []
for i in reversed(xrange(0, m)):
if get_element(table, i - 1, s, N) == 0 and get_element(table, i, s, N) == 1:
s = s - A[i]
solution.append(A[i])
print "Solution: ",solution
time1 = time()
print "Time execution: ", time1 - time0
I'm not quite sure if your solution is exact or a PTA (poly-time approximation).
But, as someone pointed out, this problem is indeed NP-Complete.
Meaning, every known (exact) algorithm has an exponential time behavior on the size of the input.
Meaning, if you can process 1 operation in .01 nanosecond then, for a list of 59 elements it'll take:
2^59 ops --> 2^59 seconds --> 2^26 years --> 1 year
-------------- ---------------
10.000.000.000 3600 x 24 x 365
You can find heuristics, which give you just a CHANCE of finding an exact solution in polynomial time.
On the other side, if you restrict the problem (to another) using bounds for the values of the numbers in the set, then the problem complexity reduces to polynomial time. But even then the memory space consumed will be a polynomial of VERY High Order.
The memory consumed will be much larger than the few gigabytes you have in memory.
And even much larger than the few tera-bytes on your hard drive.
( That's for small values of the bound for the value of the elements in the set )
May be this is the case of your Dynamic programing algorithm.
It seemed to me that you were using a bound of 1000 when building your initialization matrix.
You can try a smaller bound. That is... if your input is consistently consist of small values.
Good Luck!
Someone on Hacker News came up with the following solution to the problem, which I quite liked. It just happens to be in python :):
def subset_summing_to_zero (activities):
subsets = {0: []}
for (activity, cost) in activities.iteritems():
old_subsets = subsets
subsets = {}
for (prev_sum, subset) in old_subsets.iteritems():
subsets[prev_sum] = subset
new_sum = prev_sum + cost
new_subset = subset + [activity]
if 0 == new_sum:
new_subset.sort()
return new_subset
else:
subsets[new_sum] = new_subset
return []
I spent a few minutes with it and it worked very well.
An interesting article on optimizing python code is available here. Basically the main result is that you should inline your frequent loops, so in your case this would mean instead of calling get_element twice per loop, put the actual code of that function inside the loop in order to avoid the function call overhead.
Hope that helps! Cheers
, 1st eye catch
def split_sum(A):
N_list = 0
P_list = 0
for x in A:
if x < 0:
N_list+=x
elif x > 0:
P_list+=x
return [N_list, P_list]
Some advices:
Try to use 1D list and use bitarray to reduce memory footprint at minimum (http://pypi.python.org/pypi/bitarray) so you will just change get / set functon. This should reduce your memory footprint by at lest 64 (integer in list is pointer to integer whit type so it can be factor 3*32)
Avoid using try - catch, but figure out proper ranges at beginning, you might found out that you will gain huge speed.
The following code works for Python 3.3+ , I have used the itertools module in Python that has some great methods to use.
from itertools import chain, combinations
def powerset(iterable):
s = list(iterable)
return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))
nums = input("Enter the Elements").strip().split()
inputSum = int(input("Enter the Sum You want"))
for i, combo in enumerate(powerset(nums), 1):
sum = 0
for num in combo:
sum += int(num)
if sum == inputSum:
print(combo)
The Input Output is as Follows:
Enter the Elements 1 2 3 4
Enter the Sum You want 5
('1', '4')
('2', '3')
Just change the values in your set w and correspondingly make an array x as big as the len of w then pass the last value in the subsetsum function as the sum for which u want subsets and you wl bw done (if u want to check by giving your own values).
def subsetsum(cs,k,r,x,w,d):
x[k]=1
if(cs+w[k]==d):
for i in range(0,k+1):
if x[i]==1:
print (w[i],end=" ")
print()
elif cs+w[k]+w[k+1]<=d :
subsetsum(cs+w[k],k+1,r-w[k],x,w,d)
if((cs +r-w[k]>=d) and (cs+w[k]<=d)) :
x[k]=0
subsetsum(cs,k+1,r-w[k],x,w,d)
#driver for the above code
w=[2,3,4,5,0]
x=[0,0,0,0,0]
subsetsum(0,0,sum(w),x,w,7)

Categories

Resources