Finding the 2 numbers closest to the number needed - python

I am trying to find a way to take 2 of the closest numbers to average, in avg_list.
Also, is there any other way that I could re-write my code so that x contains 10 items instead of 6 items?
x1 = 461
x2 = 336
x3 = 267
x4 = 262
x5 = 212
x6 = 318
avg = (x1 + x2 + x3 + x4 + x5 + x6)/6
avg1 = (x1 + x2 + x3)/3
avg2 = (x1 + x2 + x4)/3
avg3 = (x1 + x2 + x5)/3
avg4 = (x1 + x2 + x6)/3
avg5 = (x1 + x3 + x4)/3
avg6 = (x1 + x5 + x5)/3
avg7 = (x1 + x3 + x6)/3
avg8 = (x1 + x4 + x5)/3
avg9 = (x1 + x4 + x6)/3
avg10 = (x1 + x5 + x6)/3
avg11 = (x2 + x3 + x4)/3
avg12 = (x2 + x3 + x5)/3
avg13 = (x2 + x3 + x6)/3
avg14 = (x2 + x4 + x5)/3
avg15 = (x2 + x4 + x6)/3
avg16 = (x2 + x5 + x6)/3
avg17 = (x3 + x4 + x5)/3
avg18 = (x3 + x4 + x6)/3
avg19 = (x3 + x5 + x6)/3
avg20 = (x4 + x5 + x6)/3
avg_list = [avg1, avg2, avg3, avg4, avg5, avg6, avg7, avg8, avg9, avg10, avg11, avg12, avg13, avg14, avg15,
avg16, avg17, avg18, avg19, avg20]

Getting combinations with itertools.combinations will help. I've additionally done some stuff with numpy (fast arrays) instead of lists where convenient. A note that if your combinations list becomes very large, there are further efficiencies possible (don't sort the full output, numpy probably has a combinations function).
from itertools import combinations
import numpy as np
def get_closest_partial_avgs(x, n=3):
# returns array of partial averages for n elements
ave = np.mean(x)
partial_aves = []
for vals in combinations(x, n):
partial_avg = sum(vals)/n
partial_aves.append(partial_avg)
partial_aves = sorted(partial_aves, key=lambda x: np.abs(x -ave))[0:2]
return partial_aves
# example
x = [461, 336, 267, 262, 212, 318]
best_aves = get_closest_partial_avgs(x, n=3)
# best_aves = [307.0, 311.6666666666667]

import itertools
import statistics as sc
import numpy as np
def nearest_numbers_avg(x_list,n):
avg_list = []
diff = []
tot_avg = sc.mean(x_list)
x_sublist = list(itertools.permutations(x_list,int(len(x_list)/2)))
for i in x_sublist:
avg = sc.mean(i)
avg_list.append(avg)
diff = abs(np.asarray(avg_list)-tot_avg)
diff = diff.tolist()
for i in range(n):
lst.append(avg_list[diff.index(sorted(diff)[i])])
return(x_sublist)

There are things in computer programming named "arrays".
An "array" is like a row of squares on a sheet of graph paper.
There are many ways to create an array.
One of which is shown below:
x = [None]*6
print(x)
The console output is as follows:
[None, None, None, None, None, None]
If you write x = [None]*6 then x will contain 6 things.
If you write x = [None]*10 then x will contain 10 things.
You should not write variable names like x1, x2, x3
Instead, use square brackets [] like this:
x = [None]*7
x[1] = 461
x[2] = 336
x[3] = 267
x[4] = 262
x[5] = 212
x[6] = 318
print(x)
If you print x, you will notice that x is [None, 461, 336, 267, 262, 212, 318]
The left-most square in the row of squares of graph paper is x[0]
The first thing in the row of graph paper is named x[0], not x[1]
If you want to compute an average, there are different ways to do it:
average_of_x = sum(x)/len(x)
Sometimes you will get the following error:
Traceback (most recent call last):
File "D:/python_sandbox/fgfnxxfgh.py", line 12, in <module>
average_of_x = sum(x)/len(x)
TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'
That error means that some of the things in your row of squares are not numbers.
The computer does not understand 4 + "egg" or 98 + a house
The computer usually adds numbers to other numbers. 1 + 92 + 81 + 1 + 5
The following makes an error:
data = [None, 443, None, 27, 19, 19 , 8, 19]
total = sum(data)
The following are almost the same:
avg = (x[1] + x[2] + x[3] + x[4] + x[5] + x[6])/6
avg = sum(x)/len(x)
You should never write avg1, avg2, avg3, etc...
Instead, write, avg[1], avg[2], avg[3].
The following is not very good code, but it is better than what you wrote originally (relatively speaking):
import operator as op
from functools import reduce
def n_choose_r(n, r):
"""
If n == 6, and r == 3, then this function computes
the number of ways to draw three numbers from the set
{1, 2, 3, 4, 5, 6}
examples of sets of size 3 drawn from {1...6} are shown below:
{1, 2, 3}
{1, 2, 4}
{1, 2, 5}
{1, 2, 6}
{1, 3, 4}
"""
r = min(r, n-r)
numer = reduce(op.mul, range(n, n-r, -1), 1)
denom = reduce(op.mul, range(1, r+1), 1)
return numer // denom
avg = [None]*n_choose_r(len(x), 3)
max_ = len(x)
i = 0
for k1 in range(1, 1+ max_):
for k2 in range(k1 + 1, 1 + max_):
for k3 in range(k2 + 1, 1 + max_):
avg[i] = (x[k1] + x[k2] + x[k3])/3
i = i + 1

Related

SymPy - Is there a way to write a summation, that has a variable with an incrementing subscript?

I wanted to write this expression, in code:
x1 + x2 + x3 + x4 + x5
I can currently do this via:
import sympy as sp
x1, x2, x3, x4, x5 = sp.symbols('x1 x2 x3 x4 x5')
x1 + x2 + x3 + x4 + x5
But unfortunately, this doesn't scale very well, incase I wanted to go from x1 to say, x10,000
Any help would be sincerely appreciated.
I tried using SymPy's summation function, like this:
summation(x(i), (i, 0, n))
But unfortunately got a TypeError, stating:
'Symbol' object is not callable
You can use a generic IndexedBase symbol with a summation:
>>> i = Symbol('i'); x = IndexedBase('x')
>>> Sum(x[i],(i,1,10))
Sum(x[i], (i, 1, 10))
>>> _.doit()
x[10] + x[1] + x[2] + x[3] + x[4] + x[5] + x[6] + x[7] + x[8] + x[9]

Multivariate curve fit in python

Can somebody please point me in the right direction...
I need to find the parameters a,b,c,d of two functions:
Y1 = ( (a * X1 + b) * p0 + (c * X2 + d) * p1 ) / (a * X1 + b + c * X2 + d)
Y2 = ( (a * X2 + b) * p2 + (c * X2 + d) * p3 ) / (a * X1 + b + c * X2 + d)
X1, X2 (independent variables) and Y1, Y2 (dependent variables) are observations, i.e. one-dimensional arrays with thousands of entries each.
p0, p1, p2, p3 are known constants (scalars).
I successfully solved the problem with the first function only with a curve-fit (see below), but how do i solve the problem for Y1 and Y2 ?
Thank you.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
X = [X1,X2]
def fitFunc(X, a,b,c,d):
X1, X2 = X
return ((a * X1 + b) * p0 + (c * X2 + d) * p1) / (a * X1 + b + c * X2 + d)
fitPar, fitCov = curve_fit(fitFunc, X, Y1)
print(fitPar)
One way would be to minimize both your functions together using scipy.optimize.minimze. In the example below, a function residual is passed a, b, c, and d as initial guesses. Using these guesses, Y1 and Y2 are evaluated, then the mean squared error is taken using the data and predicted values of respective functions. The error is returned as the mean error of the two functions. The optimized set of parameters is stored in res as res.x.
import numpy as np
from scipy.optimize import minimize
#p0 = ... known
#p1 = ... known
#p2 = ... known
#p3 = ... known
def Y1(X, a,b,c,d):
X1, X2 = X
return ((a * X1 + b) * p0 + (c * X2 + d) * p1) / (a * X1 + b + c * X2 + d)
def Y2(X, a,b,c,d):
X1, X2 = X
return ((a * X1 + b) * p2 + (c * X2 + d) * p3) / (a * X1 + b + c * X2 + d)
X1 = np.array([X1]) # your X1 array
X2 = np.array([X2]) # your X2 array
X = np.array([X1, X2])
y1_data = np.array([y1_data]) # your y1 data
y2_data = np.array([y2_data]) # your y2 data
def residual(x):
a = x[0]
b = x[1]
c = x[2]
d = x[3]
y1_pred = Y1(X,a,b,c,d)
y2_pred = Y2(X,a,b,c,d)
err1 = np.mean((y1_data - y1_pred)**2)
err2 = np.mean((y2_data - y2_pred)**2)
error = (err1 + err2) / 2
return error
x0 = [1, 1, 1, 1] # Initial guess for a, b, c, and d respectively
res = minimize(residual, x0, method="Nelder-Mead")
print(res.x)

How can I call for values from pandas dataframe to a function?

I have a large datframe with two columns and a function that takes values from each rows and iterate over the dataframe. Below is the head of the dataframe.
xG_Team1 xG_Team2
0 1.440539 1.380095
1 2.123673 0.946116
2 1.819697 0.921660
3 1.132676 1.375717
4 1.244837 1.269933
x1, x2, x3 are constants.
x1 = [1,0,0]
x2 = [0,1,0]
x3 = [0,0,1]
For index 0,
y = np.array([1-(xG_Team1[0] + xG_Team2[0])/k, xG_Team1[0]/k, xG_Team2[0]/k])
i.e. y = np.array([1-(1.440539 + 1.380095)/k, 1.440539/k, 1.380095/k])
For index 1,
y = np.array([1-(xG_Team1[1] + xG_Team2[1])/k, xG_Team1[1]/k, xG_Team2[1]/k])
Where k is the total_timeslot and a constant.
total_timeslot = 180
Home_Goal = [] # No Goal
Away_Goal = [] # No Goal
def sum_squared_diff(x1, x2, x3, y):
ssd=[]
for k in range(total_timeslot):
if k in Home_Goal:
ssd.append( sum((x2 - y)**2))
elif k in Away_Goal:
ssd.append(sum((x3 - y)**2))
else:
ssd.append(sum((x1 - y)**2))
return ssd
y_0 = sum_squared_diff(x1, x2, x3, y)
The plan is to sum up the output from sum_squared_diff for all y.
Something like, for all i sum(y_i).
So for i = 0,
y_0 = sum_squared_diff(x1, x2, x3, y_0)
len(y_0) = 180
sum(y_0) = 0.0663099498972334
Then I will have n numbers of sum(y_i) for n xGs.
using #Dillon code, for the above datframe, n=5
sum(results.sum()) = 0.31885730707076826
data = {'xG_Team1': {0: 1.440539, 1: 2.123673, 2: 1.819697, 3: 1.132676, 4: 1.244837},
'xG_Team2': {0: 1.380095, 1: 0.946116, 2: 0.92166, 3: 1.375717, 4: 1.269933}}
df = pd.DataFrame(data)
x1 = [1,0,0]
x2 = [0,1,0]
x3 = [0,0,1]
# Constants
total_timeslot = 180
k = 180
# Measures
Home_Goal = [] # No Goal
Away_Goal = [] # No Goal
def sum_squared_diff(x1, x2, x3, y):
ssd = []
for k in range(total_timeslot): # k will take multiple values
if k in Home_Goal:
ssd.append(sum((x2 - y) ** 2))
elif k in Away_Goal:
ssd.append(sum((x3 - y) ** 2))
else:
ssd.append(sum((x1 - y) ** 2))
return ssd
def my_function(row):
xG_Team1 = row.xG_Team1
xG_Team2 = row.xG_Team2
return np.array([1-(xG_Team1 + xG_Team2)/k, xG_Team1/k, xG_Team2/k])
# You can use the apply function
results = df.apply(lambda row: sum_squared_diff(x1, x2, x3, my_function(row)), axis=1)
# Each item in results is a 180 item list
results
Out[]:
0 [0.0003683886105401867, 0.0003683886105401867,...
1 [0.0004576767592872215, 0.0004576767592872215,...
2 [0.00036036396694006056, 0.0003603639669400605...
3 [0.00029220949467635905, 0.0002922094946763590...
4 [0.00029279065228265494, 0.0002927906522826549...
# For each list, calculate the sum
results.map(lambda x: sum(x))
Out[]:
0 0.066310
1 0.082382
2 0.064866
3 0.052598
4 0.052702
# Get the sum of all these values
results.map(lambda x: sum(x)).sum()
Out[]:
0.3188573070707662

Solving an Equation System in SciPy

i am trying to fit an exponential curve through three given Points. But i get only very wrong results of fsolve or actual 0. I need this for my Bachelor Thesis so if anyone knows a better solution for the problem, it would be very kind to tell me this solution.
from numpy import *
from scipy.optimize import *
def myFunction(variables):
x1 = 1
y1 = 100
x2 = 5
y2 = 50
x3 = 10
y3 = 1
(a,k,b) = variables
y1 = a*exp(-x1*k)+b
y2 = a*exp(-x2*k)+b
y3 = a*exp(-x3*k)+b
#0 = a*k**2 * exp(-x1+k)
return ([a, k, b])
z = fsolve(myFunction,(1,0.1,5))
print(z)
this is my problem, i need to fit an e function through this 3 given points, and in addition the second derivation of the forumla should be 0
edit: 06.12.17
in some way i have now an improvement with a polynom, but does not really fit like it should.
The second Maximum should not be there.. :D
from numpy import *
from scipy.optimize import *
import matplotlib.pyplot as plt
def myFunction(z):
a = z[0]
b = z[1]
c = z[2]
d = z[3]
e = z[4]
f = z[5]
g = z[6]
x = [0, 10 ,15 ,20 ,50 ,100]
y = [10 ,90 ,100 ,90 ,50 ,10]
s = [0, 10, 1, 0, 0, 0]
F = empty((8))
F[0] = a*x[0]**6 + b*x[0]**5 + c*x[0]**4 + d*x[0]**3 + e*x[0]**2 + f*x[0]**1 + g - y[0]
F[1] = a*x[1]**6 + b*x[1]**5 + c*x[1]**4 + d*x[1]**3 + e*x[1]**2 + f*x[1]**1 + g - y[1]
F[2] = a*x[2]**6 + b*x[2]**5 + c*x[2]**4 + d*x[2]**3 + e*x[2]**2 + f*x[2]**1 + g - y[2]
F[3] = a*x[3]**6 + b*x[3]**5 + c*x[3]**4 + d*x[3]**3 + e*x[3]**2 + f*x[3]**1 + g - y[3]
F[4] = a*x[4]**6 + b*x[4]**5 + c*x[4]**4 + d*x[4]**3 + e*x[4]**2 + f*x[4]**1 + g - y[4]
F[5] = a*x[5]**6 + b*x[5]**5 + c*x[5]**4 + d*x[5]**3 + e*x[5]**2 + f*x[5]**1 + g - y[5]
F[6] = 6*a*x[3]**5 + 5*b*x[3]**4 + 4*c*x[3]**3 + 3*d*x[3]**2 + 2*e*x[3]**1 + f - s[3]
F[7] = 6*a*x[5]**5 + 5*b*x[5]**4 + 4*c*x[5]**3 + 3*d*x[5]**2 + 2*e*x[5]**1 + f - s[5]
return F
zGuess = array([1,1,1,1,1,1,1,1])
z = fsolve(myFunction,zGuess)
print(z)
x_axis = linspace(0,100,100)
y_axis = z[0]*x_axis**6 + z[1]*x_axis**5 + z[2]*x_axis**4 + z[3]*x_axis**3 + z[4]*x_axis**2 + z[5]*x_axis**1 + z[6]
plt.plot(x_axis, y_axis)
plt.show()
edit 07.12.17
the whole signal should look like the data of the second example. But the difficulty is in the part of the first example. My suggestion was to use 2 polynoms, but my prof would prefer an polynom x<20 and an e function x>20. The overlapping of both should also be very smooth.
Well fsolve find the roots of a function, does not really do a non-linear fit. I must admit I don't actually quite get what you want to achieve with your code. If you want to do a nonlinear fit (since you are talking about exponential functions here) you may want to check my notebook here https://github.com/michelucci/Regression-with-Python/blob/master/(Non)%20linear%20fit%20in%20Python.ipynb that I hope can point you in the right direction. It contains first a part on linear regression and then a non-linear tutorial.
You can check curve_fit() python function in the scipy.optimize library. That should help you with what you want to do.
Let me know if that helps you.
You may also want to check this link to better understand what a non-linear fit is https://en.wikipedia.org/wiki/Nonlinear_regression
Best, Umberto

How to I get Gurobi to give only integer solutions?

I'm trying to optimize the following problem in python using Gurobi and the answer comes out as a decimal. How do I get the output to solve for optimal integers?
from gurobipy import *
def main():
pass
if __name__ == '__main__':
main()
try:
#Create a new model
m = Model("Investment");
#Create variables
x1 = m.addVar(vtype=GRB.CONTINUOUS, name="x1")
x2 = m.addVar(vtype=GRB.CONTINUOUS, name="x2")
x3 = m.addVar(vtype=GRB.CONTINUOUS, name="x3")
x4 = m.addVar(vtype=GRB.CONTINUOUS, name="x4")
x5 = m.addVar(vtype=GRB.CONTINUOUS, name="x5")
#Intigrate new variables
m.update()
#Set Objective
m.setObjective(160*x1 + 160*x2 + 160*x3 + 75*x4 + 75*x5, GRB.MINIMIZE)
m.addConstr( x1 + x2 + x3 >= 3, "c0")
m.addConstr( x1 >= 1, "c1")
m.addConstr( x2 >= 0, "c2")
m.addConstr( x3 >= 1, "c3")
m.addConstr( x4 >= 0, "c4")
m.addConstr( x5 >= 0, "c5")
m.addConstr(40*x1 + 40*x2 + 40*x3 + 25*x4 + 25*x5 >= 365,"c6")
m.optimize()
for v in m.getVars():
print v.varName, v.x
print "Obj:", m.objVal
except GurobiError:
print "Error reported"
Use .addVar(vtype=GRB.INTEGER, ...).
See http://www.gurobi.com/documentation/5.6/reference-manual/py_model_addvar
vtype = GRB.INTEGER
For binary vtype = GRB.BINARY, total 5 variables types

Categories

Resources