Algorithm to find least sum of squares of differences

Algorithm to find least sum of squares of differences - python

Basically this algorithm I'm writing takes as input a List L and wants to find a number x such that all items in L, i, minus x squared and summed are minimized. Find minimum x for the sum of abs(L[i]-x)**2. So far my algorithm is doing what it's supposed to, just not in the cases of floating. I'm not sure how to implement floating. For example [2, 2, 3, 4] ideally would yield the result 2.75, but my algorithm isn't currently capable of yielding floating integers.
def minimize_square(L):
sumsqdiff = 0
sumsqdiffs = {}
for j in range(min(L), max(L)):
for i in range(len(L)-1):
sumsqdiff += abs(L[i]-j)**2
sumsqdiffs[j]=sumsqdiff
sumsqdiff = 0
return min(sumsqdiffs, key=sumsqdiffs.get)

It is easy to prove [*] that the number that minimizes the sum of squared differences is the arithmetic mean of L. This gives the following simple solution:
In [26]: L = [2, 2, 3, 4]
In [27]: sum(L) / float(len(L))
Out[27]: 2.75
or, using NumPy:
In [28]: numpy.mean(L)
Out[28]: 2.75
[*] Here is an outline of the proof:
We need to find x that minimizes f(x) = sum((x - L[i])**2) where the sum is taken over i=0..n-1.
Take the derivative of f(x) and set it to zero:
2*sum(x - L[i]) = 0
Using simple algebra, the above can be transformed into
x = sum(L[i]) / n
which is none other than the arithmetic mean of L. QED.

I am not 100% sure this is the most efficient way to do this but what you could do is mantain the same algorithm that you have and modify the return statement.
min_int = min(sumsqdiffs, key=sumsqdiffs.get)
return bisection(L,min_int-1,min_int+1)
where bisection implement the following method: Bisection Method
This works iff there is a single minimum for the function in the analyzed interval.

Related

How to find the number that is closest to the average of the numbers?

For example, most_average([1, 2, 3, 4, 5]) should return 3 (the average of the numbers in the list is 3.0, and 3 is clearly closest to this).
most_average([3, 4, 3, 1]) should also return 3 (the average is 2.75, and 3 is closer to 2.75 than is any other number in the list).
This is what I have right now:
def most_average(numbers):
sum = 0
for num in numbers:
sum += num
result = sum / len(numbers)
return result
I can only get the normal average, but I don't know how to find the most closest number in the list.

Combining pythons min function with the key option, this is a one-liner:
numbers = [1, 2, 3, 4, 5]
closest_to_avg = min(numbers, key=lambda x: abs(x - sum(numbers)/len(numbers)))
print(closest_to_avg)
# 3
Explanation, via break-down to more lines:
avg_of_numbers = sum(numbers) / len(numbers)
print(avg_of_numbers)
# 3
So the average can be calculated without any (explicit) loops.
Now what you need is to just calculate the absolute difference between each of numbers and the average:
abs_diffs_from_avg = [abs(x - avg_of_numbers) for x in numbers]
The number in numbers minimizing this diff is the number you want, and you can see this by looking at each number and its corresponding diff:
print([(x, abs(x - avg_of_numbers)) for x in numbers])
# contains the pair (3, 0.0), which is indeed the minimum in this case)
So you just pass this diff as the key to the min function...
(Regarding usage of the key input, this is defined as "a function to customize the sort order", and is used in sort, max, and other python functions. There are many explanations of this functionality, but for min it basically means "use this function to define the ordering of the list in ascending order, and then just take the first element".)
EDIT:
As recomended in the comment, the average should be calculated outside the min, so as to avoid recalculating. So now it's a two-liner ;-)
numbers = [1, 2, 3, 4, 5]
avg_of_numbers = sum(numbers) / len(numbers)
closest_to_avg = min(numbers, key=lambda x: abs(x - avg_of_numbers))
print(closest_to_avg)
# 3

My idea is to subtract the average from the list of numbers, get the absolute value, and find the index of the minimum.
import numpy as np
a = [1, 2, 3, 4, 6]
avg = np.average(a)
print(f"Average of {a} is : {avg}")
dist_from_avg = np.abs(a - avg)
#get the index of the minimum
closest_idx = np.argmin(dist_from_avg)
print(f"Closest to average is : {a[closest_idx]}")
Which prints
Average of [1, 2, 3, 4, 6] is : 3.2
Closest to average is : 3

This is pretty simple - get the average (mean) of your numbers, find the variance of each of your numbers, and which has the minimum variance (minimum). Then return the index of the element with that variance.
def most_average(ls: list[int]) -> int:
mean = sum(ls) / len(ls) # Figures out where the mean average is.
variance = [abs(v - mean) for v in ls] # Figures out how far from the mean each element in the list is.
minimum = min(variance) # Figures out what the smallest variance is (this is the number closest to the mean).
return ls[variance.index(minimum)] # Returns the element that has the minimal variance.
In the repl:
>>> most_average([1,2,3,4,5])
3
I will say that the expense here is that you're creating an entire new list in order to calculate and record the variance of every member of the original list. But, absent other constraints, this is the most straightforward way to think about it.
Some key functions that will help you here:
sum(<some list or iterable>) -> adds it all up
len(<some list or iterable>) -> the length of the iterable
abs(<some value>) -> If it is negative, make it positive
min(<some list or iterable>) -> Find the smallest value and return it
<list>.index(<value>) -> Get the index of the value you pass
The last is interesting here, because if you calculate all the variances, you can quickly index into the original list if you ask your variance list where the smallest value is. Because the map one to one, this maps into your original list.
There is a last caveat to mention - this cannot decide whether 2 or 3 is the closest value to the mean in the list [1,2,3,4]. You'll have to make a modification if the result is not 2.

Optimizing a factorial function in python

So i have achieved this function with unpacking parameter(*x), but i want to make it display the result not return it , and i want a good optimization meaning i still need it to be a two lines function
1.def fac(*x):
2.return (fac(list(x)[0], list(x)[1] - 1)*list(x)[1]) if list(x)[1] > 0 else 1//here i need the one line to print the factorial
i tried achieving this by implementing lambda but i didn't know how to pass the *x parameter

Your factorial lambda is correct. I take it that you would like to calculate the factorials for a list say [1, 2, 3] and output the results, this is how you can achieve this.
fact = lambda x: x*fact(x-1) if x > 0 else 1
print(*[fact(i) for i in [1, 2, 3]])
Which will output: 1, 2, 6
Another option, if you have python 3.8 is to use a list comprehension with the new walrus operator (:=), this is a bit more tricky but will calculate and output all factorials up to n inclusive whilst still fitting in your required two lines.
fac, n = 1, 5
print(*[fac for i in range(1, n+1) if (fac := fac*i)])
Which will output: 1, 2, 6, 24, 120

The optimized factorial number is display by the function that i have created below.
def fact(n):
list_fact = []
if n > 1 and n not in list_fact:
list_fact.extend(list(range(1, n + 1)))
return reduce(lambda x, y: x * y, list_fact)
print(fact(9000)) # it will display output within microseconds.
Note:
while iteration i saved all previous values into a list, so that computation of each value is not going to happen each time.

How does enumerate work in polynomial function (Python)?

I am a Python beginner and a bit confused about enumerate function in summing the polynomial problem in the following SO thread:
Evaluating Polynomial coefficients
The thread includes several ways to solve the summing of polynomials. I understand the following version well:
def evalP(lst, x):
total = 0
for power in range(len(lst)):
total += (x**power) * lst[power] # lst[power] is the coefficient
return total
E.g. if I take third degree polynomial with x = 2, the program returns 15 as I expected based on pen and paper calculations:
evalP([1,1,1,1],2)
Out[64]:
15
But there is another, neater version of doing this that uses enumerate function:
evalPoly = lambda lst, x: sum((x**power) * coeff for power, coeff in enumerate(lst))
The problem is that I just can't get that previous result replicated with that. This is what I've tried:
coeff = 1
power = 3
lst = (power,coeff)
x = 2
evalPoly(lst,x)
And this is what the program returns:
Out[68]:
5
Not what I expected. I think I have misunderstood how that enumerate version takes on the coefficient. Could anyone tell me how I am thinking this wrong?
The previous version seems more general as it allows for differing coefficients in the list, whereas I am not sure what that scalar in enumerate version represents.

You should call evalPoly with the same arguments as evalP, e.g. evalPoly([1,1,1,1],2)
When you call evalPoly([3,1],2), it return 3*2^0 + 1*2^1 which equals 5.

Random contiguous slice of list in Python based on a single random integer

Using a single random number and a list, how would you return a random slice of that list?
For example, given the list [0,1,2] there are seven possibilities of random contiguous slices:
[ ]
[ 0 ]
[ 0, 1 ]
[ 0, 1, 2 ]
[ 1 ]
[ 1, 2]
[ 2 ]
Rather than getting a random starting index and a random end index, there must be a way to generate a single random number and use that one value to figure out both starting index and end/length.
I need it that way, to ensure these 7 possibilities have equal probability.

Simply fix one order in which you would sort all possible slices, then work out a way to turn an index in that list of all slices back into the slice endpoints. For example, the order you used could be described by
The empty slice is before all other slices
Non-empty slices are ordered by their starting point
Slices with the same starting point are ordered by their endpoint
So the index 0 should return the empty list. Indices 1 through n should return [0:1] through [0:n]. Indices n+1 through n+(n-1)=2n-1 would be [1:2] through [1:n]; 2n through n+(n-1)+(n-2)=3n-3 would be [2:3] through [2:n] and so on. You see a pattern here: the last index for a given starting point is of the form n+(n-1)+(n-2)+(n-3)+…+(n-k), where k is the starting index of the sequence. That's an arithmetic series, so that sum is (k+1)(2n-k)/2=(2n+(2n-1)k-k²)/2. If you set that term equal to a given index, and solve that for k, you get some formula involving square roots. You could then use the ceiling function to turn that into an integral value for k corresponding to the last index for that starting point. And once you know k, computing the end point is rather easy.
But the quadratic equation in the solution above makes things really ugly. So you might be better off using some other order. Right now I can't think of a way which would avoid such a quadratic term. The order Douglas used in his answer doesn't avoid square roots, but at least his square root is a bit simpler due to the fact that he sorts by end point first. The order in your question and my answer is called lexicographical order, his would be called reverse lexicographical and is often easier to handle since it doesn't depend on n. But since most people think about normal (forward) lexicographical order first, this answer might be more intuitive to many and might even be the required way for some applications.
Here is a bit of Python code which lists all sequence elements in order, and does the conversion from index i to endpoints [k:m] the way I described above:
from math import ceil, sqrt
n = 3
print("{:3} []".format(0))
for i in range(1, n*(n+1)//2 + 1):
b = 1 - 2*n
c = 2*(i - n) - 1
# solve k^2 + b*k + c = 0
k = int(ceil((- b - sqrt(b*b - 4*c))/2.))
m = k + i - k*(2*n-k+1)//2
print("{:3} [{}:{}]".format(i, k, m))
The - 1 term in c doesn't come from the mathematical formula I presented above. It's more like subtracting 0.5 from each value of i. This ensures that even if the result of sqrt is slightly too large, you won't end up with a k which is too large. So that term accounts for numeric imprecision and should make the whole thing pretty robust.
The term k*(2*n-k+1)//2 is the last index belonging to starting point k-1, so i minus that term is the length of the subsequence under consideration.
You can simplify things further. You can perform some computation outside the loop, which might be important if you have to choose random sequences repeatedly. You can divide b by a factor of 2 and then get rid of that factor in a number of other places. The result could look like this:
from math import ceil, sqrt
n = 3
b = n - 0.5
bbc = b*b + 2*n + 1
print("{:3} []".format(0))
for i in range(1, n*(n+1)//2 + 1):
k = int(ceil(b - sqrt(bbc - 2*i)))
m = k + i - k*(2*n-k+1)//2
print("{:3} [{}:{}]".format(i, k, m))

It is a little strange to give the empty list equal weight with the others. It is more natural for the empty list to be given weight 0 or n+1 times the others, if there are n elements on the list. But if you want it to have equal weight, you can do that.
There are n*(n+1)/2 nonempty contiguous sublists. You can specify these by the end point, from 0 to n-1, and the starting point, from 0 to the endpoint.
Generate a random integer x from 0 to n*(n+1)/2.
If x=0, return the empty list. Otherwise, x is unformly distributed from 1 through n(n+1)/2.
Compute e = floor(sqrt(2*x)-1/2). This takes the values 0, 1, 1, 2, 2, 2, 3, 3, 3, 3, etc.
Compute s = (x-1) - e*(e+1)/2. This takes the values 0, 0, 1, 0, 1, 2, 0, 1, 2, 3, ...
Return the interval starting at index s and ending at index e.
(s,e) takes the values (0,0),(0,1),(1,1),(0,2),(1,2),(2,2),...
import random
import math
n=10
x = random.randint(0,n*(n+1)/2)
if (x==0):
print(range(n)[0:0]) // empty set
exit()
e = int(math.floor(math.sqrt(2*x)-0.5))
s = int(x-1 - (e*(e+1)/2))
print(range(n)[s:e+1]) // starting at s, ending at e, inclusive

First create all possible slice indexes.
[0:0], [1:1], etc are equivalent, so we include only one of those.
Finally you pick a random index couple, and apply it.
import random
l = [0, 1, 2]
combination_couples = [(0, 0)]
length = len(l)
# Creates all index couples.
for j in range(1, length+1):
for i in range(j):
combination_couples.append((i, j))
print(combination_couples)
rand_tuple = random.sample(combination_couples, 1)[0]
final_slice = l[rand_tuple[0]:rand_tuple[1]]
print(final_slice)
To ensure we got them all:
for i in combination_couples:
print(l[i[0]:i[1]])
Alternatively, with some math...
For a length-3 list there are 0 to 3 possible index numbers, that is n=4. You have 2 of them, that is k=2. First index has to be smaller than second, therefor we need to calculate the combinations as described here.
from math import factorial as f
def total_combinations(n, k=2):
result = 1
for i in range(1, k+1):
result *= n - k + i
result /= f(k)
# We add plus 1 since we included [0:0] as well.
return result + 1
print(total_combinations(n=4)) # Prints 7 as expected.

there must be a way to generate a single random number and use that one value to figure out both starting index and end/length.
It is difficult to say what method is best but if you're only interested in binding single random number to your contiguous slice you can use modulo.
Given a list l and a single random nubmer r you can get your contiguous slice like that:
l[r % len(l) : some_sparkling_transformation(r) % len(l)]
where some_sparkling_transformation(r) is essential. It depents on your needs but since I don't see any special requirements in your question it could be for example:
l[r % len(l) : (2 * r) % len(l)]
The most important thing here is that both left and right edges of the slice are correlated to r. This makes a problem to define such contiguous slices that wont follow any observable pattern. Above example (with 2 * r) produces slices that are always empty lists or follow a pattern of [a : 2 * a].
Let's use some intuition. We know that we want to find a good random representation of the number r in a form of contiguous slice. It cames out that we need to find two numbers: a and b that are respectively left and right edges of the slice. Assuming that r is a good random number (we like it in some way) we can say that a = r % len(l) is a good approach.
Let's now try to find b. The best way to generate another nice random number will be to use random number generator (random or numpy) which supports seeding (both of them). Example with random module:
import random
def contiguous_slice(l, r):
random.seed(r)
a = int(random.uniform(0, len(l)+1))
b = int(random.uniform(0, len(l)+1))
a, b = sorted([a, b])
return l[a:b]
Good luck and have fun!

middle number without using median function, Python

I have been looking for how to find the middle number in the list so that I do not use the median function, but cannot find the information how to do that.
I need to write a code which takes middle(L) function (have to define it), makes a list L as its argument, and returns the item in the middle position of L. (In order that the middle is well-defined, i should assume that L has odd length.)
It is all i have right now and actually have no idea how to do that.
def middle (L):
i= len((L)[0:-1])/2
return i
print (middle)

To find the median, just sort the list and return the number in the middle position or (if the list has even number of elements), return the average of the 2 elements in middle:
def middle(L):
L = sorted(L)
n = len(L)
m = n - 1
return (L[n/2] + L[m/2]) / 2.0
Example:
>>> print middle([1, 2, 3, 4, 5])
3.0
>>> print middle([1, 2, 3, 4, 5, 6])
3.5

As NPE's answer suggests you just have to get the middle element of a sorted list when the list has an uneven number of elements, if it has an even number of elements you take the average of the middle two elements:
def median(l):
srt = sorted(l)
mid = len(l)//2
if len(l) % 2: # f list length mod 2 has a remainder the list is an odd lenght
return srt[mid]
else:
med = (srt[mid] + srt[mid-1]) / 2 # in a list [1,2,3,4] srt[mid]-> 2, srt[mid-1] -> 3
return med

For optimization, we should use binary search to detect the median, rather than to sort all numbers.
For details, please check:
https://www.quora.com/Given-a-list-of-unsorted-numbers-how-would-you-find-the-median-without-sorting-the-original-array
For the code, please check:
https://medium.com/#nxtchg/calculating-median-without-sorting-eaa639cedb9f
There are two well-known ways to calculate median:
naive way (sort, pick the middle)
using quickselect (or similar algorithm for weighted median)
Hope it helps.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Algorithm to find least sum of squares of differences - python

Related

How to find the number that is closest to the average of the numbers?

Optimizing a factorial function in python

How does enumerate work in polynomial function (Python)?

Random contiguous slice of list in Python based on a single random integer

middle number without using median function, Python

Categories

Resources