Function to calculate the average distance from a set of tuples (Python) - python

I need to implement a function that from a given set of points, specified by a pair of integers returns the average distance between the points. If there are less points that 2 in the set, it raises a Value Error.
distance is computed using the formula:
d=sqrt ((x1−x2)**2+(y1−y2)**2)
I'm struggling to get the loop to work, but it gives me an error that types.Genericaliases has no len(). Realised that this has something to do with the input being a set, but now I don't know how to resolve this:
def average_distance(points: set[tuple[int,int]]) -> float:
from math import sqrt
points = list[input()]
list_dist =[]
for index in range(0, len(points)):
coordinate = points[index] # tuple in the set points
x1 = coordinate[0] # first el in the pair
y1 = coordinate[1] # second el in the pair
next_coordinate = points[index +1]
x2 = next_coordinate[0]
y2 = next_coordinate[1]
distance = math.sqrt(((x1-x2)**2)+((y1-y2)**2))
list_dist.append(distance)
total_dist = 0
for dist in distance:
total_dist += dist
avg_dist = total_dist//(len(distance))
return avg_dist
So
print (average_distance({(1,2), (3,4), (5,6)}))
Expected output:
3.7712
Would be grateful for your advice on this.
Many thanks

Shorter solution using the library more:
from statistics import mean
from math import dist
from itertools import combinations, starmap
def average_distance(points):
return mean(starmap(dist, combinations(points, 2)))
print(average_distance({(1,2), (3,4), (5,6)}))
Output:
3.771236166328254

Here is my implementation for both the average distance between every group of two points given sequentially and all combinations of points. Take a look.
from itertools import combinations
from math import sqrt
from typing import List, NamedTuple
class Point(NamedTuple):
x: float
y: float
def distance(p1: Point, p2: Point) -> float:
return sqrt((p2.x - p1.x) ** 2 + (p2.y - p1.y) ** 2)
def avg_dist_between_all_points(points: List[Point]) -> float:
c = list(combinations(points, 2))
return sum(distance(*pair) for pair in c) / len(c)
def avg_dist_between_seq_points(points: List[Point]) -> float:
c = [points[i : i + 2] for i in range(len(points) - 1)]
return sum(distance(*pair) for pair in c) / len(c)
if __name__ == "__main__":
input_str = input("points (ex: 1,2 3,4 5,6): ")
point_strs = input_str.split(" ")
points: List[Point] = []
for s in point_strs:
x, y = s.split(",")
points.append(Point(float(x), float(y)))
print(avg_dist_between_all_points(points))
print(avg_dist_between_seq_points(points))
This yields:
➜ ./avgdist.py
points (ex: 1,2 3,4 5,6): 1,2 3,4 5,6
3.771236166328254
2.8284271247461903

Related

How to probabilistically populate a list in python?

I want to use a basic for loop to populate a list of values in Python but I would like the values to be calculate probabilistically such that p% of the time the values are calculated in (toy) equation 1 and 100-p% of the time the values are calculated in equation 2.
Here's what I've got so far:
# generate list of random probabilities
p_list = np.random.uniform(low=0.0, high=1.0, size=(500,))
my_list = []
# loop through but where to put 'p'? append() should probably only appear once
for p in p_list:
calc1 = x*y # equation 1
calc2 = (x-y) # equation 2
my_list.append(calc1)
my_list.append(calc2)
You've already generated a list of probabilities - p_list - that correspond to each value in my_list you want to generate. The pythonic way to do so is via a a ternary operator and a list comprehension:
import random
my_list = [(x*y if random() < p else x-y) for p in p_list]
If we were to expand this into a proper for loop:
my_list = []
for p in p_list:
if random() < p:
my_list.append(x*y)
else:
my_list.append(x-y)
If we wanted to be even more pythonic, regarding calc1 and calc2, we could make them into lambdas:
calc1 = lambda x,y: x*y
calc2 = lambda x,y: x-y
...
my_list = [calc1(x,y) if random() < p else calc2(x,y) for p in p_list]
or, depending on how x and y vary for your function (assuming they're not static), you could even do the comprehension in two steps:
calc_list = [calc1 if random() < p else calc2 for p in p_list]
my_list = [calc(x,y) for calc in calc_list]
I took approach of minimal changes to the original code and easy to understand syntax:
import numpy as np
p_list = np.random.uniform(low=0.0, high=1.0, size=(500,))
my_list = []
# uncomment below 2 lines to make this code syntactially correct
#x = 1
#y = 2
for p in p_list:
# randoms are uniformly distributed over the half-open interval [low, high)
# so check if p is in [0, 0.5) for equation 1 or [0.5, 1) for equation 2
if p < 0.5:
calc1 = x*y # equation 1
my_list.append(calc1)
else:
calc2 = (x-y) # equation 2
my_list.append(calc2)
The other answers seem to assume you want to keep the calculated chances around. If all you are after is a list of results for which equation 1 was used p% of the time and equation 2 100-p% of the time, this is all you need:
from random import random, seed
inputs = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
# change the seed to see different 'random' outcomes
seed(1)
results = [x * x if random() > 0.5 else 2 * x for x in inputs]
print(results)
If you are ok to use numpy worth trying the choice method.
https://docs.scipy.org/doc/numpy-1.14.1/reference/generated/numpy.random.choice.html

Is there a way to get the index of a percentage of a cummulative sum from a sorted list?

Given a sorted list of real numbers, e.g.
x = range(20)
The task is to find the first index of the X% of the cumulative sum of the list, e.g.
def compute_cumpercent(lint, percent):
break_point = sum(lint) * percent
mass = 0
for i, c in enumerate(lint):
if mass > break_point:
return i
mass += c
To find the index of the number in the input list which is less than and closes to 25% of the cumulative sum,
>>> compute_cumpercent(x, 0.25)
11
Firstly, is there a mathematical / name for such a function?
Other than doing it with the simple loop as above, is there a way to do the same with numpy or some bisect or otherwise?
Assume that input list is always sorted.
Something like this maybe?
import numpy as np
x = range(20)
percent = 0.25
cumsum = np.cumsum(x)
break_point = cumsum[-1] * percent
np.argmax(cumsum >= break_point) + 1 # 11
import numpy as np
x = np.arange(20)
Percent = 25
CumSumArray = np.cumsum(x)
ValueToFind = CumSumArray[-1] * Percent / 100
Idx = np.argmax(CumSumArray > ValueToFind)[0] - 1
Following this hint, one can use searchsorted to find an index of the element, that is close (lower) to a percentile/quantile value.
See example below:
import numpy as np
def find_index_left(xs, v):
return np.searchsorted(xs, v, side='left') - 1
def find_index_quantile(xs, q):
v = np.quantile(xs, q)
return find_index_left(xs, v)
xs = [5, 10, 11, 15, 20]
assert np.quantile(xs, 0.9) == 18.0
assert find_index_left(xs, 18) == 3 # zero-based index for forth element
assert find_index_quantile(xs, 0.9) == 3
Note xs has to be sorted.

How to calculate sum of two polynomials?

For instance 3x^4 - 17x^2 - 3x + 5. Each term of the polynomial can be represented as a pair of integers (coefficient,exponent). The polynomial itself is then a list of such pairs like
[(3,4), (-17,2), (-3,1), (5,0)] for the polynomial as shown.
Zero polynomial, 0, is represented as the empty list [], since it has no terms with nonzero coefficients.
I want to write two functions to add and multiply two input polynomials with the same representation of tuple (coefficient, exponent):
addpoly(p1, p2)
multpoly(p1, p2)
Test Cases:
addpoly([(4,3),(3,0)], [(-4,3),(2,1)])
should give [(2, 1),(3, 0)]
addpoly([(2,1)],[(-2,1)])
should give []
multpoly([(1,1),(-1,0)], [(1,2),(1,1),(1,0)])
should give [(1, 3),(-1, 0)]
Here is something that I started with but got completely struck!
def addpoly(p1, p2):
(coeff1, exp1) = p1
(coeff2, exp2) = p2
if exp1 == exp2:
coeff3 = coeff1 + coeff2
As suggested in the comments, it is much simpler to represent polynomials as multisets of exponents.
In Python, the closest thing to a multiset is the Counter data structure. Using a Counter (or even just a plain dictionary) that maps exponents to coefficients will automatically coalesce entries with the same exponent, just as you'd expect when writing a simplified polynomial.
You can perform operations using a Counter, and then convert back to your list of pairs representation when finished using a function like this:
def counter_to_poly(c):
p = [(coeff, exp) for exp, coeff in c.items() if coeff != 0]
# sort by exponents in descending order
p.sort(key = lambda pair: pair[1], reverse = True)
return p
To add polynomials, you group together like-exponents and sum their coefficients.
def addpoly(p, q):
r = collections.Counter()
for coeff, exp in (p + q):
r[exp] += coeff
return counter_to_poly(r)
(In fact, if you were to stick with the Counter representation throughout, you could just return p + q).
To multiply polynomials, you multiply each term from one polynomial pairwise with every term from the other. And furthermore, to multiply terms, you add exponents and multiply coefficients.
def mulpoly(p, q):
r = collections.Counter()
for (c1, e1), (c2, e2) in itertools.product(p, q):
r[e1 + e2] += c1 * c2
return counter_to_poly(r)
This python code worked for me,hope this works for u too...
Addition func
def addpoly(p1,p2):
i=0
su=0
j=0
c=[]
if len(p1)==0:
#if p1 empty
return p2
if len(p2)==0:
#if p2 is empty
return p1
while i<len(p1) and j<len(p2):
if int(p1[i][1])==int(p2[j][1]):
su=p1[i][0]+p2[j][0]
if su !=0:
c.append((su,p1[i][1]))
i=i+1
j=j+1
elif p1[i][1]>p2[j][1]:
c.append((p1[i]))
i=i+1
elif p1[i][1]<p2[j][1]:
c.append((p2[j]))
j=j+1
if p1[i:]!=[]:
for k in p1[i:]:
c.append(k)
if p2[j:]!=[]:
for k in p2[j:]:
c.append(k)
return c
Multiply func
def multipoly(p1,p2):
p=[]
s=0
for i in p1:
c=[]
for j in p2:
s=i[0]*j[0]
e=i[1]+j[1]
c.append((s,e))
p=addpoly(c,p)
return p
I have come up with a solution but I'm unsure that it's optimized!
def addpoly(p1,p2):
for i in range(len(p1)):
for item in p2:
if p1[i][1] == item[1]:
p1[i] = ((p1[i][0] + item[0]),p1[i][1])
p2.remove(item)
p3 = p1 + p2
for item in (p3):
if item[0] == 0:
p3.remove(item)
return sorted(p3)
and the second one:-
def multpoly(p1,p2):
for i in range(len(p1)):
for item in p2:
p1[i] = ((p1[i][0] * item[0]), (p1[i][1] + item[1]))
p2.remove(item)
return p1

Random function with break command in Python

Write a function that accepts 3 numbers and calculates the average of the 3 numbers and raises the average to the second power (returns the average squared).
Write a loop that finds 3 random uniform numbers (0 to 1); sends the 3 numbers to the function and stops the loop when the value of the function is greater than 0.5625
I tried to figure out this 2 things but I am confused a little bit.
import random
a = random.random ()
b = random.random ()
c = random.random ()
def avenum(x1,x2,x3): # the average of the 3 numbers
z = (x1+x2+x3)/3.0
return z
y = avenum(a,b,c)
print 'the average of the 3 numbers = ',y
def avesec(x1,x2,x3): # the average of the second power
d = ((x1**2)+(x2**2)+(x3**2))/3.0
return d
y1 = avesec(a,b,c)
print 'the average of the second power = ',y1
The first question:
Write a function that accepts 3 numbers and calculates the average of the 3 numbers and raises the average to the second power (returns the average squared).
def square_of_average(x1, x2, x3):
z = (x1 + x2 + x3) / 3
return z ** 2 # This returns the square of the average
Your second question:
Write a loop that finds 3 random uniform numbers (0 to 1); sends the 3 numbers to the function and stops the loop when the value of the function is greater than 0.5625.
Assuming you want to write this in another function:
import random
def three_random_square_average():
z = 0 # initialize your answer
while(z <= 0.5625): # While the answer is less or equal than 0.5625...
# Generate three random numbers:
a, b, c = random.random(), random.random(), random.random()
# Assign the square of the average to your answer variable
z = square_of_average(a, b, c)
# When the loop exits, return the answer
return z
Another option:
import random
def three_random_squared_average():
while(True):
a, b, c = random.random(), random.random(), random.random()
z = square_of_average(a, b, c)
if(z > 0.5625):
break
return z
If you don't want a function:
import random
z = 0
while(z < 0.5625):
z = square_of_average(random.random(), random.random(), random.random())
print z
Firstly for 1) - you're raising the average to the second power... not each value. Otherwise you want the average of the second powers of the input values.
import random
a = random.random ()
b = random.random ()
c = random.random ()
def avenum1(x1,x2,x3): # the average of the 3 numbers
z = ((x1+x2+x3)/3.0)**2
return z
For 2): There are better ways but this is the most obvious.
def avenum1(x1,x2,x3): # the average of the 3 numbers
z = ((x1+x2+x3)/3.0)**2
return z
avg = 0:
while avg<0.5625:
a = random.random ()
b = random.random ()
c = random.random ()
avg = avenum1(a,b,c)
The better way:
avg = 0
while avg<0.5625:
list_ = [random.random() for i in range(3)]
avg = (sum(list_)/3.0)**2

How do you make this code more pythonic?

Could you guys please tell me how I can make the following code more pythonic?
The code is correct. Full disclosure - it's problem 1b in Handout #4 of this machine learning course. I'm supposed to use newton's algorithm on the two data sets for fitting a logistic hypothesis. But they use matlab & I'm using scipy
Eg one question i have is the matrixes kept rounding to integers until I initialized one value to 0.0. Is there a better way?
Thanks
import os.path
import math
from numpy import matrix
from scipy.linalg import inv #, det, eig
x = matrix( '0.0;0;1' )
y = 11
grad = matrix( '0.0;0;0' )
hess = matrix('0.0,0,0;0,0,0;0,0,0')
theta = matrix( '0.0;0;0' )
# run until convergence=6or7
for i in range(1, 6):
#reset
grad = matrix( '0.0;0;0' )
hess = matrix('0.0,0,0;0,0,0;0,0,0')
xfile = open("q1x.dat", "r")
yfile = open("q1y.dat", "r")
#over whole set=99 items
for i in range(1, 100):
xline = xfile.readline()
s= xline.split(" ")
x[0] = float(s[1])
x[1] = float(s[2])
y = float(yfile.readline())
hypoth = 1/ (1+ math.exp(-(theta.transpose() * x)))
for j in range(0,3):
grad[j] = grad[j] + (y-hypoth)* x[j]
for k in range(0,3):
hess[j,k] = hess[j,k] - (hypoth *(1-hypoth)*x[j]*x[k])
theta = theta - inv(hess)*grad #update theta after construction
xfile.close()
yfile.close()
print "done"
print theta
One obvious change is to get rid of the "for i in range(1, 100):" and just iterate over the file lines. To iterate over both files (xfile and yfile), zip them. ie replace that block with something like:
import itertools
for xline, yline in itertools.izip(xfile, yfile):
s= xline.split(" ")
x[0] = float(s[1])
x[1] = float(s[2])
y = float(yline)
...
(This is assuming the file is 100 lines, (ie. you want the whole file). If you're deliberately restricting to the first 100 lines, you could use something like:
for i, xline, yline in itertools.izip(range(100), xfile, yfile):
However, its also inefficient to iterate over the same file 6 times - better to load it into memory in advance, and loop over it there, ie. outside your loop, have:
xfile = open("q1x.dat", "r")
yfile = open("q1y.dat", "r")
data = zip([line.split(" ")[1:3] for line in xfile], map(float, yfile))
And inside just:
for (x1,x2), y in data:
x[0] = x1
x[1] = x2
...
x = matrix([[0.],[0],[1]])
theta = matrix(zeros([3,1]))
for i in range(5):
grad = matrix(zeros([3,1]))
hess = matrix(zeros([3,3]))
[xfile, yfile] = [open('q1'+a+'.dat', 'r') for a in 'xy']
for xline, yline in zip(xfile, yfile):
x.transpose()[0,:2] = [map(float, xline.split(" ")[1:3])]
y = float(yline)
hypoth = 1 / (1 + math.exp(theta.transpose() * x))
grad += (y - hypoth) * x
hess -= hypoth * (1 - hypoth) * x * x.transpose()
theta += inv(hess) * grad
print "done"
print theta
the matrixes kept rounding to integers until I initialized one value
to 0.0. Is there a better way?
At the top of your code:
from __future__ import division
In Python 2.6 and earlier, integer division always returns an integer unless there is at least one floating point number within. In Python 3.0 (and in future division in 2.6), division works more how we humans might expect it to.
If you want integer division to return an integer, and you've imported from future, use a double //. That is
from __future__ import division
print 1//2 # prints 0
print 5//2 # prints 2
print 1/2 # prints 0.5
print 5/2 # prints 2.5
You could make use of the with statement.
the code that reads the files into lists could be drastically simpler
for line in open("q1x.dat", "r"):
x = map(float,line.split(" ")[1:])
y = map(float, open("q1y.dat", "r").readlines())

Categories

Resources