Is there any way to optimise this function? - python

I was told to make a program that solves a simple python exercise which should run under 12000ms. I managed to get a working piece of code. However it only works for small numbers passed into the n parameter.
def function(n):
res = [(a, b) for a in range(1, n+1) for b in range(1, n+1) if a*b == sum([i for i in range(1, n+1) if i!=a and i!=b])]
return res
Is there any way to optimise the code so that it runs under 12000ms for large numbers of n (e.g. n=100000)?
Exercise:
A friend of mine takes the sequence of all numbers from 1 to n (where n > 0).
Within that sequence, he chooses two numbers, a and b.
He says that the product of a and b should be equal to the sum of all numbers in the sequence, excluding a and b.
Given a number n, could you tell me the numbers he excluded from the sequence?
The function takes the parameter: n (n is always strictly greater than 0) and returns an array or a string (depending on the language) of the form:
[(a, b), ...] with all (a, b) which are the possible removed numbers in the sequence 1 to n.
[(a, b), ...] will be sorted in increasing order of the "a".
It happens that there are several possible (a, b). The function returns an empty array (or an empty string) if no possible numbers are found which will prove that my friend has not told the truth! (Go: in this case return nil).
E.g. function(26) should return [(15, 21), (21, 15)]

sum([i for i in range(1, n+1) if i!=a and i!=b])
is pretty easily optimized out. Just put:
basesum = sum(range(1, n+1))
outside the listcomp, then change the test to:
if a*b == basesum - sum({a, b}) # Accounts for possibility of a == b by deduping
or if a==b is not supposed to be allowed, the even simpler:
if a*b == basesum - a - b
That instantly reduces the per element work from O(n) to O(1), which should cut overall work from O(n**3) to O(n**2).
There's other optimizations available, but that's an easy one with a huge impact on big-O runtime.
If I'm reading the prompt correctly, your a and b are order-insensitive. So if your results can just show (a, b) and not (b, a) as well, you can replace:
for a in range(1, n+1) for b in range(1, n+1)
with:
for a, b in itertools.combinations(range(1, n+1), 2)
or if a == b is allowed:
for a, b in itertools.combinations_with_replacement(range(1, n+1), 2)
which halves the amount of work to do "for free" (and does more of it at the C layer instead of the Python bytecode layer, which often speeds things up a little more). If you must get the results in both orders, you can post-process to produce the reverse of each non-duplicated pair as well (or be a lazy programmer and use for a, b in itertools.permutations(range(1, n+1), 2) or for a, b in itertools.product(range(1, n+1), repeat=2) instead of combinations or combinations_with_replacement respectively, doing most/all of the work of your original nested loop, but shoving more to the C layer so the same theoretical work runs a little faster in practice).

This is more of a math problem than anything else:
Isolate b:
a*b = sum - (a+b)
(a+1)*b = sum - a
b = (sum - a)/(a+1)
Now you can substitute b where needed. With b out of the way, you don't have to iterate over the list for each element in it. You can iterate over the list just once, applying the equation for each element.
In fact, you don't even have to go through the whole list. Verifying its first sqrt(sum) elements is enough, as anything bigger than that has to be multiplied by another smaller than that number.
Here is the code:
import math
n = 26
valid = []
sum_n = (n+1)*n/2
limit = int(math.sqrt(sum_n)-0.5)
for a in range(1, (limit+1)):
if (sum_n-a) % (a+1) == 0:
valid.append(( a, int((sum_n-a) / (a+1)) ))
if valid:
if valid[-1][0] == valid[-1][1]:
valid += [(x, y) for y, x in reversed(valid[:-1])]
else:
valid += [(x, y) for y, x in reversed(valid)]
print(valid)
And the output:
[(1, 175), (3, 87), (7, 43), (10, 31), (15, 21), (21, 15), (31, 10), (43, 7), (87, 3), (175, 1)]

Related

Python Get Random Unique N Pairs

Say I have a range(1, n + 1). I want to get m unique pairs.
What I found is, if the number of pairs is close to n(n-1)/2 (maxiumum number of pairs), one can't simply generate random pairs everytime because they will start overriding eachother. I'm looking for a somewhat lazy solution, that will be very efficient (in Python's world).
My attempt so far:
def get_input(n, m):
res = str(n) + "\n" + str(m) + "\n"
buffet = range(1, n + 1)
points = set()
while len(points) < m:
x, y = random.sample(buffet, 2)
points.add((x, y)) if x > y else points.add((y, x)) # meeh
for (x, y) in points:
res += "%d %d\n" % (x, y);
return res
You can use combinations to generate all pairs and use sample to choose randomly. Admittedly only lazy in the "not much to type" sense, and not in the use a generator not a list sense :-)
from itertools import combinations
from random import sample
n = 100
sample(list(combinations(range(1,n),2)),5)
If you want to improve performance you can make it lazy by studying this
Python random sample with a generator / iterable / iterator
the generator you want to sample from is this: combinations(range(1,n)
Here is an approach which works by taking a number in the range 0 to n*(n-1)/2 - 1 and decodes it to a unique pair of items in the range 0 to n-1. I used 0-based math for convenience, but you could of course add 1 to all of the returned pairs if you want:
import math
import random
def decode(i):
k = math.floor((1+math.sqrt(1+8*i))/2)
return k,i-k*(k-1)//2
def rand_pair(n):
return decode(random.randrange(n*(n-1)//2))
def rand_pairs(n,m):
return [decode(i) for i in random.sample(range(n*(n-1)//2),m)]
For example:
>>> >>> rand_pairs(5,8)
[(2, 1), (3, 1), (4, 2), (2, 0), (3, 2), (4, 1), (1, 0), (4, 0)]
The math is hard to easily explain, but the k in the definition of decode is obtained by solving a quadratic equation which gives the number of triangular numbers which are <= i, and where i falls in the sequence of triangular numbers tells you how to decode a unique pair from it. The interesting thing about this decode is that it doesn't use n at all but implements a one-to-one correspondence from the set of natural numbers (starting at 0) to the set of all pairs of natural numbers.
I don't think any thing on your line can improve. After all, as your m get closer and closer to the limit n(n-1)/2, you have thinner and thinner chance to find the unseen pair.
I would suggest to split into two cases: if m is small, use your random approach. But if m is large enough, try
pairs = list(itertools.combination(buffet,2))
ponits = random.sample(pairs, m)
Now you have to determine the threshold of m that determines which code path it should go. You need some math here to find the right trade off.

How can I fix this Pythagorean Triplet program?

import sys
def pythTrue(a,b,c):
(A,B,C) = (a*a,b*b,c*c)
if A + B == C or B + C == A or A + C == B:
return True
def smallestTrip(a,b,c):
if pythTrue(a,b,c) == True:
if (a+b+c)%12 == 0:
return True
else:
return False
def tuplePyth(n):
list_=[]
for x in range(1, n):
for y in range(1, n):
for z in range (1, n):
if x+y+z<=n:
if smallestTrip(x, y, z)==False:
list_.append([x,y,z])
print (list_)
tuplePyth(int(sys.argv[1]))
Pythagorean triplets are sets of 3 positive integers a, b, c
satisfying the relationship a2 + b2 =
c2. The smallest and best-known Pythagorean triple is
(a, b, c) = (3, 4, 5). Write a program that reads a command line
argument n and prints to the screen all Pythagorean triplets whose sum
is less than n (i.e., a+b+c < n) and that are not multiple of the (3,
4, 5) triplet. Your program will represent triplets as 3-tuples, and
should consist of three functions:
a function that takes in a tuple
and returns a boolean indicating whether the Pythagorean relationship holds or not.
a function that takes in a tuple and returns
a boolean indicating whether a triplet is a multiple of the smallest
triplet or not.
a function that takes in an integer n and generates
the Pythagorean triplets as specified above. The function should
return a list of tuples.
The main portion of your program pythagore.py will read in the command
line input, call the last function described above, and print the
results one triplet per line.
My problem is that I am getting the same combination in different
orders for example: (5,12,13),(13,12,5)...etc
You're short on logic in your main routine. There is nothing to enforce that the triple comes in only one order: your x and y are interchangeable, and you guarantee that you'll check both.
Instead, force x < y with your loop limits, and then make sure you stop when the value of y or z gets too large to be viable. Note that this gets rid of your check for the sum of the three.
def tuplePyth(n):
list_=[]
for x in range(1, n):
for y in range(1, n):
for z in range (1, n):
if x+y+z<=n:
if smallestTrip(x, y, z)==False:
list_.append([x,y,z])
print (list_)
Instead:
def tuplePyth(n):
list_=[]
for x in range(1, n):
for y in range(x + 1, (n - x) // 2):
for z in range (y + 1, n - x - y):
if smallestTrip(x, y, z)==False:
list_.append([x,y,z])
print (list_)
Output with n=100:
[[5, 12, 13], [7, 24, 25], [8, 15, 17], [9, 40, 41], [15, 36, 39], [16, 30, 34], [20, 21, 29]]
Note that you still have a problem with smallestTrip: your check is not logically equivalent to "smallest triple". Instead, check that the three numbers are relatively prime. Since Stack Overflow allows only one question per posting, and the problem is readily researched on line, I'll leave that as an exercise for the student. :-)
An easy solution would be to keep track of the ones aleady found and add checks to avoid repeating them. The following uses a set to store the ones already produced and sorts the the elements in each triple so that their order doesn't matter.
def tuplePyth(n):
list_=[]
seen = set()
for x in range(1, n):
for y in range(1, n):
for z in range (1, n):
if tuple(sorted((x,y,z))) not in seen:
if x+y+z <= n:
if smallestTrip(x, y, z) == False:
list_.append([x,y,z])
seen.add((x,y,z))
print (list_)
You can use itertools:
import itertools.combinations_with_replacement as cwr
list_ = [triple for triple in cwr(range(n),3) if sum(triple)<n and not smallestTrip(triple)]
You can also force the numbers to be in order with the limits. Also, you can simplify finding a,b, c by realizing that if we define a to be the smallest number, then it must be smaller than n/3 (b and c will both be at least as large as a, so if a were larger than n/3, then the sum of a, b, and c would be more than n). Similarly, b must be smaller than n/2. Once you've found all the combinations of a and b, you can find all the c that are larger than b and smaller than n-a-b.
list_=[]
for x in range(1, n//3):
for y in range(x+1, n//2):
for z in range (x+y+1, n-x-y):
if not smallestTrip(x, y, z):
list_.append([x,y,z])
Because the three numbers are never the same you can just change the second and the third range from (1,n) to (x+1,n) and (y+1,n) correspondingly.

How to create multiple iterations

I wanna know how do I create an iteration which would iterate several or more than one parameters with different ranges
For example I wanna instantiate several object with iterations that all have different ranges.
Like there is a triangle function which takes three parameters, how do I use iterations to give one parameter one value from a range e.g. 50 to 100 and another parameter a different one altogether.
I know how to instantiate it over one parameter by:
for i in range(100):
But what do I do to instantiate it if it requires more than one parameter for a function.
Looks like you will want to use nested for loops. For example for your three parameter function:
# these are just example ranges, replace with what's meaningful for your problem
range_for_parameter_0 = range(100)
range_for_parameter_1 = range(150)
range_for_parameter_2 = range(75)
# start a nested for loop
for i in range_for_parameter_0:
for j in range_for_parameter_1:
for k in range_for_parameter_2:
# you can print something out to see exactly what's happening
# feel free to comment out the print statement
print('Calling triangle_function with parameters {},{},{}'.format(i,j,k))
# evaluate your triangle_function which takes 3 parameters
triangle_function(i,j,k)
You can try iterating over a Cartesian product.
Given
import itertools as it
def is_tri(a, b, c):
"""Return True if the sides make a triangle."""
a, b, c = sorted([a, b, c])
return (a + b) > c
ranges = range(1, 2), range(1, 3), range(1, 5)
Code
[sides for sides in it.product(*ranges) if is_tri(*sides)]
# [(1, 1, 1), (1, 2, 2)]
Details
If you are unfamiliar with list comprehensions, the latter is equivalent to the following code:
results = []
for x, y, z in it.product(*ranges):
if is_tri(x, y, z):
results.append((x, y, z))
results
# [(1, 1, 1), (1, 2, 2)]
Per your comment, is_tri() pre-sorts arguments, so you interchange them:
assert is_tri(13, 12, 5) == True
assert is_tri(12, 5, 13) == True
assert is_tri(5, 13, 12) == True
If your ranges are the same, you can simplify the input with the repeat parameter, e.g. it.product(range(1, 101), repeat=3).
You can't. You need three iterations.
for i in range(x):
...
for j in range(y):
...
for k in range(z):
...
See range() definition here

Project Euler getting smallest multiple in python

I am doing problem five in Project Euler: "2520 is the smallest number that can be divided by each of the numbers from 1 to 10 without any remainder.
What is the smallest positive number that is evenly divisible by all of the numbers from 1 to 20?"
I have constructed the following code which finds the correct value 2520 when using 1 - 10 as divisors but code seems to be going on forever when using 1 - 20.
Again I don't want the code just a pointer or two on where I am going wrong.
Thanks
def smallestDiv(n):
end=False
while end == False:
divisors = [x for x in range(1,21)] # get divisors
allDivisions = zip(n % i for i in divisors) # get values for n % all integers in divisors
check = all(item[0] == 0 for item in allDivisions ) # check if all values of n % i are equal to zero
if check: # if all values are equal to zero return n
end = True
return n
else: # else increase n by 1
n +=1
EDIT:
I used some code I found relating to LCM and used reduce to solve the problem:
def lcm(*values):
values = [value for value in values]
if values:
n = max(values)
m = n
values.remove(n)
while any( n % value for value in values ):
n +=m
return n
return 0
print reduce(lcm, range(1,21))
If a problem is hard, trying solving a simpler version. Here, how to calculate the lowest common multiple of two numbers. If you've read any number theory book (or think about prime factors), you can do that using the greatest common divisor function (as implemented by the Euclidean algorithm).
from fractions import gcd
def lcm(a,b):
"Calculate the lowest common multiple of two integers a and b"
return a*b//gcd(a,b)
Observing lcm(a,b,c) ≡ lcm(lcm(a,b),c) it's simple to solve your problem with Python's reduce function
>>> from functools import reduce
>>> reduce(lcm, range(1,10+1))
2520
>>> reduce(lcm, range(1,20+1))
232792560
You are doing a brute force search, so it can get arbitrary long. You should read about LCM (least common multiple) in order to code an efficient solution.(which I believe is 232792560)
int gcd(int m, int n)
{
int t;
while(n!=0)
{
t=n;
n=m%n;
m=t;
}
return m;
}
#include<stdio.h>
int main()
{
int i,n;
int long long lcm=1;
printf("Enter the range:");
scanf("%d",&n);
for (i=1;i<=n;i++)
{
lcm = (i*lcm)/gcd(i,lcm);
}
printf("smallest multiple : %uL",lcm);
}
This will give you all the factors in the numbers from 1 to 20:
from collections import Counter
def prime_factors(x):
def factor_this(x, factor):
factors = []
while x % factor == 0:
x /= factor
factors.append(factor)
return x, factors
x, factors = factor_this(x, 2)
x, f = factor_this(x, 3)
factors += f
i = 5
while i * i <= x:
for j in (2, 4):
x, f = factor_this(x, i)
factors += f
i += j
if x > 1:
factors.append(x)
return factors
def factors_in_range(x):
result = {}
for i in range(2, x + 1):
p = prime_factors(i)
c = Counter(p)
for k, v in c.items():
n = result.get(k)
if n is None or n < v:
result[k] = v
return result
print factors_in_range(20)
If you multiply these numbers together, as many times as they occur in the result, you get the smallest number that divides all the numbers from 1 to 20.
import operator
def product(c):
return reduce(operator.__mul__, [k ** v for k, v in c.items()], 1)
c = factors_in_range(20)
print product(c)
I think the answer by Colonel Panic is brilliant but I just wanted to expand on it a little bit without editing the concise answer.
The original solution is:
from fractions import gcd
def lcm(a,b):
"Calculate the lowest common multiple of two integers a and b"
return a*b//gcd(a,b)
>>> from functools import reduce
>>> reduce(lcm, range(1,10+1))
2520
>>> reduce(lcm, range(1,20+1))
232792560
I find it helpful to visualize what the reduce is doing for N = 10:
res = lcm(lcm(lcm(lcm(lcm(lcm(lcm(lcm(lcm(1, 2), 3), 4), 5), 6), 7), 8), 9), 10)
Which evaluates to:
# Evaluates lcm(1, 2)
res = lcm(lcm(lcm(lcm(lcm(lcm(lcm(lcm(lcm(1, 2), 3), 4), 5), 6), 7), 8), 9), 10)
# Evaluates lcm(2, 3)
res = lcm(lcm(lcm(lcm(lcm(lcm(lcm(lcm(2, 3), 4), 5), 6), 7), 8), 9), 10)
# Evaluates lcm(6, 4)
res = lcm(lcm(lcm(lcm(lcm(lcm(lcm(6, 4), 5), 6), 7), 8), 9), 10)
# Evaluates lcm(12, 5)
res = lcm(lcm(lcm(lcm(lcm(lcm(12, 5), 6), 7), 8), 9), 10)
# Evaluates lcm(60, 6)
res = lcm(lcm(lcm(lcm(lcm(60, 6), 7), 8), 9), 10)
# Evaluates lcm(60, 7)
res = lcm(lcm(lcm(lcm(60, 7), 8), 9), 10)
# Evaluates lcm(420, 8)
res = lcm(lcm(lcm(420, 8), 9), 10)
# Evaluates lcm(840, 9)
res = lcm(lcm(840, 9), 10)
# Evaluates lcm(2520, 10)
res = lcm(2520, 10)
print(res)
>>> 2520
The above gets across the intuition of what is happening. When we use reduce we "apply a rolling computation to sequential pairs of values in a list." It does this from the "inside-out" or from the left to the right in range(1, 20+1).
I think it is really important here to point out that you, as a programmer, are NOT expected to intuit this answer as being obvious or readily apparent. It has taken a lot of smart people a long time to learn a great deal about prime numbers, greatest common factors, and least common multiples, etc. However, as a software engineer you ARE expected to know the basics about number theory, gcd, lcm, prime numbers, and how to solve problems with these in your toolkit. Again, you are not expected to re-invent the wheel or re-discover things from number theory each time you solve a problem, but as you go about your business you should be adding tools to your problem solving toolkit.
import sys
def smallestDiv(n):
divisors = [x for x in range(1,(n+1))] # get divisors
for i in xrange(2520,sys.maxint,n):
if(all(i%x == 0 for x in divisors)):
return i
print (smallestDiv(20))
Takes approximately 5 seconds on my 1.7 GHZ i7
I based it on the C# code here:
http://www.mathblog.dk/project-euler-problem-5/
facList=[2]
prod=1
for i in range(3,1000):
n=i
for j in facList:
if n % j == 0:
n//=j
facList.append(n)
for k in facList:
prod*=k
print(prod)
I tried this method and compared my time to Colonel Panic's answer and mine started significantly beating his at about n=200 instead of n=20. His is much more elegant in my opinion, but for some reason mine is faster. Maybe someone with better understanding of algorithm runtime can explain why.
Last function finds the smallest number dividable by n, since the number should be multiples of factorial(n), you need to have a function that calculates factorial (can be done via math. method)
def factoral(n):
if n > 1:
return n * factoral(n - 1)
elif n >= 0:
return 1
else:
return -1
def isMultiple(a, b):
for i in range(1, b):
if a % i != 0:
return False
return True
def EnkucukBul(n):
for i in range(n, factoral(n) + 1, n):
if isMultiple(i, n):
return i
return -1
If you can use math module, you can use math.lcm
import math
def smallestMul():
return(math.lcm(1, 2, 3, ..., 20))

Establishing highest score for a set of combinations

I'm programming in python.
I have the data of the following form:
(A, B, C, D, E, F, G, H, I)
Segments of this data are associated with a score, for example:
scores:
(A, B, C, D) = .99
(A, B, C, E) = .77
(A, B, E) = .66
(G,) = 1
(I,) = .03
(H, I) = .55
(I, H) = .15
(E, F, G) = .79
(B,) = .93
(A, C) = .46
(D,) = .23
(D, F, G) = .6
(F, G, H) = .34
(H,) = .09
(Y, Z) = 1
We can get a score for this data as follows:
A B C E + D F G + H I = .77 * .6 * .55 = 0.2541
another possiblity is:
A B C D + E F G + H + I = .99 * .79 * .09 * .03 = 0.00211167
So, the first combination gives the higher score.
I wish to write an algorithm to establish for the data above the highest possible score. The members of data should no be repeated more than once. In other words:
A B C E + E F G + D + H I
is not valid. How would you recommend I go about solving this?
Thanks,
Barry
EDIT:
I should clarify that (H, I) != (I, H) and that (I, H) is not a subsegment for ABCDEFGHI, but is a subsegment of ABIHJ.
Another thing I should mention is that scores is a very large set (millions) and the segment on which we're calculating the score has an average length of around 10. Furthermore, the way in which I calculate the score might change in the future. Maybe I'd like to add the subsegments and take an average instead of multipling, who knows... for that reason it might be better to seperate the code which calculates the possible combinations from the actual calculation of the score. At the moment, I'm inclined to think that itertools.combinations might offer a good starting point.
Brute-forcing, by using recursion (for each segment in order, we recursively find the best score using the segment, and the best score not using the segment. A score of 0 is assigned if there is no possible combination of segments for the remaining items):
segment_scores = (('A', 'B', 'C', 'D'), .99), (('A', 'B', 'C', 'E'), .77) #, ...
def best_score_for(items, segments, subtotal = 1.0):
if not items: return subtotal
if not segments: return 0.0
segment, score = segments[0]
best_without = best_score_for(items, segments[1:], subtotal)
return max(
best_score_for(items.difference(segment), segments[1:], subtotal * score),
best_without
) if items.issuperset(segment) else best_without
best_score_for(set('ABCDEFGHI'), segment_scores) # .430155
This sounds like a NP-complete problem in disguise, a derivative of the Knapsack problem. This means you may have to walk through all possibilities to get an exact solution.
Even though... wait. Your values are between 0 and 1. That is the results can only get smaller of at most stay equal. Therefore the solution is trivial: Get the single group with the highest value, and be done with. (I'm aware that's probably not what you want, but you might have to add another condition, e.g. all elements have to be used..?)
A beginning of a brute force approach:
import operator
segment_scores = {(A, B, C, D): .99, (A, B, C, E): .77} #...
def isvalid(segments):
"""returns True if there are no duplicates
for i in range(len(segments)-1):
for element in segments[i]:
for j in range(len(segments)-i-1):
othersegment = segments[j+i+1]
if element in othersegment:
return False
return True
better way:
"""
flattened = [item for sublist in segments for item in sublist]
# http://stackoverflow.com/questions/952914/making-a-flat-list-out-of-list-of-lists-in-python
return len(set(flattened)) == len(flattened)
def getscore(segments):
"""
p = 1.0
for segment in segments:
p *= segment_scores[segment]
return p
better way:
"""
return reduce(operator.mul, [segment_scores[segment] for segment in segments])
Now, create all 2^(num segments) possible combinations of segments, check for each if it is valid, and if it is, compute the score while keeping the current winner and its highscore. Just a starting point...
OK just another update: There's lots of space for optimizations here, in particular since you're multiplying (I'm assuming now you have to use each element).
Since your total score never increases, you can drop any exploration path [segment0, segment1] that drops below the current high score because you'll only get works for any segment2.
If you don't just iterate over all possibilities but start by exploring all segment lists that contain the first segment (by recursively exploring all segment lists that contain in addition the second segment and so on), you can break as soon as, for example, the first and the second segment are invalid, i.e. no need to explore all possibilities of grouping (A,B,C,D) and (A,B,C,D,E)
Since multiplying hurts, trying to minimize the number of segments might be a suitable heuristic, so start with big segments with high scores.
First, I'd suggest assigning a unique symbol to the segments that make sense.
Then you probably want combinations of those symbols (or perhaps permutations, I'm sure you know your problem better than I do), along with a "legal_segment_combination" function you'd use to throw out bad possibilities - based on a matrix of which ones conflict and which don't.
>>> import itertools
>>> itertools.combinations([1,2,3,4], 2)
<itertools.combinations object at 0x7fbac9c709f0>
>>> list(itertools.combinations([1,2,3,4], 2))
[(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)]
>>>
Then max the valid possibilities that make it past legal_segment_combination().
First, you could take the logarithm of each score, since then the problem is to maximize the sum of the scores instead of the product. Then, you can solve the problem as an Assignment Problem, where to each data point you assign one sequence.

Categories

Resources