Below is a loop to find smallest common multiple of the numbers 1-20:
count=0
while not all(count % 1 == 0, count % 2 == 0,
count % 3 == 0, count % 4 == 0,
... count % 20 ==0):
count+=1
print count
It's quite tedious to type out so many conditions. This needs improvement, especially if the number is bigger than 20. However, being new to python, my knee-jerk reaction was:
while not all(count % range(1,21)==0):
...which doesn't work because python can't read minds. I've thought about putting a list inside the all(), but I'm not sure how to generate a list with variables in it.
.
Is there a shorthand to input a pattern of conditions like these, or is there a smarter way to do this that I'm missing?
Generator expressions:
while not all(count % i == 0 for i in range(1,21)):
Incidentally, this is pretty easy to work out by hand if you factorize the numbers 1..20 into prime factors. It's on the order of 200 million so the loop might take a while.
Use a generator expression:
while not all(count % x == 0 for x in range(1,21)):
You could also use any here:
while any(count % x for x in range(1,21)):
since 0 evaluates to False in Python.
A better solution to your current problem is to use a useful property of the least common multiple function (assuming you implemented it correctly):
lcm(a, b, c) == lcm(lcm(a, b), c)
It runs pretty quickly, even for fairly large inputs (the least common multiple of the first 20,000 numbers has 8,676 digits):
>>> %timeit reduce(lcm, range(1, 20001))
1 loops, best of 3: 240 ms per loop
Related
I'm trying to understand invariants in programming via real examples written in Python. I'm confused about where to place assert statements to check for invariants.
My research has shown different patterns for where to check for invariants. For examples:
before the loop starts
before each iteration of the loop
after the loop terminates
vs
... // the Loop Invariant must be true here
while ( TEST CONDITION ) {
// top of the loop
...
// bottom of the loop
// the Loop Invariant must be true here
}
// Termination + Loop Invariant = Goal
Below I have put code for an invariant example from a Maths book. There are two version, one using a function and one not. I expect it makes no difference, but I want to be thorough.
My questions are:
what is the minimum number of assert statemnts I need to assure program correctness, in keeping with the invariant?
which of the assert statments in my examples are redundant?
If there are multiple answers to the above question, which would be considered best practice?
Ideally I'd like to see a rewriting of my code to include best pratices and attention to any issues I may have overlooked in my work so far.
Any input much appreciated.
Here's the exercise:
E2. Suppose the positive integer n is odd. First Al writes the numbers 1, 2,..., 2n on the blackboard. Then he picks any two numbers a, b, erases them, and writes, instead, |a − b|. Prove that an odd number will remain at the end.
Solution. Suppose S is the sum of all the numbers still on the blackboard. Initially this sum is S = 1+2+···+2n = n(2n+1), an odd number. Each step reduces S by 2 min(a, b), which is an even number. So the parity of S is an invariant. During the whole reduction process we have S ≡ 1 mod 2. Initially the parity is odd. So, it will also be odd at the end.
import random
def invariant_example(n):
xs = [x for x in range(1, 2*n+1)]
print(xs)
assert sum(xs) % 2 == 1
while len(xs) >= 2:
assert sum(xs) % 2 == 1
a, b = random.sample(xs, 2)
print(f"a: {a}, b: {b}, xs: {xs}")
xs.remove(a)
xs.remove(b)
xs.append(abs(a - b))
assert sum(xs) % 2 == 1
assert sum(xs) % 2 == 1
return xs
print(invariant_example(5))
n = 5
xs = [x for x in range(1, 2*n+1)]
print(xs)
assert sum(xs) % 2 == 1
while len(xs) >= 2:
assert sum(xs) % 2 == 1
a, b = random.sample(xs, 2)
print(f"a: {a}, b: {b}, xs: {xs}")
xs.remove(a)
xs.remove(b)
xs.append(abs(a - b))
assert sum(xs) % 2 == 1
assert sum(xs) % 2 == 1
print(xs)
The only technically redundant assert statement you have is either of the ones in the loop. As in, you don't really need both of them.
For example:
If you have both of them, the first assert in the while loop will execute immediately after the second (as the code will return to the top of the loop). No values change in between those calls, so the first assert statement will always have the same result as the second.
Best practice would probably be to keep the assert at the top of the loop, to prevent code within the loop from executing if the loop invariant is violated.
EDIT: The final assert statement should also include the loop exit condition, as Kelly Bundy noted. I forgot to mention this above.
I would like it if someone could please explain to me why the following code has so much additional overhead. At 100k iterations, the speed is the same for either case (2.2 sec). When increasing to 1E6 iterations case "B" never finishes, while case "A" takes only 29 seconds.
Case "A"
while n is not 1:
foo
Case "B"
while n > 1:
foo
Complete code if of any help
def coll(n):
count = 0
# while n is not 1:
while n > 1:
count += 1
if not n % 2:
n /= 2
else:
n = 3*n + 1
return count
for x in range(1,100000):
count = coll(x)
First of all, you should use n > 1 or n != 1, not n is not 1. The fact that the latter works is an implementation detail, and obviously it's not working for you.
The reason it's not working is because there are values of x in your code that cause the Collatz sequence to go over the value of sys.maxint, which turns n into a long. Then, even when it ends up going back down to 1, it's actually 1L; a long, not an int.
Try using while n is not 1 and repr(n) != '1L':, and it'll work as you expect. But don't do that; just use n > 1 or n != 1.
Generally in Python, is is extremely fast to check, as it uses referential equality, so all that needs to be checked is that two objects have the same memory location. Note that the only reason your code works is that most implementations of Python generally maintain a pool of the smaller integers, so that every 1 always refers to the same 1, for example, where there may be multiple objects representing 1000. In those cases, n is 1000 would fail, where n == 1000 would work. Your code is relying on this integer pool, which is risky.
> involves a function call, which is fairly slow in Python: n > 1 translates to n.__gt__(1).
I am writing a simple Python script that generates 6 numbers at random (from 1 to 100) and a larger number (from 100 to 1000). My goals for this script are to:
Calculate all of the possible combinations using at least 2 numbers and any of the simple math operations (adding, subtracting, multiplying and dividing)
Output all of the combinations whose total is within 10 above or below the larger number as 'matches'
The list of numbers need not be exhausted, but repeating numbers isn't accepted. Plus I don't care too much if the code is efficient or not (if anyone decides to post any - I can post mine so far if anyone needs it - preferably post it in Python); as long as it works, I'm happy to optimize it.
I have attempted this myself, only to fail as the program quickly ended with a RunTime Error. I also tried putting in a counter to stop the loop after x passes (where x is a small number such as 50), but that just makes matters worse as it keeps on going infinitely.
I've also done some research, and I found that this (Computing target number from numbers in a set - the second to last answer) is the closest I found to meet my requirements but hasn't got quite there yet.
Thanks for the help! :-)
EDIT: Here is my code:
import random, time, operator
i = 0
numlist = []
while i != 6:
number = random.randint(1, 100)
numlist.append(number)
i += 1
largenumber = random.randint(100, 1000)
print(numlist)
print(largenumber)
def operationTesting():
a, c, m, total = 0, 0, 0, 0
totalnums = 0
operators = ['+', '-', '*', '/']
while total != largenumber:
for a in numlist[m]:
for c in numlist[m+1]:
print(a)
print(c)
if a == c:
operationTesting()
else:
b = random.choice(operators)
if b == '+':
summednums = operator.add(int(a), int(c))
print(summednums)
totalnums = totalnums + summednums
elif b == '-':
summednums = operator.sub(int(a), int(c))
print(summednums)
totalnums = totalnums + summednums
elif b == '*':
summednums = operator.mul(int(a), int(c))
print(summednums)
totalnums = totalnums + summednums
elif b == '/':
summednums = operator.floordiv(int(a), int(c))
print(summednums)
totalnums = totalnums + summednums
print(totalnums)
SystemExit(None)
operationTesting()
A very neat way to do it is using Reverse Polish Notation or Postfix notation. This notation avoids the need for brackets that you would probably want if you were doing it using conventional arithmetic with operator precedence etc.
You can do this with brute force if you are not too bothered about time efficiency. You need to consider what you want to do with division too - if two numbers do not divide exactly, do you want to return the result as 'invalid' in some way (I guess so), or really return a floored division? Note the latter might give you some invalid answers...
Consider the test case of numlist = [1,2,3,4,5,6]. In RPN, we could do something like this
RPN Equivalent to
123456+++++ (1+(2+(3+(4+(5+6)))))
123456++++- (1-(2+(3+(4+(5+6)))))
123456+++-+ (1+(2-(3+(4+(5+6)))))
...
12345+6+-++ (1+(2+(3-((4+5)+6))))
12345+6-+++ (1+(2+(3+((4+5)-6))))
...
And so on. You can probably see that with sufficient combinations, you can get any combinations of numbers, operators and brackets. The brackets are important - to take only 3 numbers obviously
1+2*6
is normally interpreted
(1 + (2*6)) == 13
and is quite different to
((1+2)*6) == 18
In RPN, these would be 126*+ and 12+6* respectively.
So, you've got to generate all your combinations in RPN, then develop an RPN calculator to evaluate them.
Unfortunately, there are quite a lot of permutations with 6 numbers (or any subset thereof). First you can have the numbers in any order, thats 6! = 720 combinations. You will always need n-1 == 5 operators and they can be any one of the 4 operators. So that's 4**5 == 1024 permutations. Finally those 5 operators can be in any one of 5 positions (after first pair of numbers, after first 3, after 4 and so on). You can have maximum 1 operator in the first position, two in the second and so on. That's 5! == 120 permutations. So in total you have 720*1024*120 == 88473600 permutations. Thats roughly 9 * 10**7 Not beyond the realms of computation at all, but it might take 5 minutes or so to generate them all on a fairly quick computer.
You could significantly improve on this by "chopping" the search tree
Loads of the RPN combinations will be arithmetically identical (e.g. 123456+++++ == 12345+6++++ == 1234+5+6+++ etc) - you could use some prior knowledge to improve generate_RPN_combinations so it didn't generate them
identifying intermediate results that show certain combinations could never satisfy your criterion and not exploring any further combinations down that road.
You then have to send each string to the RPN calculator. These are fairly easy to code and a typical programming exercise - you push values onto a stack and when you come to operators, pop the top two members from the stack, apply the operator and push the result onto the stack. If you don't want to implement that - google minimal python rpn calculator and there are resources there to help you.
Note, you say you don't have to use all 6 numbers. Rather than implementing that separately, I would suggest checking any intermediate results when evaluating the combinations for all 6 numbers, if they satisfy the criterion, keep them too.
I expected this Python implementation of ThreeSum to be slow:
def count(a):
"""ThreeSum: Given N distinct integers, how many triples sum to exactly zero?"""
N = len(a)
cnt = 0
for i in range(N):
for j in range(i+1, N):
for k in range(j+1, N):
if sum([a[i], a[j], a[k]]) == 0:
cnt += 1
return cnt
But I was shocked that this version looks pretty slow too:
def count_python(a):
"""ThreeSum using itertools"""
return sum(map(lambda X: sum(X)==0, itertools.combinations(a, r=3)))
Can anyone recommend a faster Python implementation? Both implementations just seem so slow...
Thanks
...
ANSWER SUMMARY:
Here is how the runs of all the various versions provided in this thread of the O(N^3) (for educational purposes, not used in real life) version worked out on my machine:
56 sec RUNNING count_slow...
28 sec RUNNING count_itertools, written by Ashwini Chaudhary...
14 sec RUNNING count_fixed, written by roippi...
11 sec RUNNING count_itertools (faster), written by Veedrak...
08 sec RUNNING count_enumerate, written by roippi...
*Note: Needed to modify Veedrak's solution to this to get the correct count output:
sum(1 for x, y, z in itertools.combinations(a, r=3) if x+y==-z)
Supplying a second answer. From various comments, it looks like you're primarily concerned about why this particular O(n**3) algorithm is slow when being ported over from java. Let's dive in.
def count(a):
"""ThreeSum: Given N distinct integers, how many triples sum to exactly zero?"""
N = len(a)
cnt = 0
for i in range(N):
for j in range(i+1, N):
for k in range(j+1, N):
if sum([a[i], a[j], a[k]]) == 0:
cnt += 1
return cnt
One major problem that immediately pops out is that you're doing something your java code almost certainly isn't doing: materializing a 3-element list just to add three numbers together!
if sum([a[i], a[j], a[k]]) == 0:
Yuck! Just write that as
if a[i] + a[j] + a[k] == 0:
Some benchmarking shows that you're adding 50%+ overhead just by doing that. Yikes.
The other issue here is that you're using indexing where you should be using iteration. In python try to avoid writing code like this:
for i in range(len(some_list)):
do_something(some_list[i])
And instead just write:
for x in some_list:
do_something(x)
And if you explicitly need the index that you're on (as you actually do in your code), use enumerate:
for i,x in enumerate(some_list):
#etc
This is, in general, a style thing (though it goes deeper than that, with duck typing and the iterator protocol) - but it is also a performance thing. In order to look up the value of a[i], that call is converted to a.__getitem__(i), then python has to dynamically resolve a __getitem__ method lookup, call it, and return the value. Every time. It's not a crazy amount of overhead - at least on builtin types - but it adds up if you're doing it a lot in a loop. Treating a as an iterable, on the other hand, sidesteps a lot of that overhead.
So taking that change in mind, you can rewrite your function once again:
def count_enumerate(a):
cnt = 0
for i, x in enumerate(a):
for j, y in enumerate(a[i+1:], i+1):
for z in a[j+1:]:
if x + y + z == 0:
cnt += 1
return cnt
Let's look at some timings:
%timeit count(range(-100,100))
1 loops, best of 3: 394 ms per loop
%timeit count_fixed(range(-100,100)) #just fixing your sum() line
10 loops, best of 3: 158 ms per loop
%timeit count_enumerate(range(-100,100))
10 loops, best of 3: 88.9 ms per loop
And that's about as fast as it's going to go. You can shave off a percent or so by wrapping everything in a comprehension instead of doing cnt += 1 but that's pretty minor.
I've toyed around with a few itertools implementations but I actually can't get them to go faster than this explicit loop version. This makes sense if you think about it - for every iteration, the itertools.combinations version has to rebind what all three variables refer to, whereas the explicit loops get to "cheat" and rebind the variables in the outer loops far less often.
Reality check time, though: after everything is said and done, you can still expect cPython to run this algorithm an order of magnitude slower than a modern JVM would. There is simply too much abstraction built in to python that gets in the way of looping quickly. If you care about speed (and you can't fix your algorithm - see my other answer), either use something like numpy to spend all of your time looping in C, or use a different implementation of python.
postscript: pypy
For fun, I ran count_fixed on a 1000-element list, on both cPython and pypy.
cPython:
In [81]: timeit.timeit('count_fixed(range(-500,500))', setup='from __main__ import count_fixed', number = 1)
Out[81]: 19.230753898620605
pypy:
>>>> timeit.timeit('count_fixed(range(-500,500))', setup='from __main__ import count_fixed', number = 1)
0.6961538791656494
Speedy!
I might add some java testing in later to compare :-)
Algorithmically, both versions of your function are O(n**3) - so asymptotically neither is superior. You will find that the itertools version is in practice somewhat faster since it spends more time looping in C rather than in python bytecode. You can get it down a few more percentage points by removing map entirely (especially if you're running py2) but it's still going to be "slow" compared to whatever times you got from running it in a JVM.
Note that there are plenty of python implementations other than cPython out there - for loopy code, pypy tends to be much faster than cPython. So I wouldn't write python-as-a-language off as being slow, necessarily, but I would certainly say that the reference implementation of python is not known for its blazing loop speed. Give other python flavors a shot if that's something you care about.
Specific to your algorithm, an optimization will let you drop it down to O(n**2). Build up a set of your integers, s, and build up all pairs (a,b). You know that you can "zero out" (a+b) if and only if -(a+b) in (s - {a,b}).
Thanks to #Veedrak: unfortunately constructing s - {a,b} is a slow O(len(s)) operation itself - so simply check if -(a+b) is equal to either a or b. If it is, you know there's no third c that can fulfill a+b+c == 0 since all numbers in your input are distinct.
def count_python_faster(a):
s = frozenset(a)
return sum(1 for x,y in itertools.combinations(a,2)
if -(x+y) not in (x,y) and -(x+y) in s) // 3
Note the divide-by-three at the end; this is because each successful combination is triple-counted. It's possible to avoid that but it doesn't actually speed things up and (imo) just complicates the code.
Some timings for the curious:
%timeit count(range(-100,100))
1 loops, best of 3: 407 ms per loop
%timeit count_python(range(-100,100)) #this is about 100ms faster on py3
1 loops, best of 3: 382 ms per loop
%timeit count_python_faster(range(-100,100))
100 loops, best of 3: 5.37 ms per loop
You haven't stated which version of Python you're using.
In Python 3.x, a generator expression is around 10% faster than either of the two implementations you listed. Using a random array of 100 numbers in the range [-100,100] for a:
count(a) -> 8.94 ms # as per your implementation
count_python(a) -> 8.75 ms # as per your implementation
def count_generator(a):
return sum((sum(x) == 0 for x in itertools.combinations(a,r=3)))
count_generator(a) -> 7.63 ms
But other than that, it's the shear amount of combinations that's dominating execution time - O(N^3).
I should add the times shown above are for loops of 10 calls each, averaged over 10 loops. And yeah, my laptop is slow too :)
For any N, let f(N) be the last five
digits before the trailing zeroes in
N!. For example,
9! = 362880 so f(9)=36288
10! = 3628800 so f(10)=36288
20! = 2432902008176640000 so f(20)=17664
Find f(1,000,000,000,000)
I've successfully tackled this question for the given examples, my function can correctly find f(9), f(10), etc. However it struggles with larger numbers, especially the number the problem asks for - f(10^12).
My current optimizations are as follows: I remove trailing zeros from the multiplier and the sum, and shorten the sum to 5 digits after each multiplication. The code in python is as follows:
def SFTR (n):
sum, a = 1, 2
while a < n+1:
mul = int(re.sub("0+$","",str(a)))
sum *= mul
sum = int(re.sub("0+$","",str(sum))[-5:])
a += 1
return sum
Can anyone tell me why this function is scaling so largely, and why its taking so long. Also, if anyone could hint me in the correct direction to optimize my algorithm. (a name of the general topic will suffice) Thank you.
Update:
I have made some changes for optimization and it is significantly faster, but it is still not fast enough for f(10^12). Can anyone tell me whats making my code slow or how to make it faster?
def SFTR (n):
sum, a = 1, 2
while a < n+1:
mul = a
while(mul % 10 == 0): mul = mul/10
mul = mul % 100000
sum *= mul
while(sum % 10 == 0): sum = sum/10
sum = sum % 100000
a += 1
return sum
mul can get very big. Is that necessary? If I asked you to compute the last 5 non-zero digits of 1278348572934847283948561278387487189900038 * 38758
by hand, exactly how many digits of the first number do you actually need to know?
Building strings frequently is expensive. I'd rather use the modulo operator when truncating to the last five digits.
python -m timeit 'x = str(111111111111111111111111111111111)[-5:]'
1000000 loops, best of 3: 1.09 usec per loop
python -m timeit 'x = 111111111111111111111111111111111 % 100000'
1000000 loops, best of 3: 0.277 usec per loop
The same applies to stripping the trailing zeros. There should be a more efficient way to do this, and you probably don't have to do it in every single step.
I didn't check your algorithm for correctness, though, it's just a hint for optimization.
In fact, you might even note that there are only a restricted set of possible trailing non-zero digits. If I recall correctly, there are only a few thousand possible trailing non-zero digit combinations, when you look only at the last 5 digits. For example, is it possible for the final non-zero digit ever to be odd? (Ignore the special cases of 0! and 1! here.)