A trainee of mine made this short python script for a problem from Project Euler.
We are using Python 3.9.4
Problem: The series, 11 + 22 + 33 + ... + 1010 = 10405071317. Find the last ten digits of the series, 11 + 22 + 33 + ... + 10001000.
It isn't the best solution but in theory it should work for small values(I know it wont work for bigger values to inefficient and even double is too small.
Here is her Code:
import math
import numpy as np
x = np.arange(1,11)
a=[]
for y in x:
z = y**y
a.append(z)
b=sum(a)
print(a)
Output:
[1, 4, 27, 256, 3125, 46656, 823543, 16777216, 387420489, 1410065408]
The script isn't finished yet obviously but you can see that every power 11, 22, 33 are correct up to 1010 which does not return the correct value.
Do you see any reason for this problem it seem quite odd to me? I could not figure it out.
Yes I know this isn't the best solution and in the end the actually solution will be different but it would still be nice to solve this mystery.
You're probably on some kind of 32-bit systems where numpy defaults to 32-bit integers. This caused the result to be "truncated" to 32 bits. You can verify the result with the following expression:
(10 ** 10) % (2 ** 32)
Use Python built-in int and range unless you need the fancy stuff numpy provides. It's an arbitrary-precision integer implementation and should work for all kinds of integer calculation workload.
The simple solution is to not use numpy unnecessarily. The built-in range works very well:
import math
a=[]
for y in range(1, 11):
z = y**y
a.append(z)
b=sum(a)
print(a)
print(b)
In your case, numpy was using 32 bit integers, which overflowed when they reached their maximum value.
Related
In Python, at what stage should round be used? Take this example: 10 * math.log(x) + 10 If I want this to be rounded which should I use?
round(10 * math.log(x) + 5)
round(10 * math.log(x)) + 5
10 * round(math.log(x)) + 5
My guess would be that rounding early would run the fastest because more arithmetic happens with integers, which seem like they should be faster than floats. Rounding seems less likely to break if some later values change.
Would the answer be the same with int()?
Don't prematurely optimize. In many cases, it's not highly optimized mathematical functions which slow down programs, but the logic, structure or data types used in the calculation.
To that end, I recommend you use cProfile to identify bottlenecks. Note that cProfile itself has an overhead, so it is mostly useful for relative comparisons.
As per #glibdud's comment, you have to understand how rounding will affect your calculation. Try a few examples, or perform a test to see how your error may vary across a large number of inputs.
The earlier you are rounding, the more your result will be affected by this rounding. In my opinion, it all depends on the expectations of your program.
As for the difference between int() and round(), this thread answers it perfectly.
To be more specific to your question about performance : The round() function, that is a python built-in, is implemented in C, and you shouldn't really worry about performance as it will be very, very negligible.
Round function
That entirely depends upon how you want your answer to be formatted and interpreted. I would not be hung up on the speed of the round function though (unless the very minor performance gain is crucial to your program). I would think about what I'm trying to accomplish by rounding. If your goal is to produce an output that is rounded to the nearest integer (for simplicity) then I would encompass your entire arithmetic statement into the round function. If your goal is to only use rounded integers in your log calculations (maybe because you don't want to use floats) then you should only round the math.log(x) function. There is no technical reason why you would use either, but there is definitely a logical reason that you would want to choose either of your options.
Please note that the Python Math.log() function is the base of e by default. By your questions it's unclear what base you expect so I'll assume log base of 10 like Google does. In order to make it equivalent to the mathematical function provided the code would need to be:
import math
#assuming x equals 2
x = 2
function1 = round(10 * math.log(x,10) + 5)
function2 = round(10 * math.log(x,10)) + 5)
function3 = 10 * round(math.log(x,10)) + 5)
function4 = 10*math.log(x,10)+5
print(function1)
print(function2)
print(function3)
print(function4)
Now, assuming x = 2, the calculations for the mathematical equation is 8.01029995664
Looking at the printed output from the above code:
8
8
5
8.010299956639813
It clearly shows that functions 1,2 and 4 are roughly mathematically equivalent with function 3 being incorrect. This is because the round function uses Half and Above rule to round up. Math.log(2,10) results in 0.3, so when the round function happens it drops to zero.
As for the equivalence of int() and round() the link referenced by IMCoins is pretty good. The summation is that int() removes decimal values from a number and the round uses the half and above rule so it will act like the int() for anything less than x.5.
As for the speed question, if accuracy is non-negotiable it would be best to round upon completion of the answer due to the same reasons as why function 3 was wrong above. If you're fairly certain you can round safely at a step, then I agree with the answer above to use CProfile and find the bottlenecks
Hope this helps.
I have no clue, but let's see :)
import time
import math
n = 1000000
x = 5
def timeit(f):
t_0 = time.perf_counter()
for _ in range(n):
f()
t_1 = time.perf_counter()
print((t_1 - t_0)/ n)
def fun1():
round(10 * math.log(x) + 5)
def fun2():
round(10 * math.log(x)) + 5
def fun3():
10 * round(math.log(x)) + 5
[timeit(_) for _ in [fun1, fun2, fun3]]
On my computer the last one is slightly faster than the others.
This question already has answers here:
numpy.sum() giving strange results on large arrays
(4 answers)
Closed 5 years ago.
I am using numpy like this code
>>>import numpy as np
>>>a=np.arange(1,100000001).sum()
>>>a
987459712
I guess the result must be some like
5000000050000000
I noticed that until five numbers the result is ok.
Does someone knows what is happened?
regards
Numpy is not doing a mistake here. This phenomenon is known as integer overflow.
x = np.arange(1,100000001)
print(x.sum()) # 987459712
print(x.dtype) # dtype('int32')
The 32 bit integer type used in arange for the given input simply cannot hold 5000000050000000. At most it can take 2147483647.
If you explicitly use a larger integer or floating point data type you get the expected result.
a = np.arange(1, 100000001, dtype='int64').sum()
print(a) # 5000000050000000
a = np.arange(1.0, 100000001.0).sum()
print(a) # 5000000050000000.0
I suspect you are using Windows, where the data type of the result is a 32 bit integer (while for those using, say, Mac OS X or Linux, the data type is 64 bit). Note that 5000000050000000 % (2**32) = 987459712
Try using
a = np.arange(1, 100000001, dtype=np.int64).sum()
or
a = np.arange(1, 100000001).sum(dtype=np.int64)
P.S. Anyone not using Windows can reproduce the result as follows:
>>> np.arange(1, 100000001).sum(dtype=np.int32)
987459712
While trying to run this code:
l = 1000000
w = [1, 1]
for i in range(2, l):
w.append(w[-1] + w[-2])
computer hangs on and Blue screen of death appears. The only info which I get is about MEMORY MANAGEMENT. Problem occurs in version 2.7 of Python and 3.4 as well.
Code works good for l = 100000.
Can someone explain me exactly why? I am using Windows 10 64-bit, Python 2.7.8 64-bit from Active Python.
EDIT:
Here is R code which works well:
len <- 1000000
fibvals <- numeric(len)
fibvals[1] <- 1
fibvals[2] <- 1
for (i in 3:len) {
fibvals[i] <- fibvals[i-1]+fibvals[i-2]
}
The numbers you're producing are huger than you might realize. For example, here's the size in memory of the last one:
>>> a, b = 1, 1
>>> for i in xrange(2, 1000000):
... a, b = b, a+b
...
>>> sys.getsizeof(b)
92592
That's 92 kilobytes for one integer. All of them put together would be somewhere in the vicinity of 46-ish gigabytes, and you only have 16 gigabytes.
Your R code used 64-bit floating-point numbers, which promptly overflow to infinity at around the 1476th number.
The fibonacci numbers are HUGE. In R and other languages, integers overflow, so not that much memory is required. But in Python, integers simply don't overflow. The 1000000th fibonacci number would require terabytes of space. Once your OS uses up all the physical RAM, it'll switch over to hard disk swap. When it runs out of that, you get a kernel fault.
In python list take too much space in memory. try to use tuple
Example code:
l = 1000000
w = (1,1)
for i in xrange(2,l):
w = w + (w[-1] + w[-2],)
execution of program take time, that depend on your number of cpu's and main memory.
ok so I am feeling a little stupid for not knowing this, but a coworker asked so I am asking here: I have written a python algorithm that solves his problem. given x > 0 add all numbers together from 1 to x.
def intsum(x):
if x > 0:
return x + intsum(x - 1)
else:
return 0
intsum(10)
55
first what is this type of equation is this and what is the correct way to get this answer as it is clearly easier using some other method?
This is recursion, though for some reason you're labeling it like it's factorial.
In any case, the sum from 1 to n is also simply:
n * ( n + 1 ) / 2
(You can special case it for negative values if you like.)
Transforming recursively-defined sequences of integers into ones that can be expressed in a closed form is a fascinating part of discrete mathematics -- I heartily recommend Concrete Mathematics: A Foundation for Computer Science, by Ronald Graham, Donald Knuth, and Oren Patashnik (see. e.g. the wikipedia entry about it).
However, the specific sequence you show, fac(x) = fac(x - 1) + x, according to a famous anecdote, was solved by Gauss when he was a child in first grade -- the teacher had given the pupils the taksk of summing numbers from 1 to 100 to keep them quet for a while, but two minutes later there was young Gauss with the answer, 5050, and the explanation: "I noticed that I can sum the first, 1, and the last, 100, that's 101; and the second, 2, and the next-to-last, 99, and that's again 101; and clearly that repeats 50 times, so, 50 times 101, 5050". Not rigorous as proofs go, but quite correct and appropriate for a 6-years-old;-).
In the same way (plus really elementary algebra) you can see that the general case is, as many have already said, (N * (N+1)) / 2 (the product is always even, since one of the numbers must be odd and one even; so the division by two will always produce an integer, as desired, with no remainder).
Here is how to prove the closed form for an arithmetic progression
S = 1 + 2 + ... + (n-1) + n
S = n + (n-1) + ... + 2 + 1
2S = (n+1) + (n+1) + ... + (n+1) + (n+1)
^ you'll note that there are n terms there.
2S = n(n+1)
S = n(n+1)/2
I'm not allowed to comment yet so I'll just add that you'll want to be careful in using range() as it's 0 base. You'll need to use range(n+1) to get the desired effect.
Sorry for the duplication...
sum(range(10)) != 55
sum(range(11)) == 55
OP has asked, in a comment, for a link to the story about Gauss as a schoolchild.
He may want to check out this fascinating article by Brian Hayes. It not only rather convincingly suggests that the Gauss story may be a modern fabrication, but outlines how it would be rather difficult not to see the patterns involved in summing the numbers from 1 to 100. That in fact the only way to miss these patterns would be to solve the problem by writing a program.
The article also talks about different ways to sum arithmetic progressions, which is at the heart of OP's question. There is also an ad-free version here.
Larry is very correct with his formula, and its the fastest way to calculate the sum of all integers up to n.
But for completeness, there are built-in Python functions, that perform what you have done, on lists with arbitrary elements. E.g.
sum()
>>> sum(range(11))
55
>>> sum([2,4,6])
12
or more general, reduce()
>>> import operator
>>> reduce(operator.add, range(11))
55
Consider that N+1, N-1+2, N-2+3, and so on all add up to the same number, and there are approximately N/2 instances like that (exactly N/2 if N is even).
What you have there is called arithmetic sequence and as suggested, you can compute it directly without overhead which might result from the recursion.
And I would say this is a homework despite what you say.
I'm trying to compute this:
from scipy import *
3600**3400 * (exp(-3600)) / factorial(3400)
the error: unsupported long and float
Try using logarithms instead of working with the numbers directly. Since none of your operations are addition or subtraction, you could do the whole thing in logarithm form and convert back at the end.
Computing with numbers of such magnitude, you just can't use ordinary 64-bit-or-so floats, which is what Python's core runtime supports. Consider gmpy (do not get the sourceforge version, it's aeons out of date) -- with that, math, and some care...:
>>> e = gmpy.mpf(math.exp(1))
>>> gmpy.mpz(3600)**3400 * (e**(-3600)) / gmpy.fac(3400)
mpf('2.37929475533825366213e-5')
(I'm biased about gmpy, of course, since I originated and still participate in that project, but I'd never make strong claims about its floating point abilities... I've been using it mostly for integer stuff... still, it does make this computation possible!-).
You could try using the Decimal object. Calculations will be slower but you won't have trouble with really small numbers.
from decimal import Decimal
I don't know how Decimal interacts with the scipy module, however.
This numpy discussion might be relevant.
Well the error is coming about because you are trying to multiply
3600**3400
which is a long with
exp(-3600)
which is a float.
But regardless, the error you are receiving is disguising the true problem. It seems exp(-3600) is too big a number to fit in a float anyway. The python math library is fickle with large numbers, at best.
exp(-3600) is too smale, factorial(3400) is too large:
In [1]: from scipy import exp
In [2]: exp(-3600)
Out[2]: 0.0
In [3]: from scipy import factorial
In [4]: factorial(3400)
Out[4]: array(1.#INF)
What about calculate it step by step as a workaround(and it makes sense
to check the smallest and biggest intermediate result):
from math import exp
output = 1
smallest = 1e100
biggest = 0
for i,j in izip(xrange(1, 1701), xrange(3400, 1699, -1)):
output = output * 3600 * exp(-3600/3400) / i
output = output * 3600 * exp(-3600/3400) / j
smallest = min(smallest, output)
biggest = max(biggest, output)
print "output: ", output
print "smallest: ", smallest
print "biggest: ", biggest
output is:
output: 2.37929475534e-005
smallest: 2.37929475534e-005
biggest: 1.28724174494e+214