Related
This question already has an answer here:
Is Python incorrectly handling this "arbitrary precision integer"?
(1 answer)
Closed 4 years ago.
This is my code in python but the answer it gives is not correct according to projecteuler.net.
a = 2**1000
total = 0
while a >= 1:
temp = a % 10
total = total + temp
a = int(a/10)
print(total)
It gives an output 1189. Am I making some mistake?
Your logic is fine. The problem is that 2 ** 1000 is too big for all the digits to fit into a float, so the number gets rounded when you do a = int(a/10). A Python float only has 53 bits of precision, you can read about it in the official tutorial article: Floating Point Arithmetic: Issues and Limitations, and on Wikipedia: Double-precision floating-point format. Also see Is floating point math broken?.
This is 2 ** 1000
10715086071862673209484250490600018105614048117055336074437503883703510511249361224931983788156958581275946729175531468251871452856923140435984577574698574803934567774824230985421074605062371141877954182153046474983581941267398767559165543946077062914571196477686542167660429831652624386837205668069376
But print(format(2**1000 / 10, 'f')) gives us this:
1071508607186267380429101388171324322483904737701556012694158454746129413355810495130824665231870799934327252763807170417136096893411236061781867579266085792026680021578208129860941078404632071895251811587214122307926025420797364998626502669722909817741077261714977537247847201331018951634334519394304.000000
You can see that the digits start going wrong after 10715086071862673.
So you need to use integer arithmetic, which in Python has arbitrary precision (only limited by how much memory Python can access). To do that, use the // floor division operator.
a = 2**1000
total = 0
while a >= 1:
temp = a % 10
total = total + temp
a = a // 10
print(total)
output
1366
We can condense that code a little by using augmented assignment operators.
a = 2**1000
total = 0
while a:
total += a % 10
a //= 10
print(total)
Here's a faster way. Convert a to a string then convert each digit back to int and sum them. I use bit shifting to compute a because it's faster than exponentiation.
print(sum(int(u) for u in str(1 << 1000)))
just wondering if a better solution exists for this sort of problem.
We know that for a X/Y percentage split of an even number we can get an exact split of the data - for example for data size 10:
10 * .6 = 6
10 * .4 = 4
10
Splitting data this way is easy, and we can guarantee we have all of the data and nothing is lost. However where I am struggling is on less friendly numbers - take 11
11 * .6 = 6.6
11 * .4 = 4.4
11
However we can't index into an array at i = 6.6 for example. So we have to decide how to to do this. If we take JUST the integer portion we lose 1 data point -
First set = 0..6
Second set = 6..10
This would be the same case if we floored the numbers.
However, if we take the ceiling of the numbers:
First set = 0..7
Second set = 7..12
And we've read past the end of our array.
This gets even worse when we throw in a 3rd or 4th split (30,30,20,20 for example).
Is there a standard splitting procedure for these kinds of problems? Is data loss accepted? It seems like data loss would be unacceptable for dependent data, such as time series.
Thanks!
EDIT: The values .6 and .4 are chosen by me. They could be any two numbers that sum to 1.
First of all, notice that your problem is not limited to odd-sized arrays as you claim, but any-sized arrays. How would you make the 56%-44% split of a 10 element array? Or a 60%-40% split of a 4 element array?
There is no standard procedure. In many cases, programmers do not care that much about an exact split and they either do it by flooring or rounding one quantity (the size of the first set), while taking the complementary (array length - rounded size) for the other (the size of the second).
This might be ok in most cases when this is an one-off calculation and accuracy is not required. You have to ask yourself what your requirements are. For example: are you taking thousands of 10-sized arrays and each time you are splitting them 56%-44% doing some calculations and returning a result? You have to ask yourself what accuracy do you want. Do you care if your result ends up being
the 60%-40% split or the 50%-50% split?
As another example imagine that you are doing a 4-way equal split of 25%-25%-25%-25%. If you have 10 elements and you apply the rounding technique you end up with 3,3,3,1 elements. Surely this will mess up your results.
If you do care about all these inaccuracies then the first step is consider whether you can to adjust either the array size and/or the split ratio(s).
If these are set in stone then the only way to have an accurate split of any ratios of any sized array is to make it probabilistic. You have to split multiple arrays for this to work (meaning you have to apply the same split ratio to same-sized arrays multiple times). The more arrays the better (or you can use the same array multiple times).
So imagine that you have to make a 56%-44% split of a 10 sized array. This means that you need to split it in 5.6 elements and 4.4 elements on the average.
There are many ways you can achieve a 5.6 element average. The easiest one (and the one with the smallest variance in the sequence of tries) is to have 60% of the time a set with 6 elements and 40% of the time a set that has 5 elements.
0.6*6 + 0.4*5 = 5.6
In terms of code this is what you can do to decide on the size of the set each time:
import random
array_size = 10
first_split = 0.56
avg_split_size = array_size * first_split
floored_split_size = int(avg_split_size)
if avg_split_size > floored_split_size:
if random.uniform(0,1) > avg_split_size - floored_split_size:
this_split_size = floored_split_size
else:
this_split_size = floored_split_size + 1
else:
this_split_size = avg_split_size
You could make the code more compact, I just made an outline here so you get the idea. I hope this helps.
Instead of using ciel() or floor() use round() instead. For example:
>>> round(6.6)
7.0
The value returned will be of float type. For getting the integer value, type-cast it to int as:
>>> int(round(6.6))
7
This will be the value of your first split. For getting the second split, calculate it using len(data) - split1_val. This will be applicable in case of 2 split problem.
In case of 3 split, take round value of two split and take the value of 3rd split as the value of len(my_list) - val_split_1 - val_split2
In a Generic way, For N split:
Take the round() value of N-1 split. And for the last value, do len(data) - "value of N round() values".
where len() gives the length of the list.
Let's first consider just splitting the set into two pieces.
Let n be the number of elements we are splitting, and p and q be the proportions, so that
p+q == 1
I assert that the parts after the decimal point will always sum to either 1 or 0, so we should use floor on one and ceil on the other, and we will always be right.
Here is a function that does that, along with a test. I left the print statements in but they are commented out.
def simpleSplitN(n, p, q):
"split n into proportions p and q and return indices"
np = math.ceil(n*p)
nq = math.floor(n*q)
#print n, sum([np, nq]) #np and nq are the proportions
return [0, np] #these are the indices we would use
#test for simpleSplitN
for i in range(1, 10):
p = i/10.0;
q = 1-p
simpleSplitN(37, p, q);
For the mathematically inclined, here is the proof that the decimal proportions will sum to 1
-----------------------
We can express p*n as n/(1/p), and so by the division algorithm we get integers k and r
n == k*(1/p) + r with 0 <= r < (1/p)
Thus r/(1/p) == p*r < 1
We can do exactly the same for q, getting
q*r < 1 (this is a different r)
It is important to note that q*r and p*r are the part after the decimal when we divide our n.
Now we can add them together (we've added subscripts now)
0 <= p*(r_1) < 1
0 <= q*(r_2) < 1
=> 0 < p*r + q*r == p*n + q*n + k_1 + k_2 == n + k_1 + k_2 < 2
But by closure of the integers, n + k_1 + k_2 is an integer and so
0 < n + k_1 + k_2 < 2
means that p*r + q*r must be either 0 or 1. It will only be 0 in the case that our n is divided evenly.
Otherwise we can now see that our fractional parts will always sum to 1.
-----------------------
We can do a very similar (but slightly more complicated) proof for splitting n into an arbitrary number (say N) parts, but instead of them summing to 1, they will sum to an integer less than N.
Here is the general function, it has uncommented print statements for verification purposes.
import math
import random
def splitN(n, c):
"""Compute indices that can be used to split
a dataset of n items into a list of proportions c
by first dividing them naively and then distributing
the decimal parts of said division randomly
"""
nc = [n*i for i in c];
nr = [n*i - int(n*i) for i in c] #the decimal parts
N = int(round(sum(nr))) #sum of all decimal parts
print N, nc
for i in range(0, len(nc)):
nc[i] = math.floor(nc[i])
for i in range(N): #randomly distribute leftovers
nc[random.randint(1, len(nc)) - 1] += 1
print n,sum(nc); #nc now contains the proportions
out = [0] #compute a cumulative sum
for i in range(0, len(nc) - 1):
out.append(out[-1] + nc[i])
print out
return out
#test for splitN with various proportions
c = [.1,.2,.3,.4]
c = [.2,.2,.2,.2,.2]
c = [.3, .2, .2, .3]
for n in range( 10, 40 ):
print splitN(n, c)
If we have leftovers, we will never get an even split, so we distribute them randomly, like #Thanassis said. If you don't like the dependency on random, then you could just add them all at the beginning or at even intervals.
Both of my functions output indices but they compute proportions and thus could be slightly modified to output those instead per user preference.
I'm doing some probability calculation.
In one of my task, I need to multiply the combination number of choose 8000 samples from 10000 items with 0.8**8000.
The combination number is a long long-number, and with the help of numpy, I get the result of 0.8**8000 as 5.2468172239242176864e-776.
But when I try to multiply these two numbers, I got [9] 34845 segmentation fault ipython -i.
How can I do such multiplication then?
PS: This is a piece of my code
import numpy
d2 = numpy.float128(0.8) ** 8000
d1 = 165555575235503558460892983752748337696863078099010763950122624527927836980322780662408249953188062227721112100054260160204180655980717428736444016909193193353770953722788106404786520413339850951599929567643032803416164290936680088121145665954509987077953596641237451927908536624592636591471456488142060812180933761408708169972797751139799352908109763166895772281109195968567911923343187466596002627570139321755043803267091330804414889831229832744256038117150720178689066894068507531026417815624234453195871008113238128934831837842040515600131726096039123279876153916504647241693083829553081901075278042326502699324012014817969085443550523855284341221708045253558716789811929298590803855947461554713178815399150688529048306222786951038548880400191620565711291586700534540755526276938422405001345270278335726581375322976014611332999126216550500951669985289322635729053541565465940744524663726205818866513444952048185208697438054246674199211750006230637806394882672053335493831407089830994135058867370833787098758113596190447219426121568324685764151601296948654893782399960327514764114467176417125060133454019708700782282480571935020898204763471121684913190735908414301826140125010936910161942130277906874552721346626800201093026689035996876035329180150478191582393837824731994055511844267891121846403164857127885959745644323971338513739214928092232132691519007718752719466750891748327404893783451436251805894736392433617289459646429204124129760273396235033220480921175386059331059354409267348067375581516003852060360378571075522650956157791058846993826792047806030332676423336065499519953076910418838626376480202828151673161942289092221049283902410699951912366163469099917310239336454637062482599733606299329923589714875696509548029668358723465427602758225427644633549944802010973352599970041918971524450218727345622721744933664742499521140235707102217164259438766026322532351208348119475549696983427008567651685921355966036780080415723688044325099562693124488758728102729947753752228785786200998322978801432511608341549234067324280214361346940194251357867820535466891356019219904248859277399657389914429390105240751239760865282709465029549690591863591028864648910033430400L
print d1 * d2
When multiplying an extremely large number by an extremely small number, working with floats can introduce huge inaccuracies. In your case, the magnitude of the numbers is causing overflow errors, so you have bigger problems than just inaccuracies!
Whenever you find yourself in this situation, it can be useful to first check if it is possible to stay in the integer domain, and "massage" the numbers a little first. In your case, it is possible and I'll explain how below.
One operand of the multiplication, the extremely large number, is 8000 samples from 10000 items. Use the closed form equation for the number of combinations, where your sample size n is 10000 and the subset size r is 8000. Exclam (!) here is factorial, which you can find in math.factorial in python.
C(n,r) = n! / r! (n - r)!
The other operand 0.8 ** 8000 is the extremely small number, which by index laws is equal to:
8**8000 / 10**8000
So when we multiply these two numbers together, the answer we want is:
10000! * 8**8000
--------------------------
8000! * 2000! * 10**8000
Let's call this number x and then take logarithms of both sides. Working in the log domain will transform multiplications into additions, and divisions into subtractions, making things more manageable.
from math import log, factorial
numerator = log(factorial(10000)) + 8000*log(8)
denominator = log(factorial(8000)) + log(factorial(2000)) + 8000*log(10)
log_x = numerator - denominator
Now these numbers are of a magnitude that is usable in python.
You will find that log_x is equal to approximately 3214. You now only need to observe that exp(log_x) == x to find your answer. It is a very large, but finite, number.
Arbitrary-precision integers aren't really the way to go for this problem, since you're destroying any precision you had by calling log, so I'll just let scipy.special.gammaln speak for itself (but see my edit below):
from math import log, factorial
from scipy.special import gammaln
def comp_integral(n, r, p, q):
numerator = log(factorial(n)) + r*log(8)
denominator = log(factorial(r)) + log(factorial(n-r)) + r*log(q)
return numerator - denominator
def comp_gamma(n, r, p, q):
comb = gammaln(n+1) - gammaln(n-r+1) - gammaln(r+1)
expon = r*(log(p) - log(q))
return comb+expon
In [220]: comp_integral(10000, 8000, 8, 10)
Out[220]: 3214.267963130871
In [221]: comp_gamma(10000, 8000, 8, 10)
Out[221]: 3214.2679631308811
In [222]: %timeit comp_integral(10000, 8000, 8, 10)
10 loops, best of 3: 80.3 ms per loop
In [223]: %timeit comp_gamma(10000, 8000, 8, 10)
100000 loops, best of 3: 11.4 µs per loop
Note that the outputs are identical up to 14 digits, but the gammaln version is almost 8000 times faster. If you're going to do this a lot, this will count.
EDIT: What gammaln does is to compute the natural log of the gamma function. The gamma function can be thought of as a generalization of factorial, in that factorial(n) == gamma(n+1). So comb(n,r) == gamma(n+1)/(gamma(n-r+1)*gamma(r+1)). Then taking logs turns it into the form above.
Gamma also has values for fractional inputs and for negative numbers. That doesn't really matter here though.
I maintain the gmpy2 library and it can do this very easily.
>>> import gmpy2
>>> gmpy2.comb(10000,8000) * gmpy2.mpfr('0.8')**8000
mpfr('8.6863984366232171e+1395')
Building off of wim's great answer, you can also store this number as a Fraction by building a list of prime factors, doing any cancellations and multiplying everything together.
I've included a rather naive implementation for this problem. It returns a fraction in less than a minute as is but if you implement slightly smarter factorization you can surely make it even faster.
from collections import Counter
from fractions import Fraction
import gmpy2 as gmpy
def get_factors(n):
factors = Counter()
factor = 1
while n != 1:
factor = int(gmpy.next_prime(factor))
while not n % factor:
n //= factor
factors[factor] += 1
return factors
factors = Counter()
# multiply by 10000!
for i in range(10000):
factors += get_factors(i+1)
# multiply by 8^8000
factors[2] += 3*8000
#divide by 2000!
for i in range(2000):
factors -= get_factors(i+1)
#divide by 8000!
for i in range(8000):
factors -= get_factors(i+1)
# divide by 10^8000
factors[2] -= 8000
factors[5] -= 8000
# build Fraction
numer = 1
denom = 1
for f,c in factors.items():
if c>0:
numer *= f**c
elif c<0:
denom *= f**-c
frac = Fraction(numer, denom)
Looks like it's around 8.686*10^1395
I'm trying to generate 0 or 1 with 50/50 chance of any using random.uniform instead of random.getrandbits.
Here's what I have
0 if random.uniform(0, 1e-323) == 0.0 else 1
But if I run this long enough, the average is ~70% to generate 1. As seem here:
sum(0 if random.uniform(0, 1e-323) == 0.0
else 1
for _ in xrange(1000)) / 1000.0 # --> 0.737
If I change it to 1e-324 it will always be 0. And if I change it to 1e-322, the average will be ~%90.
I made a dirty program that will try to find the sweet spot between 1e-322 and 1e-324, by dividing and multiplying it several times:
v = 1e-323
n_runs = 100000
target = n_runs/2
result = 0
while True:
result = sum(0 if random.uniform(0, v) == 0.0 else 1 for _ in xrange(n_runs))
if result > target:
v /= 1.5
elif result < target:
v *= 1.5 / 1.4
else:
break
print v
This end ups with 4.94065645841e-324
But it still will be wrong if I ran it enough times.
Is there I way to find this number without the dirty script I wrote? I know that Python has a intern min float value, show in sys.float_info.min, which in my PC is 2.22507385851e-308. But I don't see how to use it to solve this problem.
Sorry if this feels more like a puzzle than a proper question, but I'm not able to answer it myself.
I know that Python has a intern min float value, show in sys.float_info.min, which in my PC is 2.22507385851e-308. But I don't see how to use it to solve this problem.
2.22507385851e-308 is not the smallest positive float value, it is the smallest positive normalized float value. The smallest positive float value is 2-52 times that, that is, near 5e-324.
2-52 is called the “machine epsilon” and it is usual to call the “min” of a floating-point type a value that is nether that which is least of all comparable values (that is -inf), nor the least of finite values (that is -max), nor the least of positive values.
Then, the next problem you face is that random.uniform is not uniform to that level. It probably works ok when you pass it a normalized number, but if you pass it the smallest positive representable float number, the computation it does with it internally may be very approximative and lead it to behave differently than the documentation says. Although it appears to work surprisingly ok according to the results of your “dirty script”.
Here's the random.uniform implementation, according to the source:
from os import urandom as _urandom
BPF = 53 # Number of bits in a float
RECIP_BPF = 2**-BPF
def uniform(self, a, b):
"Get a random number in the range [a, b) or [a, b] depending on rounding."
return a + (b-a) * self.random()
def random(self):
"""Get the next random number in the range [0.0, 1.0)."""
return (int.from_bytes(_urandom(7), 'big') >> 3) * RECIP_BPF
So, your problem boils down to finding a number b that will give 0 when multiplied by a number less than 0.5 and another result when multiplied by a number larger than 0.5. I've found out that, on my machine, that number is 5e-324.
To test it, I've made the following script:
from random import uniform
def test():
runs = 1000000
results = [0, 0]
for i in range(runs):
if uniform(0, 5e-324) == 0:
results[0] += 1
else:
results[1] += 1
print(results)
Which returned results consistent with a 50% probability:
>>> test()
[499982, 500018]
>>> test()
[499528, 500472]
>>> test()
[500307, 499693]
So I am a new programmer, studying CS in university. I am new to Python and have been solving project euler puzzles, and I am wondering why my number 5 takes so long to compute! Looks like 287 seconds. I get the correct answer, but the compute time is very long. Can anyone explain to me why this is, and how I could better optimize it to run faster?
For anyone unfamiliar with project euler, this question is asking me to find the first positive number divisible by all numbers 1 through 20.
Edit: Thanks for all the help guys. I don't know how to comment on comments but your suggestions have been very helpful. Thanks!!
import time
def main():
time_start = time.clock()
x=2
while True:
if divBy20(x)==True:
print(x)
break
else:
x=x+1
time_elapsed = (time.clock() - time_start)
print(time_elapsed)
def divBy20(a):
for i in range(1,21):
if a%i!=0:
return False
return True
main()
Your program loops over every possible number one-by-one until it finds the solution. This is the brute force solution. Project Euler questions are designed to foil brute force approaches. It requires cleverness to improve upon the direct approach. Sometimes that means refining your answer. Sometimes it means completely rethinking it.
Refine
This problem is a great example. You could make some incremental improvements to your algorithm. For instance, you know the answer must be even, so why not skip odd numbers?
x = x + 2
In fact, it must be divisible by 3, so we could even count in multiples of 6.
x = x + 6
And it must be divisible by 5, right? Heck, let's count 30 at a time. Now we're cooking!
x = x + 30
You could keep following this line of thinking and make the increment bigger and bigger. But this would be a good time to step back. Let's rethink this whole approach. Do we need to iterate at all? Where's this all headed?
Rethink
If we multiplied together 1×2×3×4×5...×19×20, we'd have a number that is divisible by one through twenty. But it wouldn't be the smallest such number.
Why is that? Well, the reason it's too big is because of the overlap between numbers. We don't have to multiply by 2 if we're going to multiply by 4. We don't have to multiply by 3 if we're going to multiply by 6.
The breakthrough is to multiply just the prime factors. We don't need 6 because we'll already have 2 and 3. We don't need 9 if we multiply two 3's.
The question is, how many of each prime factor do we need? How many 2's? How many 3's? The answer: we need enough to cover the numbers up to 20. We'll need up to four 2's because 16 = 24. We don't need five, because no number has five 2's in it. We'll need two 3's to handle 9 and 18. And we'll only need one each of 5, 7, 11, 13, 17, and 19—no number has those more than once.
And with that, we can calculate the answer by hand. We don't even need a program!
24 × 32 × 5 × 7 × 11 × 13 × 17 × 19 = 232,792,560
Project Euler #5,
Given the prime factors:
1 = 1
2 = 2
3 = 3
4 = 2^2
5 = 5
6 = 2 * 3
7 = 7
8 = 2^3
9 = 3^3
10 = 2 * 5
11 = 11
12 = 2^2 * 3
13 = 13
14 = 2 * 7
15 = 3 * 5
16 = 2^4
17 = 17
18 = 2 * 3^2
19 = 19
20 = 2^2 * 5
Then this problem is really:
Product((a prime factor)**(the largest multiple of this common factor), all common factors)
lcm(1,2,3,4,5,6,7,8,9,10) = 2^13 * 3^2 * 5^1 * 7^1 = 2520
lcm(1,...,20) = 2^4 * 3^2 * 5 * 7 * 11 * 13 * 17 * 19
By the way, it appears you aren't the first person to fall into the brute force trap: Project Euler 5 in Python - How can I optimize my solution?
Now figure out how to do this in code.
There are obvious fixes that you can do to speed things up such as:
starting at the very lowest possible number, (hint nothing below 20 is divisible by 20 this will skip 18 steps),
stepping by 2 will half the work by skipping odd numbers but
your number must be divisible by all your factors stepping by your largest divisor, (20 this will reduce your work by 95%),
ets.
but a better approach is to consider that all possible solutions will be a product of some or all of your factors - so you could just check the possible products of 3-19 of your factors, keep those that meet the requirements and then return the lowest. You can further remove those factors that are present in higher factors, e.g. 2, 4 & 5 are already in 20, 3 in 9, etc.