Related
I have this mathematical, Python problem, where I have to find all numbers in a range which are the sum of maximum 4 square numbers. I can't think about a working algorithm, with my basic knowledge. Can you help me out with the algorithm, or an idea of where to start? I'm not asking for the code. Thanks in advance!
According to this Lagrange theorem, you can return the whole positive range because
every natural number can be represented as the sum of four integer
squares.
It means that the algorithm can be written as:
def my_algorithm(integer_range):
return [i for i in integer_range if i >= 0]
Check this link out.
This wikipedia page will solve your problem!
Basically, all whole numbers can be represented as a sum of 4 perfect squares.
I'm looking for a pseudo-random number generator (an algorithm where you input a seed number and it outputs a different 'random-looking' number, and the same seed will always generate the same output) for numbers between 1 and 951,312,000.
I would use the Linear Feedback Shift Register (LFSR) PRNG, but if I did, I would have to convert the seed number (which could be up to 1.2 million digits long in base-10) into a binary number, which would be so massive that I think it would take too long to compute.
In response to a similar question, the Feistel cipher was recommended, but I didn't understand the vocabulary of the wiki page for that method (I'm going into 10th grade so I don't have a degree in encryption), so if you could use layman's terms, I would strongly appreciate it.
Is there an efficient way of doing this which won't take until the end of time, or is this problem impossible?
Edit: I forgot to mention that the prng sequence needs to have a full period. My mistake.
A simple way to do this is to use a linear congruential generator with modulus m = 95^1312000.
The formula for the generator is x_(n+1) = a*x_n + c (mod m). By the Hull-Dobell Theorem, it will have full period if and only if gcd(m,c) = 1 and 95 divides a-1. Furthermore, if you want good second values (right after the seed) even for very small seeds, a and c should be fairly large. Also, your code can't store these values as literals (they would be much too big). Instead, you need to be able to reliably produce them on the fly. After a bit of trial and error to make sure gcd(m,c) = 1, I hit upon:
import random
def get_book(n):
random.seed(1941) #Borges' Library of Babel was published in 1941
m = 95**1312000
a = 1 + 95 * random.randint(1, m//100)
c = random.randint(1, m - 1) #math.gcd(c,m) = 1
return (a*n + c) % m
For example:
>>> book = get_book(42)
>>> book % 10**100
4779746919502753142323572698478137996323206967194197332998517828771427155582287891935067701239737874
shows the last 100 digits of "book" number 42. Given Python's built-in support for large integers, the code runs surprisingly fast (it takes less than 1 second to grab a book on my machine)
If you have a method that can produce a pseudo-random digit, then you can concatenate as many together as you want. It will be just as repeatable as the underlying prng.
However, you'll probably run out of memory scaling that up to millions of digits and attempting to do arithmetic. Normally stuff on that scale isn't done on "numbers". It's done on byte vectors, or something similar.
I'm trying to gather some statistics on prime numbers, among which is the distribution of factors for the number (prime-1)/2. I know there are general formulas for the size of factors of uniformly selected numbers, but I haven't seen anything about the distribution of factors of one less than a prime.
I've written a program to iterate through primes starting at the first prime after 2^63, and then factor the (prime - 1)/2 using trial division by all primes up to 2^32. However, this is extremely slow because that is a lot of primes (and a lot of memory) to iterate through. I store the primes as a single byte each (by storing the increment from one prime to the next). I also use a deterministic variant of the Miller-Rabin primality test for numbers up to 2^64, so I can easily detect when the remaining value (after a successful division) is prime.
I've experimented using a variant of pollard-rho and elliptic curve factorization, but it is hard to find the right balance of between trial division and switching to these more complicated methods. Also I'm not sure I've implemented them correctly, because sometimes they seem to take a very lone time to find a factor, and based on their asymptotic behavior, I'd expect them to be quite quick for such small numbers.
I have not found any information on factoring many numbers (vs just trying to factor one), but it seems like there should be some way to speed up the task by taking advantage of this.
Any suggestions, pointers to alternate approaches, or other guidance on this problem is greatly appreciated.
Edit:
The way I store the primes is by storing an 8-bit offset to the next prime, with the implicit first prime being 3. Thus, in my algorithms, I have a separate check for division by 2, then I start a loop:
factorCounts = collections.Counter()
while N % 2 == 0:
factorCounts[2] += 1
N //= 2
pp = 3
for gg in smallPrimeGaps:
if pp*pp > N:
break
if N % pp == 0:
while N % pp == 0:
factorCounts[pp] += 1
N //= pp
pp += gg
Also, I used a wheel sieve to calculate the primes for trial division, and I use an algorithm based on the remainder by several primes to get the next prime after the given starting point.
I use the following for testing if a given number is prime (porting code to c++ now):
bool IsPrime(uint64_t n)
{
if(n < 341531)
return MillerRabinMulti(n, {9345883071009581737ull});
else if(n < 1050535501)
return MillerRabinMulti(n, {336781006125ull, 9639812373923155ull});
else if(n < 350269456337)
return MillerRabinMulti(n, {4230279247111683200ull, 14694767155120705706ull, 1664113952636775035ull});
else if(n < 55245642489451)
return MillerRabinMulti(n, {2ull, 141889084524735ull, 1199124725622454117, 11096072698276303650});
else if(n < 7999252175582851)
return MillerRabinMulti(n, {2ull, 4130806001517ull, 149795463772692060ull, 186635894390467037ull, 3967304179347715805ull});
else if(n < 585226005592931977)
return MillerRabinMulti(n, {2ull, 123635709730000ull, 9233062284813009ull, 43835965440333360ull, 761179012939631437ull, 1263739024124850375ull});
else
return MillerRabinMulti(n, {2ull, 325ull, 9375ull, 28178ull, 450775ull, 9780504ull, 1795265022ull});
}
I don't have a definitive answer, but I do have some observations and some suggestions.
There are about 2*10^17 primes between 2^63 and 2^64, so any program you write is going to run for a while.
Let's talk about a primality test for numbers in the range 2^63 to 2^64. Any general-purpose test will do more work than you need, so you can speed things up by writing a special-purpose test. I suggest strong-pseudoprime tests (as in Miller-Rabin) to bases 2 and 3. If either of those tests shows the number is composite, you're done. Otherwise, look up the number (binary search) in a table of strong-pseudoprimes to bases 2 and 3 (ask Google to find those tables for you). Two strong pseudoprime tests followed by a table lookup will certainly be faster than the deterministic Miller-Rabin test you are currently performing, which probably uses six or seven bases.
For factoring, trial division to 1000 followed by Brent-Rho until the product of the known prime factors exceeds the cube root of the number being factored ought to be fairly fast, a few milliseconds. Then, if the remaining cofactor is composite, it will necessarily have only two factors, so SQUFOF would be a good algorithm to split them, faster than the other methods because all the arithmetic is done with numbers less than the square root of the number being factored, which in your case means the factorization could be done using 32-bit arithmetic instead of 64-bit arithmetic, so it ought to be fast.
Instead of factoring and primality tests, a better method uses a variant of the Sieve of Eratosthenes to factor large blocks of numbers. That will still be slow, as there are 203 million sieving primes less than 2^32, and you will need to deal with the bookkeeping of a segmented sieve, but considering that you factor lots of numbers at once, it's probably the best approach to your task.
I have code for everything mentioned above at my blog.
This is how I store primes for later:
(I'm going to assume you want the factors of the number, and not just a primality test).
Copied from my website http://chemicaldevelopment.us/programming/2016/10/03/PGS.html
I’m going to assume you know the binary number system for this part. If not, just think of 1 as a “yes” and 0 as a “no”.
So, there are plenty of algorithms to generate the first few primes. I use the Sieve of Eratosthenes to compute a list.
But, if we stored the primes as an array, like [2, 3, 5, 7] this would take up too much space. How much space exactly?
Well, 32 bit integers which can store up to 2^32 each take up 4 bytes because each byte is 8 bits, and 32 / 8 = 4
If we wanted to store each prime under 2,000,000,000, we would have to store over 98,000,000,000. This takes up more space, and is slower at runtime than a bitset, which is explained below.
This approach will take 98,000,000 integers of space (each is 32 bits, which is 4 bytes), and when we check at runtime, we will need to check every integer in the array until we find it, or we find a number that is greater than it.
For example, say I give you a small list of primes: [2, 3, 5, 7, 11, 13, 17, 19]. I ask you if 15 is prime. How do you tell me?
Well, you would go through the list and compare each to 15.
Is 2 = 15?
Is 3 = 15?
. . .
Is 17 = 15?
At this point, you can stop because you have passed where 15 would be, so you know it isn’t prime.
Now then, let’s say we use a list of bits to tell you if the number is prime. The list above would look like:
001101010001010001010
This starts at 0, and goes to 19
The 1s mean that the index is prime, so count from the left: 0, 1, 2
001101010001010001010
The last number in bold is 1, which indicates that 2 is prime.
In this case, if I asked you to check if 15 is prime, you don’t need to go through all the numbers in the list; All you need to do is skip to 0 . . . 15, and check that single bit.
And for memory usage, the first approach uses 98000000 integers, whereas this one can store 32 numbers in a single integer (using the list of 1s and 0s), so we would need
2000000000/32=62500000 integers.
So it uses about 60% as much memory as the first approach, and is much faster to use.
We store the array of integers from the second approach in a file, then read it back when you run.
This uses 250MB of ram to store data on the first 2000000000 primes.
You can further reduce this with wheel sieving (like what you did storing (prime-1)/2)
I'll go a little bit more into wheel sieve.
You got it right by storing (prime - 1)/2, and 2 being a special case.
You can extend this to p# (the product of the first p primes)
For example, you use (1#)*k+1 for numbers k
You can also use the set of linear equations (n#)*k+L, where L is the set of primes less than n# and 1 excluding the first n primes.
So, you can also just store info for 6*k+1 and 6*k+5, and even more than that, because L={1, 2, 3, 5}{2, 3}
These methods should give you an understanding of some the methods behind it.
You will need someway to implement this bitset, such as a list of 32 bit integers, or a string.
Look at: https://pypi.python.org/pypi/bitarray for a possible abstraction
Is there any difference whatsoever between using random.randrange to pick 5 digits individually, like this:
a=random.randrange(0,10)
b=random.randrange(0,10)
c=random.randrange(0,10)
d=random.randrange(0,10)
e=random.randrange(0,10)
print (a,b,c,d,e)
...and picking the 5-digit number at once, like this:
x=random.randrange(0, 100000)
print (x)
Any random-number-generator differences (if any --- see the section on Randomness) are minuscule compared to the utility and maintainability drawbacks of the digit-at-a-time method.
For starters, generating each digit would require a lot more code to handle perfectly normal calls like randrange(0, 1024) or randrange(0, 2**32), where the digits do not arise in equal probability. For example, on the closed-closed range [0,1023] (requiring 4 digits), the first digit of the four can never be anything other than 0 or 1. The last digit is slightly more likely to be a 0, 1, 2, or 3. And so on.
Trying to cover all the bases would rapidly make that code slower, more bug-prone, and more brittle than it already is. (The number of annoying little details you've encountered just posting this question should give you an idea what lies farther down that path.)
...and all that grief is before you consider how easily random.randrange handles non-zero start values, the step parameter, and negative arguments.
Randomness Problems
If your RNG is good, your alternative method should produce "equally random" results (assuming you've handled all the problems I mentioned above). However, if your RNG is biased, then the digit-at-a-time method will probably increase its effect on your outputs.
For demonstration purposes, assume your absurdly biased RNG has an off-by-one error, so that it never produces the last value of the given range:
The call randrange(0, 2**32) will never produce 2**32 - 1 (4,294,967,295), but the remaining 4-billion-plus values will appear in very nearly their expected probability. Its output over millions of calls would be very hard to distinguish from a working pseudo-random number generator.
Producing the ten digits of that same supposedly-random number individually will subject each digit to that same off-by-one error, resulting in a ten-digit output that consists entirely of the digits [0,8], with no 9s present... ever. This is vastly "less random" than generating the whole number at once.
Conversely, the digit-at-a-time method will never be better than the RNG backing it, even when the range requested is very small. That method might magnify any RNG bias, or just repeat that bias, but it will never reduce it.
Yes, no and no.
Yes: probabilities multiply, so the digit sequences have the same probability
prob(a) and prob(b) = prob(a) * prob(b)
Since each digit has 0.1 chance of appear, the probability of two particular digits in order is 0.1**2, or 0.01, which is the probability of a number between 0 and 99 inclusive.
No: you have a typo in your second number.
The second form only has four digits; you probably meant randrange(0, 100000)
No: the output will not be the same
The second form will not print leading digits; you could print("%05d"%x) to get all the digits. Also, the first form has spaces in the output, so you could instead print("%d%d%d%d%d"%(a,b,c,d,e)).
I have an binary search implemented in python.
Now I want to check if element math.floor(n ^ (1/p)) is in my binary search.
But p is a very, very large number. I wrote using fractions module:
binary_search.search(list,int (n**fractions.Fraction('1'+'/'+str(p))))
But I have an error OverflowError: integer division result too large for a float
How can I take to n to the power, which is a fraction and do it fast?
Unless your values of n are also incredibly large, floor(n^(1/p)) is going to tend toward 1 for "very, very large" values of p. Since you're only interested in the integer portion, you could get away with a simple loop to test if 1^P, 2^p, 3^p and so on are greater than n.
Don't waste time finding exact values if you don't need them.
n^(1/p)=exp(ln(n)/p) ~~ 1+ln(n)/p for big p values
So you can compare p with natural logarithm of n. If the ratio p/ln(n) >> 1 (much larger), then you can use approximation above (which tends to 1)