Slow Big Int Output in python - python

Is there anyway to improve performance of "str(bigint)" and "print bigint" in python ? Printing big integer values takes a lot of time. I tried to use the following recursive technique :
def p(x,n):
if n < 10:
sys.stdout.write(str(x))
return
n >>= 1
l = 10**n
k = x/l
p(k,n)
p(x-k*l,n)
n = number of digits,
x = bigint
But the method fails for certain cases where x in a sub call has leading zeros. Is there any alternative to it or any faster method. ( Please do not suggest using any external module or library ).

Conversion from a Python integer to a string has a running of O(n^2) where n is the length of the number. For sufficiently large numbers, it will be slow. For a 1,000,001 digit number, str() takes approximately 24 seconds on my computer.
If you are really needing to convert very large numbers to a string, your recursive algorithm is a good approach.
The following version of your recursive code should work:
def p(x,n=0):
if n == 0:
n = int(x.bit_length() * 0.3)
if n < 100:
return str(x)
n >>= 1
l = 10**n
a,b = divmod(x, l)
upper = p(a,n)
lower = p(b,n).rjust(n, "0")
return upper + lower
It automatically estimates the number of digits in the output. It is about 4x faster for a 1,000,001 digit number.
If you need to go faster, you'll probably need to use an external library.

For interactive applications, the built-in print and str functions run in the blink of an eye.
>>> print(2435**356)
392312129667763499898262143039114894750417507355276682533585134425186395679473824899297157270033375504856169200419790241076407862555973647354250524748912846623242257527142883035360865888685267386832304026227703002862158054991819517588882346178140891206845776401970463656725623839442836540489638768126315244542314683938913576544051925370624663114138982037489687849052948878188837292688265616405774377520006375994949701519494522395226583576242344239113115827276205685762765108568669292303049637000429363186413856005994770187918867698591851295816517558832718248949393330804685089066399603091911285844172167548214009780037628890526044957760742395926235582458565322761344968885262239207421474370777496310304525709023682281880997037864251638836009263968398622163509788100571164918283951366862838187930843528788482813390723672536414889756154950781741921331767254375186751657589782510334001427152820459605953449036021467737998917512341953008677012880972708316862112445813219301272179609511447382276509319506771439679815804130595523836440825373857906867090741932138749478241373687043584739886123984717258259445661838205364797315487681003613101753488707273055848670365977127506840194115511621930636465549468994140625
>>> str(2435**356)
'392312129667763499898262143039114894750417507355276682533585134425186395679473824899297157270033375504856169200419790241076407862555973647354250524748912846623242257527142883035360865888685267386832304026227703002862158054991819517588882346178140891206845776401970463656725623839442836540489638768126315244542314683938913576544051925370624663114138982037489687849052948878188837292688265616405774377520006375994949701519494522395226583576242344239113115827276205685762765108568669292303049637000429363186413856005994770187918867698591851295816517558832718248949393330804685089066399603091911285844172167548214009780037628890526044957760742395926235582458565322761344968885262239207421474370777496310304525709023682281880997037864251638836009263968398622163509788100571164918283951366862838187930843528788482813390723672536414889756154950781741921331767254375186751657589782510334001427152820459605953449036021467737998917512341953008677012880972708316862112445813219301272179609511447382276509319506771439679815804130595523836440825373857906867090741932138749478241373687043584739886123984717258259445661838205364797315487681003613101753488707273055848670365977127506840194115511621930636465549468994140625'
If however you are printing big integers to (standard output, say) so that they can be read (from standard input) by another process, and you are finding the binary-to-decimal operations impacting the overall performance, you can look at Is there a faster way to convert an arbitrary large integer to a big endian sequence of bytes? (although the accepted answer suggests numpy, which is an external library, though there are other suggestions).

Related

Why i**2 is slower than i*i? [duplicate]

I'm curious as to why it's so much faster to multiply than to take powers in python (though from what I've read this may well be true in many other languages too). For example it's much faster to do
x*x
than
x**2
I suppose the ** operator is more general and can also deal with fractional powers. But if that's why it's so much slower, why doesn't it perform a check for an int exponent and then just do the multiplication?
Edit: Here's some example code I tried...
def pow1(r, n):
for i in range(r):
p = i**n
def pow2(r, n):
for i in range(r):
p = 1
for j in range(n):
p *= i
Now, pow2 is just a quick example and is clearly not optimised!
But even so I find that using n = 2 and r = 1,000,000, then pow1 takes ~ 2500ms and pow2 takes ~ 1700ms.
I admit that for large values of n, then pow1 does get much quicker than pow2. But that's not too surprising.
Basically naive multiplication is O(n) with a very low constant factor. Taking the power is O(log n) with a higher constant factor (There are special cases that need to be tested... fractional exponents, negative exponents, etc) . Edit: just to be clear, that's O(n) where n is the exponent.
Of course the naive approach will be faster for small n, you're only really implementing a small subset of exponential math so your constant factor is negligible.
Adding a check is an expense, too. Do you always want that check there? A compiled language could make the check for a constant exponent to see if it's a relatively small integer because there's no run-time cost, just a compile-time cost. An interpreted language might not make that check.
It's up to the particular implementation unless that kind of detail is specified by the language.
Python doesn't know what distribution of exponents you're going to feed it. If it's going to be 99% non-integer values, do you want the code to check for an integer every time, making runtime even slower?
Doing this in the exponent check will slow down the cases where it isn't a simple power of two very slightly, so isn't necessarily a win. However, in cases where the exponent is known in advance( eg. literal 2 is used), the bytecode generated could be optimised with a simple peephole optimisation. Presumably this simply hasn't been considered worth doing (it's a fairly specific case).
Here's a quick proof of concept that does such an optimisation (usable as a decorator). Note: you'll need the byteplay module to run it.
import byteplay, timeit
def optimise(func):
c = byteplay.Code.from_code(func.func_code)
prev=None
for i, (op, arg) in enumerate(c.code):
if op == byteplay.BINARY_POWER:
if c.code[i-1] == (byteplay.LOAD_CONST, 2):
c.code[i-1] = (byteplay.DUP_TOP, None)
c.code[i] = (byteplay.BINARY_MULTIPLY, None)
func.func_code = c.to_code()
return func
def square(x):
return x**2
print "Unoptimised :", timeit.Timer('square(10)','from __main__ import square').timeit(10000000)
square = optimise(square)
print "Optimised :", timeit.Timer('square(10)','from __main__ import square').timeit(10000000)
Which gives the timings:
Unoptimised : 6.42024898529
Optimised : 4.52667593956
[Edit]
Actually, thinking about it a bit more, there's a very good reason why this optimisaton isn't done. There's no guarantee that someone won't create a user defined class that overrides the __mul__ and __pow__ methods and do something different for each. The only way to do it safely is if you can guarantee that the object on the top of the stack is one that has the same result "x**2" and "x*x", but working that out is much harder. Eg. in my example it's impossible, as any object could be passed to the square function.
An implementation of b^p with binary exponentiation
def power(b, p):
"""
Calculates b^p
Complexity O(log p)
b -> double
p -> integer
res -> double
"""
res = 1
while p:
if p & 0x1: res *= b
b *= b
p >>= 1
return res
I'd suspect that nobody was expecting this to be all that important. Typically, if you want to do serious calculations, you do them in Fortran or C or C++ or something like that (and perhaps call them from Python).
Treating everything as exp(n * log(x)) works well in cases where n isn't integral or is pretty large, but is relatively inefficient for small integers. Checking to see if n is a small enough integer does take time, and adds complication.
Whether the check is worth it depends on the expected exponents, how important it is to get best performance here, and the cost of the extra complexity. Apparently, Guido and the rest of the Python gang decided the check wasn't worth doing.
If you like, you could write your own repeated-multiplication function.
how about xxxxx?
is it still faster than x**5?
as int exponents gets larger, taking powers might be faster than multiplication.
but the number where actual crossover occurs depends on various conditions, so in my opinion, that's why the optimization was not done(or couldn't be done) in language/library level. But users can still optimize for some special cases :)

Why Lua and Python factorial output was different

--t.lua
function fact(n)
if n == 0 then
return 1
else
return n * fact(n-1)
end
end
for i=1,100,1 do
print(i,fact(i))
end
# t.py
fact = lambda n:1 if n == 0 else n * fact(n-1)
for i in range(1, 100):
print(i, fact(i))
When I write a factorial code in Lua and in Python, I found that output was different.
Lua as usually configured uses your platform's usual double-precision floating point format to store all numbers (this means all number types). For most desktop platforms today, that will be the 64-bit IEEE-754 format. The conventional wisdom is that integers in the range -1E15 to +1E15 can be safely assumed to be represented exactly. To deal with huge numbers in Lua, key words are "bignum" and "arbitrary precision numbers". You can use pure-Lua modules. for example (bignum and lua-nums) and C-based module lmapm. Also read this thread.
Python supports a such-known "bignum" integer type which can work with arbitrarily large numbers. In Python 2.5+, this type is called long and is separate from the int type, but the interpreter will automatically use whichever is more appropriate. In Python 3.0+, the int type has been dropped completely. In Python usually you don't need to use special tools to deal with huge numbers.
This is basic example with lbn library library
local bn = require "bn"
function bn_fact(n)
if n:tonumber() == 0 then return 1 end
return n * bn_fact(n-1)
end
function fact(n)
return bn_fact(bn.number(n))
end
for i=1,100,1 do
print(i,fact(i))
end
Output for some values
30 265252859812191058636308480000000
31 8222838654177922817725562880000000
32 263130836933693530167218012160000000
33 8683317618811886495518194401280000000
You have an overflow on your first image because values are to big to be stored on that var.

Pseudorandom Algorithm for VERY Large (10^1.2mil) Numbers?

I'm looking for a pseudo-random number generator (an algorithm where you input a seed number and it outputs a different 'random-looking' number, and the same seed will always generate the same output) for numbers between 1 and 951,312,000.
I would use the Linear Feedback Shift Register (LFSR) PRNG, but if I did, I would have to convert the seed number (which could be up to 1.2 million digits long in base-10) into a binary number, which would be so massive that I think it would take too long to compute.
In response to a similar question, the Feistel cipher was recommended, but I didn't understand the vocabulary of the wiki page for that method (I'm going into 10th grade so I don't have a degree in encryption), so if you could use layman's terms, I would strongly appreciate it.
Is there an efficient way of doing this which won't take until the end of time, or is this problem impossible?
Edit: I forgot to mention that the prng sequence needs to have a full period. My mistake.
A simple way to do this is to use a linear congruential generator with modulus m = 95^1312000.
The formula for the generator is x_(n+1) = a*x_n + c (mod m). By the Hull-Dobell Theorem, it will have full period if and only if gcd(m,c) = 1 and 95 divides a-1. Furthermore, if you want good second values (right after the seed) even for very small seeds, a and c should be fairly large. Also, your code can't store these values as literals (they would be much too big). Instead, you need to be able to reliably produce them on the fly. After a bit of trial and error to make sure gcd(m,c) = 1, I hit upon:
import random
def get_book(n):
random.seed(1941) #Borges' Library of Babel was published in 1941
m = 95**1312000
a = 1 + 95 * random.randint(1, m//100)
c = random.randint(1, m - 1) #math.gcd(c,m) = 1
return (a*n + c) % m
For example:
>>> book = get_book(42)
>>> book % 10**100
4779746919502753142323572698478137996323206967194197332998517828771427155582287891935067701239737874
shows the last 100 digits of "book" number 42. Given Python's built-in support for large integers, the code runs surprisingly fast (it takes less than 1 second to grab a book on my machine)
If you have a method that can produce a pseudo-random digit, then you can concatenate as many together as you want. It will be just as repeatable as the underlying prng.
However, you'll probably run out of memory scaling that up to millions of digits and attempting to do arithmetic. Normally stuff on that scale isn't done on "numbers". It's done on byte vectors, or something similar.

How do I represent a string as a number?

I need to represent a string as a number, however it is 8928313 characters long, note this string can contain more than just alphabet letters, and I have to be able to convert it back efficiently too. My current (too slow) code looks like this:
alpha = 'abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ,.?!#()+-=[]/*1234567890^*{}\'"$\\&#;|%<>:`~_'
alphaLeng = len(alpha)
def letterNumber(letters):
letters = str(letters)
cof = 1
nr = 0
for i in range(len(letters)):
nr += cof*alpha.find(letters[i])
cof *= alphaLeng
print(i,' ',len(letters))
return str(nr)
Ok, since other people are giving awful answers, I'm going to step in.
You shouldn't do this.
You shouldn't do this.
An integer and an array of characters are ultimately the same thing: bytes. You can access the values in the same way.
Most number representations cap out at 8 bytes (64-bits). You're looking at 8 MB, or 1 million times the largest integer representation. You shouldn't do this. Really.
You shouldn't do this. Your number will just be a custom, gigantic number type that would be identical under the hood.
If you really want to do this, despite all the reasons above, here's how...
Code
def lshift(a, b):
# bitwise left shift 8
return (a << (8 * b))
def string_to_int(data):
sum_ = 0
r = range(len(data)-1, -1, -1)
for a, b in zip(bytearray(data), r):
sum_ += lshift(a, b)
return sum_;
DONT DO THIS
Explanation
Characters are essentially bytes: they can be encoded in different ways, but ultimately you can treat them within a given encoding as a sequence of bytes. In order to convert them to a number, we can shift them left 8-bits for their position in the sequence, creating a unique number. r, the range value, is the position in reverse order: the 4th element needs to go left 24 bytes (3*8), etc.
After getting the range and converting our data to 8-bit integers, we can then transform the data and take the sum, giving us our unique identifier. It will be identical byte-wise (or in reverse byte-order) of the original number, but just "as a number". This is entirely futile. Don't do it.
Performance
Any performance is going to be outweighed by the fact that you're creating an identical object for no valid reason, but this solution is decently performant.
1,000 elements takes ~486 microseconds, 10,000 elements takes ~20.5 ms, while 100,000 elements takes about 1.5 seconds. It would work, but you shouldn't do it. This means it's scaled as O(n**2), which is likely due to memory overhead of reallocating the data each time the integer size gets larger. This might take ~4 hours to process all 8e6 elements (14365 seconds, calculated fitting the lower-order data to ax**2+bx+c). Remember, this is all to get the identical byte representation as the original data.
Futility
Remember, there are ~1e78 to 1e82 atoms in the entire universe, on current estimates. This is ~2^275. Your value will be able to represent 2^71426504, or about 260,000 times as many bits as you need to represent every atom in the universe. You don't need such a number. You never will.
If there are only ANSII characters. You can use ord() and chr().
built-in functions
There are several optimizations you can perform. For example, the find method requires searching through your string for the corresponding letter. A dictionary would be faster. Even faster might be (benchmark!) the chr function (if you're not too picky about the letter ordering) and the ord function to reverse the chr. But if you're not picky about ordering, it might be better if you just left-NULL-padded your string and treated it as a big binary number in memory if you don't need to display the value in any particular format.
You might get some speedup by iterating over characters instead of character indices. If you're using Python 2, a large range will be slow since a list needs to be generated (use xrange instead for Python 2); Python 3 uses a generator, so it's better.
Your print function is going to slow down output a fair bit, especially if you're outputting to a tty.
A big number library may also buy you speed-up: Handling big numbers in code
Your alpha.find() function needs to iterate through alpha on each loop.
You can probably speed things up by using a dict, as dictionary lookups are O(1):
alpha = 'abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ,.?!#()+-=[]/*1234567890^*{}\'"$\\&#;|%<>:`~_'
alpha_dict = { letter: index for index, letter in enumerate(alpha)}
print(alpha.find('$'))
# 83
print(alpha_dict['$'])
# 83
Store your strings in an array of distinct values; i.e. a string table. In your dataset, use a reference number. A reference number of n corresponds to the nth element of the string table array.

Reversing pow function - finding the power [duplicate]

Given positive integers b, c, m where (b < m) is True it is to find a positive integer e such that
(b**e % m == c) is True
where ** is exponentiation (e.g. in Ruby, Python or ^ in some other languages) and % is modulo operation. What is the most effective algorithm (with the lowest big-O complexity) to solve it?
Example:
Given b=5; c=8; m=13 this algorithm must find e=7 because 5**7%13 = 8
From the % operator I'm assuming that you are working with integers.
You are trying to solve the Discrete Logarithm problem. A reasonable algorithm is Baby step, giant step, although there are many others, none of which are particularly fast.
The difficulty of finding a fast solution to the discrete logarithm problem is a fundamental part of some popular cryptographic algorithms, so if you find a better solution than any of those on Wikipedia please let me know!
This isn't a simple problem at all. It is called calculating the discrete logarithm and it is the inverse operation to a modular exponentation.
There is no efficient algorithm known. That is, if N denotes the number of bits in m, all known algorithms run in O(2^(N^C)) where C>0.
Python 3 Solution:
Thankfully, SymPy has implemented this for you!
SymPy is a Python library for symbolic mathematics. It aims to become a full-featured computer algebra system (CAS) while keeping the code as simple as possible in order to be comprehensible and easily extensible. SymPy is written entirely in Python.
This is the documentation on the discrete_log function. Use this to import it:
from sympy.ntheory import discrete_log
Their example computes \log_7(15) (mod 41):
>>> discrete_log(41, 15, 7)
3
Because of the (state-of-the-art, mind you) algorithms it employs to solve it, you'll get O(\sqrt{n}) on most inputs you try. It's considerably faster when your prime modulus has the property where p - 1 factors into a lot of small primes.
Consider a prime on the order of 100 bits: (~ 2^{100}). With \sqrt{n} complexity, that's still 2^{50} iterations. That being said, don't reinvent the wheel. This does a pretty good job. I might also add that it was almost 4x times more memory efficient than Mathematica's MultiplicativeOrder function when I ran with large-ish inputs (44 MiB vs. 173 MiB).
Since a duplicate of this question was asked under the Python tag, here is a Python implementation of baby step, giant step, which, as #MarkBeyers points out, is a reasonable approach (as long as the modulus isn't too large):
def baby_steps_giant_steps(a,b,p,N = None):
if not N: N = 1 + int(math.sqrt(p))
#initialize baby_steps table
baby_steps = {}
baby_step = 1
for r in range(N+1):
baby_steps[baby_step] = r
baby_step = baby_step * a % p
#now take the giant steps
giant_stride = pow(a,(p-2)*N,p)
giant_step = b
for q in range(N+1):
if giant_step in baby_steps:
return q*N + baby_steps[giant_step]
else:
giant_step = giant_step * giant_stride % p
return "No Match"
In the above implementation, an explicit N can be passed to fish for a small exponent even if p is cryptographically large. It will find the exponent as long as the exponent is smaller than N**2. When N is omitted, the exponent will always be found, but not necessarily in your lifetime or with your machine's memory if p is too large.
For example, if
p = 70606432933607
a = 100001
b = 54696545758787
then 'pow(a,b,p)' evaluates to 67385023448517
and
>>> baby_steps_giant_steps(a,67385023448517,p)
54696545758787
This took about 5 seconds on my machine. For the exponent and the modulus of those sizes, I estimate (based on timing experiments) that brute force would have taken several months.
Discrete logarithm is a hard problem
Computing discrete logarithms is believed to be difficult. No
efficient general method for computing discrete logarithms on
conventional computers is known.
I will add here a simple bruteforce algorithm which tries every possible value from 1 to m and outputs a solution if it was found. Note that there may be more than one solution to the problem or zero solutions at all. This algorithm will return you the smallest possible value or -1 if it does not exist.
def bruteLog(b, c, m):
s = 1
for i in xrange(m):
s = (s * b) % m
if s == c:
return i + 1
return -1
print bruteLog(5, 8, 13)
and here you can see that 3 is in fact the solution:
print 5**3 % 13
There is a better algorithm, but because it is often asked to be implemented in programming competitions, I will just give you a link to explanation.
as said the general problem is hard. however a prcatical way to find e if and only if you know e is going to be small (like in your example) would be just to try each e from 1.
btw e==3 is the first solution to your example, and you can obviously find that in 3 steps, compare to solving the non discrete version, and naively looking for integer solutions i.e.
e = log(c + n*m)/log(b) where n is a non-negative integer
which finds e==3 in 9 steps

Categories

Resources