RSA algorithm fails for some strings - python

Exercise 11.7 from Chapter 11 of Think Python, by Allen B. Downey, prompts the following:
Exponentiation of large integers is the basis of common algorithms for public-key encryption. Read the Wikipedia page on the RSA algorithm3 and write functions to encode and decode messages.
I translated the plain English algorithm from the Key Generation section of the Wikipedia article. The function RSA takes a string and encodes it as an ASCII integer, creates public and private keys for decryption, encrypts the encoded integer using the public keys, decrypts it using the private key, and then decodes the integer back into a string:
import math
def encode(s):
m = ''
for c in s:
# each character in the message is converted to a ascii decimal value
m += str(ord(c))
return int(m)
def decode(i):
# convert decrypted int back to string
temp = str(i)
decoded = ''
while len(temp):
if temp[:1] == '1':
letter = chr(int(temp[:3]))
temp = temp[3:]
else:
letter = chr(int(temp[0:2]))
temp = temp[2:]
decoded += letter
return decoded
def generate_keys(m):
# requires an int m that is an int representation (in ascii) of our message
# 1. Choose two distinct prime numbers p and q.
##### TEST WIKIPEDIA EXAMPLE #####
# p, q = 61, 53
# find two prime numbers whose product n will be greater than m
i = math.ceil(math.sqrt(m))
primes = []
while len(primes) < 2:
if isPrime(i):
primes += [i]
i += 1
p, q = primes[0], primes[1]
# 2. Compute n = pq.
n = p * q
in_range = int(m) < n
if not in_range:
print('m must be less than n')
exit()
# 3. Compute λ(n), where λ is Carmichael's totient function. Since n = pq, λ(n) = lcm(λ(p),λ(q)), and since p and q are prime, λ(p) = φ(p) = p − 1 and likewise λ(q) = q − 1. Hence λ(n) = lcm(p − 1, q − 1).
# The lcm may be calculated through the Euclidean algorithm, since lcm(a,b) = |ab|/gcd(a,b).
a, b = (p - 1), (q - 1)
lam = int(abs(a * b) / math.gcd(a, b))
# print('lam:', lam)
# 4. Choose an integer e such that 1 < e < λ(n) and gcd(e, λ(n)) = 1; that is, e and λ(n) are coprime.
##### TEST WIKIPEDIA EXAMPLE #####
# e = 17
# accordng to wikipedia, the smallest (and fastest) possible value for e is 3: https://en.wikipedia.org/wiki/RSA_(cryptosystem)#Key_generation
i = 3
while i < lam:
if math.gcd(i, int(lam)) == 1:
e = i
break
i += 1
# 5. compute d where d * e ≡ 1 (mod lam)
d = int(modInverse(e, lam))
return {'public': [n, e], 'private': d}
def encrypt_message(m, public):
n, e = public[0], public[1]
# generate ciphertext
return pow(m, e, n)
def decrypt_message(cipher, n, d):
# decrypt ciphertext
return pow(cipher, d, n)
def RSA(s):
m = encode(s)
keys = generate_keys(m)
cipher = encrypt_message(m, keys['public'])
message = decrypt_message(cipher, keys['public'][0], keys['private'])
messages_match = message == m
if not messages_match:
print('the decoded integer does not equal the encoded integer')
exit()
return decode(message)
# taken from https://www.geeksforgeeks.org/python-program-to-check-whether-a-number-is-prime-or-not/
def isPrime(n) :
# Corner cases
if (n <= 1) :
return False
if (n <= 3) :
return True
# This is checked so that we can skip
# middle five numbers in below loop
if (n % 2 == 0 or n % 3 == 0) :
return False
i = 5
while(i * i <= n) :
if (n % i == 0 or n % (i + 2) == 0) :
return False
i = i + 6
return True
# modInverse(a, m) taken from https://www.geeksforgeeks.org/multiplicative-inverse-under-modulo-m/
def modInverse(a, m) :
m0 = m
y = 0
x = 1
if (m == 1) :
return 0
while (a > 1) :
# q is quotient
q = a // m
t = m
# m is remainder now, process
# same as Euclid's algo
m = a % m
a = t
t = y
# Update x and y
y = x - q * y
x = t
# Make x positive
if (x < 0) :
x = x + m0
return x
if __name__ == "__main__":
print(RSA('python')) # encrypts and decrypts message 'python'
print(RSA('cat')) # encrypts and decrypts message 'cat'
print(RSA('dog')) # encrypts and decrypts message 'dog'
print(RSA('a 1')) # (to rule out spaces as the culprit) encrypts and decrypts message 'a 1'
print(RSA('pythons')) # FAILS - 7 characters in string seems to be the limit
print(RSA('hello world')) # FAILS - encoded string does not equal decoded string
At the end of the script, I test RSA with several strings. The output equals the input for 'python', 'cat', 'dog', and 'a 1' (the intended behavior). But the encoded and decoded integers are different for 'pythons' and 'hello world'.
I suspect that it is the input string length that is causing the problem, but I'm not sure where that is causing the breakdown. My guess is that pow(cipher, d, n) in the function decrypt_message is returning an unexpected result if cipher is of too great a length.
Why does RSA work for some strings but not others? Is it the length of the input string or something else?

Related

how do I identify sequence equation Python

Am I able to identify sequence, but not formula
I have the whole code
def analyse_sequence_type(y:list[int]):
if len(y) >= 5:
res = {"linear":[],"quadratic":[],"exponential":[],"cubic":[]}
for i in reversed(range(len(y))):
if i-2>=0 and (y[i] + y[i-2] == 2*y[i-1]): res["linear"].append(True)
elif i-3>=0 and (y[i] - 2*y[i-1] + y[i-2] == y[i-1] - 2*y[i-2] + y[i-3]): res["quadratic"].append(True)
for k, v in res.items():
if v:
if k == "linear" and len(v)+2 == len(y): return k
elif k == "quadratic" and len(v)+3 == len(y): return k
return
print(f"A relation cannot be made with just {len(y)} values.\nPlease enter a minimum of 5 values!")
return
I can identify linear and quadratic but how do I make a function
So, firstly we will need to create two functions for linear and quadratic (formulae attached below).
def linear(y):
"""
Returns equation in format (str)
y = mx + c
"""
d = y[1]-y[0] # get difference
c = f"{y[0]-d:+}" # get slope
if d == 0: c = y[0] - d # if no difference then intercept is 0
return f"f(x) = {d}x {c} ; f(1) = {y[0]}".replace("0x ","").replace("1x","x").replace(" + 0","");
We apply a similar logic for quadratic:
def quadratic(y):
"""
Returns equation in format (str)
y = ax² + bx + c
"""
a = logic_round((y[2] - 2*y[1] + y[0])/2) # get a
b = logic_round(y[1] - y[0] - 3*a) # get b
c = logic_round(y[0]-a-b) # get c
return f"f(x) = {a}x² {b:+}x {c:+} ; f(1) = {y[0]}".replace('1x²','x²').replace('1x','x').replace(' +0x','').replace(' +0','')
If you try the code with multiple inputs such as 5.0 you will get 5.0x + 4 (example). To omit that try:
def logic_round(num):
splitted = str(num).split('.') # split decimal
if len(splitted)>1 and len(set(splitted[-1])) == 1 and splitted[-1].startswith('0'): return int(splitted[0]) # check if it is int.0 or similar
elif len(splitted)>1: return float(num) # else returns float
return int(num)
The above functions will work in any way provided that the y is a list where the domain is [1, ∞).
Hope this helps :) Also give cubic a try.

How to take numbers from a .txt list and save the result in another .txt?

This script has a formula where at the beginning the numbers x1 and x2 in the code are set, and
I need to change the code so that the value x1 is taken from the list
pre-prepared text document
For example, from a document: 'List.txt'
That is, it turns out I need to enter:
with open ("List.txt '", "r") as f:
into place the value x1 = 6 in the code. But how to systematize it? Just not very rich in knowledge of Python.
List of numbers:
1
4
2
15
6
8
13
3
12
5
10
7
14
9
11
Code: (Powered by Python 2.7)
import sys
a=0
b=7
p=37
x1=6
x2=8
if (len(sys.argv)>1):
x1=int(sys.argv[1])
if (len(sys.argv)>2):
x2=int(sys.argv[2])
if (len(sys.argv)>3):
p=int(sys.argv[3])
if (len(sys.argv)>4):
a=int(sys.argv[4])
if (len(sys.argv)>5):
b=int(sys.argv[5])
def modular_sqrt(a, p):
""" Find a quadratic residue (mod p) of 'a'. p
must be an odd prime.
Solve the congruence of the form:
x^2 = a (mod p)
And returns x. Note that p - x is also a root.
0 is returned is no square root exists for
these a and p.
The Tonelli-Shanks algorithm is used (except
for some simple cases in which the solution
is known from an identity). This algorithm
runs in polynomial time (unless the
generalized Riemann hypothesis is false).
"""
# Simple cases
#
if legendre_symbol(a, p) != 1:
return 0
elif a == 0:
return 0
elif p == 2:
return p
elif p % 4 == 3:
return pow(a, (p + 1) / 4, p)
# Partition p-1 to s * 2^e for an odd s (i.e.
# reduce all the powers of 2 from p-1)
#
s = p - 1
e = 0
while s % 2 == 0:
s /= 2
e += 1
# Find some 'n' with a legendre symbol n|p = -1.
# Shouldn't take long.
#
n = 2
while legendre_symbol(n, p) != -1:
n += 1
x = pow(a, (s + 1) / 2, p)
b = pow(a, s, p)
g = pow(n, s, p)
r = e
while True:
t = b
m = 0
for m in xrange(r):
if t == 1:
break
t = pow(t, 2, p)
if m == 0:
return x
gs = pow(g, 2 ** (r - m - 1), p)
g = (gs * gs) % p
x = (x * gs) % p
b = (b * g) % p
r = m
def legendre_symbol(a, p):
""" Compute the Legendre symbol a|p using
Euler's criterion. p is a prime, a is
relatively prime to p (if p divides
a, then a|p = 0)
Returns 1 if a has a square root modulo
p, -1 otherwise.
"""
ls = pow(a, (p - 1) / 2, p)
return -1 if ls == p - 1 else ls
def egcd(a, b):
if a == 0:
return (b, 0, 1)
else:
g, y, x = egcd(b % a, a)
return (g, x - (b // a) * y, y)
def modinv(a, m):
g, x, y = egcd(a, m)
if g != 1:
print ("x")
else:
return x % m
print "a=",a
print "b=",b
print "p=",p
print "x-point=",x1
print "x-point=",x2
z=(x1**3 + a*x1 +b) % p
y1=modular_sqrt(z, p)
z=(x2**3 + a*x2 +b) % p
y2=modular_sqrt(z, p)
print "\nP1\t(%d,%d)" % (x1,y1)
print "P2\t(%d,%d)" % (x2,y2)
s=((-y2)-y1)* modinv(x2-x1,p)
x3=(s**2-x2-x1) % p
y3=((s*(x2-x3)+y2)) % p
result = "Q\t(%d,%d)" % (x3,y3)
f = open('Result01.txt', 'w')
f.write(result)
f.close()
Earlier, I saw scripts where numbers are taken from one text document, perform a function, and the result is saved in another text document.
Try using the pandas library to read, process and write your numbers.
import pandas as pd # import pandas module and call it pd for short
x2 = 6
df = pd.read_csv('input_file.txt') # read the data from a text file into a dataframe
df['x1 times x2'] = df['x1'] * x2 # create new column in your dataframe with result of your function
df.to_csv('output_file.txt', index=False) # output result of your calculations (dropping the dataframe index column)
Although you're hard coding the values of x1, x2, in your code, they can be redefined, as you're doing here:
if (len(sys.argv)>1):
x1=int(sys.argv[1])
if (len(sys.argv)>2):
x2=int(sys.argv[2])
So if you call your script from command line, like C:\Users\test.py x1value x2value you can redefine x1 and x2. If you really want a text file to contain your x1 and x2, just use the following snippet somewhere at the top
import json
with open("input.json","r",encoding="utf-8") as stream:
parsed = json.load(stream)
x1,x2 = parsed["x1"],parsed["x2"]
Contents of "input.json":
{"x1":1,"x2"=2}
With only python without extra dependencies, your can read List.txt as follow
with open("List.txt","r") as f:
arrX1 = list(map(int,f.readlines()))
print (arrX1)
The above reads all the lines in f and converts/maps them to integers. The list function then gives you an array you can loop through to generate x2 and write to the Result.txt file.
The above prints
[1, 4, 2, 15, 6, 8, 13, 5, 3, 10, 7, 14, 9, 11]
So for your code replace all lines from 125 downward with
# Read numbers from file and put them in an array
with open("List.txt","r") as f:
arrX1 = list(map(int,f.readlines()))
f.close()
# Open the result file to write to
f = open('Result01.txt', 'w')
# Now get x1 for each item in the list of numbers from the file
# then do the calculations
# and write the result
for x1 in arrX1:
z=(x1**3 + a*x1 +b) % p
y1=modular_sqrt(z, p)
z=(x2**3 + a*x2 +b) % p
y2=modular_sqrt(z, p)
print "\nP1\t(%d,%d)" % (x1,y1)
print "P2\t(%d,%d)" % (x2,y2)
s=((-y2)-y1)* modinv(x2-x1,p)
x3=(s**2-x2-x1) % p
y3=((s*(x2-x3)+y2)) % p
result = "Q\t(%d,%d)" % (x3,y3)
f.write(result)
f.close()

Convert a number to Excel’s base 26

OK, I'm stuck on something seemingly simple. I am trying to convert a number to base 26 (ie. 3 = C, 27 = AA, ect.). I am guessing my problem has to do with not having a 0 in the model? Not sure. But if you run the code, you will see that numbers 52, 104 and especially numbers around 676 are really weird. Can anyone give me a hint as to what I am not seeing? I will appreciate it. (just in case to avoid wasting your time, # is ascii char 64, A is ascii char 65)
def toBase26(x):
x = int(x)
if x == 0:
return '0'
if x < 0:
negative = True
x = abs(x)
else:
negative = False
def digit_value (val):
return str(chr(int(val)+64))
digits = 1
base26 = ""
while 26**digits < x:
digits += 1
while digits != 0:
remainder = x%(26**(digits-1))
base26 += digit_value((x-remainder)/(26**(digits-1)))
x = remainder
digits -= 1
if negative:
return '-'+base26
else:
return base26
import io
with io.open('numbers.txt','w') as f:
for i in range(1000):
f.write('{} is {}\n'.format(i,toBase26(i)))
So, I found a temporary workaround by making a couple of changes to my function (the 2 if statements in the while loop). My columns are limited to 500 anyways, and the following change to the function seems to do the trick up to x = 676, so I am satisfied. However if any of you find a general solution for any x (may be my code may help), would be pretty cool!
def toBase26(x):
x = int(x)
if x == 0:
return '0'
if x < 0:
negative = True
x = abs(x)
else:
negative = False
def digit_value (val):
return str(chr(int(val)+64))
digits = 1
base26 = ""
while 26**digits < x:
digits += 1
while digits != 0:
remainder = x%(26**(digits-1))
if remainder == 0:
remainder += 26**(digits-1)
if digits == 1:
remainder -= 1
base26 += digit_value((x-remainder)/(26**(digits-1)))
x = remainder
digits -= 1
if negative:
return '-'+base26
else:
return base26
The problem when converting to Excel’s “base 26” is that for Excel, a number ZZ is actually 26 * 26**1 + 26 * 26**0 = 702 while normal base 26 number systems would make a 1 * 26**2 + 1 * 26**1 + 0 * 26**0 = 702 (BBA) out of that. So we cannot use the usual ways here to convert these numbers.
Instead, we have to roll our own divmod_excel function:
def divmod_excel(n):
a, b = divmod(n, 26)
if b == 0:
return a - 1, b + 26
return a, b
With that, we can create a to_excel function:
import string
def to_excel(num):
chars = []
while num > 0:
num, d = divmod_excel(num)
chars.append(string.ascii_uppercase[d - 1])
return ''.join(reversed(chars))
For the other direction, this is a bit simpler
import string
from functools import reduce
def from_excel(chars):
return reduce(lambda r, x: r * 26 + x + 1, map(string.ascii_uppercase.index, chars), 0)
This set of functions does the right thing:
>>> to_excel(26)
'Z'
>>> to_excel(27)
'AA'
>>> to_excel(702)
'ZZ'
>>> to_excel(703)
'AAA'
>>> from_excel('Z')
26
>>> from_excel('AA')
27
>>> from_excel('ZZ')
702
>>> from_excel('AAA')
703
And we can actually confirm that they work correctly opposite of each other by simply checking whether we can chain them to reproduce the original number:
for i in range(100000):
if from_excel(to_excel(i)) != i:
print(i)
# (prints nothing)
Simplest way, if you do not want to do it yourself:
from openpyxl.utils import get_column_letter
proper_excel_column_letter = get_column_letter(5)
# will equal "E"
Sorry, I wrote this in Pascal and know no Python
function NumeralBase26Excel(numero: Integer): string;
var
algarismo: Integer;
begin
Result := '';
numero := numero - 1;
if numero >= 0 then
begin
algarismo := numero mod 26;
if numero < 26 then
Result := Chr(Ord('A') + algarismo)
else
Result := NumeralBase26Excel(numero div 26) + Chr(Ord('A') + algarismo);
end;
end;
You can do it in one line (with line continuations for easier reading). Written here in VBA:
Function sColumn(nColumn As Integer) As String
' Return Excel column letter for a given column number.
' 703 = 26^2 + 26^1 + 26^0
' 64 = Asc("A") - 1
sColumn = _
IIf(nColumn < 703, "", Chr(Int((Int((nColumn - 1) / 26) - 1) / 26) + 64)) & _
IIf(nColumn < 27, "", Chr( ((Int((nColumn - 1) / 26) - 1) Mod 26) + 1 + 64)) & _
Chr( ( (nColumn - 1) Mod 26) + 1 + 64)
End Function
Or you can do it in the the worksheet:
=if(<col num> < 703, "", char(floor((floor((<col num> - 1) / 26, 1) - 1) / 26, 1) + 64)) &
if(<col num> < 27, "", char(mod( floor((<col num> - 1) / 26, 1) - 1, 26) + 1 + 64)) &
char(mod( <col num> - 1 , 26) + 1 + 64)
I've also posted the inverse operation done similarly.
Based on #TheUltimateOptimist's answer, I looked in the openpyxl implementation and found the "actual" algorithm used by openpyxl==3.0.10:
Be warned; it only supports values between 1 & 18278 (inclusive).
def _get_column_letter(col_idx):
"""Convert a column number into a column letter (3 -> 'C')
Right shift the column col_idx by 26 to find column letters in reverse
order. These numbers are 1-based, and can be converted to ASCII
ordinals by adding 64.
"""
# these indicies corrospond to A -> ZZZ and include all allowed
# columns
if not 1 <= col_idx <= 18278:
raise ValueError("Invalid column index {0}".format(col_idx))
letters = []
while col_idx > 0:
col_idx, remainder = divmod(col_idx, 26)
# check for exact division and borrow if needed
if remainder == 0:
remainder = 26
col_idx -= 1
letters.append(chr(remainder+64))
return ''.join(reversed(letters))

What's wrong with my Extended Euclidean Algorithm (python)?

My algorithm to find the HCF of two numbers, with displayed justification in the form r = a*aqr + b*bqr, is only partially working, even though I'm pretty sure that I have entered all the correct formulae - basically, it can and will find the HCF, but I am also trying to provide a demonstration of Bezout's Lemma, so I need to display the aforementioned displayed justification. The program:
# twonumbers.py
inp = 0
a = 0
b = 0
mul = 0
s = 1
r = 1
q = 0
res = 0
aqc = 1
bqc = 0
aqd = 0
bqd = 1
aqr = 0
bqr = 0
res = 0
temp = 0
fin_hcf = 0
fin_lcd = 0
seq = []
inp = input('Please enter the first number, "a":\n')
a = inp
inp = input('Please enter the second number, "b":\n')
b = inp
mul = a * b # Will come in handy later!
if a < b:
print 'As you have entered the first number as smaller than the second, the program will swap a and b before proceeding.'
temp = a
a = b
b = temp
else:
print 'As the inputted value a is larger than or equal to b, the program has not swapped the values a and b.'
print 'Thank you. The program will now compute the HCF and simultaneously demonstrate Bezout\'s Lemma.'
print `a`+' = ('+`aqc`+' x '+`a`+') + ('+`bqc`+' x '+`b`+').'
print `b`+' = ('+`aqd`+' x '+`a`+') + ('+`bqd`+' x '+`b`+').'
seq.append(a)
seq.append(b)
c = a
d = b
while r != 0:
if s != 1:
c = seq[s-1]
d = seq[s]
res = divmod(c,d)
q = res[0]
r = res[1]
aqr = aqc - (q * aqd)#These two lines are the main part of the justification
bqr = bqc - (q * aqd)#-/
print `r`+' = ('+`aqr`+' x '+`a`+') + ('+`bqr`+' x '+`b`+').'
aqd = aqr
bqd = bqr
aqc = aqd
bqc = bqd
s = s + 1
seq.append(r)
fin_hcf = seq[-2] # Finally, the HCF.
fin_lcd = mul / fin_hcf
print 'Using Euclid\'s Algorithm, we have now found the HCF of '+`a`+' and '+`b`+': it is '+`fin_hcf`+'.'
print 'We can now also find the LCD (LCM) of '+`a`+' and '+`b`+' using the following method:'
print `a`+' x '+`b`+' = '+`mul`+';'
print `mul`+' / '+`fin_hcf`+' (the HCF) = '+`fin_lcd`+'.'
print 'So, to conclude, the HCF of '+`a`+' and '+`b`+' is '+`fin_hcf`+' and the LCD (LCM) of '+`a`+' and '+`b`+' is '+`fin_lcd`+'.'
I would greatly appreciate it if you could help me to find out what is going wrong with this.
Hmm, your program is rather verbose and hence hard to read. For example, you don't need to initialise lots of those variables in the first few lines. And there is no need to assign to the inp variable and then copy that into a and then b. And you don't use the seq list or the s variable at all.
Anyway that's not the problem. There are two bugs. I think that if you had compared the printed intermediate answers to a hand-worked example you should have found the problems.
The first problem is that you have a typo in the second line here:
aqr = aqc - (q * aqd)#These two lines are the main part of the justification
bqr = bqc - (q * aqd)#-/
in the second line, aqd should be bqd
The second problem is that in this bit of code
aqd = aqr
bqd = bqr
aqc = aqd
bqc = bqd
you make aqd be aqr and then aqc be aqd. So aqc and aqd end up the same. Whereas you actually want the assignments in the other order:
aqc = aqd
bqc = bqd
aqd = aqr
bqd = bqr
Then the code works. But I would prefer to see it written more like this which is I think a lot clearer. I have left out the prints but I'm sure you can add them back:
a = input('Please enter the first number, "a":\n')
b = input('Please enter the second number, "b":\n')
if a < b:
a,b = b,a
r1,r2 = a,b
s1,s2 = 1,0
t1,t2 = 0,1
while r2 > 0:
q,r = divmod(r1,r2)
r1,r2 = r2,r
s1,s2 = s2,s1 - q * s2
t1,t2 = t2,t1 - q * t2
print r1,s1,t1
Finally, it might be worth looking at a recursive version which expresses the structure of the solution even more clearly, I think.
Hope this helps.
Here is a simple version of Bezout's identity; given a and b, it returns x, y, and g = gcd(a, b):
function bezout(a, b)
if b == 0
return 1, 0, a
else
q, r := divide(a, b)
x, y, g := bezout(b, r)
return y, x - q * y, g
The divide function returns both the quotient and remainder.
The python program that does what you want (please note that extended Euclid algorithm gives only one pair of Bezout coefficients) might be:
import sys
def egcd(a, b):
if a == 0:
return (b, 0, 1)
g, y, x = egcd(b % a, a)
return (g, x - (b // a) * y, y)
def main():
if len(sys.argv) != 3:
's program caluclates LCF, LCM and Bezout identity of two integers
usage %s a b''' % (sys.argv[0], sys.argv[0])
sys.exit(1)
a = int(sys.argv[1])
b = int(sys.argv[2])
g, x, y = egcd(a, b)
print 'HCF =', g
print 'LCM =', a*b/g
print 'Bezout identity: %i * (%i) + %i * (%i) = %i' % (a, x, b, y, g)
main()

Encoding a 128-bit integer in Python?

Inspired by the "encoding scheme" of the answer to this question, I implemented my own encoding algorithm in Python.
Here is what it looks like:
import random
from math import pow
from string import ascii_letters, digits
# RFC 2396 unreserved URI characters
unreserved = '-_.!~*\'()'
characters = ascii_letters + digits + unreserved
size = len(characters)
seq = range(0,size)
# Seed random generator with same randomly generated number
random.seed(914576904)
random.shuffle(seq)
dictionary = dict(zip(seq, characters))
reverse_dictionary = dict((v,k) for k,v in dictionary.iteritems())
def encode(n):
d = []
n = n
while n > 0:
qr = divmod(n, size)
n = qr[0]
d.append(qr[1])
chars = ''
for i in d:
chars += dictionary[i]
return chars
def decode(str):
d = []
for c in str:
d.append(reverse_dictionary[c])
value = 0
for i in range(0, len(d)):
value += d[i] * pow(size, i)
return value
The issue I'm running into is encoding and decoding very large integers. For example, this is how a large number is currently encoded and decoded:
s = encode(88291326719355847026813766449910520462)
# print s -> "3_r(AUqqMvPRkf~JXaWj8"
i = decode(s)
# print i -> "8.82913267194e+37"
# print long(i) -> "88291326719355843047833376688611262464"
The highest 16 places match up perfectly, but after those the number deviates from its original.
I assume this is a problem with the precision of extremely large integers when dividing in Python. Is there any way to circumvent this problem? Or is there another issue that I'm not aware of?
The problem lies within this line:
value += d[i] * pow(size, i)
It seems like you're using math.pow here instead of the built-in pow method. It returns a floating point number, so you lose accuracy for your large numbers. You should use the built-in pow or the ** operator or, even better, keep the current power of the base in an integer variable:
def decode(s):
d = [reverse_dictionary[c] for c in s]
result, power = 0, 1
for x in d:
result += x * power
power *= size
return result
It gives me the following result now:
print decode(encode(88291326719355847026813766449910520462))
# => 88291326719355847026813766449910520462

Categories

Resources