Encoding a 128-bit integer in Python? - python

Inspired by the "encoding scheme" of the answer to this question, I implemented my own encoding algorithm in Python.
Here is what it looks like:
import random
from math import pow
from string import ascii_letters, digits
# RFC 2396 unreserved URI characters
unreserved = '-_.!~*\'()'
characters = ascii_letters + digits + unreserved
size = len(characters)
seq = range(0,size)
# Seed random generator with same randomly generated number
random.seed(914576904)
random.shuffle(seq)
dictionary = dict(zip(seq, characters))
reverse_dictionary = dict((v,k) for k,v in dictionary.iteritems())
def encode(n):
d = []
n = n
while n > 0:
qr = divmod(n, size)
n = qr[0]
d.append(qr[1])
chars = ''
for i in d:
chars += dictionary[i]
return chars
def decode(str):
d = []
for c in str:
d.append(reverse_dictionary[c])
value = 0
for i in range(0, len(d)):
value += d[i] * pow(size, i)
return value
The issue I'm running into is encoding and decoding very large integers. For example, this is how a large number is currently encoded and decoded:
s = encode(88291326719355847026813766449910520462)
# print s -> "3_r(AUqqMvPRkf~JXaWj8"
i = decode(s)
# print i -> "8.82913267194e+37"
# print long(i) -> "88291326719355843047833376688611262464"
The highest 16 places match up perfectly, but after those the number deviates from its original.
I assume this is a problem with the precision of extremely large integers when dividing in Python. Is there any way to circumvent this problem? Or is there another issue that I'm not aware of?

The problem lies within this line:
value += d[i] * pow(size, i)
It seems like you're using math.pow here instead of the built-in pow method. It returns a floating point number, so you lose accuracy for your large numbers. You should use the built-in pow or the ** operator or, even better, keep the current power of the base in an integer variable:
def decode(s):
d = [reverse_dictionary[c] for c in s]
result, power = 0, 1
for x in d:
result += x * power
power *= size
return result
It gives me the following result now:
print decode(encode(88291326719355847026813766449910520462))
# => 88291326719355847026813766449910520462

Related

sequential counting using letters instead of numbers [duplicate]

This question already has answers here:
How to count sequentially using letters instead of numbers?
(3 answers)
Closed 2 months ago.
I need a method that 'increments' the string a to z and than aa to az and then ba to bz and so on, like the columns in an excel sheet. I will feed the method the previous string and it should increment to the next letter.
PSEUDO CODE
def get_next_letter(last_letter):
return last_letter += 1
So I could use it like so:
>>> get_next_letter('a')
'b'
>>> get_next_letter('b')
'c'
>>> get_next_letter('c')
'd'
...
>>> get_next_letter('z')
'aa'
>>> get_next_letter('aa')
'ab'
>>> get_next_letter('ab')
'ac'
...
>>> get_next_letter('az')
'ba'
>>> get_next_letter('ba')
'bb'
...
>>> get_next_letter('zz')
'aaa'
I believe there are better ways to handle this, but you can implement the algorithm for adding two numbers on paper...
def get_next_letter(string):
x = list(map(ord, string)) # convert to list of numbers
x[-1] += 1 # increment last element
result = ''
carry = 0;
for c in reversed(x):
result = chr((c + carry )) + result # i'm not accounting for when 'z' overflows here
carry = c > ord('z')
if carry: # add the new letter at the beggining in case there is still carry
result = 'a' + result
return result.replace('{', 'a') # replace overflowed 'z' with 'a'
all proposed are just way too complicated
I came up with below, using a recursive call,
this is it!
def getNextLetter(previous_letter):
"""
'increments' the provide string to the next letter recursively
raises TypeError if previous_letter is not a string
returns "a" if provided previous_letter was emtpy string
"""
if not isinstance(previous_letter, str):
raise TypeError("the previous letter should be a letter, doh")
if previous_letter == '':
return "a"
for letter_location in range(len(previous_letter) - 1, -1, -1):
if previous_letter[letter_location] == "z":
return getNextLetter(previous_letter[:-1])+"a"
else:
characters = "abcdefghijklmnopqrstuvwxyz"
return (previous_letter[:-1])\
+characters[characters.find(previous_letter[letter_location])+1]
# EOF
Why not use openpyxl's get_column_letter and column_index_from_string
from openpyxl.utils import get_column_letter, column_index_from_string
# or `from openpyxl.utils.cell import get_column_letter, column_index_from_string`
def get_next_letter(s: str) -> str:
return get_column_letter(
column_index_from_string(s) + 1
).lower()
and then
>>> get_next_letter('aab')
'aac'
>>> get_next_letter('zz')
'aaa'
?
Keeping in mind that this solution only works in [A, ZZZ[.
I fact what you want to achieve is increment a number expressed in base26 (using the 26 alphabet letters as symbols).
We all know decimal base that we use daily.
We know hexadecimal that is in fact base16 with symbols including digits and a, b, c, d, e, f.
Example : 0xff equals 15.
An approach is to convert into base10, increment the result decimal number, then convert it back to base26.
Let me explain.
I define 2 functions.
A first function to convert a string (base26) into a base10 (decimal) number.
str_tobase10("abcd") # 19010
The inverse function to convert a base10 number (decimal) to a string (base26).
base10_tostr(19010) # abcd
get_next_letter() just has to convert the string to a number, increment by one and converts back to a string.
Advantages :
pure Python, no extra lib/dependency required.
works with very long strings
Example :
get_next_letter("abcdefghijz") # abcdefghika
def str_tobase10(value: str) -> int:
n = 0
for letter in value:
n *= 26
n += ord(letter)-ord("a")+1
return n
def base10_tostr(value: int) -> str:
s = ""
n = value
while n > 26:
r = n % 26
s = chr(ord("a")-1+r) + s
n = n // 26
return chr(ord("a")-1+n) + s
def get_next_letter(value: str):
n = str_tobase10(value)
return base10_tostr(n+1)

converitng ASCII values of a string to base 3 number representation in Python [duplicate]

Python allows easy creation of an integer from a string of a given base via
int(str, base).
I want to perform the inverse: creation of a string from an integer,
i.e. I want some function int2base(num, base), such that:
int(int2base(x, b), b) == x
The function name/argument order is unimportant.
For any number x and base b that int() will accept.
This is an easy function to write: in fact it's easier than describing it in this question. However, I feel like I must be missing something.
I know about the functions bin, oct, hex, but I cannot use them for a few reasons:
Those functions are not available on older versions of Python, with which I need compatibility with (2.2)
I want a general solution that can be called the same way for different bases
I want to allow bases other than 2, 8, 16
Related
Python elegant inverse function of int(string, base)
Integer to base-x system using recursion in python
Base 62 conversion in Python
How to convert an integer to the shortest url-safe string in Python?
Surprisingly, people were giving only solutions that convert to small bases (smaller than the length of the English alphabet). There was no attempt to give a solution which converts to any arbitrary base from 2 to infinity.
So here is a super simple solution:
def numberToBase(n, b):
if n == 0:
return [0]
digits = []
while n:
digits.append(int(n % b))
n //= b
return digits[::-1]
so if you need to convert some super huge number to the base 577,
numberToBase(67854 ** 15 - 102, 577), will give you a correct solution:
[4, 473, 131, 96, 431, 285, 524, 486, 28, 23, 16, 82, 292, 538, 149, 25, 41, 483, 100, 517, 131, 28, 0, 435, 197, 264, 455],
Which you can later convert to any base you want
at some point of time you will notice that sometimes there is no built-in library function to do things that you want, so you need to write your own. If you disagree, post you own solution with a built-in function which can convert a base 10 number to base 577.
this is due to lack of understanding what a number in some base means.
I encourage you to think for a little bit why base in your method works only for n <= 36. Once you are done, it will be obvious why my function returns a list and has the signature it has.
If you need compatibility with ancient versions of Python, you can either use gmpy (which does include a fast, completely general int-to-string conversion function, and can be built for such ancient versions – you may need to try older releases since the recent ones have not been tested for venerable Python and GMP releases, only somewhat recent ones), or, for less speed but more convenience, use Python code – e.g., for Python 2, most simply:
import string
digs = string.digits + string.ascii_letters
def int2base(x, base):
if x < 0:
sign = -1
elif x == 0:
return digs[0]
else:
sign = 1
x *= sign
digits = []
while x:
digits.append(digs[int(x % base)])
x = int(x / base)
if sign < 0:
digits.append('-')
digits.reverse()
return ''.join(digits)
For Python 3, int(x / base) leads to incorrect results, and must be changed to x // base:
import string
digs = string.digits + string.ascii_letters
def int2base(x, base):
if x < 0:
sign = -1
elif x == 0:
return digs[0]
else:
sign = 1
x *= sign
digits = []
while x:
digits.append(digs[x % base])
x = x // base
if sign < 0:
digits.append('-')
digits.reverse()
return ''.join(digits)
"{0:b}".format(100) # bin: 1100100
"{0:x}".format(100) # hex: 64
"{0:o}".format(100) # oct: 144
def baseN(num,b,numerals="0123456789abcdefghijklmnopqrstuvwxyz"):
return ((num == 0) and numerals[0]) or (baseN(num // b, b, numerals).lstrip(numerals[0]) + numerals[num % b])
ref:
http://code.activestate.com/recipes/65212/
Please be aware that this may lead to
RuntimeError: maximum recursion depth exceeded in cmp
for very big integers.
>>> numpy.base_repr(10, base=3)
'101'
Note that numpy.base_repr() has a limit of 36 as its base. Otherwise it throws a ValueError
Recursive
I would simplify the most voted answer to:
BS="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"
def to_base(n, b):
return "0" if not n else to_base(n//b, b).lstrip("0") + BS[n%b]
With the same advice for RuntimeError: maximum recursion depth exceeded in cmp on very large integers and negative numbers. (You could usesys.setrecursionlimit(new_limit))
Iterative
To avoid recursion problems:
BS="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"
def to_base(s, b):
res = ""
while s:
res+=BS[s%b]
s//= b
return res[::-1] or "0"
Great answers!
I guess the answer to my question was "no" I was not missing some obvious solution.
Here is the function I will use that condenses the good ideas expressed in the answers.
allow caller-supplied mapping of characters (allows base64 encode)
checks for negative and zero
maps complex numbers into tuples of strings
def int2base(x,b,alphabet='0123456789abcdefghijklmnopqrstuvwxyz'):
'convert an integer to its string representation in a given base'
if b<2 or b>len(alphabet):
if b==64: # assume base64 rather than raise error
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
else:
raise AssertionError("int2base base out of range")
if isinstance(x,complex): # return a tuple
return ( int2base(x.real,b,alphabet) , int2base(x.imag,b,alphabet) )
if x<=0:
if x==0:
return alphabet[0]
else:
return '-' + int2base(-x,b,alphabet)
# else x is non-negative real
rets=''
while x>0:
x,idx = divmod(x,b)
rets = alphabet[idx] + rets
return rets
You could use baseconv.py from my project: https://github.com/semente/python-baseconv
Sample usage:
>>> from baseconv import BaseConverter
>>> base20 = BaseConverter('0123456789abcdefghij')
>>> base20.encode(1234)
'31e'
>>> base20.decode('31e')
'1234'
>>> base20.encode(-1234)
'-31e'
>>> base20.decode('-31e')
'-1234'
>>> base11 = BaseConverter('0123456789-', sign='$')
>>> base11.encode('$1234')
'$-22'
>>> base11.decode('$-22')
'$1234'
There is some bultin converters as for example baseconv.base2, baseconv.base16 and baseconv.base64.
def base(decimal ,base) :
list = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"
other_base = ""
while decimal != 0 :
other_base = list[decimal % base] + other_base
decimal = decimal / base
if other_base == "":
other_base = "0"
return other_base
print base(31 ,16)
output:
"1F"
def base_conversion(num, base):
digits = []
while num > 0:
num, remainder = divmod(num, base)
digits.append(remainder)
return digits[::-1]
http://code.activestate.com/recipes/65212/
def base10toN(num,n):
"""Change a to a base-n number.
Up to base-36 is supported without special notation."""
num_rep={10:'a',
11:'b',
12:'c',
13:'d',
14:'e',
15:'f',
16:'g',
17:'h',
18:'i',
19:'j',
20:'k',
21:'l',
22:'m',
23:'n',
24:'o',
25:'p',
26:'q',
27:'r',
28:'s',
29:'t',
30:'u',
31:'v',
32:'w',
33:'x',
34:'y',
35:'z'}
new_num_string=''
current=num
while current!=0:
remainder=current%n
if 36>remainder>9:
remainder_string=num_rep[remainder]
elif remainder>=36:
remainder_string='('+str(remainder)+')'
else:
remainder_string=str(remainder)
new_num_string=remainder_string+new_num_string
current=current/n
return new_num_string
Here's another one from the same link
def baseconvert(n, base):
"""convert positive decimal integer n to equivalent in another base (2-36)"""
digits = "0123456789abcdefghijklmnopqrstuvwxyz"
try:
n = int(n)
base = int(base)
except:
return ""
if n < 0 or base < 2 or base > 36:
return ""
s = ""
while 1:
r = n % base
s = digits[r] + s
n = n / base
if n == 0:
break
return s
I made a pip package for this.
I recommend you use my bases.py https://github.com/kamijoutouma/bases.py which was inspired by bases.js
from bases import Bases
bases = Bases()
bases.toBase16(200) // => 'c8'
bases.toBase(200, 16) // => 'c8'
bases.toBase62(99999) // => 'q0T'
bases.toBase(200, 62) // => 'q0T'
bases.toAlphabet(300, 'aAbBcC') // => 'Abba'
bases.fromBase16('c8') // => 200
bases.fromBase('c8', 16) // => 200
bases.fromBase62('q0T') // => 99999
bases.fromBase('q0T', 62) // => 99999
bases.fromAlphabet('Abba', 'aAbBcC') // => 300
refer to https://github.com/kamijoutouma/bases.py#known-basesalphabets
for what bases are usable
EDIT:
pip link https://pypi.python.org/pypi/bases.py/0.2.2
def int2base(a, base, numerals="0123456789abcdefghijklmnopqrstuvwxyz"):
baseit = lambda a=a, b=base: (not a) and numerals[0] or baseit(a-a%b,b*base)+numerals[a%b%(base-1) or (a%b) and (base-1)]
return baseit()
explanation
In any base every number is equal to a1+a2*base**2+a3*base**3... The "mission" is to find all a 's.
For everyN=1,2,3... the code is isolating the aN*base**N by "mouduling" by b for b=base**(N+1) which slice all a 's bigger than N, and slicing all the a 's that their serial is smaller than N by decreasing a everytime the func is called by the current aN*base**N .
Base%(base-1)==1 therefor base**p%(base-1)==1 and therefor q*base^p%(base-1)==q with only one exception when q=base-1 which returns 0.
To fix that in case it returns 0 the func is checking is it 0 from the beggining.
advantages
in this sample theres only one multiplications (instead of division) and some moudulueses which relatively takes small amounts of time.
While the currently top answer is definitely an awesome solution, there remains more customization users might like.
Basencode adds some of these features, including conversions of floating point numbers, modifying digits (in the linked answer, only numbers can be used).
Here's a possible use case:
>>> from basencode import *
>>> n1 = Number(12345)
>> n1.repr_in_base(64) # convert to base 64
'30V'
>>> Number('30V', 64) # construct Integer from base 64
Integer(12345)
>>> n1.repr_in_base(8)
'30071'
>>> n1.repr_in_octal() # shortcuts
'30071'
>>> n1.repr_in_bin() # equivelant to `n1.repr_in_base(2)`
'11000000111001'
>>> n1.repr_in_base(2, digits=list('-+')) # override default digits: use `-` and `+` in place of `0` and `1`
'++------+++--+'
>>> n1.repr_in_base(33) # yet another base - all bases from 2 to 64 are supported from the start
'bb3'
How would you add any bases you want? Let me replicate the example of the currently most upvoted answer: the digits parameter allows you to override the default digits from base 2 to 64, and provide digits for any base higher than that. The mode parameter determines how the value of the representation will determine how (list or string) the answer will be returned.
>>> n2 = Number(67854 ** 15 - 102)
>>> n2.repr_in_base(577, digits=[str(i) for i in range(577)], mode="l")
['4', '473', '131', '96', '431', '285', '524', '486', '28', '23', '16', '82', '292', '538', '149', '25', '41', '483', '100', '517', '131', '28', '0', '435', '197', '264', '455']
>>> n2.repr_in_base(577, mode="l") # the program remembers the digits for base 577 now
['4', '473', '131', '96', '431', '285', '524', '486', '28', '23', '16', '82', '292', '538', '149', '25', '41', '483', '100', '517', '131', '28', '0', '435', '197', '264', '455']
Operations can be done: the Number class returns an instance of basencode.Integer if the provided number is an Integer, else it returns a basencode.Float
>>> n3 = Number(54321) # the Number class returns an instance of `basencode.Integer` if the provided number is an Integer, otherwise it returns a `basencode.Float`.
>>> n1 + n3
Integer(66666)
>>> n3 - n1
Integer(41976)
>>> n1 * n3
Integer(670592745)
>>> n3 // n1
Integer(4)
>>> n3 / n1 # a basencode.Float class allows conversion of floating point numbers
Float(4.400243013365735)
>>> (n3 / n1).repr_in_base(32)
'4.cpr56v6rnc4oitoblha2r11sus0dheqd4pgechfcjklo74b2bgom7j8ih86mipdvss0068sehi9f3791mdo4uotfujq66cf0jkgo'
>>> n4 = Number(0.5) # returns a basencode.Float
>>> n4.repr_in_bin() # binary version of 0.5
'0.1'
Disclaimer: this project is under active maintenance, and I'm a contributor.
>>> import string
>>> def int2base(integer, base):
if not integer: return '0'
sign = 1 if integer > 0 else -1
alphanum = string.digits + string.ascii_lowercase
nums = alphanum[:base]
res = ''
integer *= sign
while integer:
integer, mod = divmod(integer, base)
res += nums[mod]
return ('' if sign == 1 else '-') + res[::-1]
>>> int2base(-15645, 23)
'-16d5'
>>> int2base(213, 21)
'a3'
A recursive solution for those interested. Of course, this will not work with negative binary values. You would need to implement Two's Complement.
def generateBase36Alphabet():
return ''.join([str(i) for i in range(10)]+[chr(i+65) for i in range(26)])
def generateAlphabet(base):
return generateBase36Alphabet()[:base]
def intToStr(n, base, alphabet):
def toStr(n, base, alphabet):
return alphabet[n] if n < base else toStr(n//base,base,alphabet) + alphabet[n%base]
return ('-' if n < 0 else '') + toStr(abs(n), base, alphabet)
print('{} -> {}'.format(-31, intToStr(-31, 16, generateAlphabet(16)))) # -31 -> -1F
def base_changer(number,base):
buff=97+abs(base-10)
dic={};buff2='';buff3=10
for i in range(97,buff+1):
dic[buff3]=chr(i)
buff3+=1
while(number>=base):
mod=int(number%base)
number=int(number//base)
if (mod) in dic.keys():
buff2+=dic[mod]
continue
buff2+=str(mod)
if (number) in dic.keys():
buff2+=dic[number]
else:
buff2+=str(number)
return buff2[::-1]
Here is an example of how to convert a number of any base to another base.
from collections import namedtuple
Test = namedtuple("Test", ["n", "from_base", "to_base", "expected"])
def convert(n: int, from_base: int, to_base: int) -> int:
digits = []
while n:
(n, r) = divmod(n, to_base)
digits.append(r)
return sum(from_base ** i * v for i, v in enumerate(digits))
if __name__ == "__main__":
tests = [
Test(32, 16, 10, 50),
Test(32, 20, 10, 62),
Test(1010, 2, 10, 10),
Test(8, 10, 8, 10),
Test(150, 100, 1000, 150),
Test(1500, 100, 10, 1050000),
]
for test in tests:
result = convert(*test[:-1])
assert result == test.expected, f"{test=}, {result=}"
print("PASSED!!!")
Say we want to convert 14 to base 2. We repeatedly apply the division algorithm until the quotient is 0:
14 = 2 x 7
7 = 2 x 3 + 1
3 = 2 x 1 + 1
1 = 2 x 0 + 1
The binary representation is just the remainder read from bottom to top. This can be proved by expanding
14 = 2 x 7 = 2 x (2 x 3 + 1) = 2 x (2 x (2 x 1 + 1) + 1) = 2 x (2 x (2 x (2 x 0 + 1) + 1) + 1) = 2^3 + 2^2 + 2
The code is the implementation of the above algorithm.
def toBaseX(n, X):
strbin = ""
while n != 0:
strbin += str(n % X)
n = n // X
return strbin[::-1]
This is my approach. At first converting the number then casting it to string.
def to_base(n, base):
if base == 10:
return n
result = 0
counter = 0
while n:
r = n % base
n //= base
result += r * 10**counter
counter+=1
return str(result)
I have written this function which I use to encode in different bases. I also provided the way to shift the result by a value 'offset'. This is useful if you'd like to encode to bases above 64, but keeping displayable chars (like a base 95).
I also tried to avoid reversing the output 'list' and tried to minimize computing operations. The array of pow(base) is computed on demand and kept for additional calls to the function.
The output is a binary string
pows = {}
######################################################
def encode_base(value,
base = 10,
offset = 0) :
"""
Encode value into a binary string, according to the desired base.
Input :
value : Any positive integer value
offset : Shift the encoding (eg : Starting at chr(32))
base : The base in which we'd like to encode the value
Return : Binary string
Example : with : offset = 32, base = 64
100 -> !D
200 -> #(
"""
# Determine the number of loops
try :
pb = pows[base]
except KeyError :
pb = pows[base] = {n : base ** n for n in range(0, 8) if n < 2 ** 48 -1}
for n in pb :
if value < pb[n] :
n -= 1
break
out = []
while n + 1 :
b = pb[n]
out.append(chr(offset + value // b))
n -= 1
value %= b
return ''.join(out).encode()
This function converts any integer from any base to any base
def baseconvert(number, srcbase, destbase):
if srcbase != 10:
sum = 0
for _ in range(len(str(number))):
sum += int(str(number)[_]) * pow(srcbase, len(str(number)) - _ - 1)
b10 = sum
return baseconvert(b10, 10, destbase)
end = ''
q = number
while(True):
r = q % destbase
q = q // destbase
end = str(r) + end
if(q<destbase):
end = str(q) + end
return int(end)
The below provided Python code converts a Python integer to a string in arbitrary base ( from 2 up to infinity ) and works in both directions. So all the created strings can be converted back to Python integers by providing a string for N instead of an integer.
The code works only on positive numbers by intention (there is in my eyes some hassle about negative values and their bit representations I don't want to dig into). Just pick from this code what you need, want or like, or just have fun learning about available options. Much is there only for the purpose of documenting all the various available approaches ( e.g. the Oneliner seems not to be fast, even if promised to be ).
I like the by Salvador Dali proposed format for infinite large bases. A nice proposal which works optically well even for simple binary bit representations. Notice that the width=x padding parameter in case of infiniteBase=True formatted string applies to the digits and not to the whole number. It seems, that code handling infiniteBase digits format runs even a bit faster than the other options - another reason for using it?
I don't like the idea of using Unicode for extending the number of symbols available for digits, so don't look in the code below for it, because it's not there. Use the proposed infiniteBase format instead or store integers as bytes for compression purposes.
def inumToStr( N, base=2, width=1, infiniteBase=False,\
useNumpy=False, useRecursion=False, useOneliner=False, \
useGmpy=False, verbose=True):
''' Positive numbers only, but works in BOTH directions.
For strings in infiniteBase notation set for bases <= 62
infiniteBase=True . Examples of use:
inumToStr( 17, 2, 1, 1) # [1,0,0,0,1]
inumToStr( 17, 3, 5) # 00122
inumToStr(245, 16, 4) # 00F5
inumToStr(245, 36, 4,0,1) # 006T
inumToStr(245245245245,36,10,0,1) # 0034NWOQBH
inumToStr(245245245245,62) # 4JhA3Th
245245245245 == int(gmpy2.mpz('4JhA3Th',62))
inumToStr(245245245245,99,2) # [25,78, 5,23,70,44]
----------------------------------------------------
inumToStr( '[1,0,0,0,1]',2, infiniteBase=True ) # 17
inumToStr( '[25,78, 5,23,70,44]', 99) # 245245245245
inumToStr( '0034NWOQBH', 36 ) # 245245245245
inumToStr( '4JhA3Th' , 62 ) # 245245245245
----------------------------------------------------
--- Timings for N = 2**4096, base=36:
standard: 0.0023
infinite: 0.0017
numpy : 0.1277
recursio; 0.0022
oneliner: 0.0146
For N = 2**8192:
standard: 0.0075
infinite: 0.0053
numpy : 0.1369
max. recursion depth exceeded: recursio/oneliner
'''
show = print
if type(N) is str and ( infiniteBase is True or base > 62 ):
lstN = eval(N)
if verbose: show(' converting a non-standard infiniteBase bits string to Python integer')
return sum( [ item*base**pow for pow, item in enumerate(lstN[::-1]) ] )
if type(N) is str and base <= 36:
if verbose: show('base <= 36. Returning Python int(N, base)')
return int(N, base)
if type(N) is str and base <= 62:
if useGmpy:
if verbose: show(' base <= 62, useGmpy=True, returning int(gmpy2.mpz(N,base))')
return int(gmpy2.mpz(N,base))
else:
if verbose: show(' base <= 62, useGmpy=False, self-calculating return value)')
lstStrOfDigits="0123456789"+ \
"abcdefghijklmnopqrstuvwxyz".upper() + \
"abcdefghijklmnopqrstuvwxyz"
dictCharToPow = {}
for index, char in enumerate(lstStrOfDigits):
dictCharToPow.update({char : index})
return sum( dictCharToPow[item]*base**pow for pow, item in enumerate(N[::-1]) )
#:if
#:if
if useOneliner and base <= 36:
if verbose: show(' base <= 36, useOneliner=True, running the Oneliner code')
d="0123456789abcdefghijklmnopqrstuvwxyz"
baseit = lambda a=N, b=base: (not a) and d[0] or \
baseit(a-a%b,b*base)+d[a%b%(base-1) or (a%b) and (base-1)]
return baseit().rjust(width, d[0])[1:]
if useRecursion and base <= 36:
if verbose: show(' base <= 36, useRecursion=True, running recursion algorythm')
BS="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"
def to_base(n, b):
return "0" if not n else to_base(n//b, b).lstrip("0") + BS[n%b]
return to_base(N, base).rjust(width,BS[0])
if base > 62 or infiniteBase:
if verbose: show(' base > 62 or infiniteBase=True, returning a non-standard digits string')
# Allows arbitrary large base with 'width=...'
# applied to each digit (useful also for bits )
N, digit = divmod(N, base)
strN = str(digit).rjust(width, ' ')+']'
while N:
N, digit = divmod(N, base)
strN = str(digit).rjust(width, ' ') + ',' + strN
return '[' + strN
#:if
if base == 2:
if verbose: show(" base = 2, returning Python str(f'{N:0{width}b}')")
return str(f'{N:0{width}b}')
if base == 8:
if verbose: show(" base = 8, returning Python str(f'{N:0{width}o}')")
return str(f'{N:0{width}o}')
if base == 16:
if verbose: show(" base = 16, returning Python str(f'{N:0{width}X}')")
return str(f'{N:0{width}X}')
if base <= 36:
if useNumpy:
if verbose: show(" base <= 36, useNumpy=True, returning np.base_repr(N, base)")
import numpy as np
strN = np.base_repr(N, base)
return strN.rjust(width, '0')
else:
if verbose: show(' base <= 36, useNumpy=False, self-calculating return value)')
lstStrOfDigits="0123456789"+"abcdefghijklmnopqrstuvwxyz".upper()
strN = lstStrOfDigits[N % base] # rightmost digit
while N >= base:
N //= base # consume already converted digit
strN = lstStrOfDigits[N % base] + strN # add digits to the left
#:while
return strN.rjust(width, lstStrOfDigits[0])
#:if
#:if
if base <= 62:
if useGmpy:
if verbose: show(" base <= 62, useGmpy=True, returning gmpy2.digits(N, base)")
import gmpy2
strN = gmpy2.digits(N, base)
return strN.rjust(width, '0')
# back to Python int from gmpy2.mpz with
# int(gmpy2.mpz('4JhA3Th',62))
else:
if verbose: show(' base <= 62, useGmpy=False, self-calculating return value)')
lstStrOfDigits= "0123456789" + \
"abcdefghijklmnopqrstuvwxyz".upper() + \
"abcdefghijklmnopqrstuvwxyz"
strN = lstStrOfDigits[N % base] # rightmost digit
while N >= base:
N //= base # consume already converted digit
strN = lstStrOfDigits[N % base] + strN # add digits to the left
#:while
return strN.rjust(width, lstStrOfDigits[0])
#:if
#:if
#:def
I'm presenting a "unoptimized" solution for bases between 2 and 9:
def to_base(N, base=2):
N_in_base = ''
while True:
N_in_base = str(N % base) + N_in_base
N //= base
if N == 0:
break
return N_in_base
This solution does not require reversing the final result, but it's actually not optimized. Refer to this answer to see why: https://stackoverflow.com/a/37133870/7896998
Simple base transformation
def int_to_str(x, b):
s = ""
while x:
s = str(x % b) + s
x //= b
return s
Example of output with no 0 to base 9
s = ""
x = int(input())
while x:
if x % 9 == 0:
s = "9" + s
x -= x % 10
x = x // 9
else:
s = str(x % 9) + s
x = x // 9
print(s)
def dec_to_radix(input, to_radix=2, power=None):
if not isinstance(input, int):
raise TypeError('Not an integer!')
elif power is None:
power = 1
if input == 0:
return 0
else:
remainder = input % to_radix**power
digit = str(int(remainder/to_radix**(power-1)))
return int(str(dec_to_radix(input-remainder, to_radix, power+1)) + digit)
def radix_to_dec(input, from_radix):
if not isinstance(input, int):
raise TypeError('Not an integer!')
return sum(int(digit)*(from_radix**power) for power, digit in enumerate(str(input)[::-1]))
def radix_to_radix(input, from_radix=10, to_radix=2, power=None):
dec = radix_to_dec(input, from_radix)
return dec_to_radix(dec, to_radix, power)
Another short one (and easier to understand imo):
def int_to_str(n, b, symbols='0123456789abcdefghijklmnopqrstuvwxyz'):
return (int_to_str(n/b, b, symbols) if n >= b else "") + symbols[n%b]
And with proper exception handling:
def int_to_str(n, b, symbols='0123456789abcdefghijklmnopqrstuvwxyz'):
try:
return (int_to_str(n/b, b) if n >= b else "") + symbols[n%b]
except IndexError:
raise ValueError(
"The symbols provided are not enough to represent this number in "
"this base")
Here is a recursive version that handles signed integers and custom digits.
import string
def base_convert(x, base, digits=None):
"""Convert integer `x` from base 10 to base `base` using `digits` characters as digits.
If `digits` is omitted, it will use decimal digits + lowercase letters + uppercase letters.
"""
digits = digits or (string.digits + string.ascii_letters)
assert 2 <= base <= len(digits), "Unsupported base: {}".format(base)
if x == 0:
return digits[0]
sign = '-' if x < 0 else ''
x = abs(x)
first_digits = base_convert(x // base, base, digits).lstrip(digits[0])
return sign + first_digits + digits[x % base]
Strings aren't the only choice for representing numbers: you can use a list of integers to represent the order of each digit. Those can easily be converted to a string.
None of the answers reject base < 2; and most will run very slowly or crash with stack overflows for very large numbers (such as 56789 ** 43210). To avoid such failures, reduce quickly like this:
def n_to_base(n, b):
if b < 2: raise # invalid base
if abs(n) < b: return [n]
ret = [y for d in n_to_base(n, b*b) for y in divmod(d, b)]
return ret[1:] if ret[0] == 0 else ret # remove leading zeros
def base_to_n(v, b):
h = len(v) // 2
if h == 0: return v[0]
return base_to_n(v[:-h], b) * (b**h) + base_to_n(v[-h:], b)
assert ''.join(['0123456789'[x] for x in n_to_base(56789**43210,10)])==str(56789**43210)
Speedwise, n_to_base is comparable with str for large numbers (about 0.3s on my machine), but if you compare against hex you may be surprised (about 0.3ms on my machine, or 1000x faster). The reason is because the large integer is stored in memory in base 256 (bytes). Each byte can simply be converted to a two-character hex string. This alignment only happens for bases that are powers of two, which is why there are special cases for 2,8, and 16 (and base64, ascii, utf16, utf32).
Consider the last digit of a decimal string. How does it relate to the sequence of bytes that forms its integer? Let's label the bytes s[i] with s[0] being the least significant (little endian). Then the last digit is sum([s[i]*(256**i) % 10 for i in range(n)]). Well, it happens that 256**i ends with a 6 for i > 0 (6*6=36) so that last digit is (s[0]*5 + sum(s)*6)%10. From this, you can see that the last digit depends on the sum of all the bytes. This nonlocal property is what makes converting to decimal harder.
def baseConverter(x, b):
s = ""
d = string.printable.upper()
while x > 0:
s += d[x%b]
x = x / b
return s[::-1]

How can I make this binary to float algorithm more efficient / shortened

So this algorithm takes in user binary input as xxxx.xxxx and then outputs decimal equivalent of it. Keeping the same format and style, how can I shorten / make it more efficient?
import math
binary = {"Input":input("Enter your binary value here in the format of x.x : ").split("."), "Int":0, "Float":0}
for k, v in enumerate(binary["Input"][0][::-1]):
if int(v) == 1:
binary["Int"]= binary["Int"] + (2**(k))
for k, v in enumerate(binary["Input"][1]):
if int(v) == 1:
binary["Float"] = binary["Float"]+ (1/math.pow(2,k+1))
print(binary["Float"]+binary["Int"])
For efficiency, it would be better if you only did one pass over the string.
Currently, you do three passes: split, reverse (partially), and compute.
Also, don't do this binary[...] stuff, just use variables.
Here is an implementation that does exactly one pass:
def bin2float(s):
result = exp = 0
for k in s:
if k == ".":
exp = 1
continue
result *= 2
exp *= 2
if k == '1':
result += 1
exp = max(exp, 1)
return result / exp
print(bin2float('Enter your binary value here in the format of x.x :')
If you're only after shortening, use individual variables, and use +=, also don't use int(v) == '1', but v == '1', and use math.pow(2, -k-1) instead of (1/math.pow(2,k+1)).
def bin2Dec(bin_value):
bin_value = '0' + bin_value #to avoid the input format case, like '.11'
parts = bin_value.split(".")
integer_part = int(parts[0],2)
fraction_part = 0
if len(parts) == 2:
fraction_part = sum([int(val) * (10 ** (-id)) for id, val in enumerate(parts[1],start = 1)])
return integer_part + fraction_part
the integer part can be done by built-in function int
a version using hex instead of doing the calculation in a loop (not tested!):
def bin_str_to_hex_str(bin_str):
'''
extend the binary number after the '.' such that its length is a multiple
of 8; convert to hex.
'''
a, b = bin_str.split('.')
while len(b)%8 != 0:
b += '0'
hex_str = '{}.{}p+0'.format(hex(int(a,2)), hex(int(b,2))[2:])
return hex_str
def bin_str_to_float(bin_str):
hex_str = bin_str_to_hex_str(bin_str)
return float.fromhex(hex_str)
print(bin_str_to_float('010010010.010010101')) # -> 146.291015625
admittedly this converts twice (once for the hex string, then the hex string itself) which is not very nice... there is probably a cleverer way to assemble the parts.
from math import pow as pow
import timeit
def bin2float(bin_str):
index = bin_str.index('.')
high = bin_str[:index]
low = bin_str[index+1:]
high_f = int(high,2)
low_f = int(low,2)/pow(2,len(low))
return high_f+low_f
def b2f(bin):
int_s,float_s =bin.split(".")
int_v = 0
float_v =0
for k, v in enumerate(int_s[::-1]):
if int(v) == 1:
int_v += (2**(k))
for k, v in enumerate(float_s):
if int(v) == 1:
float_v += (1/pow(2,k+1))
return int_v+float_v
if __name__ == "__main__":
bin ='010010010.010010101'
stmt1 = "bin2float('{0}')".format(bin)
stmt2= "b2f('{0}')".format(bin)
print(timeit.timeit(stmt1,"from bin2float import bin2float",number =10000 ))
print(timeit.timeit(stmt2,"from bin2float import b2f",number =10000 ))
test result:
0.015675368406093453
0.08317904950635754
in fact, your way to handle the float part is just equal to
int(low,2)/pow(2,len(low))
which handles the part as a whole.

Is there a faster way of converting a number to a name?

The following code defines a sequence of names that are mapped to numbers. It is designed to take a number and retrieve a specific name. The class operates by ensuring the name exists in its cache, and then returns the name by indexing into its cache. The question in this: how can the name be calculated based on the number without storing a cache?
The name can be thought of as a base 63 number, except for the first digit which is always in base 53.
class NumberToName:
def __generate_name():
def generate_tail(length):
if length > 0:
for char in NumberToName.CHARS:
for extension in generate_tail(length - 1):
yield char + extension
else:
yield ''
for length in itertools.count():
for char in NumberToName.FIRST:
for extension in generate_tail(length):
yield char + extension
FIRST = ''.join(sorted(string.ascii_letters + '_'))
CHARS = ''.join(sorted(string.digits + FIRST))
CACHE = []
NAMES = __generate_name()
#classmethod
def convert(cls, number):
for _ in range(number - len(cls.CACHE) + 1):
cls.CACHE.append(next(cls.NAMES))
return cls.CACHE[number]
def __init__(self, *args, **kwargs):
raise NotImplementedError()
The following interactive sessions show some of the values that are expected to be returned in order.
>>> NumberToName.convert(0)
'A'
>>> NumberToName.convert(26)
'_'
>>> NumberToName.convert(52)
'z'
>>> NumberToName.convert(53)
'A0'
>>> NumberToName.convert(1692)
'_1'
>>> NumberToName.convert(23893)
'FAQ'
Unfortunately, these numbers need to be mapped to these exact names (to allow a reverse conversion).
Please note: A variable number of bits are received and converted unambiguously into a number. This number should be converted unambiguously to a name in the Python identifier namespace. Eventually, valid Python names will be converted to numbers, and these numbers will be converted to a variable number of bits.
Final solution:
import string
HEAD_CHAR = ''.join(sorted(string.ascii_letters + '_'))
TAIL_CHAR = ''.join(sorted(string.digits + HEAD_CHAR))
HEAD_BASE, TAIL_BASE = len(HEAD_CHAR), len(TAIL_CHAR)
def convert_number_to_name(number):
if number < HEAD_BASE: return HEAD_CHAR[number]
q, r = divmod(number - HEAD_BASE, TAIL_BASE)
return convert_number_to_name(q) + TAIL_CHAR[r]
This is a fun little problem full of off by 1 errors.
Without loops:
import string
first_digits = sorted(string.ascii_letters + '_')
rest_digits = sorted(string.digits + string.ascii_letters + '_')
def convert(number):
if number < len(first_digits):
return first_digits[number]
current_base = len(rest_digits)
remain = number - len(first_digits)
return convert(remain / current_base) + rest_digits[remain % current_base]
And the tests:
print convert(0)
print convert(26)
print convert(52)
print convert(53)
print convert(1692)
print convert(23893)
Output:
A
_
z
A0
_1
FAQ
What you've got is a corrupted form of bijective numeration (the usual example being spreadsheet column names, which are bijective base-26).
One way to generate bijective numeration:
def bijective(n, digits=string.ascii_uppercase):
result = []
while n > 0:
n, mod = divmod(n - 1, len(digits))
result += digits[mod]
return ''.join(reversed(result))
All you need to do is supply a different set of digits for the case where 53 >= n > 0. You will also need to increment n by 1, as properly the bijective 0 is the empty string, not "A":
def name(n, first=sorted(string.ascii_letters + '_'), digits=sorted(string.ascii_letters + '_' + string.digits)):
result = []
while n >= len(first):
n, mod = divmod(n - len(first), len(digits))
result += digits[mod]
result += first[n]
return ''.join(reversed(result))
Tested for the first 10,000 names:
first_chars = sorted(string.ascii_letters + '_')
later_chars = sorted(list(string.digits) + first_chars)
def f(n):
# first, determine length by subtracting the number of items of length l
# also determines the index into the list of names of length l
ix = n
l = 1
while ix >= 53 * (63 ** (l-1)):
ix -= 53 * (63 ** (l-1))
l += 1
# determine first character
first = first_chars[ix // (63 ** (l-1))]
# rest of string is just a base 63 number
s = ''
rem = ix % (63 ** (l-1))
for i in range(l-1):
s = later_chars[rem % 63] + s
rem //= 63
return first+s
You can use the code in this answer to the question "Base 62 conversion in Python" (or perhaps one of the other answers).
Using the referenced code, I think the answer your real question which was "how can the name be calculated based on the number without storing a cache?" would be to make the name the simple base 62 conversion of the number possibly with a leading underscore if the first character of the name is a digit (which is simply ignored when converting the name back into a number).
Here's sample code illustrating what I propose:
from base62 import base62_encode, base62_decode
def NumberToName(num):
ret = base62_encode(num)
return ('_' + ret) if ret[0] in '0123456789' else ret
def NameToNumber(name):
return base62_decode(name if name[0] is not '_' else name[1:])
if __name__ == '__main__':
def test(num):
name = NumberToName(num)
num2 = NameToNumber(name)
print 'NumberToName({0:5d}) -> {1!r:>6s}, NameToNumber({2!r:>6s}) -> {3:5d}' \
.format(num, name, name, num2)
test(26)
test(52)
test(53)
test(1692)
test(23893)
Output:
NumberToName( 26) -> 'q', NameToNumber( 'q') -> 26
NumberToName( 52) -> 'Q', NameToNumber( 'Q') -> 52
NumberToName( 53) -> 'R', NameToNumber( 'R') -> 53
NumberToName( 1692) -> 'ri', NameToNumber( 'ri') -> 1692
NumberToName(23893) -> '_6dn', NameToNumber('_6dn') -> 23893
If the numbers could be negative, you might have to modify the code from the referenced answer (and there is some discussion there on how to do it).

How to split big numbers?

I have a big number, which I need to split into smaller numbers in Python. I wrote the following code to swap between the two:
def split_number (num, part_size):
string = str(num)
string_size = len(string)
arr = []
pointer = 0
while pointer < string_size:
e = pointer + part_size
arr.append(int(string[pointer:e]))
pointer += part_size
return arr
def join_number(arr):
num = ""
for x in arr:
num += str(x)
return int(num)
But the number comes back different. It's hard to debug because the number is so large so before I go into that I thought I would post it here to see if there is a better way to do it or whether I'm missing something obvious.
Thanks a lot.
Clearly, any leading 0s in the "parts" can't be preserved by this operation. Can't join_number also receive the part_size argument, so that it can reconstruct the string formats with all the leading zeros?
Without some information such as part_size that's known to both the sender and receiver, or the equivalent (such as the base number to use for a similar split and join based on arithmetic, roughly equivalent to 10**part_size given the way you're using part_size), the task becomes quite a bit harder. If the receiver is initially clueless about this, why not just place the part_size (or base, etc) as the very first int in the arr list that's being sent and received? That way, the encoding trivially becomes "self-sufficient", i.e., doesn't need any supplementary parameter known to both sender and receiver.
There is no need to convert to and from strings, which can be very time consuming for really large numbers
>>> def split_number(n, part_size):
... base = 10**part_size
... L = []
... while n:
... n,part = divmod(n,base)
... L.append(part)
... return L[::-1]
...
>>> def join_number(L, part_size):
... base = 10**part_size
... n = 0
... L = L[::-1]
... while L:
... n = n*base+L.pop()
... return n
...
>>> print split_number(1000005,3)
[1, 0, 5]
>>> print join_number([1,0,5],3)
1000005
>>>
Here you can see that just converting the number to a str takes longer than my entire function!
>>> from time import time
>>> t=time();b = split_number(2**100000,3000);print time()-t
0.204252004623
>>> t=time();b = split_number(2**100000,30);print time()-t
0.486856222153
>>> t=time();b = str(2**100000);print time()-t
0.730905056
You should think of the following number split into 3-sized chunks:
1000005 -> 100 000 5
You have two problems. The first is that if you put those integers back together, you'll get:
100 0 5 -> 100005
(i.e., the middle one is 0, not 000) which is not what you started with. Second problem is that you're not sure what size the last part should be.
I would ensure that you're first using a string whose length is an exact multiple of the part size so you know exactly how big each part should be:
def split_number (num, part_size):
string = str(num)
string_size = len(string)
while string_size % part_size != 0:
string = "0%s"%(string)
string_size = string_size + 1
arr = []
pointer = 0
while pointer < string_size:
e = pointer + part_size
arr.append(int(string[pointer:e]))
pointer += part_size
return arr
Secondly, make sure that you put the parts back together with the right length for each part (ensuring you don't put leading zeros on the first part of course):
def join_number(arr, part_size):
fmt_str = "%%s%%0%dd"%(part_size)
num = arr[0]
for x in arr[1:]:
num = fmt_str%(num,int(x))
return int(num)
Tying it all together, the following complete program:
#!/usr/bin/python
def split_number (num, part_size):
string = str(num)
string_size = len(string)
while string_size % part_size != 0:
string = "0%s"%(string)
string_size = string_size + 1
arr = []
pointer = 0
while pointer < string_size:
e = pointer + part_size
arr.append(int(string[pointer:e]))
pointer += part_size
return arr
def join_number(arr, part_size):
fmt_str = "%%s%%0%dd"%(part_size)
num = arr[0]
for x in arr[1:]:
num = fmt_str%(num,int(x))
return int(num)
x = 1000005
print x
y = split_number(x,3)
print y
z = join_number(y,3)
print z
produces the output:
1000005
[1, 0, 5]
1000005
which shows that it goes back together.
Just keep in mind I haven't done Python for a few years. There's almost certainly a more "Pythonic" way to do it with those new-fangled lambdas and things (or whatever Python calls them) but, since your code was of the basic form, I just answered with the minimal changes required to get it working. Oh yeah, and be wary of negative numbers :-)
Here's some code for Alex Martelli's answer.
def digits(n, base):
while n:
yield n % base
n //= base
def split_number(n, part_size):
base = 10 ** part_size
return list(digits(n, base))
def join_number(digits, part_size):
base = 10 ** part_size
return sum(d * (base ** i) for i, d in enumerate(digits))

Categories

Resources