Is there a faster way of converting a number to a name? - python

The following code defines a sequence of names that are mapped to numbers. It is designed to take a number and retrieve a specific name. The class operates by ensuring the name exists in its cache, and then returns the name by indexing into its cache. The question in this: how can the name be calculated based on the number without storing a cache?
The name can be thought of as a base 63 number, except for the first digit which is always in base 53.
class NumberToName:
def __generate_name():
def generate_tail(length):
if length > 0:
for char in NumberToName.CHARS:
for extension in generate_tail(length - 1):
yield char + extension
else:
yield ''
for length in itertools.count():
for char in NumberToName.FIRST:
for extension in generate_tail(length):
yield char + extension
FIRST = ''.join(sorted(string.ascii_letters + '_'))
CHARS = ''.join(sorted(string.digits + FIRST))
CACHE = []
NAMES = __generate_name()
#classmethod
def convert(cls, number):
for _ in range(number - len(cls.CACHE) + 1):
cls.CACHE.append(next(cls.NAMES))
return cls.CACHE[number]
def __init__(self, *args, **kwargs):
raise NotImplementedError()
The following interactive sessions show some of the values that are expected to be returned in order.
>>> NumberToName.convert(0)
'A'
>>> NumberToName.convert(26)
'_'
>>> NumberToName.convert(52)
'z'
>>> NumberToName.convert(53)
'A0'
>>> NumberToName.convert(1692)
'_1'
>>> NumberToName.convert(23893)
'FAQ'
Unfortunately, these numbers need to be mapped to these exact names (to allow a reverse conversion).
Please note: A variable number of bits are received and converted unambiguously into a number. This number should be converted unambiguously to a name in the Python identifier namespace. Eventually, valid Python names will be converted to numbers, and these numbers will be converted to a variable number of bits.
Final solution:
import string
HEAD_CHAR = ''.join(sorted(string.ascii_letters + '_'))
TAIL_CHAR = ''.join(sorted(string.digits + HEAD_CHAR))
HEAD_BASE, TAIL_BASE = len(HEAD_CHAR), len(TAIL_CHAR)
def convert_number_to_name(number):
if number < HEAD_BASE: return HEAD_CHAR[number]
q, r = divmod(number - HEAD_BASE, TAIL_BASE)
return convert_number_to_name(q) + TAIL_CHAR[r]

This is a fun little problem full of off by 1 errors.
Without loops:
import string
first_digits = sorted(string.ascii_letters + '_')
rest_digits = sorted(string.digits + string.ascii_letters + '_')
def convert(number):
if number < len(first_digits):
return first_digits[number]
current_base = len(rest_digits)
remain = number - len(first_digits)
return convert(remain / current_base) + rest_digits[remain % current_base]
And the tests:
print convert(0)
print convert(26)
print convert(52)
print convert(53)
print convert(1692)
print convert(23893)
Output:
A
_
z
A0
_1
FAQ

What you've got is a corrupted form of bijective numeration (the usual example being spreadsheet column names, which are bijective base-26).
One way to generate bijective numeration:
def bijective(n, digits=string.ascii_uppercase):
result = []
while n > 0:
n, mod = divmod(n - 1, len(digits))
result += digits[mod]
return ''.join(reversed(result))
All you need to do is supply a different set of digits for the case where 53 >= n > 0. You will also need to increment n by 1, as properly the bijective 0 is the empty string, not "A":
def name(n, first=sorted(string.ascii_letters + '_'), digits=sorted(string.ascii_letters + '_' + string.digits)):
result = []
while n >= len(first):
n, mod = divmod(n - len(first), len(digits))
result += digits[mod]
result += first[n]
return ''.join(reversed(result))

Tested for the first 10,000 names:
first_chars = sorted(string.ascii_letters + '_')
later_chars = sorted(list(string.digits) + first_chars)
def f(n):
# first, determine length by subtracting the number of items of length l
# also determines the index into the list of names of length l
ix = n
l = 1
while ix >= 53 * (63 ** (l-1)):
ix -= 53 * (63 ** (l-1))
l += 1
# determine first character
first = first_chars[ix // (63 ** (l-1))]
# rest of string is just a base 63 number
s = ''
rem = ix % (63 ** (l-1))
for i in range(l-1):
s = later_chars[rem % 63] + s
rem //= 63
return first+s

You can use the code in this answer to the question "Base 62 conversion in Python" (or perhaps one of the other answers).
Using the referenced code, I think the answer your real question which was "how can the name be calculated based on the number without storing a cache?" would be to make the name the simple base 62 conversion of the number possibly with a leading underscore if the first character of the name is a digit (which is simply ignored when converting the name back into a number).
Here's sample code illustrating what I propose:
from base62 import base62_encode, base62_decode
def NumberToName(num):
ret = base62_encode(num)
return ('_' + ret) if ret[0] in '0123456789' else ret
def NameToNumber(name):
return base62_decode(name if name[0] is not '_' else name[1:])
if __name__ == '__main__':
def test(num):
name = NumberToName(num)
num2 = NameToNumber(name)
print 'NumberToName({0:5d}) -> {1!r:>6s}, NameToNumber({2!r:>6s}) -> {3:5d}' \
.format(num, name, name, num2)
test(26)
test(52)
test(53)
test(1692)
test(23893)
Output:
NumberToName( 26) -> 'q', NameToNumber( 'q') -> 26
NumberToName( 52) -> 'Q', NameToNumber( 'Q') -> 52
NumberToName( 53) -> 'R', NameToNumber( 'R') -> 53
NumberToName( 1692) -> 'ri', NameToNumber( 'ri') -> 1692
NumberToName(23893) -> '_6dn', NameToNumber('_6dn') -> 23893
If the numbers could be negative, you might have to modify the code from the referenced answer (and there is some discussion there on how to do it).

Related

Python: How to find all ways to decode a string?

I'm trying to solve this problem but it fails with input "226".
Problem:
A message containing letters from A-Z is being encoded to numbers using the following mapping:
'A' -> 1
'B' -> 2
...
'Z' -> 26
Given a non-empty string containing only digits, determine the total number of ways to decode it.
My Code:
class Solution:
def numDecodings(self, s: str) -> int:
decode =[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26]
ways = []
for d in decode:
for i in s:
if str(d) == s or str(d) in s:
ways.append(d)
if int(i) in decode:
ways.append(str(i))
return len(ways)
My code returns 2. It only takes care of combinations (22,6) and (2,26).
It should be returning 3, so I'm not sure how to take care of the (2,2,6) combination.
Looks like this problem can be broken down into many subproblems thus can be solved recursively
Subproblem 1 = when the last digit of the string is valid ( i.e. non zero number ) for that you can just recur for (n-1) digits left
if s[n-1] > "0":
count = number_of_decodings(s,n-1)
Subproblem 2 = when last 2 digits form a valid number ( less then 27 ) for that you can just recur for remaining (n-2) digits
if (s[n - 2] == '1' or (s[n - 2] == '2' and s[n - 1] < '7') ) :
count += number_of_decodings(s, n - 2)
Base Case = length of the string is 0 or 1
if n == 0 or n == 1 :
return 1
EDIT: A quick searching on internet , I found another ( more interesting ) method to solve this particular problem which uses dynamic programming to solve this problem
# A Dynamic Programming based function
# to count decodings
def countDecodingDP(digits, n):
count = [0] * (n + 1); # A table to store
# results of subproblems
count[0] = 1;
count[1] = 1;
for i in range(2, n + 1):
count[i] = 0;
# If the last digit is not 0, then last
# digit must add to the number of words
if (digits[i - 1] > '0'):
count[i] = count[i - 1];
# If second last digit is smaller than 2
# and last digit is smaller than 7, then
# last two digits form a valid character
if (digits[i - 2] == '1' or
(digits[i - 2] == '2' and
digits[i - 1] < '7') ):
count[i] += count[i - 2];
return count[n];
the above solution solves the problem in complexity of O(n) and uses the similar method as that of fibonacci number problem
source: https://www.geeksforgeeks.org/count-possible-decodings-given-digit-sequence/
This seemed like a natural for recursion. Since I was bored, and the first answer didn't use recursion and didn't return the actual decodings, I thought there was room for improvement. For what it's worth...
def encodings(str, prefix = ''):
encs = []
if len(str) > 0:
es = encodings(str[1:], (prefix + ',' if prefix else '') + str[0])
encs.extend(es)
if len(str) > 1 and int(str[0:2]) <= 26:
es = encodings(str[2:], (prefix + ',' if prefix else '') + str[0:2])
encs.extend(es)
return encs if len(str) else [prefix]
This returns a list of the possible decodings. To get the count, you just take the length of the list. Here a sample run:
encs = encodings("123")
print("{} {}".format(len(encs), encs))
with result:
3 ['1,2,3', '1,23', '12,3']
Another sample run:
encs = encodings("123123")
print("{} {}".format(len(encs), encs))
with result:
9 ['1,2,3,1,2,3', '1,2,3,1,23', '1,2,3,12,3', '1,23,1,2,3', '1,23,1,23', '1,23,12,3', '12,3,1,2,3', '12,3,1,23', '12,3,12,3']

converitng ASCII values of a string to base 3 number representation in Python [duplicate]

Python allows easy creation of an integer from a string of a given base via
int(str, base).
I want to perform the inverse: creation of a string from an integer,
i.e. I want some function int2base(num, base), such that:
int(int2base(x, b), b) == x
The function name/argument order is unimportant.
For any number x and base b that int() will accept.
This is an easy function to write: in fact it's easier than describing it in this question. However, I feel like I must be missing something.
I know about the functions bin, oct, hex, but I cannot use them for a few reasons:
Those functions are not available on older versions of Python, with which I need compatibility with (2.2)
I want a general solution that can be called the same way for different bases
I want to allow bases other than 2, 8, 16
Related
Python elegant inverse function of int(string, base)
Integer to base-x system using recursion in python
Base 62 conversion in Python
How to convert an integer to the shortest url-safe string in Python?
Surprisingly, people were giving only solutions that convert to small bases (smaller than the length of the English alphabet). There was no attempt to give a solution which converts to any arbitrary base from 2 to infinity.
So here is a super simple solution:
def numberToBase(n, b):
if n == 0:
return [0]
digits = []
while n:
digits.append(int(n % b))
n //= b
return digits[::-1]
so if you need to convert some super huge number to the base 577,
numberToBase(67854 ** 15 - 102, 577), will give you a correct solution:
[4, 473, 131, 96, 431, 285, 524, 486, 28, 23, 16, 82, 292, 538, 149, 25, 41, 483, 100, 517, 131, 28, 0, 435, 197, 264, 455],
Which you can later convert to any base you want
at some point of time you will notice that sometimes there is no built-in library function to do things that you want, so you need to write your own. If you disagree, post you own solution with a built-in function which can convert a base 10 number to base 577.
this is due to lack of understanding what a number in some base means.
I encourage you to think for a little bit why base in your method works only for n <= 36. Once you are done, it will be obvious why my function returns a list and has the signature it has.
If you need compatibility with ancient versions of Python, you can either use gmpy (which does include a fast, completely general int-to-string conversion function, and can be built for such ancient versions – you may need to try older releases since the recent ones have not been tested for venerable Python and GMP releases, only somewhat recent ones), or, for less speed but more convenience, use Python code – e.g., for Python 2, most simply:
import string
digs = string.digits + string.ascii_letters
def int2base(x, base):
if x < 0:
sign = -1
elif x == 0:
return digs[0]
else:
sign = 1
x *= sign
digits = []
while x:
digits.append(digs[int(x % base)])
x = int(x / base)
if sign < 0:
digits.append('-')
digits.reverse()
return ''.join(digits)
For Python 3, int(x / base) leads to incorrect results, and must be changed to x // base:
import string
digs = string.digits + string.ascii_letters
def int2base(x, base):
if x < 0:
sign = -1
elif x == 0:
return digs[0]
else:
sign = 1
x *= sign
digits = []
while x:
digits.append(digs[x % base])
x = x // base
if sign < 0:
digits.append('-')
digits.reverse()
return ''.join(digits)
"{0:b}".format(100) # bin: 1100100
"{0:x}".format(100) # hex: 64
"{0:o}".format(100) # oct: 144
def baseN(num,b,numerals="0123456789abcdefghijklmnopqrstuvwxyz"):
return ((num == 0) and numerals[0]) or (baseN(num // b, b, numerals).lstrip(numerals[0]) + numerals[num % b])
ref:
http://code.activestate.com/recipes/65212/
Please be aware that this may lead to
RuntimeError: maximum recursion depth exceeded in cmp
for very big integers.
>>> numpy.base_repr(10, base=3)
'101'
Note that numpy.base_repr() has a limit of 36 as its base. Otherwise it throws a ValueError
Recursive
I would simplify the most voted answer to:
BS="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"
def to_base(n, b):
return "0" if not n else to_base(n//b, b).lstrip("0") + BS[n%b]
With the same advice for RuntimeError: maximum recursion depth exceeded in cmp on very large integers and negative numbers. (You could usesys.setrecursionlimit(new_limit))
Iterative
To avoid recursion problems:
BS="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"
def to_base(s, b):
res = ""
while s:
res+=BS[s%b]
s//= b
return res[::-1] or "0"
Great answers!
I guess the answer to my question was "no" I was not missing some obvious solution.
Here is the function I will use that condenses the good ideas expressed in the answers.
allow caller-supplied mapping of characters (allows base64 encode)
checks for negative and zero
maps complex numbers into tuples of strings
def int2base(x,b,alphabet='0123456789abcdefghijklmnopqrstuvwxyz'):
'convert an integer to its string representation in a given base'
if b<2 or b>len(alphabet):
if b==64: # assume base64 rather than raise error
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
else:
raise AssertionError("int2base base out of range")
if isinstance(x,complex): # return a tuple
return ( int2base(x.real,b,alphabet) , int2base(x.imag,b,alphabet) )
if x<=0:
if x==0:
return alphabet[0]
else:
return '-' + int2base(-x,b,alphabet)
# else x is non-negative real
rets=''
while x>0:
x,idx = divmod(x,b)
rets = alphabet[idx] + rets
return rets
You could use baseconv.py from my project: https://github.com/semente/python-baseconv
Sample usage:
>>> from baseconv import BaseConverter
>>> base20 = BaseConverter('0123456789abcdefghij')
>>> base20.encode(1234)
'31e'
>>> base20.decode('31e')
'1234'
>>> base20.encode(-1234)
'-31e'
>>> base20.decode('-31e')
'-1234'
>>> base11 = BaseConverter('0123456789-', sign='$')
>>> base11.encode('$1234')
'$-22'
>>> base11.decode('$-22')
'$1234'
There is some bultin converters as for example baseconv.base2, baseconv.base16 and baseconv.base64.
def base(decimal ,base) :
list = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"
other_base = ""
while decimal != 0 :
other_base = list[decimal % base] + other_base
decimal = decimal / base
if other_base == "":
other_base = "0"
return other_base
print base(31 ,16)
output:
"1F"
def base_conversion(num, base):
digits = []
while num > 0:
num, remainder = divmod(num, base)
digits.append(remainder)
return digits[::-1]
http://code.activestate.com/recipes/65212/
def base10toN(num,n):
"""Change a to a base-n number.
Up to base-36 is supported without special notation."""
num_rep={10:'a',
11:'b',
12:'c',
13:'d',
14:'e',
15:'f',
16:'g',
17:'h',
18:'i',
19:'j',
20:'k',
21:'l',
22:'m',
23:'n',
24:'o',
25:'p',
26:'q',
27:'r',
28:'s',
29:'t',
30:'u',
31:'v',
32:'w',
33:'x',
34:'y',
35:'z'}
new_num_string=''
current=num
while current!=0:
remainder=current%n
if 36>remainder>9:
remainder_string=num_rep[remainder]
elif remainder>=36:
remainder_string='('+str(remainder)+')'
else:
remainder_string=str(remainder)
new_num_string=remainder_string+new_num_string
current=current/n
return new_num_string
Here's another one from the same link
def baseconvert(n, base):
"""convert positive decimal integer n to equivalent in another base (2-36)"""
digits = "0123456789abcdefghijklmnopqrstuvwxyz"
try:
n = int(n)
base = int(base)
except:
return ""
if n < 0 or base < 2 or base > 36:
return ""
s = ""
while 1:
r = n % base
s = digits[r] + s
n = n / base
if n == 0:
break
return s
I made a pip package for this.
I recommend you use my bases.py https://github.com/kamijoutouma/bases.py which was inspired by bases.js
from bases import Bases
bases = Bases()
bases.toBase16(200) // => 'c8'
bases.toBase(200, 16) // => 'c8'
bases.toBase62(99999) // => 'q0T'
bases.toBase(200, 62) // => 'q0T'
bases.toAlphabet(300, 'aAbBcC') // => 'Abba'
bases.fromBase16('c8') // => 200
bases.fromBase('c8', 16) // => 200
bases.fromBase62('q0T') // => 99999
bases.fromBase('q0T', 62) // => 99999
bases.fromAlphabet('Abba', 'aAbBcC') // => 300
refer to https://github.com/kamijoutouma/bases.py#known-basesalphabets
for what bases are usable
EDIT:
pip link https://pypi.python.org/pypi/bases.py/0.2.2
def int2base(a, base, numerals="0123456789abcdefghijklmnopqrstuvwxyz"):
baseit = lambda a=a, b=base: (not a) and numerals[0] or baseit(a-a%b,b*base)+numerals[a%b%(base-1) or (a%b) and (base-1)]
return baseit()
explanation
In any base every number is equal to a1+a2*base**2+a3*base**3... The "mission" is to find all a 's.
For everyN=1,2,3... the code is isolating the aN*base**N by "mouduling" by b for b=base**(N+1) which slice all a 's bigger than N, and slicing all the a 's that their serial is smaller than N by decreasing a everytime the func is called by the current aN*base**N .
Base%(base-1)==1 therefor base**p%(base-1)==1 and therefor q*base^p%(base-1)==q with only one exception when q=base-1 which returns 0.
To fix that in case it returns 0 the func is checking is it 0 from the beggining.
advantages
in this sample theres only one multiplications (instead of division) and some moudulueses which relatively takes small amounts of time.
While the currently top answer is definitely an awesome solution, there remains more customization users might like.
Basencode adds some of these features, including conversions of floating point numbers, modifying digits (in the linked answer, only numbers can be used).
Here's a possible use case:
>>> from basencode import *
>>> n1 = Number(12345)
>> n1.repr_in_base(64) # convert to base 64
'30V'
>>> Number('30V', 64) # construct Integer from base 64
Integer(12345)
>>> n1.repr_in_base(8)
'30071'
>>> n1.repr_in_octal() # shortcuts
'30071'
>>> n1.repr_in_bin() # equivelant to `n1.repr_in_base(2)`
'11000000111001'
>>> n1.repr_in_base(2, digits=list('-+')) # override default digits: use `-` and `+` in place of `0` and `1`
'++------+++--+'
>>> n1.repr_in_base(33) # yet another base - all bases from 2 to 64 are supported from the start
'bb3'
How would you add any bases you want? Let me replicate the example of the currently most upvoted answer: the digits parameter allows you to override the default digits from base 2 to 64, and provide digits for any base higher than that. The mode parameter determines how the value of the representation will determine how (list or string) the answer will be returned.
>>> n2 = Number(67854 ** 15 - 102)
>>> n2.repr_in_base(577, digits=[str(i) for i in range(577)], mode="l")
['4', '473', '131', '96', '431', '285', '524', '486', '28', '23', '16', '82', '292', '538', '149', '25', '41', '483', '100', '517', '131', '28', '0', '435', '197', '264', '455']
>>> n2.repr_in_base(577, mode="l") # the program remembers the digits for base 577 now
['4', '473', '131', '96', '431', '285', '524', '486', '28', '23', '16', '82', '292', '538', '149', '25', '41', '483', '100', '517', '131', '28', '0', '435', '197', '264', '455']
Operations can be done: the Number class returns an instance of basencode.Integer if the provided number is an Integer, else it returns a basencode.Float
>>> n3 = Number(54321) # the Number class returns an instance of `basencode.Integer` if the provided number is an Integer, otherwise it returns a `basencode.Float`.
>>> n1 + n3
Integer(66666)
>>> n3 - n1
Integer(41976)
>>> n1 * n3
Integer(670592745)
>>> n3 // n1
Integer(4)
>>> n3 / n1 # a basencode.Float class allows conversion of floating point numbers
Float(4.400243013365735)
>>> (n3 / n1).repr_in_base(32)
'4.cpr56v6rnc4oitoblha2r11sus0dheqd4pgechfcjklo74b2bgom7j8ih86mipdvss0068sehi9f3791mdo4uotfujq66cf0jkgo'
>>> n4 = Number(0.5) # returns a basencode.Float
>>> n4.repr_in_bin() # binary version of 0.5
'0.1'
Disclaimer: this project is under active maintenance, and I'm a contributor.
>>> import string
>>> def int2base(integer, base):
if not integer: return '0'
sign = 1 if integer > 0 else -1
alphanum = string.digits + string.ascii_lowercase
nums = alphanum[:base]
res = ''
integer *= sign
while integer:
integer, mod = divmod(integer, base)
res += nums[mod]
return ('' if sign == 1 else '-') + res[::-1]
>>> int2base(-15645, 23)
'-16d5'
>>> int2base(213, 21)
'a3'
A recursive solution for those interested. Of course, this will not work with negative binary values. You would need to implement Two's Complement.
def generateBase36Alphabet():
return ''.join([str(i) for i in range(10)]+[chr(i+65) for i in range(26)])
def generateAlphabet(base):
return generateBase36Alphabet()[:base]
def intToStr(n, base, alphabet):
def toStr(n, base, alphabet):
return alphabet[n] if n < base else toStr(n//base,base,alphabet) + alphabet[n%base]
return ('-' if n < 0 else '') + toStr(abs(n), base, alphabet)
print('{} -> {}'.format(-31, intToStr(-31, 16, generateAlphabet(16)))) # -31 -> -1F
def base_changer(number,base):
buff=97+abs(base-10)
dic={};buff2='';buff3=10
for i in range(97,buff+1):
dic[buff3]=chr(i)
buff3+=1
while(number>=base):
mod=int(number%base)
number=int(number//base)
if (mod) in dic.keys():
buff2+=dic[mod]
continue
buff2+=str(mod)
if (number) in dic.keys():
buff2+=dic[number]
else:
buff2+=str(number)
return buff2[::-1]
Here is an example of how to convert a number of any base to another base.
from collections import namedtuple
Test = namedtuple("Test", ["n", "from_base", "to_base", "expected"])
def convert(n: int, from_base: int, to_base: int) -> int:
digits = []
while n:
(n, r) = divmod(n, to_base)
digits.append(r)
return sum(from_base ** i * v for i, v in enumerate(digits))
if __name__ == "__main__":
tests = [
Test(32, 16, 10, 50),
Test(32, 20, 10, 62),
Test(1010, 2, 10, 10),
Test(8, 10, 8, 10),
Test(150, 100, 1000, 150),
Test(1500, 100, 10, 1050000),
]
for test in tests:
result = convert(*test[:-1])
assert result == test.expected, f"{test=}, {result=}"
print("PASSED!!!")
Say we want to convert 14 to base 2. We repeatedly apply the division algorithm until the quotient is 0:
14 = 2 x 7
7 = 2 x 3 + 1
3 = 2 x 1 + 1
1 = 2 x 0 + 1
The binary representation is just the remainder read from bottom to top. This can be proved by expanding
14 = 2 x 7 = 2 x (2 x 3 + 1) = 2 x (2 x (2 x 1 + 1) + 1) = 2 x (2 x (2 x (2 x 0 + 1) + 1) + 1) = 2^3 + 2^2 + 2
The code is the implementation of the above algorithm.
def toBaseX(n, X):
strbin = ""
while n != 0:
strbin += str(n % X)
n = n // X
return strbin[::-1]
This is my approach. At first converting the number then casting it to string.
def to_base(n, base):
if base == 10:
return n
result = 0
counter = 0
while n:
r = n % base
n //= base
result += r * 10**counter
counter+=1
return str(result)
I have written this function which I use to encode in different bases. I also provided the way to shift the result by a value 'offset'. This is useful if you'd like to encode to bases above 64, but keeping displayable chars (like a base 95).
I also tried to avoid reversing the output 'list' and tried to minimize computing operations. The array of pow(base) is computed on demand and kept for additional calls to the function.
The output is a binary string
pows = {}
######################################################
def encode_base(value,
base = 10,
offset = 0) :
"""
Encode value into a binary string, according to the desired base.
Input :
value : Any positive integer value
offset : Shift the encoding (eg : Starting at chr(32))
base : The base in which we'd like to encode the value
Return : Binary string
Example : with : offset = 32, base = 64
100 -> !D
200 -> #(
"""
# Determine the number of loops
try :
pb = pows[base]
except KeyError :
pb = pows[base] = {n : base ** n for n in range(0, 8) if n < 2 ** 48 -1}
for n in pb :
if value < pb[n] :
n -= 1
break
out = []
while n + 1 :
b = pb[n]
out.append(chr(offset + value // b))
n -= 1
value %= b
return ''.join(out).encode()
This function converts any integer from any base to any base
def baseconvert(number, srcbase, destbase):
if srcbase != 10:
sum = 0
for _ in range(len(str(number))):
sum += int(str(number)[_]) * pow(srcbase, len(str(number)) - _ - 1)
b10 = sum
return baseconvert(b10, 10, destbase)
end = ''
q = number
while(True):
r = q % destbase
q = q // destbase
end = str(r) + end
if(q<destbase):
end = str(q) + end
return int(end)
The below provided Python code converts a Python integer to a string in arbitrary base ( from 2 up to infinity ) and works in both directions. So all the created strings can be converted back to Python integers by providing a string for N instead of an integer.
The code works only on positive numbers by intention (there is in my eyes some hassle about negative values and their bit representations I don't want to dig into). Just pick from this code what you need, want or like, or just have fun learning about available options. Much is there only for the purpose of documenting all the various available approaches ( e.g. the Oneliner seems not to be fast, even if promised to be ).
I like the by Salvador Dali proposed format for infinite large bases. A nice proposal which works optically well even for simple binary bit representations. Notice that the width=x padding parameter in case of infiniteBase=True formatted string applies to the digits and not to the whole number. It seems, that code handling infiniteBase digits format runs even a bit faster than the other options - another reason for using it?
I don't like the idea of using Unicode for extending the number of symbols available for digits, so don't look in the code below for it, because it's not there. Use the proposed infiniteBase format instead or store integers as bytes for compression purposes.
def inumToStr( N, base=2, width=1, infiniteBase=False,\
useNumpy=False, useRecursion=False, useOneliner=False, \
useGmpy=False, verbose=True):
''' Positive numbers only, but works in BOTH directions.
For strings in infiniteBase notation set for bases <= 62
infiniteBase=True . Examples of use:
inumToStr( 17, 2, 1, 1) # [1,0,0,0,1]
inumToStr( 17, 3, 5) # 00122
inumToStr(245, 16, 4) # 00F5
inumToStr(245, 36, 4,0,1) # 006T
inumToStr(245245245245,36,10,0,1) # 0034NWOQBH
inumToStr(245245245245,62) # 4JhA3Th
245245245245 == int(gmpy2.mpz('4JhA3Th',62))
inumToStr(245245245245,99,2) # [25,78, 5,23,70,44]
----------------------------------------------------
inumToStr( '[1,0,0,0,1]',2, infiniteBase=True ) # 17
inumToStr( '[25,78, 5,23,70,44]', 99) # 245245245245
inumToStr( '0034NWOQBH', 36 ) # 245245245245
inumToStr( '4JhA3Th' , 62 ) # 245245245245
----------------------------------------------------
--- Timings for N = 2**4096, base=36:
standard: 0.0023
infinite: 0.0017
numpy : 0.1277
recursio; 0.0022
oneliner: 0.0146
For N = 2**8192:
standard: 0.0075
infinite: 0.0053
numpy : 0.1369
max. recursion depth exceeded: recursio/oneliner
'''
show = print
if type(N) is str and ( infiniteBase is True or base > 62 ):
lstN = eval(N)
if verbose: show(' converting a non-standard infiniteBase bits string to Python integer')
return sum( [ item*base**pow for pow, item in enumerate(lstN[::-1]) ] )
if type(N) is str and base <= 36:
if verbose: show('base <= 36. Returning Python int(N, base)')
return int(N, base)
if type(N) is str and base <= 62:
if useGmpy:
if verbose: show(' base <= 62, useGmpy=True, returning int(gmpy2.mpz(N,base))')
return int(gmpy2.mpz(N,base))
else:
if verbose: show(' base <= 62, useGmpy=False, self-calculating return value)')
lstStrOfDigits="0123456789"+ \
"abcdefghijklmnopqrstuvwxyz".upper() + \
"abcdefghijklmnopqrstuvwxyz"
dictCharToPow = {}
for index, char in enumerate(lstStrOfDigits):
dictCharToPow.update({char : index})
return sum( dictCharToPow[item]*base**pow for pow, item in enumerate(N[::-1]) )
#:if
#:if
if useOneliner and base <= 36:
if verbose: show(' base <= 36, useOneliner=True, running the Oneliner code')
d="0123456789abcdefghijklmnopqrstuvwxyz"
baseit = lambda a=N, b=base: (not a) and d[0] or \
baseit(a-a%b,b*base)+d[a%b%(base-1) or (a%b) and (base-1)]
return baseit().rjust(width, d[0])[1:]
if useRecursion and base <= 36:
if verbose: show(' base <= 36, useRecursion=True, running recursion algorythm')
BS="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"
def to_base(n, b):
return "0" if not n else to_base(n//b, b).lstrip("0") + BS[n%b]
return to_base(N, base).rjust(width,BS[0])
if base > 62 or infiniteBase:
if verbose: show(' base > 62 or infiniteBase=True, returning a non-standard digits string')
# Allows arbitrary large base with 'width=...'
# applied to each digit (useful also for bits )
N, digit = divmod(N, base)
strN = str(digit).rjust(width, ' ')+']'
while N:
N, digit = divmod(N, base)
strN = str(digit).rjust(width, ' ') + ',' + strN
return '[' + strN
#:if
if base == 2:
if verbose: show(" base = 2, returning Python str(f'{N:0{width}b}')")
return str(f'{N:0{width}b}')
if base == 8:
if verbose: show(" base = 8, returning Python str(f'{N:0{width}o}')")
return str(f'{N:0{width}o}')
if base == 16:
if verbose: show(" base = 16, returning Python str(f'{N:0{width}X}')")
return str(f'{N:0{width}X}')
if base <= 36:
if useNumpy:
if verbose: show(" base <= 36, useNumpy=True, returning np.base_repr(N, base)")
import numpy as np
strN = np.base_repr(N, base)
return strN.rjust(width, '0')
else:
if verbose: show(' base <= 36, useNumpy=False, self-calculating return value)')
lstStrOfDigits="0123456789"+"abcdefghijklmnopqrstuvwxyz".upper()
strN = lstStrOfDigits[N % base] # rightmost digit
while N >= base:
N //= base # consume already converted digit
strN = lstStrOfDigits[N % base] + strN # add digits to the left
#:while
return strN.rjust(width, lstStrOfDigits[0])
#:if
#:if
if base <= 62:
if useGmpy:
if verbose: show(" base <= 62, useGmpy=True, returning gmpy2.digits(N, base)")
import gmpy2
strN = gmpy2.digits(N, base)
return strN.rjust(width, '0')
# back to Python int from gmpy2.mpz with
# int(gmpy2.mpz('4JhA3Th',62))
else:
if verbose: show(' base <= 62, useGmpy=False, self-calculating return value)')
lstStrOfDigits= "0123456789" + \
"abcdefghijklmnopqrstuvwxyz".upper() + \
"abcdefghijklmnopqrstuvwxyz"
strN = lstStrOfDigits[N % base] # rightmost digit
while N >= base:
N //= base # consume already converted digit
strN = lstStrOfDigits[N % base] + strN # add digits to the left
#:while
return strN.rjust(width, lstStrOfDigits[0])
#:if
#:if
#:def
I'm presenting a "unoptimized" solution for bases between 2 and 9:
def to_base(N, base=2):
N_in_base = ''
while True:
N_in_base = str(N % base) + N_in_base
N //= base
if N == 0:
break
return N_in_base
This solution does not require reversing the final result, but it's actually not optimized. Refer to this answer to see why: https://stackoverflow.com/a/37133870/7896998
Simple base transformation
def int_to_str(x, b):
s = ""
while x:
s = str(x % b) + s
x //= b
return s
Example of output with no 0 to base 9
s = ""
x = int(input())
while x:
if x % 9 == 0:
s = "9" + s
x -= x % 10
x = x // 9
else:
s = str(x % 9) + s
x = x // 9
print(s)
def dec_to_radix(input, to_radix=2, power=None):
if not isinstance(input, int):
raise TypeError('Not an integer!')
elif power is None:
power = 1
if input == 0:
return 0
else:
remainder = input % to_radix**power
digit = str(int(remainder/to_radix**(power-1)))
return int(str(dec_to_radix(input-remainder, to_radix, power+1)) + digit)
def radix_to_dec(input, from_radix):
if not isinstance(input, int):
raise TypeError('Not an integer!')
return sum(int(digit)*(from_radix**power) for power, digit in enumerate(str(input)[::-1]))
def radix_to_radix(input, from_radix=10, to_radix=2, power=None):
dec = radix_to_dec(input, from_radix)
return dec_to_radix(dec, to_radix, power)
Another short one (and easier to understand imo):
def int_to_str(n, b, symbols='0123456789abcdefghijklmnopqrstuvwxyz'):
return (int_to_str(n/b, b, symbols) if n >= b else "") + symbols[n%b]
And with proper exception handling:
def int_to_str(n, b, symbols='0123456789abcdefghijklmnopqrstuvwxyz'):
try:
return (int_to_str(n/b, b) if n >= b else "") + symbols[n%b]
except IndexError:
raise ValueError(
"The symbols provided are not enough to represent this number in "
"this base")
Here is a recursive version that handles signed integers and custom digits.
import string
def base_convert(x, base, digits=None):
"""Convert integer `x` from base 10 to base `base` using `digits` characters as digits.
If `digits` is omitted, it will use decimal digits + lowercase letters + uppercase letters.
"""
digits = digits or (string.digits + string.ascii_letters)
assert 2 <= base <= len(digits), "Unsupported base: {}".format(base)
if x == 0:
return digits[0]
sign = '-' if x < 0 else ''
x = abs(x)
first_digits = base_convert(x // base, base, digits).lstrip(digits[0])
return sign + first_digits + digits[x % base]
Strings aren't the only choice for representing numbers: you can use a list of integers to represent the order of each digit. Those can easily be converted to a string.
None of the answers reject base < 2; and most will run very slowly or crash with stack overflows for very large numbers (such as 56789 ** 43210). To avoid such failures, reduce quickly like this:
def n_to_base(n, b):
if b < 2: raise # invalid base
if abs(n) < b: return [n]
ret = [y for d in n_to_base(n, b*b) for y in divmod(d, b)]
return ret[1:] if ret[0] == 0 else ret # remove leading zeros
def base_to_n(v, b):
h = len(v) // 2
if h == 0: return v[0]
return base_to_n(v[:-h], b) * (b**h) + base_to_n(v[-h:], b)
assert ''.join(['0123456789'[x] for x in n_to_base(56789**43210,10)])==str(56789**43210)
Speedwise, n_to_base is comparable with str for large numbers (about 0.3s on my machine), but if you compare against hex you may be surprised (about 0.3ms on my machine, or 1000x faster). The reason is because the large integer is stored in memory in base 256 (bytes). Each byte can simply be converted to a two-character hex string. This alignment only happens for bases that are powers of two, which is why there are special cases for 2,8, and 16 (and base64, ascii, utf16, utf32).
Consider the last digit of a decimal string. How does it relate to the sequence of bytes that forms its integer? Let's label the bytes s[i] with s[0] being the least significant (little endian). Then the last digit is sum([s[i]*(256**i) % 10 for i in range(n)]). Well, it happens that 256**i ends with a 6 for i > 0 (6*6=36) so that last digit is (s[0]*5 + sum(s)*6)%10. From this, you can see that the last digit depends on the sum of all the bytes. This nonlocal property is what makes converting to decimal harder.
def baseConverter(x, b):
s = ""
d = string.printable.upper()
while x > 0:
s += d[x%b]
x = x / b
return s[::-1]

Returning a string for the number with prefix 0s

I don't understand this question and confused. can anyone show me? It's an exercise in a python book. Only can use loop and function. And based on the question, have to ask the user to enter the number and width.
def format(number, width):
The function returns a string for the number with prefix 0s. The size
of the string is the width. For example, format(34, 4) returns "0034"
and format(34, 5) returns "00034". If the number is longer than the
width, the function returns the string representation for the number.
For example, format(34, 1) returns "34".
I don't quite understand what you mean by "Only can use loop and function". Since your can use function, you can use almost anything in Python.
The simplest solution:
def format(n,w):
s = str(n)
return ('0' * w + s)[-max(w,len(s)):]
>>> format(34,4)
'0034'
Or you can use a loop:
def format(n,w):
s = str(n)
result = ''
for i in range(w - len(s)):
result += '0'
return result + s
>>> format(34,1)
'34'
>>> format(34,4)
'0034'
Try:
def format(number, width):
numstr = str(number)
result = ''
numstrlen = len(numstr)
for i in range(width - numstrlen):
result += '0'
result += numstr
return result
I would have simply done subtraction, but you said it needed to be a loop.
If you can't use len:
def format(number, width):
numstr = str(number)
result = ''
numstrlen = 0
for c in numstr:
numstrlen += 1
for i in range(width - numstrlen):
result += '0'
result += numstr
return result

Encoding a 128-bit integer in Python?

Inspired by the "encoding scheme" of the answer to this question, I implemented my own encoding algorithm in Python.
Here is what it looks like:
import random
from math import pow
from string import ascii_letters, digits
# RFC 2396 unreserved URI characters
unreserved = '-_.!~*\'()'
characters = ascii_letters + digits + unreserved
size = len(characters)
seq = range(0,size)
# Seed random generator with same randomly generated number
random.seed(914576904)
random.shuffle(seq)
dictionary = dict(zip(seq, characters))
reverse_dictionary = dict((v,k) for k,v in dictionary.iteritems())
def encode(n):
d = []
n = n
while n > 0:
qr = divmod(n, size)
n = qr[0]
d.append(qr[1])
chars = ''
for i in d:
chars += dictionary[i]
return chars
def decode(str):
d = []
for c in str:
d.append(reverse_dictionary[c])
value = 0
for i in range(0, len(d)):
value += d[i] * pow(size, i)
return value
The issue I'm running into is encoding and decoding very large integers. For example, this is how a large number is currently encoded and decoded:
s = encode(88291326719355847026813766449910520462)
# print s -> "3_r(AUqqMvPRkf~JXaWj8"
i = decode(s)
# print i -> "8.82913267194e+37"
# print long(i) -> "88291326719355843047833376688611262464"
The highest 16 places match up perfectly, but after those the number deviates from its original.
I assume this is a problem with the precision of extremely large integers when dividing in Python. Is there any way to circumvent this problem? Or is there another issue that I'm not aware of?
The problem lies within this line:
value += d[i] * pow(size, i)
It seems like you're using math.pow here instead of the built-in pow method. It returns a floating point number, so you lose accuracy for your large numbers. You should use the built-in pow or the ** operator or, even better, keep the current power of the base in an integer variable:
def decode(s):
d = [reverse_dictionary[c] for c in s]
result, power = 0, 1
for x in d:
result += x * power
power *= size
return result
It gives me the following result now:
print decode(encode(88291326719355847026813766449910520462))
# => 88291326719355847026813766449910520462

How to split big numbers?

I have a big number, which I need to split into smaller numbers in Python. I wrote the following code to swap between the two:
def split_number (num, part_size):
string = str(num)
string_size = len(string)
arr = []
pointer = 0
while pointer < string_size:
e = pointer + part_size
arr.append(int(string[pointer:e]))
pointer += part_size
return arr
def join_number(arr):
num = ""
for x in arr:
num += str(x)
return int(num)
But the number comes back different. It's hard to debug because the number is so large so before I go into that I thought I would post it here to see if there is a better way to do it or whether I'm missing something obvious.
Thanks a lot.
Clearly, any leading 0s in the "parts" can't be preserved by this operation. Can't join_number also receive the part_size argument, so that it can reconstruct the string formats with all the leading zeros?
Without some information such as part_size that's known to both the sender and receiver, or the equivalent (such as the base number to use for a similar split and join based on arithmetic, roughly equivalent to 10**part_size given the way you're using part_size), the task becomes quite a bit harder. If the receiver is initially clueless about this, why not just place the part_size (or base, etc) as the very first int in the arr list that's being sent and received? That way, the encoding trivially becomes "self-sufficient", i.e., doesn't need any supplementary parameter known to both sender and receiver.
There is no need to convert to and from strings, which can be very time consuming for really large numbers
>>> def split_number(n, part_size):
... base = 10**part_size
... L = []
... while n:
... n,part = divmod(n,base)
... L.append(part)
... return L[::-1]
...
>>> def join_number(L, part_size):
... base = 10**part_size
... n = 0
... L = L[::-1]
... while L:
... n = n*base+L.pop()
... return n
...
>>> print split_number(1000005,3)
[1, 0, 5]
>>> print join_number([1,0,5],3)
1000005
>>>
Here you can see that just converting the number to a str takes longer than my entire function!
>>> from time import time
>>> t=time();b = split_number(2**100000,3000);print time()-t
0.204252004623
>>> t=time();b = split_number(2**100000,30);print time()-t
0.486856222153
>>> t=time();b = str(2**100000);print time()-t
0.730905056
You should think of the following number split into 3-sized chunks:
1000005 -> 100 000 5
You have two problems. The first is that if you put those integers back together, you'll get:
100 0 5 -> 100005
(i.e., the middle one is 0, not 000) which is not what you started with. Second problem is that you're not sure what size the last part should be.
I would ensure that you're first using a string whose length is an exact multiple of the part size so you know exactly how big each part should be:
def split_number (num, part_size):
string = str(num)
string_size = len(string)
while string_size % part_size != 0:
string = "0%s"%(string)
string_size = string_size + 1
arr = []
pointer = 0
while pointer < string_size:
e = pointer + part_size
arr.append(int(string[pointer:e]))
pointer += part_size
return arr
Secondly, make sure that you put the parts back together with the right length for each part (ensuring you don't put leading zeros on the first part of course):
def join_number(arr, part_size):
fmt_str = "%%s%%0%dd"%(part_size)
num = arr[0]
for x in arr[1:]:
num = fmt_str%(num,int(x))
return int(num)
Tying it all together, the following complete program:
#!/usr/bin/python
def split_number (num, part_size):
string = str(num)
string_size = len(string)
while string_size % part_size != 0:
string = "0%s"%(string)
string_size = string_size + 1
arr = []
pointer = 0
while pointer < string_size:
e = pointer + part_size
arr.append(int(string[pointer:e]))
pointer += part_size
return arr
def join_number(arr, part_size):
fmt_str = "%%s%%0%dd"%(part_size)
num = arr[0]
for x in arr[1:]:
num = fmt_str%(num,int(x))
return int(num)
x = 1000005
print x
y = split_number(x,3)
print y
z = join_number(y,3)
print z
produces the output:
1000005
[1, 0, 5]
1000005
which shows that it goes back together.
Just keep in mind I haven't done Python for a few years. There's almost certainly a more "Pythonic" way to do it with those new-fangled lambdas and things (or whatever Python calls them) but, since your code was of the basic form, I just answered with the minimal changes required to get it working. Oh yeah, and be wary of negative numbers :-)
Here's some code for Alex Martelli's answer.
def digits(n, base):
while n:
yield n % base
n //= base
def split_number(n, part_size):
base = 10 ** part_size
return list(digits(n, base))
def join_number(digits, part_size):
base = 10 ** part_size
return sum(d * (base ** i) for i, d in enumerate(digits))

Categories

Resources