Base 62 conversion

Base 62 conversion - python

How would you convert an integer to base 62 (like hexadecimal, but with these digits: '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ').
I have been trying to find a good Python library for it, but they all seems to be occupied with converting strings. The Python base64 module only accepts strings and turns a single digit into four characters. I was looking for something akin to what URL shorteners use.

There is no standard module for this, but I have written my own functions to achieve that.
BASE62 = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
def encode(num, alphabet):
"""Encode a positive number into Base X and return the string.
Arguments:
- `num`: The number to encode
- `alphabet`: The alphabet to use for encoding
"""
if num == 0:
return alphabet[0]
arr = []
arr_append = arr.append # Extract bound-method for faster access.
_divmod = divmod # Access to locals is faster.
base = len(alphabet)
while num:
num, rem = _divmod(num, base)
arr_append(alphabet[rem])
arr.reverse()
return ''.join(arr)
def decode(string, alphabet=BASE62):
"""Decode a Base X encoded string into the number
Arguments:
- `string`: The encoded string
- `alphabet`: The alphabet to use for decoding
"""
base = len(alphabet)
strlen = len(string)
num = 0
idx = 0
for char in string:
power = (strlen - (idx + 1))
num += alphabet.index(char) * (base ** power)
idx += 1
return num
Notice the fact that you can give it any alphabet to use for encoding and decoding. If you leave the alphabet argument out, you are going to get the 62 character alphabet defined on the first line of code, and hence encoding/decoding to/from 62 base.
PS - For URL shorteners, I have found that it's better to leave out a few confusing characters like 0Ol1oI etc. Thus I use this alphabet for my URL shortening needs - "23456789abcdefghijkmnpqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ"

I once wrote a script to do this aswell, I think it's quite elegant :)
import string
# Remove the `_#` below for base62, now it has 64 characters
BASE_LIST = string.digits + string.letters + '_#'
BASE_DICT = dict((c, i) for i, c in enumerate(BASE_LIST))
def base_decode(string, reverse_base=BASE_DICT):
length = len(reverse_base)
ret = 0
for i, c in enumerate(string[::-1]):
ret += (length ** i) * reverse_base[c]
return ret
def base_encode(integer, base=BASE_LIST):
if integer == 0:
return base[0]
length = len(base)
ret = ''
while integer != 0:
ret = base[integer % length] + ret
integer /= length
return ret
Example usage:
for i in range(100):
print i, base_decode(base_encode(i)), base_encode(i)

The following decoder-maker works with any reasonable base, has a much tidier loop, and gives an explicit error message when it meets an invalid character.
def base_n_decoder(alphabet):
"""Return a decoder for a base-n encoded string
Argument:
- `alphabet`: The alphabet used for encoding
"""
base = len(alphabet)
char_value = dict(((c, v) for v, c in enumerate(alphabet)))
def f(string):
num = 0
try:
for char in string:
num = num * base + char_value[char]
except KeyError:
raise ValueError('Unexpected character %r' % char)
return num
return f
if __name__ == "__main__":
func = base_n_decoder('0123456789abcdef')
for test in ('0', 'f', '2020', 'ffff', 'abqdef'):
print test
print func(test)

If you're looking for the highest efficiency (like django), you'll want something like the following. This code is a combination of efficient methods from Baishampayan Ghose and WoLpH and John Machin.
# Edit this list of characters as desired.
BASE_ALPH = tuple("0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz")
BASE_DICT = dict((c, v) for v, c in enumerate(BASE_ALPH))
BASE_LEN = len(BASE_ALPH)
def base_decode(string):
num = 0
for char in string:
num = num * BASE_LEN + BASE_DICT[char]
return num
def base_encode(num):
if not num:
return BASE_ALPH[0]
encoding = ""
while num:
num, rem = divmod(num, BASE_LEN)
encoding = BASE_ALPH[rem] + encoding
return encoding
You may want to also calculate your dictionary in advance. (Note: Encoding with a string shows more efficiency than with a list, even with very long numbers.)
>>> timeit.timeit("for i in xrange(1000000): base.base_decode(base.base_encode(i))", setup="import base", number=1)
2.3302059173583984
Encoded and decoded 1 million numbers in under 2.5 seconds. (2.2Ghz i7-2670QM)

If you use django framework, you can use django.utils.baseconv module.
>>> from django.utils import baseconv
>>> baseconv.base62.encode(1234567890)
1LY7VK
In addition to base62, baseconv also defined base2/base16/base36/base56/base64.

If all you need is to generate a short ID (since you mention URL shorteners) rather than encode/decode something, this module might help:
https://github.com/stochastic-technologies/shortuuid/

You probably want base64, not base62. There's an URL-compatible version of it floating around, so the extra two filler characters shouldn't be a problem.
The process is fairly simple; consider that base64 represents 6 bits and a regular byte represents 8. Assign a value from 000000 to 111111 to each of the 64 characters chosen, and put the 4 values together to match a set of 3 base256 bytes. Repeat for each set of 3 bytes, padding at the end with your choice of padding character (0 is generally useful).

There is now a python library for this.
I'm working on making a pip package for this.
I recommend you use my bases.py https://github.com/kamijoutouma/bases.py which was inspired by bases.js
from bases import Bases
bases = Bases()
bases.toBase16(200) // => 'c8'
bases.toBase(200, 16) // => 'c8'
bases.toBase62(99999) // => 'q0T'
bases.toBase(200, 62) // => 'q0T'
bases.toAlphabet(300, 'aAbBcC') // => 'Abba'
bases.fromBase16('c8') // => 200
bases.fromBase('c8', 16) // => 200
bases.fromBase62('q0T') // => 99999
bases.fromBase('q0T', 62) // => 99999
bases.fromAlphabet('Abba', 'aAbBcC') // => 300
refer to https://github.com/kamijoutouma/bases.py#known-basesalphabets
for what bases are usable

you can download zbase62 module from pypi
eg
>>> import zbase62
>>> zbase62.b2a("abcd")
'1mZPsa'

I have benefited greatly from others' posts here. I needed the python code originally for a Django project, but since then I have turned to node.js, so here's a javascript version of the code (the encoding part) that Baishampayan Ghose provided.
var ALPHABET = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
function base62_encode(n, alpha) {
var num = n || 0;
var alphabet = alpha || ALPHABET;
if (num == 0) return alphabet[0];
var arr = [];
var base = alphabet.length;
while(num) {
rem = num % base;
num = (num - rem)/base;
arr.push(alphabet.substring(rem,rem+1));
}
return arr.reverse().join('');
}
console.log(base62_encode(2390687438976, "123456789ABCDEFGHIJKLMNPQRSTUVWXYZ"));

I hope the following snippet could help.
def num2sym(num, sym, join_symbol=''):
if num == 0:
return sym[0]
if num < 0 or type(num) not in (int, long):
raise ValueError('num must be positive integer')
l = len(sym) # target number base
r = []
div = num
while div != 0: # base conversion
div, mod = divmod(div, l)
r.append(sym[mod])
return join_symbol.join([x for x in reversed(r)])
Usage for your case:
number = 367891
alphabet = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
print num2sym(number, alphabet) # will print '1xHJ'
Obviously, you can specify another alphabet, consisting of lesser or greater number of symbols, then it will convert your number to the lesser or greater number base. For example, providing '01' as an alphabet will output string representing input number as binary.
You may shuffle the alphabet initially to have your unique representation of the numbers. It can be helpful if you're making URL shortener service.

Simplest ever.
BASE62 = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
def encode_base62(num):
s = ""
while num>0:
num,r = divmod(num,62)
s = BASE62[r]+s
return s
def decode_base62(num):
x,s = 1,0
for i in range(len(num)-1,-1,-1):
s = int(BASE62.index(num[i])) *x + s
x*=62
return s
print(encode_base62(123))
print(decode_base62("1Z"))

Python does not have a built-in solution.
The chosen solution is probably the most readable one, but we might be able to scrap a bit of performance.
from string import digits, ascii_lowercase, ascii_uppercase
base_chars = digits + ascii_lowercase + ascii_uppercase
def base_it(number, base=62):
def iterate(moving_number=number, moving_base=base):
if not moving_number:
return ''
return iterate(moving_number // moving_base, moving_base * base) + base_chars[moving_number % base]
return iterate() or base_chars[0]
Explanation
In any base every number is equal to a1 + a2*base**2 + a3*base**3... So the goal is to find all the as.
For every N=1,2,3... the code isolates the aN*base**N by "modulo" by base for base = base**(N+1) which slices all numbers bigger than N, and slicing all the numbers so that their serial is smaller than N by decreasing a every time the function is called recursively by the current aN*base**N.
Advantages and discussion
In this sample, there's only one multiplication (instead of a division) and some modulus operations, which are all relatively fast.
If you really want performance, though, you'd probably do better of using a CPython library.

Personally I like the solution from Baishampayan, mostly because of stripping the confusing characters.
For completeness, and solution with better performance, this post shows a way to use the Python base64 module.

I wrote this a while back and it's worked pretty well (negatives and all included)
def code(number,base):
try:
int(number),int(base)
except ValueError:
raise ValueError('code(number,base): number and base must be in base10')
else:
number,base = int(number),int(base)
if base < 2:
base = 2
if base > 62:
base = 62
numbers = [0,1,2,3,4,5,6,7,8,9,"a","b","c","d","e","f","g","h","i","j",
"k","l","m","n","o","p","q","r","s","t","u","v","w","x","y",
"z","A","B","C","D","E","F","G","H","I","J","K","L","M","N",
"O","P","Q","R","S","T","U","V","W","X","Y","Z"]
final = ""
loc = 0
if number < 0:
final = "-"
number = abs(number)
while base**loc <= number:
loc = loc + 1
for x in range(loc-1,-1,-1):
for y in range(base-1,-1,-1):
if y*(base**x) <= number:
final = "{}{}".format(final,numbers[y])
number = number - y*(base**x)
break
return final
def decode(number,base):
try:
int(base)
except ValueError:
raise ValueError('decode(value,base): base must be in base10')
else:
base = int(base)
number = str(number)
if base < 2:
base = 2
if base > 62:
base = 62
numbers = ["0","1","2","3","4","5","6","7","8","9","a","b","c","d","e","f",
"g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v",
"w","x","y","z","A","B","C","D","E","F","G","H","I","J","K","L",
"M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z"]
final = 0
if number.startswith("-"):
neg = True
number = list(number)
del(number[0])
temp = number
number = ""
for x in temp:
number = "{}{}".format(number,x)
else:
neg = False
loc = len(number)-1
number = str(number)
for x in number:
if numbers.index(x) > base:
raise ValueError('{} is out of base{} range'.format(x,str(base)))
final = final+(numbers.index(x)*(base**loc))
loc = loc - 1
if neg:
return -final
else:
return final
sorry about the length of it all

BASE_LIST = tuple("23456789ABCDEFGHJKLMNOPQRSTUVWXYZabcdefghjkmnpqrstuvwxyz")
BASE_DICT = dict((c, v) for v, c in enumerate(BASE_LIST))
BASE_LEN = len(BASE_LIST)
def nice_decode(str):
num = 0
for char in str[::-1]:
num = num * BASE_LEN + BASE_DICT[char]
return num
def nice_encode(num):
if not num:
return BASE_LIST[0]
encoding = ""
while num:
num, rem = divmod(num, BASE_LEN)
encoding += BASE_LIST[rem]
return encoding

Here is an recurive and iterative way to do that. The iterative one is a little faster depending on the count of execution.
def base62_encode_r(dec):
s = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
return s[dec] if dec < 62 else base62_encode_r(dec / 62) + s[dec % 62]
print base62_encode_r(2347878234)
def base62_encode_i(dec):
s = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
ret = ''
while dec > 0:
ret = s[dec % 62] + ret
dec /= 62
return ret
print base62_encode_i(2347878234)
def base62_decode_r(b62):
s = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
if len(b62) == 1:
return s.index(b62)
x = base62_decode_r(b62[:-1]) * 62 + s.index(b62[-1:]) % 62
return x
print base62_decode_r("2yTsnM")
def base62_decode_i(b62):
s = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
ret = 0
for i in xrange(len(b62)-1,-1,-1):
ret = ret + s.index(b62[i]) * (62**(len(b62)-i-1))
return ret
print base62_decode_i("2yTsnM")
if __name__ == '__main__':
import timeit
print(timeit.timeit(stmt="base62_encode_r(2347878234)", setup="from __main__ import base62_encode_r", number=100000))
print(timeit.timeit(stmt="base62_encode_i(2347878234)", setup="from __main__ import base62_encode_i", number=100000))
print(timeit.timeit(stmt="base62_decode_r('2yTsnM')", setup="from __main__ import base62_decode_r", number=100000))
print(timeit.timeit(stmt="base62_decode_i('2yTsnM')", setup="from __main__ import base62_decode_i", number=100000))
0.270266867033
0.260915645986
0.344734796766
0.311662500262

Python 3.7.x
I found a PhD's github for some algorithms when looking for an existing base62 script. It didn't work for the current max-version of Python 3 at this time so I went ahead and fixed where needed and did a little refactoring. I don't usually work with Python and have always used it ad-hoc so YMMV. All credit goes to Dr. Zhihua Lai. I just worked the kinks out for this version of Python.
file base62.py
#modified from Dr. Zhihua Lai's original on GitHub
from math import floor
base = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
b = 62;
def toBase10(b62: str) -> int:
limit = len(b62)
res = 0
for i in range(limit):
res = b * res + base.find(b62[i])
return res
def toBase62(b10: int) -> str:
if b <= 0 or b > 62:
return 0
r = b10 % b
res = base[r];
q = floor(b10 / b)
while q:
r = q % b
q = floor(q / b)
res = base[int(r)] + res
return res
file try_base62.py
import base62
print("Base10 ==> Base62")
for i in range(999):
print(f'{i} => {base62.toBase62(i)}')
base62_samples = ["gud", "GA", "mE", "lo", "lz", "OMFGWTFLMFAOENCODING"]
print("Base62 ==> Base10")
for i in range(len(base62_samples)):
print(f'{base62_samples[i]} => {base62.toBase10(base62_samples[i])}')
output of try_base62.py
Base10 ==> Base62
0 => 0
[...]
998 => g6
Base62 ==> Base10
gud => 63377
GA => 2640
mE => 1404
lo => 1326
lz => 1337
OMFGWTFLMFAOENCODING => 577002768656147353068189971419611424
Since there was no licensing info in the repo I did submit a PR so the original author at least knows other people are using and modifying their code.

In all solutions above they define the alphabet itself when in reality it's already available using the ASCII codes.
def converter_base62(count) -> str:
result = ''
start = ord('0')
while count > 0:
result = chr(count % 62 + start) + result
count //= 62
return result
def decode_base62(string_to_decode: str):
result = 0
start = ord('0')
for char in string_to_decode:
result = result * 62 + (ord(char)-start)
return result
import tqdm
n = 10_000_000
for i in tqdm.tqdm(range(n)):
assert decode_base62(converter_base62(i)) == i

Sorry, I can't help you with a library here. I would prefer using base64 and just adding to extra characters to your choice -- if possible!
Then you can use the base64 module.
If this is really, really not possible:
You can do it yourself this way (this is pseudo-code):
base62vals = []
myBase = 62
while num > 0:
reminder = num % myBase
num = num / myBase
base62vals.insert(0, reminder)

with simple recursion
"""
This module contains functions to transform a number to string and vice-versa
"""
BASE = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
LEN_BASE = len(BASE)
def encode(num):
"""
This function encodes the given number into alpha numeric string
"""
if num < LEN_BASE:
return BASE[num]
return BASE[num % LEN_BASE] + encode(num//LEN_BASE)
def decode_recursive(string, index):
"""
recursive util function for decode
"""
if not string or index >= len(string):
return 0
return (BASE.index(string[index]) * LEN_BASE ** index) + decode_recursive(string, index + 1)
def decode(string):
"""
This function decodes given string to number
"""
return decode_recursive(string, 0)

Benchmarking answers that worked for Python3 (machine: i7-8565U):
"""
us per enc()+dec() # test
(4.477935791015625, 2, '3Tx16Db2JPSS4ZdQ4dp6oW')
(6.073190927505493, 5, '3Tx16Db2JPSS4ZdQ4dp6oW')
(9.051250696182251, 9, '3Tx16Db2JPSS4ZdQ4dp6oW')
(9.864609956741333, 6, '3Tx16Db2JOOqeo6GCGscmW')
(10.868197917938232, 1, '3Tx16Db2JPSS4ZdQ4dp6oW')
(11.018349647521973, 10, '3Tx16Db2JPSS4ZdQ4dp6oW')
(12.448230504989624, 4, '03Tx16Db2JPSS4ZdQ4dp6oW')
(13.016672611236572, 7, '3Tx16Db2JPSS4ZdQ4dp6oW')
(13.212724447250366, 8, '3Tx16Db2JPSS4ZdQ4dp6oW')
(24.119479656219482, 3, '3tX16dB2jpss4zDq4DP6Ow')
"""
from time import time
half = 2 ** 127
results = []
def bench(n, enc, dec):
start = time()
for i in range(half, half + 1_000_000):
dec(enc(i))
end = time()
results.append(tuple([end - start, n, enc(half + 1234134134134314)]))
BASE62 = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
def encode(num, alphabet=BASE62):
"""Encode a positive number into Base X and return the string.
Arguments:
- `num`: The number to encode
- `alphabet`: The alphabet to use for encoding
"""
if num == 0:
return alphabet[0]
arr = []
arr_append = arr.append # Extract bound-method for faster access.
_divmod = divmod # Access to locals is faster.
base = len(alphabet)
while num:
num, rem = _divmod(num, base)
arr_append(alphabet[rem])
arr.reverse()
return ''.join(arr)
def decode(string, alphabet=BASE62):
"""Decode a Base X encoded string into the number
Arguments:
- `string`: The encoded string
- `alphabet`: The alphabet to use for decoding
"""
base = len(alphabet)
strlen = len(string)
num = 0
idx = 0
for char in string:
power = (strlen - (idx + 1))
num += alphabet.index(char) * (base ** power)
idx += 1
return num
bench(1, encode, decode)
###########################################################################################################
# Remove the `_#` below for base62, now it has 64 characters
BASE_ALPH = tuple(BASE62)
BASE_LIST = BASE62
BASE_DICT = dict((c, v) for v, c in enumerate(BASE_ALPH))
###########################################################################################################
BASE_LEN = len(BASE_ALPH)
def decode(string):
num = 0
for char in string:
num = num * BASE_LEN + BASE_DICT[char]
return num
def encode(num):
if not num:
return BASE_ALPH[0]
encoding = ""
while num:
num, rem = divmod(num, BASE_LEN)
encoding = BASE_ALPH[rem] + encoding
return encoding
bench(2, encode, decode)
###########################################################################################################
from django.utils import baseconv
bench(3, baseconv.base62.encode, baseconv.base62.decode)
###########################################################################################################
def encode(a):
baseit = (lambda a=a, b=62: (not a) and '0' or
baseit(a - a % b, b * 62) + '0123456789abcdefghijklmnopqrstuvwxyz'
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'[
a % b % 61 or -1 * bool(a % b)])
return baseit()
bench(4, encode, decode)
###########################################################################################################
def encode(num, sym=BASE62, join_symbol=''):
if num == 0:
return sym[0]
l = len(sym) # target number base
r = []
div = num
while div != 0: # base conversion
div, mod = divmod(div, l)
r.append(sym[mod])
return join_symbol.join([x for x in reversed(r)])
bench(5, encode, decode)
###########################################################################################################
from math import floor
base = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
b = 62;
def decode(b62: str) -> int:
limit = len(b62)
res = 0
for i in range(limit):
res = b * res + base.find(b62[i])
return res
def encode(b10: int) -> str:
if b <= 0 or b > 62:
return 0
r = b10 % b
res = base[r];
q = floor(b10 / b)
while q:
r = q % b
q = floor(q / b)
res = base[int(r)] + res
return res
bench(6, encode, decode)
###########################################################################################################
def encode(dec):
s = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
return s[dec] if dec < 62 else encode(dec // 62) + s[int(dec % 62)]
def decode(b62):
s = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
if len(b62) == 1:
return s.index(b62)
x = decode(b62[:-1]) * 62 + s.index(b62[-1:]) % 62
return x
bench(7, encode, decode)
def encode(dec):
s = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
ret = ''
while dec > 0:
ret = s[dec % 62] + ret
dec //= 62
return ret
def decode(b62):
s = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
ret = 0
for i in range(len(b62) - 1, -1, -1):
ret = ret + s.index(b62[i]) * (62 ** (len(b62) - i - 1))
return ret
bench(8, encode, decode)
###########################################################################################################
def encode(num):
s = ""
while num > 0:
num, r = divmod(num, 62)
s = BASE62[r] + s
return s
def decode(num):
x, s = 1, 0
for i in range(len(num) - 1, -1, -1):
s = int(BASE62.index(num[i])) * x + s
x *= 62
return s
bench(9, encode, decode)
###########################################################################################################
def encode(number: int, alphabet=BASE62, padding: int = 22) -> str:
l = len(alphabet)
res = []
while number > 0:
number, rem = divmod(number, l)
res.append(alphabet[rem])
if number == 0:
break
return "".join(res)[::-1] # .rjust(padding, "0")
def decode(digits: str, lookup=BASE_DICT) -> int:
res = 0
last = len(digits) - 1
base = len(lookup)
for i, d in enumerate(digits):
res += lookup[d] * pow(base, last - i)
return res
bench(10, encode, decode)
###########################################################################################################
for row in sorted(results):
print(row)

Original javascript version:
var hash = "", alphabet = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ", alphabetLength =
alphabet.length;
do {
hash = alphabet[input % alphabetLength] + hash;
input = parseInt(input / alphabetLength, 10);
} while (input);
Source: https://hashids.org/
python:
def to_base62(number):
alphabet = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
alphabetLength = len(alphabet)
result = ""
while True:
result = alphabet[number % alphabetLength] + result
number = int(number / alphabetLength)
if number == 0:
break
return result
print to_base62(59*(62**2) + 60*(62) + 61)
# result: XYZ

Related

Recursive function to reduce zeros from number

I have a mission to write a recursive function named Reduce. This function should reduce all the zeros from number and return new number.
for example:
Reduce(-160760) => -1676
Reduce(1020034000) => 1234
I started to to something but I got stuck in the condition. here's the code I wrote so far:
def Reduce(num):
while num != 0:
if num % 10 != 0:
newNum = (num % 10) +
Reduce(num//10)

def reduce(num):
if num == 0: return 0
if num < 0: return -reduce(-num)
if num % 10 == 0:
return reduce(num // 10)
else:
return num % 10 + 10 * reduce(num // 10)

String version of recursive function:
def reduce(n):
return int(reduce_recursive(str(n), ''))
def reduce_recursive(num, res):
if not num: # if we've recursed on the whole input, nothing left to do
return res
if num[0] == '0': # if the character is '0', ignore it and recurse on the next character
return reduce_recursive(num[1:], res)
return reduce_recursive(num[1:], res+num[0]) # num[0] is not a '0' so we add it to the result and we move to the next character
>>> reduce(1200530060)
12536

Why does my hexadecimal to base64 converter break down on large numbers?

I'm working on a cryptopals problem. Specifically, the first one. I have a, what I feel, decent solution for it, in that it works for given inputs, and for the example they give, but I've been looking at further testing to see if it holds up, and it doesn't seem to. Here's my code:
hex="0123456789abcdef"
base64 = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
def hexNumeralToBinary(hexNumeral):
index = hex.index(hexNumeral)
binaryNumber = []
while index > 0:
if (index % 2) == 0:
binaryNumber = ["0"] + binaryNumber
index = index / 2
else:
binaryNumber = ["1"] + binaryNumber
index = (index-1)/2
while len(binaryNumber) < 4:
binaryNumber = ["0"] + binaryNumber
return ''.join(binaryNumber)
def hexToBinary(hexNumber):
length = len(hexNumber) + 1
binaryNumber = []
for i in range(1, length):
hexNumeral = hexNumber[-1*i]
binaryNumeral = hexNumeralToBinary(hexNumeral)
binaryNumber = [binaryNumeral] + binaryNumber
return ''.join(binaryNumber)
def splitString(binaryNumber):
while (len(binaryNumber) % 6) != 0:
binaryNumber = "0" + binaryNumber
binaryNumberSplit = []
while binaryNumber != "":
binaryNumberSplit.append(binaryNumber[0:6])
binaryNumber = binaryNumber[6:]
return binaryNumberSplit
def hexToBase64(hexNumber):
base64Number = []
binaryNumber = hexToBinary(hexNumber)
binaryNumberSplit = splitString(binaryNumber)
for sixBitNum in binaryNumberSplit:
#Convert 6 bit binary number into the base64 index
index = 0
for i in range(1, 7):
index += int(sixBitNum[-1*i]) * (2**(i-1))
base64Digit = base64[index]
base64Number = base64Number + [base64Digit]
return ''.join(base64Number)
hexNumber = input("Please enter a hexadecimal number to convert to base64: ")
print(hexToBase64(hexNumber))
It seems like the hexadecimal number needs to have a number of digits divisible by 3, or it just gets stuck while running. I don't know where, and I'm really stumped. I'm a total beginner to programming, and this is easily the most complex thing I've done, just trying to work it out as I go, and this isn't coming to me.

Convert between str and int without using built-in typecasting

NB: You may not use built-in typecasting: code this yourself.
def str2int(s):
result = 0
if s[0] == '-':
sign = -1
i = 1
while i < len(s):
num = ord(s[i]) - ord('0')
result = result * 10 + num
i += 1
result = sign * result
return result
else:
i = 0
while i < len(s):
num = ord(s[i]) - ord('0')
result = result * 10 + num
i += 1
return result
NB: You may not use built-in str() or string template. Code this yourself.
def int2str(i):
strng = ""
if i > 0:
while i != 0:
num = i % 10
strng += chr(48+num)
i = i / 10
return strng[::-1]
else:
while i != 0:
num = abs(i) % 10
strng += chr(48+num)
i = abs(i) / 10
return '-' + strng[::-1]
I am a newbie and I have to write code based on basic. I write these function by myself but these look weird. Can you help me to improve code? Thank you

This maybe a better question for https://codereview.stackexchange.com/.
Not withstanding there is no error checking, one obvious comment is you have common code that can be factored out. Only capture in the if, else what is unique rather than repeat the while loop:
def str2int(s):
if s[0] == '-':
sign = -1
i = 1
else:
sign = 1
i = 0
result = 0
while i < len(s):
num = ord(s[i]) - ord('0')
result = result * 10 + num
i += 1
return sign * result
It is generally considered better form in python to iterate over list rather than indices:
def str2int(s):
sign = 1
if s[0] == '-':
sign = -1
s = s[1:]
result = 0
for c in s:
num = ord(c) - ord('0')
result = result * 10 + num
return sign * result
These last lines are equivalent to a standard map and reduce (reduce is in functools for py3). Though some would argue against it:
from functools import reduce # Py3
def str2int(s):
sign = 1
if s[0] == '-':
sign = -1
s = s[1:]
return sign * reduce(lambda x,y: x*10+y, map(lambda c: ord(c) - ord('0'), s))
There are similar opportunities to do the same for int2str().

Encryption code in python, call upon a function, python returns nothing. No error messages show

Okay, so I've written the following series of functions in Python 3.6.0:
def code_char(c, key):
result = ord(c) + key
if c.isupper():
while result > ord("Z"):
result -= 26
while result < ord("A"):
result += 26
return chr(result)
else:
while result > ord("z"):
result -= 26
while result < ord("a"):
result += 26
result = chr(result)
return result
def isletter(char):
if 65 <= ord(char) <= 90 or 97<= ord(char) <= 122:
return True
else:
return False
def encrypt(string, key):
result = ""
length = len(string)
key = key * (length // len(key)) + key[0:(length % len(key))]
for i in range(0,length):
if (isletter for i in string):
c = string[i]
num = int("".join("".join(i) for i in key))
result += code_char(c, num)
else:
c = string[i]
result += i
return result
Then I try to call on the functions with:
encrypt("This is a secret message!!", "12345678")
When python runs the program absolutely nothing happens. Nothing gets returned, and in the shell python forces me onto a blank line without indents, or >>>. i don't know what is right or wrong with the code as no error messages appear, and no results appear. Any kind of advice would be appreciated.
Thank you.

Looking at your code, I don't think this is an infinite loop. I think your loop will not be infinite but will run for a very long time since the value of key is very big, and so, subtracting 26 at a time, until it gets to an English letter ascii value, will just take forever (but not really forever)
>>> key = '12345678'
>>> length = len("This is a secret message!!")
>>> key * (length // len(key)) + key[0:(length % len(key))]
'12345678123456781234567812'
It might be a problem in the your logic, maybe in the logic generating the key, but if this is indeed the logic you want, how about using modulus rather than iterating:
def code_char(c, key):
result = ord(c) + key
if c.isupper():
if result > ord("Z"):
result = ord("Z") + result % 26 - 26
if result < ord("A"):
result = ord("A") - result % 26 + 26
return chr(result)
else:
if result > ord("z"):
result = ord("z") + result % 26 - 26
if result < ord("a"):
result = ord("a") - result % 26 + 26
return chr(result)
def isletter(char):
if 65 <= ord(char) <= 90 or 97<= ord(char) <= 122:
return True
else:
return False
def encrypt(string, key):
result = ""
length = len(string)
key = key * (length // len(key)) + key[0:(length % len(key))]
for i in range(0,length):
if (isletter for i in string):
c = string[i]
num = int("".join("".join(i) for i in key))
result += code_char(c, num)
else:
c = string[i]
result += i
return result
>>> encrypt("This is a secret message!!", "12345678")
'Rlmwrmwrerwigvixrqiwwekiss'

Should you be having while loop here , or are you intening if loop? I don't see any exit for while loop. That may be where your code is hanging.
if c.isupper():
while result > ord("Z"):
result -= 26
while result < ord("A"):
result += 26
return chr(result)
else:
while result > ord("z"):
result -= 26
while result < ord("a"):
result += 26
Also, if I replace while with if above, it's giving me overflow error.
OverflowError: Python int too large to convert to C long
EDIT
After looking at #polo's comment and taking a second look at code, I believe #polo is correct. I put while loop back and added print statements. I have commented them, but you can uncomment at your end.
I've also reduced key's complexity to just key = key and reduced key from 12345678 to just 1234 to see if the code works and if it completes in reasonable time.. You can make it as complex as you want once code runs smoothly.
Here is result I got after:
>>>
key =1234
coding char = T
coding char = h
coding char = i
coding char = s
coding char =
coding char = i
coding char = s
coding char =
coding char = a
coding char =
coding char = s
coding char = e
coding char = c
coding char = r
coding char = e
coding char = t
coding char =
coding char = m
coding char = e
coding char = s
coding char = s
coding char = a
coding char = g
coding char = e
coding char = !
coding char = !
encrypted_message = Ftuezuezmzeqodqfzyqeemsqaa
Modified code below:
def code_char(c, key):
result = ord(c) + key
if c.isupper():
while result > ord("Z"):
#print("result1 = {}",format(result))
result -= 26
while result < ord("A"):
#print("result2 = {}",format(result))
result += 26
return chr(result)
else:
while result > ord("z"):
#print("result3 = {}",format(result))
result -= 26
while result < ord("a"):
#print("result4 = {}",format(result))
result += 26
result = chr(result)
return result
def isletter(char):
if 65 <= ord(char) <= 90 or 97<= ord(char) <= 122:
return True
else:
return False
def encrypt(string, key):
result = ""
length = len(string)
#key = key * (length // len(key)) + key[0:(length % len(key))]
key = key
print "key ={}".format(key)
for i in range(0,length):
if (isletter for i in string):
c = string[i]
num = int("".join("".join(i) for i in key))
print("coding char = {}".format(c))
result += code_char(c, num)
else:
c = string[i]
result += i
return result
#encrypt("This is a secret message!!", "12345678")
encrypted_message = encrypt("This is a secret message!!", "1234")
print("encrypted_message = {}".format(encrypted_message))

Proper way to benchmark python code

I have the following modulo exponentiation code and I would like to benchmark a few lines in the function.
One line is:
temp = square(temp)
But python complains that global name 'square' is not defined. Also how can I benchmark the line with
ret = temp % n
Do I also need to write it into a function?
import math
import timeit
def int2baseTwo(x):
"""x is a positive integer. Convert it to base two as a list of integers
in reverse order as a list."""
# repeating x >>= 1 and x & 1 will do the trick
assert x >= 0
bitInverse = []
while x != 0:
bitInverse.append(x & 1)
x >>= 1
return bitInverse
def square(a):
return a ** 2
def modExp(a, d, n):
"""returns a ** d (mod n)"""
assert d >= 0
assert n >= 0
base2D = int2baseTwo(d)
#print 'base2D = ', base2D
base2DLength = len(base2D)
#print 'base2DLength = ', base2DLength
modArray = []
result = 1
temp = 1
for i in range(0, base2DLength):
if i == 0:
temp = a
continue
print(timeit.timeit("temp = square(temp)", setup="from __main__ import modExp"))
if base2D[i] == 1:
temp = temp * a
ret = temp % n
return ret
if __name__=="__main__":
print(timeit.timeit("modExp(1000,100,59)", setup="from __main__ import modExp"))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Base 62 conversion - python

If you use django framework, you can use django.utils.baseconv module. >>> from django.utils import baseconv >>> baseconv.base62.encode(1234567890) 1LY7VK In addition to base62, baseconv also defined base2/base16/base36/base56/base64.

If all you need is to generate a short ID (since you mention URL shorteners) rather than encode/decode something, this module might help: https://github.com/stochastic-technologies/shortuuid/

you can download zbase62 module from pypi eg >>> import zbase62 >>> zbase62.b2a("abcd") '1mZPsa'

Personally I like the solution from Baishampayan, mostly because of stripping the confusing characters. For completeness, and solution with better performance, this post shows a way to use the Python base64 module.

Related

Recursive function to reduce zeros from number

Why does my hexadecimal to base64 converter break down on large numbers?

Convert between str and int without using built-in typecasting

Encryption code in python, call upon a function, python returns nothing. No error messages show

Proper way to benchmark python code

Categories

Resources