Is there a way to autogenerate valid arithmetic expressions?

Is there a way to autogenerate valid arithmetic expressions? - python

I'm currently trying to create a Python script that will autogenerate space-delimited arithmetic expressions which are valid. However, I get sample output that looks like this: ( 32 - 42 / 95 + 24 ( ) ( 53 ) + ) 21
While the empty parentheses are perfectly OK by me, I can't use this autogenerated expression in calculations since there's no operator between the 24 and the 53, and the + before the 21 at the end has no second argument.
What I want to know is, is there a way to account for/fix these errors using a Pythonic solution? (And before anyone points it out, I'll be the first to acknowledge that the code I posted below is probably the worst code I've pushed and conforms to...well, very few of Python's core tenets.)
import random
parentheses = ['(',')']
ops = ['+','-','*','/'] + parentheses
lines = 0
while lines < 1000:
fname = open('test.txt','a')
expr = []
numExpr = lines
if (numExpr % 2 == 0):
numExpr += 1
isDiv = False # Boolean var, makes sure there's no Div by 0
# isNumber, isParentheses, isOp determine whether next element is a number, parentheses, or operator, respectively
isNumber = random.randint(0,1) == 0 # determines whether to start sequence with number or parentheses
isParentheses = not isNumber
isOp = False
# Counts parentheses to ensure parentheses are matching
numParentheses = 0
while (numExpr > 0 or numParentheses > 0):
if (numExpr < 0 and numParentheses > 0):
isDiv = False
expr.append(')')
numParentheses -= 1
elif (isOp and numParentheses > 0):
rand = random.randint(0,5)
expr.append(ops[rand])
isDiv = (rand == 3) # True if div op was just appended
# Checks to see if ')' was appended
if (rand == 5):
isNumber = False
isOp = True
numParentheses -= 1
# Checks to see if '(' was appended
elif (rand == 4):
isNumber = True
isOp = False
numParentheses += 1
# All other operations go here
else:
isNumber = True
isOp = False
# Didn't add parentheses possibility here in case expression in parentheses somehow reaches 0
elif (isNumber and isDiv):
expr.append(str(random.randint(1,100)))
isDiv = False
isNumber = False
isOp = True
# If a number's up, decides whether to append parentheses or a number
elif (isNumber):
rand = random.randint(0,1)
if (rand == 0):
expr.append(str(random.randint(0,100)))
isNumber = False
isOp = True
elif (rand == 1):
if (numParentheses == 0):
expr.append('(')
numParentheses += 1
else:
rand = random.randint(0,1)
expr.append(parentheses[rand])
if rand == 0:
numParentheses += 1
else:
numParentheses -= 1
isDiv = False
numExpr -= 1
fname.write(' '.join(expr) + '\n')
fname.close()
lines += 1

Yes, you can generate random arithmetic expressions in a Pythonic way. You need to change your approach, though. Don't try to generate a string and count parens. Instead generate a random expression tree, then output that.
By an expression tree, I mean an instance of a class called, say, Expression with subclasses Number, PlusExpression,MinusExpression, 'TimesExpression, DivideExpression, and ParenthesizedExpression. Each of these, except Number will have fields of type Expression. Give each a suitable __str__ method. Generate some random expression objects and just print the "root."
Can you take it from here or would you like me to code it up?
ADDENDUM: Some sample starter code. Doesn't generate random expressions (yet?) but this can be added....
# This is just the very beginning of a script that can be used to process
# arithmetic expressions. At the moment it just defines a few classes
# and prints a couple example expressions.
# Possible additions include methods to evaluate expressions and generate
# some random expressions.
class Expression:
pass
class Number(Expression):
def __init__(self, num):
self.num = num
def __str__(self):
return str(self.num)
class BinaryExpression(Expression):
def __init__(self, left, op, right):
self.left = left
self.op = op
self.right = right
def __str__(self):
return str(self.left) + " " + self.op + " " + str(self.right)
class ParenthesizedExpression(Expression):
def __init__(self, exp):
self.exp = exp
def __str__(self):
return "(" + str(self.exp) + ")"
e1 = Number(5)
print e1
e2 = BinaryExpression(Number(8), "+", ParenthesizedExpression(BinaryExpression(Number(7), "*", e1)))
print e2
** ADDENDUM 2 **
Getting back into Python is really fun. I couldn't resist implementing the random expression generator. It is built on the code above. SORRY ABOUT THE HARDCODING!!
from random import random, randint, choice
def randomExpression(prob):
p = random()
if p > prob:
return Number(randint(1, 100))
elif randint(0, 1) == 0:
return ParenthesizedExpression(randomExpression(prob / 1.2))
else:
left = randomExpression(prob / 1.2)
op = choice(["+", "-", "*", "/"])
right = randomExpression(prob / 1.2)
return BinaryExpression(left, op, right)
for i in range(10):
print(randomExpression(1))
Here is the output I got:
(23)
86 + 84 + 87 / (96 - 46) / 59
((((49)))) + ((46))
76 + 18 + 4 - (98) - 7 / 15
(((73)))
(55) - (54) * 55 + 92 - 13 - ((36))
(78) - (7 / 56 * 33)
(81) - 18 * (((8)) * 59 - 14)
(((89)))
(59)
Ain't tooooo pretty. I think it puts out too many parents. Maybe change the probability of choosing between parenthesized expressions and binary expressions might work well....

Actually, as long as Ray Toal's response is formally correct, for such a simple problem you don't have to subclass each operator*. I came up with the following code which works pretty well:
import random
import math
class Expression(object):
OPS = ['+', '-', '*', '/']
GROUP_PROB = 0.3
MIN_NUM, MAX_NUM = 0, 20
def __init__(self, maxNumbers, _maxdepth=None, _depth=0):
"""
maxNumbers has to be a power of 2
"""
if _maxdepth is None:
_maxdepth = math.log(maxNumbers, 2) - 1
if _depth < _maxdepth and random.randint(0, _maxdepth) > _depth:
self.left = Expression(maxNumbers, _maxdepth, _depth + 1)
else:
self.left = random.randint(Expression.MIN_NUM, Expression.MAX_NUM)
if _depth < _maxdepth and random.randint(0, _maxdepth) > _depth:
self.right = Expression(maxNumbers, _maxdepth, _depth + 1)
else:
self.right = random.randint(Expression.MIN_NUM, Expression.MAX_NUM)
self.grouped = random.random() < Expression.GROUP_PROB
self.operator = random.choice(Expression.OPS)
def __str__(self):
s = '{0!s} {1} {2!s}'.format(self.left, self.operator, self.right)
if self.grouped:
return '({0})'.format(s)
else:
return s
for i in range(10):
print Expression(4)
It can although be improved to take into considerations things like divisions by zero (not handled currently), customization of all parameters through attributes, allowing any value for the maxNumbers argument and so on.
* By "simple problem" I mean "generating valid expressions"; if you are adding any other functionality (for example, expression evaluation), then Ray's approach will pay of because you can define the behavior of each subclass in a much cleaner way.
Edit (output):
(5 * 12 / 16)
6 * 3 + 14 + 0
13 + 15 - 1
19 + (8 / 8)
(12 + 3 - 5)
(4 * 0 / 4)
1 - 18 / (3 * 15)
(3 * 16 + 3 * 1)
(6 + 16) / 16
(8 * 10)

I found this thread on a similar quest, namely to generate random expressions for unit testing of symbolic calculations. In my version, I included unary functions and allowed the symbols to be arbitrary strings, i.e. you can use numbers or variable names.
from random import random, choice
UNARIES = ["sqrt(%s)", "exp(%s)", "log(%s)", "sin(%s)", "cos(%s)", "tan(%s)",
"sinh(%s)", "cosh(%s)", "tanh(%s)", "asin(%s)", "acos(%s)",
"atan(%s)", "-%s"]
BINARIES = ["%s + %s", "%s - %s", "%s * %s", "%s / %s", "%s ** %s"]
PROP_PARANTHESIS = 0.3
PROP_BINARY = 0.7
def generate_expressions(scope, num_exp, num_ops):
scope = list(scope) # make a copy first, append as we go
for _ in xrange(num_ops):
if random() < PROP_BINARY: # decide unary or binary operator
ex = choice(BINARIES) % (choice(scope), choice(scope))
if random() < PROP_PARANTHESIS:
ex = "(%s)" % ex
scope.append(ex)
else:
scope.append(choice(UNARIES) % choice(scope))
return scope[-num_exp:] # return most recent expressions
As copied from pervious answers, I just throw in some paranthesis around binary operators with probability PROP_PARANTHESIS (that is a bit of cheating). Binary operators are more common than unary ones, so I left it also for configuration (PROP_BINARY). An example code is:
scope = [c for c in "abcde"]
for expression in generate_expressions(scope, 10, 50):
print expression
This will generate something like:
e / acos(tan(a)) / a * acos(tan(a)) ** (acos(tan(a)) / a + a) + (d ** b + a)
(a + (a ** sqrt(e)))
acos((b / acos(tan(a)) / a + d) / (a ** sqrt(e)) * (a ** sinh(b) / b))
sin(atan(acos(tan(a)) ** (acos(tan(a)) / a + a) + (d ** b + a)))
sin((b / acos(tan(a)) / a + d)) / (a ** sinh(b) / b)
exp(acos(tan(a)) / a + acos(e))
tan((b / acos(tan(a)) / a + d))
acos(tan(a)) / a * acos(tan(a)) ** (acos(tan(a)) / a + a) + (d ** b + a) + cos(sqrt(e))
(acos(tan(a)) / a + acos(e) * a + e)
((b / acos(tan(a)) / a + d) - cos(sqrt(e))) + sinh(b)
Putting PROP_BINARY = 1.0 and applying with
scope = range(100)
brings us back to output like
43 * (50 * 83)
34 / (29 / 24)
66 / 47 - 52
((88 ** 38) ** 40)
34 / (29 / 24) - 27
(16 + 36 ** 29)
55 ** 95
70 + 28
6 * 32
(52 * 2 ** 37)

Ok, I couldn't resist adding my own implementation using some of the ideas we discussed in Ray's answer. I approached a few things differently than Ray did.
I added some handling of the probability of the incidence of each operator. The operators are biased so that the lower priority operators (larger precedence values) are more common than the higher order ones.
I also implemented parentheses only when precedence requires. Since the integers have the highest priority (lowest precedence value) they never get wrapped in parentheses. There is no need for a parenthesized expression as a node in the expression tree.
The probability of using an operator is biased towards the initial levels (using a quadratic function) to get a nicer distribution of operators. Choosing a different exponent gives more potential control of the quality of the output, but I didn't play with the possibilities much.
I further implemented an evaluator for fun and also to filter out indeterminate expressions.
import sys
import random
# dictionary of operator precedence and incidence probability, with an
# evaluator added just for fun.
operators = {
'^': {'prec': 10, 'prob': .1, 'eval': lambda a, b: pow(a, b)},
'*': {'prec': 20, 'prob': .2, 'eval': lambda a, b: a*b},
'/': {'prec': 20, 'prob': .2, 'eval': lambda a, b: a/b},
'+': {'prec': 30, 'prob': .25, 'eval': lambda a, b: a+b},
'-': {'prec': 30, 'prob': .25, 'eval': lambda a, b: a-b}}
max_levels = 3
integer_range = (-100, 100)
random.seed()
# A node in an expression tree
class expression(object):
def __init__(self):
super(expression, self).__init__()
def precedence(self):
return -1
def eval(self):
return 0
#classmethod
def create_random(cls, level):
if level == 0:
is_op = True
elif level == max_levels:
is_op = False
else:
is_op = random.random() <= 1.0 - pow(level/max_levels, 2.0)
if is_op:
return binary_expression.create_random(level)
else:
return integer_expression.create_random(level)
class integer_expression(expression):
def __init__(self, value):
super(integer_expression, self).__init__()
self.value = value
def __str__(self):
return self.value.__str__()
def precedence(self):
return 0
def eval(self):
return self.value
#classmethod
def create_random(cls, level):
return integer_expression(random.randint(integer_range[0],
integer_range[1]))
class binary_expression(expression):
def __init__(self, symbol, left_expression, right_expression):
super(binary_expression, self).__init__()
self.symbol = symbol
self.left = left_expression
self.right = right_expression
def eval(self):
f = operators[self.symbol]['eval']
return f(self.left.eval(), self.right.eval())
#classmethod
def create_random(cls, level):
symbol = None
# Choose an operator based on its probability distribution
r = random.random()
cumulative = 0.0
for k, v in operators.items():
cumulative += v['prob']
if r <= cumulative:
symbol = k
break
assert symbol != None
left = expression.create_random(level + 1)
right = expression.create_random(level + 1)
return binary_expression(symbol, left, right)
def precedence(self):
return operators[self.symbol]['prec']
def __str__(self):
left_str = self.left.__str__()
right_str = self.right.__str__()
op_str = self.symbol
# Use precedence to determine if we need to put the sub expressions in
# parentheses
if self.left.precedence() > self.precedence():
left_str = '('+left_str+')'
if self.right.precedence() > self.precedence():
right_str = '('+right_str+')'
# Nice to have space around low precedence operators
if operators[self.symbol]['prec'] >= 30:
op_str = ' ' + op_str + ' '
return left_str + op_str + right_str
max_result = pow(10, 10)
for i in range(10):
expr = expression.create_random(0)
try:
value = float(expr.eval())
except:
value = 'indeterminate'
print expr, '=', value
I got these results:
(4 + 100)*41/46 - 31 - 18 - 2^-83 = -13.0
(43 - -77)/37^-94 + (-66*67)^(-24*49) = 3.09131533541e+149
-32 - -1 + 74 + 74 - 15 + 64 - -22/98 = 37.0
(-91*-4*45*-55)^(-9^2/(82 - -53)) = 1.0
-72*-85*(75 - 65) + -100*19/48*22 = 61198.0
-57 - -76 - -54*76 + -38 - -23 + -17 - 3 = 4088.0
(84*-19)^(13 - 87) - -10*-84*(-28 + -49) = 64680.0
-69 - -8 - -81^-51 + (53 + 80)^(99 - 48) = 2.07220963807e+108
(-42*-45)^(12/87) - -98 + -23 + -67 - -37 = 152.0
-31/-2*-58^-60 - 33 - -49 - 46/12 = -79.0
There are a couple of things the program does, that although are valid, a human wouldn't do. For example:
It can create long strings of sequential divides (e.g. 1/2/3/4/5).
+/- of a negative number is common (e.g. 1 - -2)
These can be corrected with a clean-up pass.
Also, there is no guarantee that the answer is determinate. Divides by 0 and 0^0 are possible, although with the exception handling these can be filtered out.

import random
def expr(depth):
if depth==1 or random.random()<1.0/(2**depth-1):
return str(int(random.random() * 100))
return '(' + expr(depth-1) + random.choice(['+','-','*','/']) + expr(depth-1) + ')'
for i in range(10):
print expr(4)

Generate an array at random in RPN with mixtures of operators and numbers (always valid). Then start from middle of the array and generate the corresponding evaluation tree.

Related

Getting Wrong ancestor for the right side of the tree

So I was trying to implement LCA using Euler tour and RMQ with Sparse Table in python. My code is working fine, if I insert number from left side of the tree. The code for building the tree is:
The node class has data, left_child and right_child:
class Node:
def __init__(self, value):
self.data = value
self.right_child = None
self.left_child = None
Then I have a tree class which I used to build the tree:
class Tree:
content = []
def __init__(self):
self.root = None
def add_child(self, arr, root, i, n):
if i < n:
temp = Node(arr[i])
root = temp
root.left_child = self.add_child(arr, root.left_child, 2 * i + 1, n)
root.right_child = self.add_child(arr, root.right_child, 2 * i + 2, n)
return root
def pre_order(self, root):
if root != None:
Tree.content.append(root.data)
self.pre_order(root.left_child)
self.pre_order(root.right_child)
return Tree.content
Finally, I got an LCA class which mainly contains the function for building the sparse table, then do a RMQ on the sparse table, then a function that does the part of the euler tour and finally the function for finding LCA.
class LCA:
def __init__(self, root, length):
self.pre_process_array = [[0] * (math.floor(math.log2(length) + 2)) for _ in range((length * 2) - 1)]
self.euler = [0] * (length * 2 - 1)
self.height = [0] * (length * 2 - 1)
self.index = [-1 for _ in range(length + 1)]
self.tour = 0
self.euler_tour(root, 0)
self.build_sparse_table(self.height)
print(self.pre_process_array)
def build_sparse_table(self, height_array):
for i in range(len(height_array)):
self.pre_process_array[i][0] = height_array[i]
j = 1
while (1 << j) <= len(height_array):
i = 0
while (i + (1 << j) - 1) < len(height_array):
if self.pre_process_array[i][j - 1] < self.pre_process_array[i + (1 << (j - 1))][j-1]:
self.pre_process_array[i][j] = self.pre_process_array[i][j - 1]
else:
self.pre_process_array[i][j] = self.pre_process_array[i + (1 << (j - 1))][j - 1]
i += 1
j += 1
def rmq(self, l, h):
j = int(math.log2(h - l + 1))
if self.pre_process_array[l][j] <= self.pre_process_array[h - (1 << j) + 1][j]:
return self.pre_process_array[l][j]
else:
return self.pre_process_array[h - (1 << j) + 1][j]
def euler_tour(self, root, level):
if root is not None:
self.euler[self.tour] = root.data
self.height[self.tour] = level
self.tour += 1
if self.index[root.data] == -1:
self.index[root.data] = self.tour - 1
if root.left_child is not None:
self.euler_tour(root.left_child, level + 1)
self.euler[self.tour] = root.data
self.height[self.tour] = level
self.tour += 1
if root.right_child is not None:
self.euler_tour(root.right_child, level + 1)
self.euler[self.tour] = root.data
self.height[self.tour] = level
self.tour += 1
def find_LCA(self, val1, val2):
if val1 >= len(self.index) or val2 >= len(self.index) or self.index[val1] == -1 or self.index[val2] == -1:
return -1
if self.index[val1] > self.index[val2]:
return self.euler[self.rmq(self.index[val2], self.index[val1])]
elif self.index[val2] > self.index[val1]:
return self.euler[self.rmq(self.index[val1], self.index[val2])]
And my driver code is as follow:
tree_data = [1,2,3,4,5,6,7,8,9]
length = len(tree_data)
tree = Tree()
root = tree.root
tree.root = tree.add_child(tree_data, root, 0, length)
l = LCA(tree.root, length)
print(l.find_LCA(6, 3))
so, my tree should something look like:
1
/ \
2 3
/ \ / \
4 5 6 7
/ \
8 9
Now, if I try to find the LCA(8, 9) or LCA(4, 3), I get the correct output but whenever I try to give LCA(6, 7) or LCA(3, 7), It always keeps returning 2 as the answer where it should return 3 and 3 respectively.
I don't have any idea where is the mistake as, from my side it looks pretty fine.

How can I modify code by excluding libraries

I am writing a program to do a The following:
Read given file name, and Print a quick summary of statistics
Print a table of word length frequencies,and graphs
Print graphs of word length frequencies. Print a blank
line.
Print a graphical representation of the relative frequency of
each word length.
Here is the text file data used for testing the code:
This is before the start and should be ignored.
So should this
and this
*** START OF SYNTHETIC TEST CASE ***
a blah ba ba
*** END OF SYNTHETIC TEST CASE ***
This is after the end and should be ignored too.
Have a nice day.
Here's my code so far:
import os
from collections import Counter
TABLE_TITLE = " Len Freq"
FREQ_TABLE_TEMPLATE = "{:>4}{:>6}"
GRAPH_TITLE = " Len Freq Graph"
GRAPH_LINE_TEMPLATE = "{:>4}{:>5}% {}"
def get_filename():
filename = input("Please enter filename: ")
while not os.path.isfile(filename):
print(filename, "not found...")
filename = input("Please enter filename: ")
return filename
def get_words_from_file(filename):
lines = open_and_read(filename)
stripped = strip_data(lines)
return stripped
def open_and_read(filename):
should_add = False
processed_data = []
infile = open(filename, 'r', encoding='utf-8')
raw_data = infile.readlines()
for line in raw_data:
if line.startswith("*** START"):
should_add = True
elif line.startswith("*** END OF"):
should_add = False
break
if should_add:
processed_data.append(line)
processed_data.pop(0)
return processed_data
def strip_data(raw_data):
stripped_list = get_words(raw_data)
processed_data = remove_punctuation(stripped_list)
return processed_data
def get_words(raw_data):
"""
Takes a list, raw_data, splits and strips words.
returns a list stripped_list
"""
stripped_list = []
for word in raw_data:
word = word.strip('\n"-:\';,.').split(' ')
for bit in word:
bit = bit.strip('\n"-:\';,.').split(' ')
stripped_list.append(bit)
return stripped_list
def remove_punctuation(stripped_list):
"""
Takes a list, stripped_list, removes the all non alpha words.
returns a list, processed_data
"""
processed_data = []
for piece in stripped_list:
for chunk in piece:
if chunk.isalpha():
chunk = chunk.lower()
processed_data.append(chunk)
return processed_data
def avg_word_length(words):
"""
Takes a list, words and counts the average length of the words in the list.
Returns list average_leng
"""
sum_lengths = 0
for word in words:
sum_lengths += len(word)
average_leng = sum_lengths / len(words)
return average_leng
def max_word_length(words):
"""Returns the length of the longest word in the list of words.
Or 0 if there are no words in the list.
"""
if len(words) > 0:
max_length = len(words[0])
for word in words:
length = len(word)
if length > max_length:
max_length = length
else:
max_length = 0
return max_length
def max_frequency(words):
count = Counter(words).most_common(1)
freq_count = count[0][1]
return freq_count
def length_freq(words):
"""
takes a list(words), and counts the amount of times the frequecny of a word appears
Returns a list of the frequecny of a words length(len_freq)
"""
words_length = [len(word) for word in words]
len_freq = Counter(words_length).most_common()
for i in range(1, max(words_length)): #gets the first value of the tuple
test_set = [len_freq[x][0] for x in range(len(len_freq))] #and checks if already in the set
if i not in test_set: #if not adds it as a tuple (i,0)
len_freq.append((i, 0))
return len_freq
def print_length_table(words):
freq_dict = length_freq(words)
print()
print(TABLE_TITLE)
for pair in sorted(freq_dict):
print(FREQ_TABLE_TEMPLATE.format(pair[0], pair[1]))
def print_length_graph_hori(words):
print()
print(GRAPH_TITLE)
relative_freq = get_percentage(words)
for i in range(len(relative_freq)):
number = relative_freq[i][0]
percent = relative_freq[i][1]
graph_line = "=" * percent
print(GRAPH_LINE_TEMPLATE.format(number, percent, graph_line))
def get_percentage(words):
"""
Returns a sorted list (relative_freq)
"""
lengths = length_freq(words)
relative_freq = []
for value in lengths:
percentage = int(value[1] / len(words) * 100)
relative_freq.append((value[0], percentage))
relative_freq = sorted(relative_freq)
return relative_freq
def print_length_graph_vert(words):
relative_freq = get_percentage(words)
bars = [percent[1] for percent in relative_freq]
next_10 = to_next_10(bars)
print("\n% frequency")
for percentage in range(next_10, 0, -1):
if percentage < 10:
print(" {} ".format(percentage), end="")
else:
print(" {} ".format(percentage), end="")
for point in bars:
if int(point) >= percentage:
print(" ** ", end="")
else:
print(" " * 4, end="")
print()
print(" " * 5, end="")
for i in range(len(relative_freq)):
if i < 9:
print(" 0{} ".format(i + 1), end="")
else:
print(" {} ".format(i + 1), end="")
print("\n" + " " * (len(relative_freq) * 4 - 7) + "word length")
def to_next_10(bars):
"""
Takes a list(bars)
Maps the value of bars to a new list(bars_sort) and rounds to nearest 10
Returns int(next_10)
"""
bars_sort = bars[:]
bars_sort = sorted(bars_sort)
next_10 = bars_sort[-1]
is_not_x10 = True
while is_not_x10:
next_10 += 1
if next_10 % 10 == 0:
is_not_x10 = False
return next_10
def print_results(words):
average_length = avg_word_length(words)
max_length = max_word_length(words)
max_freq = max_frequency(words)
print()
print("Word summary (all words):")
print(" Number of words = {}".format(len(words)))
print(" Avg word length = {:.2f}".format(average_length))
print(" Max word length = {}".format(max_length))
print(" Max frequency = {}".format(max_freq))
print_length_table(words)
print_length_graph_hori(words)
print_length_graph_vert(words)
def main():
""" Gets the job done """
text = get_filename()
print(" {} loaded ok.".format(text))
words = get_words_from_file(text)
print_results(words)
main()
Example terminal input/output:
Please enter filename: blah.txt
blah.txt loaded ok.
Word summary (all words):
Number of words = 4
Avg word length = 2.25
Max word length = 4
Max frequency = 2
Len Freq
1 1
2 2
3 0
4 1
Len Freq Graph
1 25% =========================
2 50% ==================================================
3 0%
4 25% =========================
% frequency
60
59
58
57
56
55
54
53
52
51
50 **
49 **
48 **
47 **
46 **
45 **
44 **
43 **
42 **
41 **
40 **
39 **
38 **
37 **
36 **
35 **
34 **
33 **
32 **
31 **
30 **
29 **
28 **
27 **
26 **
25 ** ** **
24 ** ** **
23 ** ** **
22 ** ** **
21 ** ** **
20 ** ** **
19 ** ** **
18 ** ** **
17 ** ** **
16 ** ** **
15 ** ** **
14 ** ** **
13 ** ** **
12 ** ** **
11 ** ** **
10 ** ** **
9 ** ** **
8 ** ** **
7 ** ** **
6 ** ** **
5 ** ** **
4 ** ** **
3 ** ** **
2 ** ** **
1 ** ** **
01 02 03 04
word length
I now need to change the code to enforce the following rules:
I may import only re and os libraries. No other libraries
The code must now use the pattern "[a-z]+[-'][a-z]+|[a-z]+[']?|[a-z]+"

if you're not allowed to use the "collections" module, you could re-implement the bits of the Counter class (at least the parts you are using) yourself (which would be the init() method and the most_common() method).
I don't understand what the regular expression is supposed to be used for.
EDIT: OK, here's a brain-dead implmentation of collections.Counter.
class MyCounter(object):
def __init__(self, iterable):
"""
initialize a counter object with something iterable
"""
self._data = dict()
# set up a dictionary that counts how many of each item we have
for item in iterable:
try:
self._data[item] += 1
except KeyError:
self._data[item] = 1
def most_common(self, n=None):
"""
return the most common items from the object, along with their count.
If n=None, return the whole list
"""
# build a list of counts
list_of_counts = self._data.items()
# sort the list in descending order. Ordinarily, we would use sorted()
# along with operator.itemgetter, but since we are not allowed to use
# anything but re and os, we can just do a selection sort.
for i in range(len(list_of_counts)):
for j in range(i+1, len(list_of_counts)):
if list_of_counts[i] > list_of_counts[j]:
temp = list_of_counts[j]
list_of_counts[j] = list_of_counts[i]
list_of_counts[i] = temp
# return what is needed.
if n is None:
return list_of_counts
return list_of_counts[:n]
##############################################################################
## the code from here down is not part of the solution, it is proof that the
## solution works
import unittest
from collections import Counter
class MyCounterTest(unittest.TestCase):
def test_single_most_common(self):
"""
check when we have a single most-common value
"""
# illustrate the behavior of collections.Counter
system_counter = Counter(['a','a','b','c'])
system_common = system_counter.most_common(n=1)[0]
self.assertEqual(system_common[0], 'a')
self.assertEqual(system_common[1], 2)
# confirm we get the same results from our Counter
my_counter = MyCounter(['a','a','b','c'])
my_common = my_counter.most_common(n=1)[0]
self.assertEqual(my_common[0], 'a')
self.assertEqual(my_common[1], 2)
def test_with_none(self):
system_counter = Counter(['a','a','b','c'])
self.assertEqual(len(system_counter.most_common()), 3)
my_counter = MyCounter(['a','a','b','c'])
self.assertEqual(len(my_counter.most_common()), 3)
if __name__ == '__main__':
unittest.main()

Convert between str and int without using built-in typecasting

NB: You may not use built-in typecasting: code this yourself.
def str2int(s):
result = 0
if s[0] == '-':
sign = -1
i = 1
while i < len(s):
num = ord(s[i]) - ord('0')
result = result * 10 + num
i += 1
result = sign * result
return result
else:
i = 0
while i < len(s):
num = ord(s[i]) - ord('0')
result = result * 10 + num
i += 1
return result
NB: You may not use built-in str() or string template. Code this yourself.
def int2str(i):
strng = ""
if i > 0:
while i != 0:
num = i % 10
strng += chr(48+num)
i = i / 10
return strng[::-1]
else:
while i != 0:
num = abs(i) % 10
strng += chr(48+num)
i = abs(i) / 10
return '-' + strng[::-1]
I am a newbie and I have to write code based on basic. I write these function by myself but these look weird. Can you help me to improve code? Thank you

This maybe a better question for https://codereview.stackexchange.com/.
Not withstanding there is no error checking, one obvious comment is you have common code that can be factored out. Only capture in the if, else what is unique rather than repeat the while loop:
def str2int(s):
if s[0] == '-':
sign = -1
i = 1
else:
sign = 1
i = 0
result = 0
while i < len(s):
num = ord(s[i]) - ord('0')
result = result * 10 + num
i += 1
return sign * result
It is generally considered better form in python to iterate over list rather than indices:
def str2int(s):
sign = 1
if s[0] == '-':
sign = -1
s = s[1:]
result = 0
for c in s:
num = ord(c) - ord('0')
result = result * 10 + num
return sign * result
These last lines are equivalent to a standard map and reduce (reduce is in functools for py3). Though some would argue against it:
from functools import reduce # Py3
def str2int(s):
sign = 1
if s[0] == '-':
sign = -1
s = s[1:]
return sign * reduce(lambda x,y: x*10+y, map(lambda c: ord(c) - ord('0'), s))
There are similar opportunities to do the same for int2str().

Project Euler 461 - Genetic Algorithm

Someone told me that this problem should be easy to solve with a genetic algorithm.
I read some stuff about this topic (I hadn't heard about it before), and wrote (and copied) some code.
The results I get are close to optimum, but not close enough.
I'd like to have some help with it:
import time
import math
import random
def f(n, k):
return math.exp(k / n) - 1
def individual(length, min, max):
'Create a member of the population.'
return [random.randint(min, max) for x in range(length)]
def population(count, length, min, max):
"""
Create a number of individuals (i.e. a population).
count: the number of individuals in the population
length: the number of values per individual
min: the minimum possible value in an individual's list of values
max: the maximum possible value in an individual's list of values
"""
return [individual(length, min, max) for x in range(count)]
def fitness(individual, target):
def get_best_last_element(a, b, c):
s = math.pi - f(eu461.BASE, a) - f(eu461.BASE, b) - f(eu461.BASE, c)
s += 1
if s > 1:
return round(math.log(s) * eu461.BASE)
else:
return 0
def getg():
return get_best_last_element
"""
Determine the fitness of an individual. Higher is better.
individual: the individual to evaluate
target: the target number individuals are aiming for
"""
l = get_best_last_element(individual[0], individual[1], individual[2])
return abs(target - sum([f(eu461.BASE, k) for k in individual]) - f(eu461.BASE, l))
def grade(pop, target):
'Find average fitness for a population.'
return sum([fitness(x, target) for x in pop]) / (len(pop))
def evolve(pop, target, retain=0.2, random_select=0.05, mutate=0.01):
graded = [(fitness(x, target), x) for x in pop]
graded = [x[1] for x in sorted(graded)]
retain_length = int(len(graded) * retain)
parents = graded[:retain_length]
# randomly add other individuals to
# promote genetic diversity
for individual in graded[retain_length:]:
if random_select > random.random():
parents.append(individual)
# mutate some individuals
for individual in parents:
if mutate > random.random():
pos_to_mutate = random.randint(0, len(individual) - 1)
# this mutation is not ideal, because it
# restricts the range of possible values,
# but the function is unaware of the min/max
# values used to create the individuals,
individual[pos_to_mutate] = random.randint(min(individual), max(individual))
# crossover parents to create children
parents_length = len(parents)
desired_length = len(pop) - parents_length
children = []
while len(children) < desired_length:
male = random.randint(0, parents_length - 1)
female = random.randint(0, parents_length - 1)
if male != female:
male = parents[male]
female = parents[female]
half = len(male) // 2
if random.randint(0, 1):
child = male[:half] + female[half:]
else:
child = female[:half] + male[half:]
children.append(child)
parents.extend(children)
return parents
def get_best_last_element(a, b, c):
s = math.pi - f(eu461.BASE, a) - f(eu461.BASE, b) - f(eu461.BASE, c)
s += 1
if s > 0:
return round(math.log(s) * eu461.BASE)
else:
return 0
def eu461():
target = math.pi
p_count = 10000
i_length = 3
i_min = 0
i_max = round(eu461.BASE * math.log(math.pi + 1))
p = population(p_count, i_length, i_min, i_max)
fitness_history = [grade(p, target),]
for i in range(150):
p = evolve(p, target)
fitness_history.append(grade(p, target))
for datum in fitness_history:
pass #print (datum)
return p[0], get_best_last_element(p[0][0], p[0][1], p[0][2]), sum([f(eu461.BASE, k) for k in p[0]]) + f(eu461.BASE, get_best_last_element(p[0][0], p[0][1], p[0][2]))
eu461.BASE = 200
if __name__ == "__main__":
startTime = time.clock()
print (eu461())
elapsedTime = time.clock() - startTime
print ("Time spent in (", __name__, ") is: ", elapsedTime, " sec")

Base 62 conversion

How would you convert an integer to base 62 (like hexadecimal, but with these digits: '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ').
I have been trying to find a good Python library for it, but they all seems to be occupied with converting strings. The Python base64 module only accepts strings and turns a single digit into four characters. I was looking for something akin to what URL shorteners use.

There is no standard module for this, but I have written my own functions to achieve that.
BASE62 = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
def encode(num, alphabet):
"""Encode a positive number into Base X and return the string.
Arguments:
- `num`: The number to encode
- `alphabet`: The alphabet to use for encoding
"""
if num == 0:
return alphabet[0]
arr = []
arr_append = arr.append # Extract bound-method for faster access.
_divmod = divmod # Access to locals is faster.
base = len(alphabet)
while num:
num, rem = _divmod(num, base)
arr_append(alphabet[rem])
arr.reverse()
return ''.join(arr)
def decode(string, alphabet=BASE62):
"""Decode a Base X encoded string into the number
Arguments:
- `string`: The encoded string
- `alphabet`: The alphabet to use for decoding
"""
base = len(alphabet)
strlen = len(string)
num = 0
idx = 0
for char in string:
power = (strlen - (idx + 1))
num += alphabet.index(char) * (base ** power)
idx += 1
return num
Notice the fact that you can give it any alphabet to use for encoding and decoding. If you leave the alphabet argument out, you are going to get the 62 character alphabet defined on the first line of code, and hence encoding/decoding to/from 62 base.
PS - For URL shorteners, I have found that it's better to leave out a few confusing characters like 0Ol1oI etc. Thus I use this alphabet for my URL shortening needs - "23456789abcdefghijkmnpqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ"

I once wrote a script to do this aswell, I think it's quite elegant :)
import string
# Remove the `_#` below for base62, now it has 64 characters
BASE_LIST = string.digits + string.letters + '_#'
BASE_DICT = dict((c, i) for i, c in enumerate(BASE_LIST))
def base_decode(string, reverse_base=BASE_DICT):
length = len(reverse_base)
ret = 0
for i, c in enumerate(string[::-1]):
ret += (length ** i) * reverse_base[c]
return ret
def base_encode(integer, base=BASE_LIST):
if integer == 0:
return base[0]
length = len(base)
ret = ''
while integer != 0:
ret = base[integer % length] + ret
integer /= length
return ret
Example usage:
for i in range(100):
print i, base_decode(base_encode(i)), base_encode(i)

The following decoder-maker works with any reasonable base, has a much tidier loop, and gives an explicit error message when it meets an invalid character.
def base_n_decoder(alphabet):
"""Return a decoder for a base-n encoded string
Argument:
- `alphabet`: The alphabet used for encoding
"""
base = len(alphabet)
char_value = dict(((c, v) for v, c in enumerate(alphabet)))
def f(string):
num = 0
try:
for char in string:
num = num * base + char_value[char]
except KeyError:
raise ValueError('Unexpected character %r' % char)
return num
return f
if __name__ == "__main__":
func = base_n_decoder('0123456789abcdef')
for test in ('0', 'f', '2020', 'ffff', 'abqdef'):
print test
print func(test)

If you're looking for the highest efficiency (like django), you'll want something like the following. This code is a combination of efficient methods from Baishampayan Ghose and WoLpH and John Machin.
# Edit this list of characters as desired.
BASE_ALPH = tuple("0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz")
BASE_DICT = dict((c, v) for v, c in enumerate(BASE_ALPH))
BASE_LEN = len(BASE_ALPH)
def base_decode(string):
num = 0
for char in string:
num = num * BASE_LEN + BASE_DICT[char]
return num
def base_encode(num):
if not num:
return BASE_ALPH[0]
encoding = ""
while num:
num, rem = divmod(num, BASE_LEN)
encoding = BASE_ALPH[rem] + encoding
return encoding
You may want to also calculate your dictionary in advance. (Note: Encoding with a string shows more efficiency than with a list, even with very long numbers.)
>>> timeit.timeit("for i in xrange(1000000): base.base_decode(base.base_encode(i))", setup="import base", number=1)
2.3302059173583984
Encoded and decoded 1 million numbers in under 2.5 seconds. (2.2Ghz i7-2670QM)

If you use django framework, you can use django.utils.baseconv module.
>>> from django.utils import baseconv
>>> baseconv.base62.encode(1234567890)
1LY7VK
In addition to base62, baseconv also defined base2/base16/base36/base56/base64.

If all you need is to generate a short ID (since you mention URL shorteners) rather than encode/decode something, this module might help:
https://github.com/stochastic-technologies/shortuuid/

You probably want base64, not base62. There's an URL-compatible version of it floating around, so the extra two filler characters shouldn't be a problem.
The process is fairly simple; consider that base64 represents 6 bits and a regular byte represents 8. Assign a value from 000000 to 111111 to each of the 64 characters chosen, and put the 4 values together to match a set of 3 base256 bytes. Repeat for each set of 3 bytes, padding at the end with your choice of padding character (0 is generally useful).

There is now a python library for this.
I'm working on making a pip package for this.
I recommend you use my bases.py https://github.com/kamijoutouma/bases.py which was inspired by bases.js
from bases import Bases
bases = Bases()
bases.toBase16(200) // => 'c8'
bases.toBase(200, 16) // => 'c8'
bases.toBase62(99999) // => 'q0T'
bases.toBase(200, 62) // => 'q0T'
bases.toAlphabet(300, 'aAbBcC') // => 'Abba'
bases.fromBase16('c8') // => 200
bases.fromBase('c8', 16) // => 200
bases.fromBase62('q0T') // => 99999
bases.fromBase('q0T', 62) // => 99999
bases.fromAlphabet('Abba', 'aAbBcC') // => 300
refer to https://github.com/kamijoutouma/bases.py#known-basesalphabets
for what bases are usable

you can download zbase62 module from pypi
eg
>>> import zbase62
>>> zbase62.b2a("abcd")
'1mZPsa'

I have benefited greatly from others' posts here. I needed the python code originally for a Django project, but since then I have turned to node.js, so here's a javascript version of the code (the encoding part) that Baishampayan Ghose provided.
var ALPHABET = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
function base62_encode(n, alpha) {
var num = n || 0;
var alphabet = alpha || ALPHABET;
if (num == 0) return alphabet[0];
var arr = [];
var base = alphabet.length;
while(num) {
rem = num % base;
num = (num - rem)/base;
arr.push(alphabet.substring(rem,rem+1));
}
return arr.reverse().join('');
}
console.log(base62_encode(2390687438976, "123456789ABCDEFGHIJKLMNPQRSTUVWXYZ"));

I hope the following snippet could help.
def num2sym(num, sym, join_symbol=''):
if num == 0:
return sym[0]
if num < 0 or type(num) not in (int, long):
raise ValueError('num must be positive integer')
l = len(sym) # target number base
r = []
div = num
while div != 0: # base conversion
div, mod = divmod(div, l)
r.append(sym[mod])
return join_symbol.join([x for x in reversed(r)])
Usage for your case:
number = 367891
alphabet = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
print num2sym(number, alphabet) # will print '1xHJ'
Obviously, you can specify another alphabet, consisting of lesser or greater number of symbols, then it will convert your number to the lesser or greater number base. For example, providing '01' as an alphabet will output string representing input number as binary.
You may shuffle the alphabet initially to have your unique representation of the numbers. It can be helpful if you're making URL shortener service.

Simplest ever.
BASE62 = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
def encode_base62(num):
s = ""
while num>0:
num,r = divmod(num,62)
s = BASE62[r]+s
return s
def decode_base62(num):
x,s = 1,0
for i in range(len(num)-1,-1,-1):
s = int(BASE62.index(num[i])) *x + s
x*=62
return s
print(encode_base62(123))
print(decode_base62("1Z"))

Python does not have a built-in solution.
The chosen solution is probably the most readable one, but we might be able to scrap a bit of performance.
from string import digits, ascii_lowercase, ascii_uppercase
base_chars = digits + ascii_lowercase + ascii_uppercase
def base_it(number, base=62):
def iterate(moving_number=number, moving_base=base):
if not moving_number:
return ''
return iterate(moving_number // moving_base, moving_base * base) + base_chars[moving_number % base]
return iterate() or base_chars[0]
Explanation
In any base every number is equal to a1 + a2*base**2 + a3*base**3... So the goal is to find all the as.
For every N=1,2,3... the code isolates the aN*base**N by "modulo" by base for base = base**(N+1) which slices all numbers bigger than N, and slicing all the numbers so that their serial is smaller than N by decreasing a every time the function is called recursively by the current aN*base**N.
Advantages and discussion
In this sample, there's only one multiplication (instead of a division) and some modulus operations, which are all relatively fast.
If you really want performance, though, you'd probably do better of using a CPython library.

Personally I like the solution from Baishampayan, mostly because of stripping the confusing characters.
For completeness, and solution with better performance, this post shows a way to use the Python base64 module.

I wrote this a while back and it's worked pretty well (negatives and all included)
def code(number,base):
try:
int(number),int(base)
except ValueError:
raise ValueError('code(number,base): number and base must be in base10')
else:
number,base = int(number),int(base)
if base < 2:
base = 2
if base > 62:
base = 62
numbers = [0,1,2,3,4,5,6,7,8,9,"a","b","c","d","e","f","g","h","i","j",
"k","l","m","n","o","p","q","r","s","t","u","v","w","x","y",
"z","A","B","C","D","E","F","G","H","I","J","K","L","M","N",
"O","P","Q","R","S","T","U","V","W","X","Y","Z"]
final = ""
loc = 0
if number < 0:
final = "-"
number = abs(number)
while base**loc <= number:
loc = loc + 1
for x in range(loc-1,-1,-1):
for y in range(base-1,-1,-1):
if y*(base**x) <= number:
final = "{}{}".format(final,numbers[y])
number = number - y*(base**x)
break
return final
def decode(number,base):
try:
int(base)
except ValueError:
raise ValueError('decode(value,base): base must be in base10')
else:
base = int(base)
number = str(number)
if base < 2:
base = 2
if base > 62:
base = 62
numbers = ["0","1","2","3","4","5","6","7","8","9","a","b","c","d","e","f",
"g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v",
"w","x","y","z","A","B","C","D","E","F","G","H","I","J","K","L",
"M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z"]
final = 0
if number.startswith("-"):
neg = True
number = list(number)
del(number[0])
temp = number
number = ""
for x in temp:
number = "{}{}".format(number,x)
else:
neg = False
loc = len(number)-1
number = str(number)
for x in number:
if numbers.index(x) > base:
raise ValueError('{} is out of base{} range'.format(x,str(base)))
final = final+(numbers.index(x)*(base**loc))
loc = loc - 1
if neg:
return -final
else:
return final
sorry about the length of it all

BASE_LIST = tuple("23456789ABCDEFGHJKLMNOPQRSTUVWXYZabcdefghjkmnpqrstuvwxyz")
BASE_DICT = dict((c, v) for v, c in enumerate(BASE_LIST))
BASE_LEN = len(BASE_LIST)
def nice_decode(str):
num = 0
for char in str[::-1]:
num = num * BASE_LEN + BASE_DICT[char]
return num
def nice_encode(num):
if not num:
return BASE_LIST[0]
encoding = ""
while num:
num, rem = divmod(num, BASE_LEN)
encoding += BASE_LIST[rem]
return encoding

Here is an recurive and iterative way to do that. The iterative one is a little faster depending on the count of execution.
def base62_encode_r(dec):
s = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
return s[dec] if dec < 62 else base62_encode_r(dec / 62) + s[dec % 62]
print base62_encode_r(2347878234)
def base62_encode_i(dec):
s = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
ret = ''
while dec > 0:
ret = s[dec % 62] + ret
dec /= 62
return ret
print base62_encode_i(2347878234)
def base62_decode_r(b62):
s = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
if len(b62) == 1:
return s.index(b62)
x = base62_decode_r(b62[:-1]) * 62 + s.index(b62[-1:]) % 62
return x
print base62_decode_r("2yTsnM")
def base62_decode_i(b62):
s = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
ret = 0
for i in xrange(len(b62)-1,-1,-1):
ret = ret + s.index(b62[i]) * (62**(len(b62)-i-1))
return ret
print base62_decode_i("2yTsnM")
if __name__ == '__main__':
import timeit
print(timeit.timeit(stmt="base62_encode_r(2347878234)", setup="from __main__ import base62_encode_r", number=100000))
print(timeit.timeit(stmt="base62_encode_i(2347878234)", setup="from __main__ import base62_encode_i", number=100000))
print(timeit.timeit(stmt="base62_decode_r('2yTsnM')", setup="from __main__ import base62_decode_r", number=100000))
print(timeit.timeit(stmt="base62_decode_i('2yTsnM')", setup="from __main__ import base62_decode_i", number=100000))
0.270266867033
0.260915645986
0.344734796766
0.311662500262

Python 3.7.x
I found a PhD's github for some algorithms when looking for an existing base62 script. It didn't work for the current max-version of Python 3 at this time so I went ahead and fixed where needed and did a little refactoring. I don't usually work with Python and have always used it ad-hoc so YMMV. All credit goes to Dr. Zhihua Lai. I just worked the kinks out for this version of Python.
file base62.py
#modified from Dr. Zhihua Lai's original on GitHub
from math import floor
base = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
b = 62;
def toBase10(b62: str) -> int:
limit = len(b62)
res = 0
for i in range(limit):
res = b * res + base.find(b62[i])
return res
def toBase62(b10: int) -> str:
if b <= 0 or b > 62:
return 0
r = b10 % b
res = base[r];
q = floor(b10 / b)
while q:
r = q % b
q = floor(q / b)
res = base[int(r)] + res
return res
file try_base62.py
import base62
print("Base10 ==> Base62")
for i in range(999):
print(f'{i} => {base62.toBase62(i)}')
base62_samples = ["gud", "GA", "mE", "lo", "lz", "OMFGWTFLMFAOENCODING"]
print("Base62 ==> Base10")
for i in range(len(base62_samples)):
print(f'{base62_samples[i]} => {base62.toBase10(base62_samples[i])}')
output of try_base62.py
Base10 ==> Base62
0 => 0
[...]
998 => g6
Base62 ==> Base10
gud => 63377
GA => 2640
mE => 1404
lo => 1326
lz => 1337
OMFGWTFLMFAOENCODING => 577002768656147353068189971419611424
Since there was no licensing info in the repo I did submit a PR so the original author at least knows other people are using and modifying their code.

In all solutions above they define the alphabet itself when in reality it's already available using the ASCII codes.
def converter_base62(count) -> str:
result = ''
start = ord('0')
while count > 0:
result = chr(count % 62 + start) + result
count //= 62
return result
def decode_base62(string_to_decode: str):
result = 0
start = ord('0')
for char in string_to_decode:
result = result * 62 + (ord(char)-start)
return result
import tqdm
n = 10_000_000
for i in tqdm.tqdm(range(n)):
assert decode_base62(converter_base62(i)) == i

Sorry, I can't help you with a library here. I would prefer using base64 and just adding to extra characters to your choice -- if possible!
Then you can use the base64 module.
If this is really, really not possible:
You can do it yourself this way (this is pseudo-code):
base62vals = []
myBase = 62
while num > 0:
reminder = num % myBase
num = num / myBase
base62vals.insert(0, reminder)

with simple recursion
"""
This module contains functions to transform a number to string and vice-versa
"""
BASE = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
LEN_BASE = len(BASE)
def encode(num):
"""
This function encodes the given number into alpha numeric string
"""
if num < LEN_BASE:
return BASE[num]
return BASE[num % LEN_BASE] + encode(num//LEN_BASE)
def decode_recursive(string, index):
"""
recursive util function for decode
"""
if not string or index >= len(string):
return 0
return (BASE.index(string[index]) * LEN_BASE ** index) + decode_recursive(string, index + 1)
def decode(string):
"""
This function decodes given string to number
"""
return decode_recursive(string, 0)

Benchmarking answers that worked for Python3 (machine: i7-8565U):
"""
us per enc()+dec() # test
(4.477935791015625, 2, '3Tx16Db2JPSS4ZdQ4dp6oW')
(6.073190927505493, 5, '3Tx16Db2JPSS4ZdQ4dp6oW')
(9.051250696182251, 9, '3Tx16Db2JPSS4ZdQ4dp6oW')
(9.864609956741333, 6, '3Tx16Db2JOOqeo6GCGscmW')
(10.868197917938232, 1, '3Tx16Db2JPSS4ZdQ4dp6oW')
(11.018349647521973, 10, '3Tx16Db2JPSS4ZdQ4dp6oW')
(12.448230504989624, 4, '03Tx16Db2JPSS4ZdQ4dp6oW')
(13.016672611236572, 7, '3Tx16Db2JPSS4ZdQ4dp6oW')
(13.212724447250366, 8, '3Tx16Db2JPSS4ZdQ4dp6oW')
(24.119479656219482, 3, '3tX16dB2jpss4zDq4DP6Ow')
"""
from time import time
half = 2 ** 127
results = []
def bench(n, enc, dec):
start = time()
for i in range(half, half + 1_000_000):
dec(enc(i))
end = time()
results.append(tuple([end - start, n, enc(half + 1234134134134314)]))
BASE62 = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
def encode(num, alphabet=BASE62):
"""Encode a positive number into Base X and return the string.
Arguments:
- `num`: The number to encode
- `alphabet`: The alphabet to use for encoding
"""
if num == 0:
return alphabet[0]
arr = []
arr_append = arr.append # Extract bound-method for faster access.
_divmod = divmod # Access to locals is faster.
base = len(alphabet)
while num:
num, rem = _divmod(num, base)
arr_append(alphabet[rem])
arr.reverse()
return ''.join(arr)
def decode(string, alphabet=BASE62):
"""Decode a Base X encoded string into the number
Arguments:
- `string`: The encoded string
- `alphabet`: The alphabet to use for decoding
"""
base = len(alphabet)
strlen = len(string)
num = 0
idx = 0
for char in string:
power = (strlen - (idx + 1))
num += alphabet.index(char) * (base ** power)
idx += 1
return num
bench(1, encode, decode)
###########################################################################################################
# Remove the `_#` below for base62, now it has 64 characters
BASE_ALPH = tuple(BASE62)
BASE_LIST = BASE62
BASE_DICT = dict((c, v) for v, c in enumerate(BASE_ALPH))
###########################################################################################################
BASE_LEN = len(BASE_ALPH)
def decode(string):
num = 0
for char in string:
num = num * BASE_LEN + BASE_DICT[char]
return num
def encode(num):
if not num:
return BASE_ALPH[0]
encoding = ""
while num:
num, rem = divmod(num, BASE_LEN)
encoding = BASE_ALPH[rem] + encoding
return encoding
bench(2, encode, decode)
###########################################################################################################
from django.utils import baseconv
bench(3, baseconv.base62.encode, baseconv.base62.decode)
###########################################################################################################
def encode(a):
baseit = (lambda a=a, b=62: (not a) and '0' or
baseit(a - a % b, b * 62) + '0123456789abcdefghijklmnopqrstuvwxyz'
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'[
a % b % 61 or -1 * bool(a % b)])
return baseit()
bench(4, encode, decode)
###########################################################################################################
def encode(num, sym=BASE62, join_symbol=''):
if num == 0:
return sym[0]
l = len(sym) # target number base
r = []
div = num
while div != 0: # base conversion
div, mod = divmod(div, l)
r.append(sym[mod])
return join_symbol.join([x for x in reversed(r)])
bench(5, encode, decode)
###########################################################################################################
from math import floor
base = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
b = 62;
def decode(b62: str) -> int:
limit = len(b62)
res = 0
for i in range(limit):
res = b * res + base.find(b62[i])
return res
def encode(b10: int) -> str:
if b <= 0 or b > 62:
return 0
r = b10 % b
res = base[r];
q = floor(b10 / b)
while q:
r = q % b
q = floor(q / b)
res = base[int(r)] + res
return res
bench(6, encode, decode)
###########################################################################################################
def encode(dec):
s = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
return s[dec] if dec < 62 else encode(dec // 62) + s[int(dec % 62)]
def decode(b62):
s = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
if len(b62) == 1:
return s.index(b62)
x = decode(b62[:-1]) * 62 + s.index(b62[-1:]) % 62
return x
bench(7, encode, decode)
def encode(dec):
s = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
ret = ''
while dec > 0:
ret = s[dec % 62] + ret
dec //= 62
return ret
def decode(b62):
s = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
ret = 0
for i in range(len(b62) - 1, -1, -1):
ret = ret + s.index(b62[i]) * (62 ** (len(b62) - i - 1))
return ret
bench(8, encode, decode)
###########################################################################################################
def encode(num):
s = ""
while num > 0:
num, r = divmod(num, 62)
s = BASE62[r] + s
return s
def decode(num):
x, s = 1, 0
for i in range(len(num) - 1, -1, -1):
s = int(BASE62.index(num[i])) * x + s
x *= 62
return s
bench(9, encode, decode)
###########################################################################################################
def encode(number: int, alphabet=BASE62, padding: int = 22) -> str:
l = len(alphabet)
res = []
while number > 0:
number, rem = divmod(number, l)
res.append(alphabet[rem])
if number == 0:
break
return "".join(res)[::-1] # .rjust(padding, "0")
def decode(digits: str, lookup=BASE_DICT) -> int:
res = 0
last = len(digits) - 1
base = len(lookup)
for i, d in enumerate(digits):
res += lookup[d] * pow(base, last - i)
return res
bench(10, encode, decode)
###########################################################################################################
for row in sorted(results):
print(row)

Original javascript version:
var hash = "", alphabet = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ", alphabetLength =
alphabet.length;
do {
hash = alphabet[input % alphabetLength] + hash;
input = parseInt(input / alphabetLength, 10);
} while (input);
Source: https://hashids.org/
python:
def to_base62(number):
alphabet = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
alphabetLength = len(alphabet)
result = ""
while True:
result = alphabet[number % alphabetLength] + result
number = int(number / alphabetLength)
if number == 0:
break
return result
print to_base62(59*(62**2) + 60*(62) + 61)
# result: XYZ

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Is there a way to autogenerate valid arithmetic expressions? - python

import random def expr(depth): if depth==1 or random.random()<1.0/(2**depth-1): return str(int(random.random() * 100)) return '(' + expr(depth-1) + random.choice(['+','-','*','/']) + expr(depth-1) + ')' for i in range(10): print expr(4)

Generate an array at random in RPN with mixtures of operators and numbers (always valid). Then start from middle of the array and generate the corresponding evaluation tree.

Related

Getting Wrong ancestor for the right side of the tree

How can I modify code by excluding libraries

Convert between str and int without using built-in typecasting

Project Euler 461 - Genetic Algorithm

Base 62 conversion

Categories

Resources