Including commas, question marks, exclamation marks in output (python) - python

The goal of this program is to take a user input and convert it to ascii-text.
The code works as it should, but it doesn't include commas, periods, exclamation marks or question marks.
I have tried to include: !, ?, ' and commas, as a seperate list and try to call it in the input. But I wasn't fully sure how to do it.
At the moment I just used a bunch of else-if statements, it works but I feel like there must be a simpler way to fix that. I can't really figure out how. Tips are extremely appreciated!
def asciiToLeet(c):
l33tLetters = ["#", "8", "(", "|)", "3", "#", "6", "[-]", "|", "_|", "|<", "1", "[]\/[]", "[]\[]", "0", "|D", "(,)", "|Z", "$", "']['",
"|_|", "\/", "\/\/", "}{", "`/", "2"]
if c == ' ': return ' '
elif c == '.': return '.'
elif c == ',': return ','
elif c == '?': return '?'
elif c == '!': return '!'
elif c == "'": return "'"
asciiCode = ord(c)
if asciiCode >= ord('a') and asciiCode <= ord('z'):
return l33tLetters[asciiCode - ord('a')]
if asciiCode >= ord('A') and asciiCode <= ord('Z'):
return l33tLetters[asciiCode - ord('A')]
return ""
if __name__ == "__main__":
inputString = input()
outputString = ""
for c in inputString:
outputString += asciiToLeet(c)
print(outputString)
My expectation is for the code to show the output with the punctuations without having to use if-else statements.

You have return "" at the end of your method. Thus, if all of your lookups fail, it discards the input character. Instead, do return c. This will cause the input character to be returned as-is if the lookups to make it "leet" don't match it.

Related

More efficient way to replace special chars with their unicode name in pandas df

I have a large pandas dataframe and would like to perform a thorough text cleaning on it. For this, I have crafted the below code that evaluates if a character is either an emoji, number, Roman number, or a currency symbol, and replaces these with their unidode name from the unicodedata package.
The code uses a double for loop though and I believe there must be far more efficient solutions than that but I haven't managed to figure out yet how I could implement it in a vectorized manner.
My current code is as follows:
from unicodedata import name as unicodename
def clean_text(text):
for item in text:
for char in item:
# Simple space
if char == ' ':
newtext += char
# Letters
elif category(char)[0] == 'L':
newtext += char
# Other symbols: emojis
elif category(char) == 'So':
newtext += f" {unicodename(char)} "
# Decimal numbers
elif category(char) == 'Nd':
newtext += f" {unicodename(char).replace('DIGIT ', '').lower()} "
# Letterlike numbers e.g. Roman numerals
elif category(char) == 'Nl':
newtext += f" {unicodename(char)} "
# Currency symbols
elif category(char) == 'Sc':
newtext += f" {unicodename(char).replace(' SIGN', '').lower()} "
# Punctuation, invisibles (separator, control chars), maths symbols...
else:
newtext += " "
At the moment I am using this function on my dataframe with an apply:
df['Texts'] = df['Texts'].apply(lambda x: clean_text(x))
Sample data:
l = [
"thumbs ups should be replaced: 👍👍👍",
"hearts also should be replaced: ❤️️❤️️❤️️❤️️",
"also other emojis: ☺️☺️",
"numbers and digits should also go: 40/40",
"Ⅰ, Ⅱ, Ⅲ these are roman numerals, change 'em"
]
df = pd.DataFrame(l, columns=['Texts'])
A good start would be to not do as much work:
once you've resolved the representation for a character, cache it. (lru_cache() does that for you)
don't call category() and name() more times than you need to
from functools import lru_cache
from unicodedata import name as unicodename, category
#lru_cache(maxsize=None)
def map_char(char: str) -> str:
if char == " ": # Simple space
return char
cat = category(char)
if cat[0] == "L": # Letters
return char
name = unicodename(char)
if cat == "So": # Other symbols: emojis
return f" {name} "
if cat == "Nd": # Decimal numbers
return f" {name.replace('DIGIT ', '').lower()} "
if cat == "Nl": # Letterlike numbers e.g. Roman numerals
return f" {name} "
if cat == "Sc": # Currency symbols
return f" {name.replace(' SIGN', '').lower()} "
# Punctuation, invisibles (separator, control chars), maths symbols...
return " "
def clean_text(text):
for item in text:
new_text = "".join(map_char(char) for char in item)
# ...

How to format an AST parse

I would like to format the following ast parse:
>>> import ast
>>> print(ast.dump(ast.parse('-a+b')))
Module(body=[Expr(value=BinOp(left=UnaryOp(op=USub(), operand=Name(id='a', ctx=Load())), op=Add(), right=Name(id='b', ctx=Load())))])
It seems like the indent option was introduced in python3.9, but I don't see an option to 'pretty-print' before then. What options are there to print a nicely-formatting output for an the syntax tree?
If you need to pretty-print the AST in an earlier python version and are happy with the indent function in Python3.9 why not just take the dump function from 3.9 and implement it in your project? The source code is here: https://github.com/python/cpython/blob/e56d54e447694c6ced2093d2273c3e3d60b36b6f/Lib/ast.py#L111-L175
And it doesn't look very complicated and doesn't seem to use any features specific to 3.9.
I had one use case in which I couldn't upgrade to Python 3.9 (where indent argument was added), yet I needed a way to prettify the result of ast.dump.
I wrote the following method that can take an unformatted ast.dump output and print that in a way that is easier on the eyes.
def prettify(ast_tree_str, indent=4):
ret = []
stack = []
in_string = False
curr_indent = 0
for i in range(len(ast_tree_str)):
char = ast_tree_str[i]
if in_string and char != '\'' and char != '"':
ret.append(char)
elif char == '(' or char == '[':
ret.append(char)
if i < len(ast_tree_str) - 1:
next_char = ast_tree_str[i+1]
if next_char == ')' or next_char == ']':
curr_indent += indent
stack.append(char)
continue
print(''.join(ret))
ret.clear()
curr_indent += indent
ret.append(' ' * curr_indent)
stack.append(char)
elif char == ',':
ret.append(char)
print(''.join(ret))
ret.clear()
ret.append(' ' * curr_indent)
elif char == ')' or char == ']':
ret.append(char)
curr_indent -= indent
stack.pop()
elif char == '\'' or char == '"':
if (len(ret) > 0 and ret[-1] == '\\') or (in_string and stack[-1] != char):
ret.append(char)
continue
if len(stack) > 0 and stack[-1] == char:
ret.append(char)
in_string = False
stack.pop()
continue
in_string = True
ret.append(char)
stack.append(char)
elif char == ' ':
pass
else:
ret.append(char)
print(''.join(ret))
Usage:
if __name__ == '__main__':
content = """
#testdecorator
def my_method(a, b):
def ola():
print("Hello")
ola()
return (a + b) * 5 + "dasdas,da'sda\\'asdas\\'\\'"
"""
ast_tree = ast.parse(source=content)
prettify(ast.dump(ast_tree))
PS.: It's not 100% equivalent to what one could get out of the Python 3.9 ast.dump(...., indent=n) but it should be enough for now. Feel free to improve it

Balanced String Recursion Returns Improperly

I'm currently working on a problem to write a recursive program to remove all the balanced bracket operators from a string or return False if the string is not balanced. I can get the program to remove all the brackets but, according to the debugger, when the program does its final base case check to verify that the string is empty, the program jumps from return True in line 3 to isBalanced recursive call in line 10. I don't understand why this is happening. Code is the following:
def isBalanced(string):
if not string: # Base Case. If the string is empty then return True
return True
else:
j = 0
for i in string: # Iterate thru the str looking for (), {}. and [] pairs, looking for closed bracket first
if (i == ')') or (i == ']') or (i == '}'):
if (i == ')') and (string[j-1] == '('):
new_string = string[:j-1] + string[j+1:] # Remove ()
isBalanced(new_string)
elif (i == ']') and (string[j-1] == '['):
new_string = string[:j-1] + string[j+1:] # Remove []
isBalanced(new_string)
elif (i == '}') and (string[j-1] == '{'):
new_string = string[:j-1] + string[j+1:] # Remove {}
isBalanced(new_string)
else: # Did not find an open bracket to match a closed bracket operator
print('Program failed at:', string)
return False
else:
j += 1 # Index counter
test_str = "({[]()})"
print(isBalanced(test_str))

implement my own strip method in Python

I am trying to implement my own strip method in Python, so without using the built-in method, I'd like my function to strip out all the whitespace from the left and the right.
Here, what I am trying to do is create a list, remove all the blank character before the first non-space character, then do it reverse way, finally return the list to a string. But with what I wrote, it doesn't even remove one whitespace.
I know what I am trying to do might not even work, so I would also like to see the best way to do this. I am really new to programming, so I would take any piece of advise that makes my program better. Thanks!
# main function
inputString = input("Enter here: ")
print(my_strip(inputString))
def my_strip(inputString):
newString = []
for ch in inputString:
newString.append(ch)
print(newString)
i = 0
while i < len(newString):
if i == " ":
del newString[i]
elif i != " ":
return newString
i += 1
print(newString)
Instead of doing a bunch of string operations, let's just get the beginning and ending indices of the non-whitespace portion and return a string slice.
def strip_2(s):
start = 0
end = -1
while s[start].isspace():
start += 1
while s[end].isspace():
end -= 1
end += 1
return s[start:end or None]
How about using regular expression?
import re
def my_strip(s):
return re.sub(r'\s+$', '', re.sub(r'^\s+', '', s))
>>> my_strip(' a c d ')
'a c d'
What you seem to be doing is an ltrim for spaces, since you return from the function when you get a non-space character.
Some changes are needed:
# main function
inputString = input("Enter here: ")
print(my_strip(inputString))
def my_strip(inputString):
newString = []
for ch in inputString:
newString.append(ch)
print(newString)
i = 0
while i < len(newString):
if i == " ": # <== this should be newString[i] == " "
del newString[i]
elif i != " ": # <== this should be newString[i] == " "
return newString
i += 1 # <== this is not needed as the char is deleted, so the next char has the same index
print(newString)
So the updated code will be:
# main function
inputString = input("Enter here: ")
print(my_strip(inputString))
def my_strip(inputString):
newString = []
for ch in inputString:
newString.append(ch)
print(newString)
i = 0
while i < len(newString):
if newString[i] == " ":
del newString[i]
elif newString[i] != " ":
return newString
print(newString)
Good luck with the rest of the exercise (implementation of rtrim).

converting infix to prefix in python

I am trying to write an Infix to Prefix Converter where e.g. I would like to convert this:
1 + ((C + A ) * (B - F))
to something like:
add(1, multiply(add(C, A), subtract(B, F)))
but I get this instead :
multiply(add(1, add(C, A), subtract(B, F)))
This is the code I have so far
postfix = []
temp = []
newTemp = []
def textOperator(s):
if s is '+':
return 'add('
elif s is '-':
return 'subtract('
elif s is '*':
return 'multiply('
else:
return ""
def typeof(s):
if s is '(':
return leftparentheses
elif s is ')':
return rightparentheses
elif s is '+' or s is '-' or s is '*' or s is '%' or s is '/':
return operator
elif s is ' ':
return empty
else :
return operand
infix = "1 + ((C + A ) * (B - F))"
for i in infix :
type = typeof(i)
if type is operand:
newTemp.append(i)
elif type is operator:
postfix.append(textOperator(i))
postfix.append(newTemp.pop())
postfix.append(', ')
elif type is leftparentheses :
newTemp.append(i)
elif type is rightparentheses :
next = newTemp.pop()
while next is not '(':
postfix.append(next)
next = newTemp.pop()
postfix.append(')')
newTemp.append(''.join(postfix))
while len(postfix) > 0 :
postfix.pop()
elif type is empty:
continue
print("newTemp = ", newTemp)
print("postfix = ", postfix)
while len(newTemp) > 0 :
postfix.append(newTemp.pop())
postfix.append(')')
print(''.join(postfix))
Can someone please help me figure out how I would fix this.
What I see, with the parenthetical clauses, is a recursive problem crying out for a recursive solution. The following is a rethink of your program that might give you some ideas of how to restructure it, even if you don't buy into my recursion argument:
import sys
from enum import Enum
class Type(Enum): # This could also be done with individual classes
leftparentheses = 0
rightparentheses = 1
operator = 2
empty = 3
operand = 4
OPERATORS = { # get your data out of your code...
"+": "add",
"-": "subtract",
"*": "multiply",
"%": "modulus",
"/": "divide",
}
def textOperator(string):
if string not in OPERATORS:
sys.exit("Unknown operator: " + string)
return OPERATORS[string]
def typeof(string):
if string == '(':
return Type.leftparentheses
elif string == ')':
return Type.rightparentheses
elif string in OPERATORS:
return Type.operator
elif string == ' ':
return Type.empty
else:
return Type.operand
def process(tokens):
stack = []
while tokens:
token = tokens.pop()
category = typeof(token)
print("token = ", token, " (" + str(category) + ")")
if category == Type.operand:
stack.append(token)
elif category == Type.operator:
stack.append((textOperator(token), stack.pop(), process(tokens)))
elif category == Type.leftparentheses:
stack.append(process(tokens))
elif category == Type.rightparentheses:
return stack.pop()
elif category == Type.empty:
continue
print("stack = ", stack)
return stack.pop()
INFIX = "1 + ((C + A ) * (B - F))"
# pop/append work from right, so reverse, and require a real list
postfix = process(list(INFIX[::-1]))
print(postfix)
The result of this program is a structure like:
('add', '1', ('multiply', ('add', 'C', 'A'), ('subtract', 'B', 'F')))
Which you should be able to post process into the string form you desire (again, recursively...)
PS: type and next are Python built-ins and/or reserved words, don't use them for variable names.
PPS: replace INFIX[::-1] with sys.argv[1][::-1] and you can pass test cases into the program to see what it does with them.
PPPS: like your original, this only handles single digit numbers (or single letter variables), you'll need to provide a better tokenizer than list() to get that working right.

Categories

Resources