Replace some accented letters from word in python - python

I'm trying to replace some accented letters from Portuguese words in Python.
accentedLetters = ['à', 'á', 'â', 'ã', 'é', 'ê', 'í', 'ó', 'ô', 'õ', 'ú', 'ü']
letters = ['a', 'a', 'a', 'a', 'e', 'e', 'i', 'o', 'o', 'o', 'u', 'u']
So the accentedLetters will be replaced by the letter in the letters array.
In this way, my expected results are for example:
ação => açao
frações => fraçoes
How can I do that?

A simple translation dictionary should do the trick. For each letter, if the letter is in the dictionary, use its translation. Otherwise, use the original. Join the individual characters back into a word.
def removeAccents(word):
repl = {'à': 'a', 'á': 'a', 'â': 'a', 'ã': 'a',
'é': 'e', 'ê': 'e',
'í': 'i',
'ó': 'o', 'ô': 'o', 'õ': 'o',
'ú': 'u', 'ü': 'u'}
new_word = ''.join([repl[c] if c in repl else c for c in word])
return new_word

You can view the Unidecode library for Python3.
For example:
from unidecode import unidecode
a = ['à', 'á', 'â', 'ã', 'é', 'ê', 'í', 'ó', 'ô', 'õ', 'ú', 'ü']
for k in a:
print (unidecode(u'{0}'.format(k)))
Result:
a
a
a
a
e
e
i
o
o
o
u
u

I have finally solved my problem:
#! /usr/bin/python
# -*- coding: utf-8 -*-
import sys
def removeAccents(word):
replaceDict = {'à'.decode('utf-8'): 'a',
'á'.decode('utf-8'): 'a',
'â'.decode('utf-8'): 'a',
'ã'.decode('utf-8'): 'a',
'é'.decode('utf-8'): 'e',
'ê'.decode('utf-8'): 'e',
'í'.decode('utf-8'): 'i',
'ó'.decode('utf-8'): 'o',
'ô'.decode('utf-8'): 'o',
'õ'.decode('utf-8'): 'o',
'ú'.decode('utf-8'): 'u',
'ü'.decode('utf-8'): 'u'}
finalWord = ''
for letter in word:
if letter in replaceDict:
finalWord += replaceDict[letter]
else:
finalWord += letter
return finalWord
word = (sys.argv[1]).decode('utf-8')
print removeAccents(word)
This just works as I expected.

Another simple option using regex:
import re
def remove_accents(string):
if type(string) is not unicode:
string = unicode(string, encoding='utf-8')
string = re.sub(u"[àáâãäå]", 'a', string)
string = re.sub(u"[èéêë]", 'e', string)
string = re.sub(u"[ìíîï]", 'i', string)
string = re.sub(u"[òóôõö]", 'o', string)
string = re.sub(u"[ùúûü]", 'u', string)
string = re.sub(u"[ýÿ]", 'y', string)
return string

Related

Python ignore punctuation and white space

string = "Python, program!"
result = []
for x in string:
if x not in result:
result.append(x)
print(result)
This program makes it so if a repeat letter is used twice in a string, it'll appear only once in the list. In this case, the string "Python, program!" will appear as
['P', 'y', 't', 'h', 'o', 'n', ',', ' ', 'p', 'r', 'g', 'a', 'm', '!']
My question is, how do I make it so the program ignores punctuation such as ". , ; ? ! -", and also white spaces? So the final output would look like this instead:
['P', 'y', 't', 'h', 'o', 'n', 'p', 'r', 'g', 'a', 'm']
Just check if the string (letter) is alphanumeric using str.isalnum as an additional condition before appending the character to the list:
string = "Python, program!"
result = []
for x in string:
if x.isalnum() and x not in result:
result.append(x)
print(result)
Output:
['P', 'y', 't', 'h', 'o', 'n', 'p', 'r', 'g', 'a', 'm']
If you don't want numbers in your output, try str.isalpha() instead (returns True if the character is alphabetic).
You can filler them out using the string module. This build in library contains several constants that refer to collections of characters in order, like letters and whitespace.
import string
start = "Python, program!" #Can't name it string since that's the module's name
result = []
for x in start:
if x not in result and (x in string.ascii_letters):
result.append(x)
print(result)

TypeError: can only concatenate str (not "int") to str (I don't think that should happen)

I decided it would be a cool idea to make a translator to a custom language, so I tried making one. However, I am fairly new to python, and I cannot figure out why it is expecting a string instead of an integer. What I am trying to do is make it so if you enter in a word such as 'bin', it will go to the next consonant/vowel for each, so 'bin' ends up as 'cop' as the next consonant after 'b' is 'c', the next vowel after 'i' is 'o' and the next consonant after 'n' is 'p'.
consonants = ['b', 'c', 'd', 'f', 'g', 'h', 'j', 'k', 'l', 'm', 'n', 'p', 'q', 'r', 's', 't', 'v', 'w', 'x', 'y', 'z']
vowels = ['a', 'e', 'i', 'o', 'u']
translated_word = ''
word_to_translate = input('Enter in the word to translate! ')
for letter in range(len(word_to_translate)):
new_letter = word_to_translate[letter - 1]
if new_letter in consonants:
l = (consonants[:new_letter + 1])
translated_word = translated_word + str(l)
elif new_letter in vowels:
l = (vowels[:new_letter + 1])
translated_word = translated_word + str(l)
print(translated_word)
consonants = ['b', 'c', 'd', 'f', 'g', 'h', 'j', 'k', 'l', 'm', 'n', 'p', 'q', 'r', 's', 't', 'v', 'w', 'x', 'y', 'z']
vowels = ['a', 'e', 'i', 'o', 'u']
translated_word = ''
word_to_translate = input('Enter in the word to translate! ')
for i in word_to_translate:
if i in consonants:
ind = consonants.index(i)
translated_word += consonants[ind+1]
elif i in vowels:
ind = vowels.index(i)
translated_word += vowels[ind+1]
print (translated_word)

Check if a string is in a list of letters - Python3

I have this list which contains letters, and I need to check if a pre-determined word located in another list is horizontally inside this list of letters.
i.e.:
mat_input = [['v', 'e', 'd', 'j', 'n', 'a', 'e', 'o'], ['i', 'p', 'y', 't', 'h', 'o', 'n', 'u'], ['s', 'u', 'e', 'w', 'e', 't', 'a', 'e']]
words_to_search = ['python', 'fox']
I don't need to tell if a word was not found, but if it was I need to tell which one.
My problem is that so far I've tried to compare letter by letter, in a loop similar to this:
for i in range(n): # n = number of words
for j in range(len(word_to_search[i])): # size of the word I'm searching
for k in range(h): # h = height of crossword
for m in range(l): # l = lenght of crossword
But it's not working, inside the last loop I tried several if/else conditions to tell if the whole word was found. How can I solve this?
You can use str.join:
mat_input = [['v', 'e', 'd', 'j', 'n', 'a', 'e', 'o'], ['i', 'p', 'y', 't', 'h', 'o', 'n', 'u'], ['s', 'u', 'e', 'w', 'e', 't', 'a', 'e']]
words_to_search = ['python', 'fox']
joined_input = list(map(''.join, mat_input))
results = {i:any(i in b or i in b[::-1] for b in joined_input) for i in words_to_search}
Output:
{'python': True, 'fox': False}
I'd start by joining each sublist in mat_input into one string:
mat_input_joined = [''.join(x) for x in mat_input]
Then loop over your words to search and simply use the in operator to see if the word is contained in each string:
for word_to_search in words_to_search:
result = [word_to_search in x for x in mat_input_joined]
print('Word:',word_to_search,'found in indices:',[i for i, x in enumerate(result) if x])
Result:
Word: python found in indices: [1]
Word: fox found in indices: []

Python: using indices and str

I am attempting to learn Python and am working on an assignment for fun that involves translating "encrypted" messages (it's just the alphabet in reverse). My function is supposed to be able to read in an encoded string and then print out its decoded string equivalent. However, as I am new to Python, I find myself continually running into a type error with trying to use the indices of my lists to give the values. If anyone has any pointers on a better approach or if there is something that I just plain missed, that would be awesome.
def answer(s):
'''
All lowercase letters [a-z] have been swapped with their corresponding values
(e.g. a=z, b=y, c=x, etc.) Uppercase and punctuation characters are unchanged.
Write a program that can take in encrypted input and give the decrypted output
correctly.
'''
word = ""
capsETC = 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M',\
'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z',\
' ', '?', '\'', '\"', '#', '!', '#', '$', '%', '&', '*', '(', \
') ', '-', '_', '+', '=', '<', '>', '/', '\\'
alphF = 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n',\
'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'
alphB = 'z', 'y', 'x', 'w', 'v', 'u', 't', 's', 'r', 'q', 'p', 'o', 'n', 'm',\
'l', 'k', 'j', 'i', 'h', 'g', 'f', 'e', 'd', 'c', 'b', 'a'
for i in s:
if i in capsETC: # if letter is uppercase or punctuation
word = word + i # do nothing
elif i in alphB: # else, do check
for x in alphB: # for each index in alphB
if i == alphB[x]: # if i and index are equal (same letter)
if alphB[x] == alphF[x]: # if indices are equal
newLetter = alphF[x] # new letter equals alpf at index x
str(newLetter) # convert to str?
word = word + newLetter # add to word
print(word)
s = "Yvzs!"
answer(s)
your code is fine, just a few changes (left your old lines as comments)
def answer(s):
word = ""
capsETC = 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M',\
'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z',\
' ', '?', '\'', '\"', '#', '!', '#', '$', '%', '&', '*', '(', \
') ', '-', '_', '+', '=', '<', '>', '/', '\\'
alphF = 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n',\
'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'
alphB = 'z', 'y', 'x', 'w', 'v', 'u', 't', 's', 'r', 'q', 'p', 'o', 'n', 'm',\
'l', 'k', 'j', 'i', 'h', 'g', 'f', 'e', 'd', 'c', 'b', 'a'
for i in s:
if i in capsETC: # if letter is uppercase or punctuation
word = word + i # do nothing
elif i in alphB: # else, do check
for x in range(len(alphB)): # for each index in alphB
if i == alphB[x]: # if i and index are equal (same letter)
# if alphB[x] == alphF[x]: # if indices are equal
newLetter = alphF[x] # new letter equals alpf at index x
# str(newLetter) # convert to str?
word = word + newLetter # add to word
return word
s = "Yvzs!"
print(s)
print(answer(s))
ouput
Yvzs!
Yeah!
of course you can make it a lot simple and python's way... but wanted to change your code as little as possible
Your current issue is that you are trying to use letters as indices. To fix your current approach, you could use enumerate while looping through each of your strings.
If you want a much simpler approach, you can make use of str.maketrans and str.translate. These two builtin functions help easily solve this problem:
import string
unenc = string.ascii_lowercase # abcdefghijklmnopqrstuvwxyz
decd = unenc[::-1] # zyxwvutsrqponmlkjihgfedcba
secrets = str.maketrans(unenc, decd)
s = "Yvzs!"
print(s.translate(secrets))
Output:
Yeah!
If you want a looping approach, you can use try and except along with string.index() to achieve a much simpler loop:
import string
unenc = string.ascii_lowercase # abcdefghijklmnopqrstuvwxyz
decd = unenc[::-1] # zyxwvutsrqponmlkjihgfedcba
s = "Yvzs!"
word = ''
for i in s:
try:
idx = unenc.index(i)
except:
idx = -1
word += decd[idx] if idx != -1 else i
print(word)
Output:
Yeah!

Python newspace persists

I'm trying to make a name generator:
from random import randint
relation = {
'A' : ['B', 'C', 'D', 'F', 'R', 'Y'],
'B' : ['E', 'O', 'I'],
'C' : ['A', 'E', 'H', 'R', 'O', 'I'],
'D' : ['A', 'E', 'H', 'R', 'O', 'I'],
'E' : ['R', 'T', 'P', 'S', 'F', 'L', 'X'],
'F' : ['E', 'U', 'I', 'O', 'A'],
'G' : ['R', 'O', 'A'],
'H' : ['E', 'I', 'O', 'A'],
'I' : ['N', 'X', 'S', 'E', 'T', 'P', 'L', 'M'],
'J' : ['A', 'I', 'O', 'Y'],
'K' : ['I', 'E', 'A'],
'L' : ['I', 'E'],
'M' : ['O', 'Y', 'I'],
'N' : ['E', 'I', 'O', 'A'],
'O' : ['V', 'T', 'N'],
'P' : ['I', 'A', 'E', 'O'],
'Q' : ['U', 'E', 'I'],
'R' : ['E', 'I', 'A'],
'S' : ['T', 'I', 'O', 'A', 'H'],
'T' : ['H', 'E', 'I'],
'U' : ['B', 'G', 'L'],
'V' : ['E', 'U', 'I', 'A'],
'X' : ['I', 'O'],
'Y' : ['E', 'L'],
'Z' : ['O', 'I']
}
char = (raw_input("Enter an English alphabet: ")).upper()
letters = int(raw_input("How many letters: "))
for i in range(0, letters):
if i==0:
print char,
else:
print char.lower(),
char = (relation[char])[randint(0, len(relation[char])-1)]
print ''
raw_input("Press [ENTER] to exit...")
But the problem is that there is a whitespace when it prints the name.
For example:
Enter an English alphabet: T
How many letters: 5
T i p a y
Press [ENTER] to exit...
How to remove the whitespace?
P.S: I'm a beginner :)
It's the commas that's causing this. print statements separated with commas add white spaces:
print "a", "b"
Prints a b
print "a",
Prints a  (with white space)
print "a"
Prints a (without white space)
You can, however, change your code to use a variable:
name = ''
for i in range(0, letters):
if i==0:
name += char
else:
name += char.lower()
char = (relation[char])[randint(0, len(relation[char])-1)]
print name
print ''
Or shorter and more efficient:
letter_list = []
for i in range(0, letters):
letter_list.append(char)
char = (relation[char])[randint(0, len(relation[char])-1)]
name = ''.join(letter_list)
print name.lower().capitalize()
print ''
I am not exactly sure your reasoning for printing as you go along, but you could have a word variable and then append all the char you come up with to the word variable and just print it at the end. This will not have the spaces in between the letters
word = ''
for i in range(0, letters):
word += char
char = (relation[char])[randint(0, len(relation[char])-1)]
print word.lower().capitalize()
print ''
Based on PM 2 Rings suggestion you could also do it this way:
charList = []
for i in range(0, letters):
charList.append(char)
char = (relation[char])[randint(0, len(relation[char])-1)]
print ''.join(charList).lower().capitalize()
print ''

Categories

Resources