A translator that replaces vowels with a string - python

For those that don't know, replacing vowels with 'ooba' has become a popular trend on https://reddit.com/r/prequelmemes . I would like to automate this process by making a program with python 2.7 that replaces vowels with 'ooba'. I have no idea where to get started

You could use a simple regular expression:
import re
my_string = 'Hello!'
my_other_string = re.sub(r'[aeiou]', 'ooba', my_string)
print(my_other_string) # Hooballooba!

Following method is suggested if the line is short. I would prefer using regex otherwise. Following assumes that your text is s.
s = ''.join(['ooba' if i in ['a', 'e', 'i', 'o', 'u'] else i for i in s])
Regex approach:
import re
s = re.sub(r'a|e|i|o|u', "ooba", s)

For a quick and simple answer, you could feed string meme into here
for i, c in enumerate(meme):
if c in ['a', 'e', 'i', 'o', 'u']:
meme[:i] = meme[:i] + 'ooba' + meme[i+1:]
It goes over each character in the string, and checks if it is a vowel. If it is, it slices around the index and inserts 'ooba' where it used to be.

Related

Drop Duplicate Substrings from String with NO Spaces

Given a Pandas DF column that looks like this:
...how can I turn it into this:
XOM
ZM
AAPL
SOFI
NKLA
TIGR
Although these strings appear to be 4 characters in length maximum, I can't rely on that, I want to be able to have a string like ABCDEFGHIJABCDEFGHIJ and still be able to turn it into ABCDEFGHIJ in one column calculation. Preferably WITHOUT for looping/iterating through the rows.
You can use regex pattern like r'\b(\w+)\1\b' with str.extract like below:
df = pd.DataFrame({'Symbol':['ZOMZOM', 'ZMZM', 'SOFISOFI',
'ABCDEFGHIJABCDEFGHIJ', 'NOTDUPLICATED']})
print(df['Symbol'].str.extract(r'\b(\w+)\1\b'))
Output:
0
0 ZOM
1 ZM
2 SOFI
3 ABCDEFGHIJ
4 NaN # <- from `NOTDUPLICATED`
Explanation:
\b is a word boundary
(w+) capture a word
\1 references to captured (w+) of the first group
An alternative approach which does involve iteration, but also regular expressions. Evaluate longest possible substrings first, getting progressively shorter. Use the substring to compile a regex that looks for the substring repeated two or more times. If it finds that, replace it with a single occurrence of the substring.
Does not handle leading or trailing characters. that are not part of the repetition.
When it performs a removal, it returns, breaking the loop. Going with longest substrings first ensures things like 'AAPLAAPL' leave the double A intact.
import re
def remove_repeated(str):
for i in range(len(str)):
substr = str[i:]
pattern = re.compile(f"({substr}){{2,}}")
if pattern.search(str):
return pattern.sub(substr, str)
return str
>>> remove_repeated('abcdabcd')
'abcd'
>>> remove_repeated('abcdabcdabcd')
'abcd'
>>> remove_repeated('aabcdaabcdaabcd')
'aabcd'
If we want to make this more flexible, a helper function to get all of the substrings in a string, starting with the longest, but as a generator expression so we don't have to actually generate more than we need.
def substrings(str):
return (str[i:i+l] for l in range(len(str), 0, -1)
for i in range(len(str) - l + 1))
>>> list(substrings("hello"))
['hello', 'hell', 'ello', 'hel', 'ell', 'llo', 'he', 'el', 'll', 'lo', 'h', 'e', 'l', 'l', 'o']
But there's no way 'hello' is going to be repeated in 'hello', so we can make this at least somewhat more efficient by looking at only substrings at most half the length of the input string.
def substrings(str):
return (str[i:i+l] for l in range(len(str)//2, 0, -1)
for i in range(len(str) - l + 1))
>>> list(substrings("hello"))
['he', 'el', 'll', 'lo', 'h', 'e', 'l', 'l', 'o']
Now, a little tweak to the original function:
def remove_repeated(str):
for s in substrings(str):
pattern = re.compile(f"({s}){{2,}}")
if pattern.search(str):
return pattern.sub(s, str)
return str
And now:
>>> remove_repeated('AAPLAAPL')
'AAPL'
>>> remove_repeated('fooAAPLAAPLbar')
'fooAAPLbar'

python - how to use the join method and sort method

My purpose is to get an input as a string and return a list of lower case letters of that string, without repeats, without punctuations, in alphabetical order. For example, the input "happy!" would get ['a','h','p','y']. I try to use the join function to get rid of my punctuations but somehow it doesn't work. Does anybody know why? Also, can sort.() sort alphabets? Am I using it in the right way? Thanks!
def split(a):
a.lower()
return [char for char in a]
def f(a):
i=split(a)
s=set(i)
l=list(s)
v=l.join(u for u in l if u not in ("?", ".", ";", ":", "!"))
v.sort()
return v
.join() is a string method, but being used on a list, so the code raises an exception, but join and isn't really needed here.
You're on the right track with set(). It only stores unique items, so create a set of your input and compute the intersection(&) with lower case letters. Sort the result:
>>> import string
>>> s = 'Happy!'
>>> sorted(set(s.lower()) & set(string.ascii_lowercase))
['a', 'h', 'p', 'y']
You could use:
def f(a):
return sorted(set(a.lower().strip('?.;:!')))
>>> f('Happy!')
['a', 'h', 'p', 'y']
You could also use regex for this:
pattern = re.compile(r'[^a-z]')
string = 'Hello# W0rld!!##'
print(sorted(set(pattern.sub('', string))))
Output:
['d', 'e', 'l', 'o', 'r']

How to convert a string to a list if the string has wild characters for a group of characters like [] or {}, ()

I have a string of this sort
s = 'a,s,[c,f],[f,t]'
I want to convert this to a list
S = ['a','s',['c','f'],['f','t']]
I tried using strip()
d = s.strip('][').split(',')
But it is not giving me the desired output:
output = ['a', 's', '[c', 'f]', '[f', 't']
You could use ast.literal_eval(), having first enclosed each element in quotes:
>>> qs = re.sub(r'(\w+)', r'"\1"', s) # add quotes
>>> ast.literal_eval('[' + qs + ']') # enclose in brackets & safely eval
['a', 's', ['c', 'f'], ['f', 't']]
You may need to tweak the regex if your elements can contain non-word characters.
This only works if your input string follows Python expression syntax or is sufficiently close to be mechanically converted to Python syntax (as we did above by adding quotes and brackets). If this assumption does not hold, you might need to look into using a parsing library. (You could also hand-code a recursive descent parser, but that'll probably be more work to do correctly than just using a parsing library.)
Alternative to ast.literal_eval you can use the json package with more or less the same restrictions of NPE's answer:
import re
import json
qs = re.sub(r'(\w+)', r'"\1"', s) # add quotes
ls = json.loads('[' + qs + ']')
print(ls)
# ['a', 's', ['c', 'f'], ['f', 't']]

Anti_vowel function does not remove a vowel [duplicate]

This question already has answers here:
How to remove items from a list while iterating?
(25 answers)
Closed 7 years ago.
def anti_vowel(text):
upper_vowels = ['A', 'E', 'I', 'O', 'U']
lower_vowels = ['a', 'e', 'i', 'o', 'u']
char_list = list(text)
for char in char_list:
if char in upper_vowels or char in lower_vowels:
char_list.remove(char)
new_word = ''
for char in char_list:
new_word += char
return new_word
If I pass anti_vowel("Hey look Words!"), I get the result 'Hy lk Words!'. The 'o' in 'Words!' is not removed.
As others have noted, you're modifying the list while you iterate over it. That's not going to work: the list changes size while you iterate, causing you to skip over some characters. You could use a list copy to avoid this, but maybe consider just using a better approach.
You can build the string up character-by-character, instead:
def anti_vowel(text):
vowels = 'AEIOUaeiou'
result = []
for c in text:
if c not in vowels:
result.append(c)
return ''.join(result)
This is faster, safer, shorter and easier to read than the original approach.
Alternately, you can use Python's powerful built-in string functions to do this very simply:
def anti_vowel(text):
return text.translate(dict.fromkeys(map(ord, 'aeiouAEIOU')))
This pushes the text through a translation table that deletes all the vowels. Easy!
Few things going on here.
Never modify lists while iterating through them. Your could create a simple copy by calling list[:] instead if you'd to continue using your technique.
Use list comprehension to simplify your code further.
Simplifying the code a bit:
def anti_vowel(text):
upper_vowels = ['A', 'E', 'I', 'O', 'U']
lower_vowels = [s.lower() for s in upper_vowels]
char_list = list(text)
for char in char_list[:]:
if char in upper_vowels or char in lower_vowels:
char_list.remove(char)
return ''.join(char_list)
If you want to simplify everything under a single lambda then use the following:
anti_vowel = lambda text: ''.join([s for s in text if s.upper() not in ['A', 'E', 'I', 'O', 'U']])
You can call it like regular function anti_vowel(). The idea behind it as follows:
Iterate through every character, taking the upper case
Use list comprehension to create a new list out of those characters, only if they do not exist in the vowels. ([x for x in list])
Finally put the characters together (''.join([])

Python, splitting strings on middle characters with overlapping matches using regex

In Python, I am using regular expressions to retrieve strings from a dictionary which show a specific pattern, such as having some repetitions of characters than a specific character and another repetitive part (e.g. ^(\w{0,2})o(\w{0,2})$).
This works as expected, but now I'd like to split the string in two substrings (eventually one might be empty) using the central character as delimiter. The issue I am having stems from the possibility of multiple overlapping matches inside a string (e.g. I'd want to use the previous regex to split the string room in two different ways, (r, om) and (ro, m)).
Both re.search().groups() and re.findall() did not solve this issue, and the docs on the re module seems to point out that overlapping matches would not be returned by the methods.
Here is a snippet showing the undesired behaviour:
import re
dictionary = ('room', 'door', 'window', 'desk', 'for')
regex = re.compile('^(\w{0,2})o(\w{0,2})$')
halves = []
for word in dictionary:
matches = regex.findall(word)
if matches:
halves.append(matches)
I am posting this as an answer mainly not to leave the question answered in the case someone stumbles here in the future and since I've managed to reach the desired behaviour, albeit probably not in a very pythonic way, this might be useful as a starting point from someone else. Some notes on how improve this answer (i.e. making more "pythonic" or simply more efficient would be very welcomed).
The only way of getting all the possible splits of the words having length in a certain range and a character in certain range of positions, using the characters in the "legal" positions as delimiters, both using there and the new regex modules involves using multiple regexes. This snippet allows to create at runtime an appropriate regex knowing the length range of the word, the char to be seek and the range of possible positions of such character.
dictionary = ('room', 'roam', 'flow', 'door', 'window',
'desk', 'for', 'fo', 'foo', 'of', 'sorrow')
char = 'o'
word_len = (3, 6)
char_pos = (2, 3)
regex_str = '(?=^\w{'+str(word_len[0])+','+str(word_len[1])+'}$)(?=\w{'
+str(char_pos[0]-1)+','+str(char_pos[1]-1)+'}'+char+')'
halves = []
for word in dictionary:
matches = re.match(regex_str, word)
if matches:
matched_halves = []
for pos in xrange(char_pos[0]-1, char_pos[1]):
split_regex_str = '(?<=^\w{'+str(pos)+'})'+char
split_word =re.split(split_regex_str, word)
if len(split_word) == 2:
matched_halves.append(split_word)
halves.append(matched_halves)
The output is:
[[['r', 'om'], ['ro', 'm']], [['r', 'am']], [['fl', 'w']], [['d', 'or'], ['do', 'r']], [['f', 'r']], [['f', 'o'], ['fo', '']], [['s', 'rrow']]]
At this point I might start considering using a regex just to find the to words to be split and the doing the splitting in 'dumb way' just checking if the characters in the range positions are equal char. Anyhow, any remark is extremely appreciated.
EDIT: Fixed.
Does a simple while loop work?
What you want is re.search and then loop with a 1 shift:
https://docs.python.org/2/library/re.html
>>> dictionary = ('room', 'door', 'window', 'desk', 'for')
>>> regex = re.compile('(\w{0,2})o(\w{0,2})')
>>> halves = []
>>> for word in dictionary:
>>> start = 0
>>> while start < len(word):
>>> match = regex.search(word, start)
>>> if match:
>>> start = match.start() + 1
>>> halves.append([match.group(1), match.group(2)])
>>> else:
>>> # no matches left
>>> break
>>> print halves
[['ro', 'm'], ['o', 'm'], ['', 'm'], ['do', 'r'], ['o', 'r'], ['', 'r'], ['nd', 'w'], ['d', 'w'], ['', 'w'], ['f', 'r'], ['', 'r']]

Categories

Resources