I'm new to Python and I am trying to replace all uppercase-letters within a word to underscores, for example:
ThisIsAGoodExample
should become
this_is_a_good_example
Any ideas/tips/links/tutorials on how to achieve this?
Here's a regex way:
import re
example = "ThisIsAGoodExample"
print re.sub( '(?<!^)(?=[A-Z])', '_', example ).lower()
This is saying, "Find points in the string that aren't preceeded by a start of line, and are followed by an uppercase character, and substitute an underscore. Then we lower()case the whole thing.
import re
"_".join(l.lower() for l in re.findall('[A-Z][^A-Z]*', 'ThisIsAGoodExample'))
EDIT:
Actually, this only works, if the first letter is uppercase. Otherwise this (taken from here) does the right thing:
def convert(name):
s1 = re.sub('(.)([A-Z][a-z]+)', r'\1_\2', name)
return re.sub('([a-z0-9])([A-Z])', r'\1_\2', s1).lower()
This generates a list of items, where each item is "_" followed by the lowercased letter if the character was originally an uppercase letter, or the character itself if it wasn't. Then it joins them together into a string and removes any leading underscores that might have been added by the process:
print ''.join('_' + char.lower() if char.isupper() else char
for char in inputstring).lstrip('_')
BTW, you haven't specified what to do with underscores that are already present in the string. I wasn't sure how to handle that case so I punted.
As no-one else has offered a solution using a generator, here's one:
>>> sample = "ThisIsAGoodExample"
>>> def upperSplit(data):
... buff = ''
... for item in data:
... if item.isupper():
... if buff:
... yield buff
... buff = ''
... buff += item
... yield buff
...
>>> list(upperSplit(sample))
['This', 'Is', 'A', 'Good', 'Example']
>>> "_".join(upperSplit(sample)).lower()
'this_is_a_good_example'
example = 'ThisIsAGoodExample'
# Don't put an underscore before first character.
new_example = example[0].lower()
for character in example[1:]:
# Append an underscore if the character is uppercase.
if character.isupper():
new_example += '_'
new_example += character.lower()
Parse your string, each time you encounter an upper case letter, insert an _ before it and then switch the found character to lower case
An attempt at a readable version:
import re
_uppercase_part = re.compile('[A-Z][^A-Z]*')
def uppercase_to_underscore(name):
result = ''
for match in _uppercase_part.finditer(name):
if match.span()[0] > 0:
result += '_'
result += match.group().lower()
return result
Related
I have a python challenge that if given a string with '_' or '-' in between each word such as the_big_red_apple or the-big-red-apple to convert it to camel case. Also if the first word is uppercase keep it as uppercase. This is my code. Im not allowed to use the re library in the challenge however but I didn't know how else to do it.
from re import sub
def to_camel_case(text):
if text[0].isupper():
text = sub(r"(_|-)+"," ", text).title().replace(" ", "")
else:
text = sub(r"(_|-)+"," ", text).title().replace(" ", "")
text = text[0].lower() + text[1:]
return print(text)
Word delimiters can be - dash or _ underscore.
Let's simplify, making them all underscores:
text = text.replace('-', '_')
Now we can break out words:
words = text.split('_')
With that in hand it's simple to put them back together:
text = ''.join(map(str.capitalize, words))
or more verbosely, with a generator expression,
assign ''.join(word.capitalize() for word in words).
I leave "finesse the 1st character"
as an exercise to the reader.
If you RTFM you'll find it contains a wealth of knowledge.
https://docs.python.org/3/library/re.html#raw-string-notation
'+'
Causes the resulting RE to match 1 or more repetitions of the preceding RE. ab+ will match ‘a’ followed by any non-zero number of ‘b’s
The effect of + is turn both
db_rows_read and
db__rows_read
into DbRowsRead.
Also,
Raw string notation (r"text") keeps regular expressions sane.
The regex in your question doesn't exactly
need a raw string, as it has no crazy
punctuation like \ backwhacks.
But it's a very good habit to always put
a regex in an r-string, Just In Case.
You never know when code maintenance
will tack on additional elements,
and who wants a subtle regex bug on their hands?
You can try it like this :
def to_camel_case(text):
s = text.replace("-", " ").replace("_", " ")
s = s.split()
if len(text) == 0:
return text
return s[0] + ''.join(i.capitalize() for i in s[1:])
print(to_camel_case('momo_es-es'))
the output of print(to_camel_case('momo_es-es')) is momoEsEs
r"..." refers to Raw String in Python which simply means treating backlash \ as literal instead of escape character.
And (_|-)[+] is a Regular Expression that match the string containing one or more - or _ characters.
(_|-) means matching the string that contains - or _.
+ means matching the above character (- or _) than occur one or more times in the string.
In case you cannot use re library for this solution:
def to_camel_case(text):
# Since delimiters can have 2 possible answers, let's simplify it to one.
# In this case, I replace all `_` characters with `-`, to make sure we have only one delimiter.
text = text.replace("_", "-") # the_big-red_apple => the-big-red-apple
# Next, we should split our text into words in order for us to iterate through and modify it later.
words = text.split("-") # the-big-red-apple => ["the", "big", "red", "apple"]
# Now, for each word (except the first word) we have to turn its first character to uppercase.
for i in range(1, len(words)):
# `i`start from 1, which means the first word IS NOT INCLUDED in this loop.
word = words[i]
# word[1:] means the rest of the characters except the first one
# (e.g. w = "apple" => w[1:] = "pple")
words[i] = word[0].upper() + word[1:].lower()
# you can also use Python built-in method for this:
# words[i] = word.capitalize()
# After this loop, ["the", "big", "red", "apple"] => ["the", "Big", "Red", "Apple"]
# Finally, we put the words back together and return it
# ["the", "Big", "Red", "Apple"] => theBigRedApple
return "".join(words)
print(to_camel_case("the_big-red_apple"))
Try this:
First, replace all the delimiters into a single one, i.e. str.replace('_', '-')
Split the string on the str.split('-') standardized delimiter
Capitalize each string in list, i.e. str.capitilize()
Join the capitalize string with str.join
>>> s = "the_big_red_apple"
>>> s.replace('_', '-').split('-')
['the', 'big', 'red', 'apple']
>>> ''.join(map(str.capitalize, s.replace('_', '-').split('-')))
'TheBigRedApple'
>> ''.join(word.capitalize() for word in s.replace('_', '-').split('-'))
'TheBigRedApple'
If you need to lowercase the first char, then:
>>> camel_mile = lambda x: x[0].lower() + x[1:]
>>> s = 'TheBigRedApple'
>>> camel_mile(s)
'theBigRedApple'
Alternative,
First replace all delimiters to space str.replace('_', ' ')
Titlecase the string str.title()
Remove space from string, i.e. str.replace(' ', '')
>>> s = "the_big_red_apple"
>>> s.replace('_', ' ').title().replace(' ', '')
'TheBigRedApple'
Another alternative,
Iterate through the characters and then keep a pointer/note on previous character, i.e. for prev, curr in zip(s, s[1:])
check if the previous character is one of your delimiter, if so, uppercase the current character, i.e. curr.upper() if prev in ['-', '_'] else curr
skip whitepace characters, i.e. if curr != " "
Then add the first character in lowercase, [s[0].lower()]
>>> chars = [s[0].lower()] + [curr.upper() if prev in ['-', '_'] else curr for prev, curr in zip(s, s[1:]) if curr != " "]
>>> "".join(chars)
'theBigRedApple'
Yet another alternative,
Replace/Normalize all delimiters into a single one, s.replace('-', '_')
Convert it into a list of chars, list(s.replace('-', '_'))
While there is still '_' in the list of chars, keep
find the position of the next '_'
replacing the character after '_' with its uppercase
replacing the '_' with ''
>>> s = 'the_big_red_apple'
>>> s_list = list(s.replace('-', '_'))
>>> while '_' in s_list:
... where_underscore = s_list.index('_')
... s_list[where_underscore+1] = s_list[where_underscore+1].upper()
... s_list[where_underscore] = ""
...
>>> "".join(s_list)
'theBigRedApple'
or
>>> s = 'the_big_red_apple'
>>> s_list = list(s.replace('-', '_'))
>>> while '_' in s_list:
... where_underscore = s_list.index('_')
... s_list[where_underscore:where_underscore+2] = ["", s_list[where_underscore+1].upper()]
...
>>> "".join(s_list)
'theBigRedApple'
Note: Why do we need to convert the string to list of chars? Cos strings are immutable, 'str' object does not support item assignment
BTW, the regex solution can make use of some group catching, e.g.
>>> import re
>>> s = "the_big_red_apple"
>>> upper_regex_group = lambda x: x.group(1).upper()
>>> re.sub("[_|-](\w)", upper_regex_group, s)
'theBigRedApple'
>>> re.sub("[_|-](\w)", lambda x: x.group(1).upper(), s)
'theBigRedApple'
I need to take the initial letter of every word, moving it to the end of the word and adding 'arg'. For such I tried the following way
def pirate(str):
list_str = str.split(' ')
print(list_str)
new_str = ''
for lstr in list_str:
first_element = lstr[0]
second_element = lstr[1:]
new_str += second_element + first_element + 'arg' + ' '
return new_str
print(pirate('Hello! how are, you!!'))
The expected output is: elloHarg! owharg reaarg, ouyarg!!
However, I am getting following output: ello!Harg owharg re,aarg ou!!yarg
How can I make it work the following usecase?
Punctuations should remain at the end of the word even after translation. Assume Punctuations wont appear after than end of the word. Punctuations to be considered are .,:;?! There could be multiple punctuations present (e.g yes!!)
Here is a short and efficient solution using a regex:
import re
re.sub(r'(\w)(\w+)', r'\2\1arg', 'Hello! how are, you!!')
This is literally: replace each single letter followed by more letters by the more letters first, then the single letter and 'arg'
Output:
'elloHarg! owharg reaarg, ouyarg!!'
As a function:
def pirate(s):
return re.sub(r'(\w)(\w+)', r'\2\1arg', s)
I made a function that replaces multiple instances of a single character with multiple patterns depending on the character location.
There were two ways I found to accomplish this:
This one looks horrible but it works:
def xSubstitution(target_string):
while target_string.casefold().find('x') != -1:
x_finded = target_string.casefold().find('x')
if (x_finded == 0 and target_string[1] == ' ') or (target_string[x_finded-1] == ' ' and
((target_string[-1] == 'x' or 'X') or target_string[x_finded+1] == ' ')):
target_string = target_string.replace(target_string[x_finded], 'ecks', 1)
elif (target_string[x_finded+1] != ' '):
target_string = target_string.replace(target_string[x_finded], 'z', 1)
else:
target_string = target_string.replace(target_string[x_finded], 'cks', 1)
return(target_string)
This one technically works, but I just can't get the regex patterns right:
import re
def multipleRegexSubstitutions(sentence):
patterns = {(r'^[xX]\s'): 'ecks ', (r'[^\w]\s?[xX](?!\w)'): 'ecks',
(r'[\w][xX]'): 'cks', (r'[\w][xX][\w]'): 'cks',
(r'^[xX][\w]'): 'z',(r'\s[xX][\w]'): 'z'}
regexes = [
re.compile(p)
for p in patterns
]
for regex in regexes:
for match in re.finditer(regex, sentence):
match_location = sentence.casefold().find('x', match.start(), match.end())
sentence = sentence.replace(sentence[match_location], patterns.get(regex.pattern), 1)
return sentence
From what I figured it out, the only problem in the second function is the regex patterns. Could someone help me?
EDIT: Sorry I forgot to tell that the regexes are looking for the different x characters in a string, and replace an X in the beggining of a word for a 'Z', in the middle or end of a word for 'cks' and if it is a lone 'x' char replace with 'ecks'
You need \b (word boundary) and \B (position other than word boundary):
Replace an X in the beggining of a word for a 'Z'
re.sub(r'\bX\B', 'Z', s, flags=re.I)
In the middle or end of a word for 'cks'
re.sub(r'\BX', 'cks', s, flags=re.I)
If it is a lone 'x' char replace with 'ecks'
re.sub(r'\bX\b', 'ecks', s, flags=re.I)
I would just use the following set of substitutions for this:
string = re.sub(r"\b[Xx]\b", "ecks", string)
string = re.sub(r"\b[Xx](?!\s)", "Z", string)
string = re.sub(r"(?<=\w)[Xx](?=\w)", "cks", string)
Here,
(?!\s)
just asserts that the regex does not match any whitespace character,
\b
Edit: The last regex would also match x or X at the beginning of a word. So we can use the following, instead,
(?<=\w)[xX](?=\w)
to make sure there must be a character \w before/after x or X.
How to check special symbols such as !?,(). in the words ending? For example Hello??? or Hello,, or Hello! returns True but H!??llo or Hel,lo returns False.
I know how to check the only last symbol of string but how to check if two or more last characters are symbols?
You may have to use regex for this.
import re
def checkword(word):
m = re.match("\w+[!?,().]+$", word)
if m is not None:
return True
return False
That regex is:
\w+ # one or more word characters (a-zA-z)
[!?,().]+ # one or more of the characters inside the brackets
# (this is called a character class)
$ # assert end of string
Using re.match forces the match to begin at the beginning of the string, or else we'd have to use ^ before the regular expression.
You can try something like this:
word = "Hello!"
def checkSym(word):
return word[-1] in "!?,()."
print(checkSym(word))
The result is:
True
Try giving different strings as input and check the results.
In case you want to find every symbol from the end of the string, you can use:
def symbolCount(word):
i = len(word)-1
c = 0
while word[i] in "!?,().":
c = c + 1
i = i - 1
return c
Testing it with word = "Hello!?.":
print(symbolCount(word))
The result is:
3
If you want to get a count of the 'special' characters at the end of a given string.
special = '!?,().'
s = 'Hello???'
count = 0
for c in s[::-1]:
if c in special:
count += 1
else:
break
print("Found {} special characters at the end of the string.".format(count))
You can use re.findall:
import re
s = "Hello???"
if re.findall('\W+$', s):
pass
You could try this.
string="gffrwr."
print(string[-1] in "!?,().")
I have a function isvowel that returns either True or False, depending on whether a character ch is a vowel.
def isvowel(ch):
if "aeiou".count(ch) >= 1:
return True
else:
return False
I want to know how to use that to get the index of the first occurrence of any vowel in a string. I want to be able to take the characters before the first vowel and add them end of the string. Of course, I can't do s.find(isvowel) because isvowel gives a boolean response. I need a way to look at each character, find the first vowel, and give the index of that vowel.
How should I go about doing this?
You can always try something like this:
import re
def first_vowel(s):
i = re.search("[aeiou]", s, re.IGNORECASE)
return -1 if i == None else i.start()
s = "hello world"
print first_vowel(s)
Or, if you don't want to use regular expressions:
def first_vowel(s):
for i in range(len(s)):
if isvowel(s[i].lower()):
return i
return -1
s = "hello world"
print first_vowel(s)
[isvowel(ch) for ch in string].index(True)
(ch for ch in string if isvowel(ch)).next()
or for just the index (as asked):
(index for ch, index in itertools.izip(string, itertools.count()) if isvowel(ch)).next()
This will create an iterator and only return the first vowel element. Warning: a string with no vowels will throw StopIteration, recommend handling that.
Here is my take:
>>> vowel_str = "aeiou"
>>> def isVowel(ch,string):
... if ch in vowel_str and ch in string:
... print string.index(ch)
... else:
... print "notfound"
...
>>> isVowel("a","hello")
not found
>>> isVowel("e","hello")
1
>>> isVowel("l","hello")
not found
>>> isVowel("o","hello")
4
Using next for a generator is quite efficient, it means you don't iterate though the entire string (once you have found a string).
first_vowel(word):
"index of first vowel in word, if no vowels in word return None"
return next( (i for i, ch in enumerate(word) if is_vowel(ch), None)
is_vowel(ch):
return ch in 'aeiou'
my_string = 'Bla bla'
vowels = 'aeyuioa'
def find(my_string):
for i in range(len(my_string)):
if my_string[i].lower() in vowels:
return i
break
print(find(my_string))