Split on separators not giving exact output - python

def split_on_separators(word, separators):
"""
Return a list of non-empty, non-blank strings from the original string
determined by splitting the string on any of the separators.
separators is a string of single-character separators.
>>> split_on_separators("Wow! Fantastic, you're done.", "!,")
['Wow', ' Fantastic', " you're done."]
"""
word_list = []
for ch in word.split():
stripped = ch.strip(separators)
word_list.append(stripped)
return word_list
#output
['Wow', 'Fantastic', "you're", 'done.']
The separators are removed but I can't seem to get the white space in front of the F. Secondly 'you're done.' is not in a single string
Any Help would be greatly appreciated :)
I'm using python 3

One solution could be:
def split_on_separators(word, separators):
word_list = [word]
auxList = []
for sep in separators:
for w in word_list:
auxList.extend(w.split(sep))
word_list = auxList
auxList = list()
return word_list
Out[76]: ['Wow', ' Fantastic', " you're done."]

This should do the job:
def split_on_separators(word, separators):
for sep in separators:
word = word.replace(sep, '^#^')
return [x.strip() for x in word.split('^#^')]
The ^#^ is just a placeholder. I made it a weird character combination to make sure it doesn't appear in a normal sentence. You can replace it, if you want.

Another crazy solution:
import itertools
def split_on_separators(word, separators):
groups = itertools.groupby(word, lambda char: char in separators)
return [''.join(letters) for is_sep, letters in groups if not is_sep]
For the case where two separators can be adjacent (and you want to represent it as an empty word):
import itertools
def split_on_separators(word, separators):
groups = itertools.groupby(word, lambda char: char in separators)
seps2words = lambda letters: [''] * (len(tuple(letters)) - 1)
return [word for is_sep, letters in groups
for word in ([''.join(letters)] if not is_sep else seps2words(letters))]

You're splitting prior to stripping it. Splitting is done based on whitespace between words; therefore, you won't see it in your final output. That is also the reason "you're done" is not a single string. It splits it based on white space between the words.
You might try a delimiter for the split:
str.split([sep[, maxsplit]]) Return a list of the words in the string,
using sep as the delimiter string.

Related

Python challenge to convert a string to camelCase

I have a python challenge that if given a string with '_' or '-' in between each word such as the_big_red_apple or the-big-red-apple to convert it to camel case. Also if the first word is uppercase keep it as uppercase. This is my code. Im not allowed to use the re library in the challenge however but I didn't know how else to do it.
from re import sub
def to_camel_case(text):
if text[0].isupper():
text = sub(r"(_|-)+"," ", text).title().replace(" ", "")
else:
text = sub(r"(_|-)+"," ", text).title().replace(" ", "")
text = text[0].lower() + text[1:]
return print(text)
Word delimiters can be - dash or _ underscore.
Let's simplify, making them all underscores:
text = text.replace('-', '_')
Now we can break out words:
words = text.split('_')
With that in hand it's simple to put them back together:
text = ''.join(map(str.capitalize, words))
or more verbosely, with a generator expression,
assign ''.join(word.capitalize() for word in words).
I leave "finesse the 1st character"
as an exercise to the reader.
If you RTFM you'll find it contains a wealth of knowledge.
https://docs.python.org/3/library/re.html#raw-string-notation
'+'
Causes the resulting RE to match 1 or more repetitions of the preceding RE. ab+ will match ‘a’ followed by any non-zero number of ‘b’s
The effect of + is turn both
db_rows_read and
db__rows_read
into DbRowsRead.
Also,
Raw string notation (r"text") keeps regular expressions sane.
The regex in your question doesn't exactly
need a raw string, as it has no crazy
punctuation like \ backwhacks.
But it's a very good habit to always put
a regex in an r-string, Just In Case.
You never know when code maintenance
will tack on additional elements,
and who wants a subtle regex bug on their hands?
You can try it like this :
def to_camel_case(text):
s = text.replace("-", " ").replace("_", " ")
s = s.split()
if len(text) == 0:
return text
return s[0] + ''.join(i.capitalize() for i in s[1:])
print(to_camel_case('momo_es-es'))
the output of print(to_camel_case('momo_es-es')) is momoEsEs
r"..." refers to Raw String in Python which simply means treating backlash \ as literal instead of escape character.
And (_|-)[+] is a Regular Expression that match the string containing one or more - or _ characters.
(_|-) means matching the string that contains - or _.
+ means matching the above character (- or _) than occur one or more times in the string.
In case you cannot use re library for this solution:
def to_camel_case(text):
# Since delimiters can have 2 possible answers, let's simplify it to one.
# In this case, I replace all `_` characters with `-`, to make sure we have only one delimiter.
text = text.replace("_", "-") # the_big-red_apple => the-big-red-apple
# Next, we should split our text into words in order for us to iterate through and modify it later.
words = text.split("-") # the-big-red-apple => ["the", "big", "red", "apple"]
# Now, for each word (except the first word) we have to turn its first character to uppercase.
for i in range(1, len(words)):
# `i`start from 1, which means the first word IS NOT INCLUDED in this loop.
word = words[i]
# word[1:] means the rest of the characters except the first one
# (e.g. w = "apple" => w[1:] = "pple")
words[i] = word[0].upper() + word[1:].lower()
# you can also use Python built-in method for this:
# words[i] = word.capitalize()
# After this loop, ["the", "big", "red", "apple"] => ["the", "Big", "Red", "Apple"]
# Finally, we put the words back together and return it
# ["the", "Big", "Red", "Apple"] => theBigRedApple
return "".join(words)
print(to_camel_case("the_big-red_apple"))
Try this:
First, replace all the delimiters into a single one, i.e. str.replace('_', '-')
Split the string on the str.split('-') standardized delimiter
Capitalize each string in list, i.e. str.capitilize()
Join the capitalize string with str.join
>>> s = "the_big_red_apple"
>>> s.replace('_', '-').split('-')
['the', 'big', 'red', 'apple']
>>> ''.join(map(str.capitalize, s.replace('_', '-').split('-')))
'TheBigRedApple'
>> ''.join(word.capitalize() for word in s.replace('_', '-').split('-'))
'TheBigRedApple'
If you need to lowercase the first char, then:
>>> camel_mile = lambda x: x[0].lower() + x[1:]
>>> s = 'TheBigRedApple'
>>> camel_mile(s)
'theBigRedApple'
Alternative,
First replace all delimiters to space str.replace('_', ' ')
Titlecase the string str.title()
Remove space from string, i.e. str.replace(' ', '')
>>> s = "the_big_red_apple"
>>> s.replace('_', ' ').title().replace(' ', '')
'TheBigRedApple'
Another alternative,
Iterate through the characters and then keep a pointer/note on previous character, i.e. for prev, curr in zip(s, s[1:])
check if the previous character is one of your delimiter, if so, uppercase the current character, i.e. curr.upper() if prev in ['-', '_'] else curr
skip whitepace characters, i.e. if curr != " "
Then add the first character in lowercase, [s[0].lower()]
>>> chars = [s[0].lower()] + [curr.upper() if prev in ['-', '_'] else curr for prev, curr in zip(s, s[1:]) if curr != " "]
>>> "".join(chars)
'theBigRedApple'
Yet another alternative,
Replace/Normalize all delimiters into a single one, s.replace('-', '_')
Convert it into a list of chars, list(s.replace('-', '_'))
While there is still '_' in the list of chars, keep
find the position of the next '_'
replacing the character after '_' with its uppercase
replacing the '_' with ''
>>> s = 'the_big_red_apple'
>>> s_list = list(s.replace('-', '_'))
>>> while '_' in s_list:
... where_underscore = s_list.index('_')
... s_list[where_underscore+1] = s_list[where_underscore+1].upper()
... s_list[where_underscore] = ""
...
>>> "".join(s_list)
'theBigRedApple'
or
>>> s = 'the_big_red_apple'
>>> s_list = list(s.replace('-', '_'))
>>> while '_' in s_list:
... where_underscore = s_list.index('_')
... s_list[where_underscore:where_underscore+2] = ["", s_list[where_underscore+1].upper()]
...
>>> "".join(s_list)
'theBigRedApple'
Note: Why do we need to convert the string to list of chars? Cos strings are immutable, 'str' object does not support item assignment
BTW, the regex solution can make use of some group catching, e.g.
>>> import re
>>> s = "the_big_red_apple"
>>> upper_regex_group = lambda x: x.group(1).upper()
>>> re.sub("[_|-](\w)", upper_regex_group, s)
'theBigRedApple'
>>> re.sub("[_|-](\w)", lambda x: x.group(1).upper(), s)
'theBigRedApple'

move initial letter to the end of the word and add arg while punctuation to be in the end of word

I need to take the initial letter of every word, moving it to the end of the word and adding 'arg'. For such I tried the following way
def pirate(str):
list_str = str.split(' ')
print(list_str)
new_str = ''
for lstr in list_str:
first_element = lstr[0]
second_element = lstr[1:]
new_str += second_element + first_element + 'arg' + ' '
return new_str
print(pirate('Hello! how are, you!!'))
The expected output is: elloHarg! owharg reaarg, ouyarg!!
However, I am getting following output: ello!Harg owharg re,aarg ou!!yarg
How can I make it work the following usecase?
Punctuations should remain at the end of the word even after translation. Assume Punctuations wont appear after than end of the word. Punctuations to be considered are .,:;?! There could be multiple punctuations present (e.g yes!!)
Here is a short and efficient solution using a regex:
import re
re.sub(r'(\w)(\w+)', r'\2\1arg', 'Hello! how are, you!!')
This is literally: replace each single letter followed by more letters by the more letters first, then the single letter and 'arg'
Output:
'elloHarg! owharg reaarg, ouyarg!!'
As a function:
def pirate(s):
return re.sub(r'(\w)(\w+)', r'\2\1arg', s)

How to solve the string indices must be integers problem in a for loop for capitalizing every word in a string

I hope everyone is safe.
I am trying to go over a string and capitalize every first letter of the string.
I know I can use .title() but
a) I want to figure out how to use capitalize or something else in this case - basics, and
b) The strings in the tests, have some words with (') which makes .title() confused and capitalize the letter after the (').
def to_jaden_case(string):
appended_string = ''
word = len(string.split())
for word in string:
new_word = string[word].capitalize()
appended_string +=str(new_word)
return appended_string
The problem is the interpreter gives me "TypeError: string indices must be integers" even tho I have an integer input in 'word'. Any help?
thanks!
You are doing some strange things in the code.
First, you split the string just to count the number of words, but don't store it to manipulate the words after that.
Second, when iterating a string with a for in, what you get are the characters of the string, not the words.
I have made a small snippet to help you do what you desire:
def first_letter_of_word_upper(string, exclusions=["a", "the"]):
words = string.split()
for i, w in enumerate(words):
if w not in exclusions:
words[i] = w[0].upper() + w[1:]
return " ".join(words)
test = first_letter_of_word_upper("miguel angelo santos bicudo")
test2 = first_letter_of_word_upper("doing a bunch of things", ["a", "of"])
print(test)
print(test2)
Notes:
I assigned the value of the string splitting to a variable to use it in the loop
As a bonus, I included a list to allow you exclude words that you don't want to capitalize.
I use the original same array of split words to build the result... and then join based on that array. This a way to do it efficiently.
Also, I show some useful Python tricks... first is enumerate(iterable) that returns tuples (i, j) where i is the positional index, and j is the value at that position. Second, I use w[1:] to get a substring of the current word that starts at character index 1 and goes all the way to the end of the string. Ah, and also the usage of optional parameters in the list of arguments of the function... really useful things to learn! If you didn't know them already. =)
You have a logical error in your code:
You have used word = len(string.split()) which is of no use ,Also there is an issue in the for loop logic.
Try this below :
def to_jaden_case(string):
appended_string = ''
word_list = string.split()
for i in range(len(word_list)):
new_word = word_list[i].capitalize()
appended_string += str(new_word) + " "
return appended_string
from re import findall
def capitalize_words(string):
words = findall(r'\w+[\']*\w+', string)
for word in words:
string = string.replace(word, word.capitalize())
return string
This just grabs all the words in the string, then replaces the words in the original string, the characters inside the [ ] will be included in the word aswell
You are using string index to access another string word is a string you are accessing word using string[word] this causing the error.
def to_jaden_case(string):
appended_string = ''
for word in string.split():
new_word = word.capitalize()
appended_string += new_word
return appended_string
Simple solution using map()
def to_jaden_case(string):
return ' '.join(map(str.capitalize, string.split()))
In for word in string: word will iterate over the characters in string. What you want to do is something like this:
def to_jaden_case(string):
appended_string = ''
splitted_string = string.split()
for word in splitted_string:
new_word = word.capitalize()
appended_string += new_word
return appended_string
The output for to_jaden_case("abc def ghi") is now "AbcDefGhi", this is CammelCase. I suppose you actually want this: "Abc Def Ghi". To achieve that, you must do:
def to_jaden_case(string):
appended_string = ''
splitted_string = string.split()
for word in splitted_string:
new_word = word.capitalize()
appended_string += new_word + " "
return appended_string[:-1] # removes the last space.
Look, in your code word is a character of string, it is not index, therefore you can't use string[word], you can correct this problem by modifying your loop or using word instead of string[word]
So your rectified code will be:
def to_jaden_case(string):
appended_string = ''
for word in range(len(string)):
new_word = string[word].capitalize()
appended_string +=str(new_word)
return appended_string
Here I Changed The Third Line for word in string with for word in len(string), the counterpart give you index of each character and you can use them!
Also I removed the split line, because it's unnecessary and you can do it on for loop like len(string)

String Index Out of Range Issue - Python

I am trying to make a lossy text compression program that removes all vowels from the input, except for if the vowel is the first letter of a word. I keep getting this "string index out of range" error on line 6. Please help!
text = str(input('Message: '))
text = (' ' + text)
for i in range(0, len(text)):
i = i + 1
if str(text[i-1]) != ' ': #LINE 6
text = text.replace('a', '')
text = text.replace('e', '')
text = text.replace('i', '')
text = text.replace('o', '')
text = text.replace('u', '')
print(text)
As busybear notes, the loop isn't necessary: your replacements don't depend on i.
Here's how I'd do it:
def strip_vowels(s): # Remove all vowels from a string
for v in 'aeiou':
s = s.replace(v, '')
return s
def compress_word(s):
if not s: return '' # Needed to avoid an out-of-range error on the empty string
return s[0] + strip_vowels(s[1:]) # Strip vowels from all but the first letter
def compress_text(s): # Apply to each word
words = text.split(' ')
new_words = compress_word(w) for w in words
return ' '.join(new_words)
When you replace letters with a blank, your word gets shorter. So what was originally len(text) is going to be out of bounds if you remove any letters. Do note however, replace is replacing all occurrences within your string, so a loop isn't even necessary.
An alternative to use the loop is to just keep track of the index of letters to replace while going through the loop, then replace after the loop is complete.
Shortening your string length by replacing any char with "" means that if you remove a character, len(text) used in your iterator is longer than the actual string length. There are plenty of alternative solutions. for example,
text_list = list(text)
for i in range(1, len(text_list)):
if text_list[i] in "aeiou":
text_list[i] = ""
text = "".join(text_list)
By turning your string into a list of its composite characters, you can remove characters but maintain the list length (since empty elements are allowed) then rejoin them.
Be sure to account for special cases, such as len(text)<2.

Removing words containing digits from a given string

I'm trying to write a simple program that removes all words containing digits from a received string.
Here is my current implementation:
import re
def checkio(text):
text = text.replace(",", " ").replace(".", " ") .replace("!", " ").replace("?", " ").lower()
counter = 0
words = text.split()
print words
for each in words:
if bool(re.search(r'\d', each)):
words.remove(each)
print words
checkio("1a4 4ad, d89dfsfaj.")
However, when I execute this program, I get the following output:
['1a4', '4ad', 'd89dfsfaj']
['4ad']
I can't figure out why '4ad' is printed in the second line as it contains digits and should have been removed from the list. Any ideas?
Assuming that your regular expression does what you want, you can do this to avoid removing while iterating.
import re
def checkio(text):
text = re.sub('[,\.\?\!]', ' ', text).lower()
words = [w for w in text.split() if not re.search(r'\d', w)]
print words ## prints [] in this case
Also, note that I simplified your text = text.replace(...) line.
Additionally, if you do not need to reuse your text variable, you can use regex to split it directly.
import re
def checkio(text):
words = [w for w in re.split('[,.?!]', text.lower()) if w and not re.search(r'\d', w)]
print words ## prints [] in this case
If you are testing for alpha numeric strings why not use isalnum() instead of regex ?
In [1695]: x = ['1a4', '4ad', 'd89dfsfaj']
In [1696]: [word for word in x if not word.isalnum()]
Out[1696]: []
This would be possible through using re.sub, re.search and list_comprehension.
>>> import re
>>> def checkio(s):
print([i for i in re.sub(r'[.,!?]', '', s.lower()).split() if not re.search(r'\d', i)])
>>> checkio("1a4 4ad, d89dfsfaj.")
[]
>>> checkio("1a4 ?ad, d89dfsfaj.")
['ad']
So apparently what happens is a concurrent access error. Namely - you are deleting an element while traversing the array.
At the first iteration we have words = ['1a4', '4ad', 'd89dfsfaj']. Since '1a4' has a number, we remove it.
Now, words = ['4ad','d89dfsfaj']. However, at the second iteration, the current word is now 'd89dfsfaj' and we remove it. What happens is that we skip '4ad', because it is now at index 0 and the current pointer for the for cycle is at 1.

Categories

Resources