I have the next list of sentences:
list_of_sentense = ['Hi how are you?', 'I am good', 'Great!', 'I am doing good,', 'Good.']
I want to convert it into:
['Hi how are you?', 'I am good.', 'Great!', 'I am doing good.', 'Good.']
So I need to insert a period only if a sentence doesn't end with '?', '!' or '.'. Also if a sentence ends with a comma I need to change it into a period.
My code is here:
list_of_sentense_fixed = []
for i in range(len(list_of_sentense)):
b = list_of_sentense[i]
b = b + '.' if (not b.endswith('.')) or (not b.endswith('!')) or (not b.endswith('?')) else b
list_of_sentense_fixed.append(b)
But it doesn't work properly.
Just define a function to fix one sentence, then use list comprehension to construct a new list from the old:
def fix_sentence(str):
if str == "": # Don't change empty strings.
return str
if str[-1] in ["?", ".", "!"]: # Don't change if already okay.
return str
if str[-1] == ",": # Change trailing ',' to '.'.
return str[:-1] + "."
return str + "." # Otherwise, add '.'.
orig_sentences = ['Hi how are you?', 'I am good', 'Great!', 'I am doing good,', 'Good.']
fixed_sentences = [fix_sentence(item) for item in orig_sentences]
print(fixed_sentences)
This outputs, as requested:
['Hi how are you?', 'I am good.', 'Great!', 'I am doing good.', 'Good.']
With a separate function, you can just improve fix_sentence() if/when new rules need to be added.
For example, being able to handle empty strings so that you don't get an exception when trying to extract the last character from them, as per the first two lines of the function.
According to De Morgan's laws, you should change to:
b = b + '.' if (not b.endswith('.')) and (not b.endswith('!')) and (not b.endswith('?')) else b
You can simplify to:
b = b + '.' if b and b[-1] not in ('.', '!', '?') else b
Related
I have a python challenge that if given a string with '_' or '-' in between each word such as the_big_red_apple or the-big-red-apple to convert it to camel case. Also if the first word is uppercase keep it as uppercase. This is my code. Im not allowed to use the re library in the challenge however but I didn't know how else to do it.
from re import sub
def to_camel_case(text):
if text[0].isupper():
text = sub(r"(_|-)+"," ", text).title().replace(" ", "")
else:
text = sub(r"(_|-)+"," ", text).title().replace(" ", "")
text = text[0].lower() + text[1:]
return print(text)
Word delimiters can be - dash or _ underscore.
Let's simplify, making them all underscores:
text = text.replace('-', '_')
Now we can break out words:
words = text.split('_')
With that in hand it's simple to put them back together:
text = ''.join(map(str.capitalize, words))
or more verbosely, with a generator expression,
assign ''.join(word.capitalize() for word in words).
I leave "finesse the 1st character"
as an exercise to the reader.
If you RTFM you'll find it contains a wealth of knowledge.
https://docs.python.org/3/library/re.html#raw-string-notation
'+'
Causes the resulting RE to match 1 or more repetitions of the preceding RE. ab+ will match ‘a’ followed by any non-zero number of ‘b’s
The effect of + is turn both
db_rows_read and
db__rows_read
into DbRowsRead.
Also,
Raw string notation (r"text") keeps regular expressions sane.
The regex in your question doesn't exactly
need a raw string, as it has no crazy
punctuation like \ backwhacks.
But it's a very good habit to always put
a regex in an r-string, Just In Case.
You never know when code maintenance
will tack on additional elements,
and who wants a subtle regex bug on their hands?
You can try it like this :
def to_camel_case(text):
s = text.replace("-", " ").replace("_", " ")
s = s.split()
if len(text) == 0:
return text
return s[0] + ''.join(i.capitalize() for i in s[1:])
print(to_camel_case('momo_es-es'))
the output of print(to_camel_case('momo_es-es')) is momoEsEs
r"..." refers to Raw String in Python which simply means treating backlash \ as literal instead of escape character.
And (_|-)[+] is a Regular Expression that match the string containing one or more - or _ characters.
(_|-) means matching the string that contains - or _.
+ means matching the above character (- or _) than occur one or more times in the string.
In case you cannot use re library for this solution:
def to_camel_case(text):
# Since delimiters can have 2 possible answers, let's simplify it to one.
# In this case, I replace all `_` characters with `-`, to make sure we have only one delimiter.
text = text.replace("_", "-") # the_big-red_apple => the-big-red-apple
# Next, we should split our text into words in order for us to iterate through and modify it later.
words = text.split("-") # the-big-red-apple => ["the", "big", "red", "apple"]
# Now, for each word (except the first word) we have to turn its first character to uppercase.
for i in range(1, len(words)):
# `i`start from 1, which means the first word IS NOT INCLUDED in this loop.
word = words[i]
# word[1:] means the rest of the characters except the first one
# (e.g. w = "apple" => w[1:] = "pple")
words[i] = word[0].upper() + word[1:].lower()
# you can also use Python built-in method for this:
# words[i] = word.capitalize()
# After this loop, ["the", "big", "red", "apple"] => ["the", "Big", "Red", "Apple"]
# Finally, we put the words back together and return it
# ["the", "Big", "Red", "Apple"] => theBigRedApple
return "".join(words)
print(to_camel_case("the_big-red_apple"))
Try this:
First, replace all the delimiters into a single one, i.e. str.replace('_', '-')
Split the string on the str.split('-') standardized delimiter
Capitalize each string in list, i.e. str.capitilize()
Join the capitalize string with str.join
>>> s = "the_big_red_apple"
>>> s.replace('_', '-').split('-')
['the', 'big', 'red', 'apple']
>>> ''.join(map(str.capitalize, s.replace('_', '-').split('-')))
'TheBigRedApple'
>> ''.join(word.capitalize() for word in s.replace('_', '-').split('-'))
'TheBigRedApple'
If you need to lowercase the first char, then:
>>> camel_mile = lambda x: x[0].lower() + x[1:]
>>> s = 'TheBigRedApple'
>>> camel_mile(s)
'theBigRedApple'
Alternative,
First replace all delimiters to space str.replace('_', ' ')
Titlecase the string str.title()
Remove space from string, i.e. str.replace(' ', '')
>>> s = "the_big_red_apple"
>>> s.replace('_', ' ').title().replace(' ', '')
'TheBigRedApple'
Another alternative,
Iterate through the characters and then keep a pointer/note on previous character, i.e. for prev, curr in zip(s, s[1:])
check if the previous character is one of your delimiter, if so, uppercase the current character, i.e. curr.upper() if prev in ['-', '_'] else curr
skip whitepace characters, i.e. if curr != " "
Then add the first character in lowercase, [s[0].lower()]
>>> chars = [s[0].lower()] + [curr.upper() if prev in ['-', '_'] else curr for prev, curr in zip(s, s[1:]) if curr != " "]
>>> "".join(chars)
'theBigRedApple'
Yet another alternative,
Replace/Normalize all delimiters into a single one, s.replace('-', '_')
Convert it into a list of chars, list(s.replace('-', '_'))
While there is still '_' in the list of chars, keep
find the position of the next '_'
replacing the character after '_' with its uppercase
replacing the '_' with ''
>>> s = 'the_big_red_apple'
>>> s_list = list(s.replace('-', '_'))
>>> while '_' in s_list:
... where_underscore = s_list.index('_')
... s_list[where_underscore+1] = s_list[where_underscore+1].upper()
... s_list[where_underscore] = ""
...
>>> "".join(s_list)
'theBigRedApple'
or
>>> s = 'the_big_red_apple'
>>> s_list = list(s.replace('-', '_'))
>>> while '_' in s_list:
... where_underscore = s_list.index('_')
... s_list[where_underscore:where_underscore+2] = ["", s_list[where_underscore+1].upper()]
...
>>> "".join(s_list)
'theBigRedApple'
Note: Why do we need to convert the string to list of chars? Cos strings are immutable, 'str' object does not support item assignment
BTW, the regex solution can make use of some group catching, e.g.
>>> import re
>>> s = "the_big_red_apple"
>>> upper_regex_group = lambda x: x.group(1).upper()
>>> re.sub("[_|-](\w)", upper_regex_group, s)
'theBigRedApple'
>>> re.sub("[_|-](\w)", lambda x: x.group(1).upper(), s)
'theBigRedApple'
Currently, I am creating a code where a random song will be selected and the first letters of each word shall be printed, along with the artist's name, and the user has to try and guess the song name. I want to be able to make this code to account for punctuation like apostrophes.
Ex:-
Currently output - "I__ A B_______ - Smash Mouth"
Expected output - "I'_ A B_______ - Smash Mouth".
Could anyone let me know of a simple way to do this?
My current code is as follows:
print(' '.join(x[0] + '_' * (len(x) - 1) for x in string.split()))
You can use regular expressions to replace letters only. Use sub to replace letters with '_':
>>> import re
>>> s = "I'm A Believer"
>>> print(' '.join(x[0] + re.sub("[a-zA-Z]", '_', x[1:]) for x in s.split()))
I'_ A B_______
A more interesting example:
>>> s = "Hello, I love you! - Don't you?"
>>> print(' '.join(x[0] + re.sub("[a-zA-Z]", '_', x[1:]) for x in s.split()))
H____, I l___ y__! - D__'_ y__?
Using re you could match all
letters [a-zA-Z]
which are not preceded by a space (?<! )
The latter is called a negative lookbehind assertion:
import re
s = "I'm A Believer"
s[0] + re.sub(r'(?<! )[a-zA-Z]', '.', s[1:])
# I'. A B.......
I am trying to find a single exact word within a large string.
I have tried the below:
for word in words:
if word in strings:
best.append("The word " + word + " The Sentence " + strings)
else:
pass
This seemed to work at first until tried with a larger set of words in a much larger string and was getting partial matches. As an example if the word is "me" it would pass "message" off as being found.
Is there a way of searching for exactly "me"?
Thanks in advance.
You need to set boundaries in order to find complete word. I'd go to regex. Something like:
re.search(r'\b' + word_to_find + r'\b')
You can split the string into words and then perform the in operation, making sure you strip the words in the list and the string of any trailing whitespaces
import string
def find_words(words, s):
best = []
#Strip extra whitespaces if any around the word and make them all lowercase
modified_words = [word.strip().lower() for word in words]
#Strip away punctuations from string, and make it lower
modified_s = s.translate(str.maketrans('', '', string.punctuation))
words_list = [word.strip().lower() for word in modified_s.lower().split()]
#Iterate through the list
for idx, word in enumerate(modified_words):
#If word is found in lit of words, append to result
if word in words_list:
best.append("The word " + words[idx] + " The Sentence " + s)
return best
print(find_words(['me', 'message'], 'I me myself'))
print(find_words([' me ', 'message'], 'I me myself'))
print(find_words(['me', 'message'], 'I me myself'))
print(find_words(['me', 'message'], 'I am me.'))
print(find_words(['me', 'message'], 'I am ME.'))
print(find_words(['Me', 'message'], 'I am ME.'))
The output will be
['The word me The Sentence I me myself']
['The word me The Sentence I me myself']
['The word me The Sentence I me myself']
['The word me The Sentence I am me.']
['The word me The Sentence I am ME.']
['The word Me The Sentence I am ME.']
You can also use regex to find the word exactly. \\b means boundary like space or punctuation marks.
for word in words:
if len(re.findall("\\b" + word + "\\b", strings)) > 0:
best.append("The word " + word + " The Sentence " + strings)
else:
pass
The double backslashes are due to a '\b' character being the backspace control sequence. Source
You could include the surrounding spaces in the if statement.
for word in words:
if f' {word} ' in strings:
best.append("The word " + word + " The Sentence " + strings)
else:
pass
To make sure you don't detect words inside words that they are contained within (like "me" in "message" or "flame") is to add spaces before and after the words in the detection. The easiest way of doing this is to replace
if word in strings:
with
if " "+word+" " in strings:
Hope this helps! -Theo
You need to set boundaries for your search, \b is the boundary character.
import re
string = 'youyou message me me me me me'
print(re.findall(r'\bme\b', string))
The string has message and me, we only need me explicitly. So added boundaries in my search expression. The result is below -
['me', 'me', 'me', 'me', 'me']
Got all the me(s), but not the message which also has a me in it.
Without knowing the rest of the code, the best I could suggest is using == to get a direct match, so for example
a = 0
list = ["Me","Hello","Message"]
b = len(list)
i = input("What do you want to find?")
for d in range(b):
if list[a] == i:
print("Found a match")
else:
a = a+1
I have a very messy data I am noticing patterns where ever there is '\n' end of the element, it needs to be merged with single element before that.
sample list:
ls = ['hello','world \n','my name','is john \n','How are you?','I am \n doing well']
ls
return/tryouts:
print([s for s in ls if "\n" in s[-1]])
>>> ['world \n', 'is john \n'] # gave elements that ends with \n
How do I get it elements that ends with '\n' merge with 1 before element? Looking for a output like this one:
['hello world \n', 'my name is john \n', 'How are you?','I am \n doing well']
If you are reducing a list, maybe, one readable approach is to use reduce function.
functools.reduce(func, iter, [initial_value]) cumulatively performs an operation on all the iterable’s elements and, therefore, can’t be applied to infinite iterables.
First of all, you need a kind of struck to accumulate results, I use a tuple with two elements: buffer with concatenated strings until I found "\n" and the list of results. See initial struct (1).
ls = ['hello','world \n','my name','is john \n','How are you?','I am \n doing well']
def combine(x,y):
if y.endswith('\n'):
return ( "", x[1]+[x[0]+" "+y] ) #<-- buffer to list
else:
return ( x[0]+" "+y, x[1] ) #<-- on buffer
t=reduce( combine, ls, ("",[]) ) #<-- see initial struct (1)
t[1]+[t[0]] if t[0] else t[1] #<-- add buffer if not empty
Result:
['hello world \n', 'my name is john \n', 'How are you? ', 'I am \n doing well ']
(1) Explained initial struct: you use a tuple to store buffer string until \n and a list of already cooked strings:
("",[])
Means:
("__ buffer string not yet added to list __", [ __result list ___ ] )
I wrote this out so it is simple to understand instead of trying to make it more complex as a list comprehension.
This will work for any number of words until you hit a \n character and clean up the remainder of your input as well.
ls_out = [] # your outgoing ls
out = '' # keeps your words to use
for i in range(0, len(ls)):
if '\n' in ls[i]: # check for the ending word, if so, add it to output and reset
out += ls[i]
ls_out.append(out)
out = ''
else: # otherwise add to your current word list
out += ls[i]
if out: # check for remaining words in out if total ls doesn't end with \n
ls_out.append(out)
You may need to add spaces when you string concatenate but I am guessing that it is just with your example. If you do, make this edit:
out += ' ' + ls[i]
Edit:
If you want to only grab the one before and not multiple before, you could do this:
ls_out = []
for i in range(0, len(ls)):
if ls[i].endswith('\n'): # check ending only
if not ls[i-1].endswith('\n'): # check previous string
out = ls[i-1] + ' ' + ls[i] # concatenate together
else:
out = ls[i] # this one does, previous didn't
elif ls[i+1].endswith('\n'): # next one will grab this so skip
continue
else:
out = ls[i] # next one won't so add this one in
ls_out.append(out)
You can solve it using the regex expression using the 're' module.
import re
ls = ['hello','world \n','my name','is john \n','How are you?','I am \n doing well']
new_ls = []
for i in range(len(ls)):
concat_word = '' # reset the concat word to ''
if re.search(r"\n$", str(ls[i])): # matching the \n at the end of the word
try:
concat_word = str(ls[i-1]) + ' ' + str(ls[i]) # appending to the previous word
except:
concat_word = str(ls[i]) # in case if the first word in the list has \n
new_ls.append(concat_word)
elif re.search(r'\n',str(ls[i])): # matching the \n anywhere in the word
concat_word = str(ls[i])
new_ls.extend([str(ls[i-1]), concat_word]) # keeps the word before the "anywhere" match separate
print(new_ls)
This returns the output
['hello world \n', 'my name is john \n', 'How are you?', 'I am \n doing well']
Assuming the first element doesn't end with \n and all words are longer than 2 characters:
res = []
for el in ls:
if el[-2:] == "\n":
res[-1] = res[-1] + el
else:
res.append(el)
Try this:
lst=[]
for i in range(len(ls)):
if "\n" in ls[i][-1]:
lst.append((ls[i-1] + ' ' + ls[i]))
lst.remove(ls[i-1])
else:
lst.append(ls[i])
lst
Result:
['hello world \n', 'my name is john \n', 'How are you?', 'I am \n doing well']
How do I remove leading and trailing whitespace from a string in Python?
" Hello world " --> "Hello world"
" Hello world" --> "Hello world"
"Hello world " --> "Hello world"
"Hello world" --> "Hello world"
To remove all whitespace surrounding a string, use .strip(). Examples:
>>> ' Hello '.strip()
'Hello'
>>> ' Hello'.strip()
'Hello'
>>> 'Bob has a cat'.strip()
'Bob has a cat'
>>> ' Hello '.strip() # ALL consecutive spaces at both ends removed
'Hello'
Note that str.strip() removes all whitespace characters, including tabs and newlines. To remove only spaces, specify the specific character to remove as an argument to strip:
>>> " Hello\n ".strip(" ")
'Hello\n'
To remove only one space at most:
def strip_one_space(s):
if s.endswith(" "): s = s[:-1]
if s.startswith(" "): s = s[1:]
return s
>>> strip_one_space(" Hello ")
' Hello'
As pointed out in answers above
my_string.strip()
will remove all the leading and trailing whitespace characters such as \n, \r, \t, \f, space .
For more flexibility use the following
Removes only leading whitespace chars: my_string.lstrip()
Removes only trailing whitespace chars: my_string.rstrip()
Removes specific whitespace chars: my_string.strip('\n') or my_string.lstrip('\n\r') or my_string.rstrip('\n\t') and so on.
More details are available in the docs.
strip is not limited to whitespace characters either:
# remove all leading/trailing commas, periods and hyphens
title = title.strip(',.-')
This will remove all leading and trailing whitespace in myString:
myString.strip()
You want strip():
myphrases = [" Hello ", " Hello", "Hello ", "Bob has a cat"]
for phrase in myphrases:
print(phrase.strip())
This can also be done with a regular expression
import re
input = " Hello "
output = re.sub(r'^\s+|\s+$', '', input)
# output = 'Hello'
Well seeing this thread as a beginner got my head spinning. Hence came up with a simple shortcut.
Though str.strip() works to remove leading & trailing spaces it does nothing for spaces between characters.
words=input("Enter the word to test")
# If I have a user enter discontinous threads it becomes a problem
# input = " he llo, ho w are y ou "
n=words.strip()
print(n)
# output "he llo, ho w are y ou" - only leading & trailing spaces are removed
Instead use str.replace() to make more sense plus less error & more to the point.
The following code can generalize the use of str.replace()
def whitespace(words):
r=words.replace(' ','') # removes all whitespace
n=r.replace(',','|') # other uses of replace
return n
def run():
words=input("Enter the word to test") # take user input
m=whitespace(words) #encase the def in run() to imporve usability on various functions
o=m.count('f') # for testing
return m,o
print(run())
output- ('hello|howareyou', 0)
Can be helpful while inheriting the same in diff. functions.
In order to remove "Whitespace" which causes plenty of indentation errors when running your finished code or programs in Pyhton. Just do the following;obviously if Python keeps telling that the error(s) is indentation in line 1,2,3,4,5, etc..., just fix that line back and forth.
However, if you still get problems about the program that are related to typing mistakes, operators, etc, make sure you read why error Python is yelling at you:
The first thing to check is that you have your
indentation right. If you do, then check to see if you have
mixed tabs with spaces in your code.
Remember: the code
will look fine (to you), but the interpreter refuses to run it. If
you suspect this, a quick fix is to bring your code into an
IDLE edit window, then choose Edit..."Select All from the
menu system, before choosing Format..."Untabify Region.
If you’ve mixed tabs with spaces, this will convert all your
tabs to spaces in one go (and fix any indentation issues).
I could not find a solution to what I was looking for so I created some custom functions. You can try them out.
def cleansed(s: str):
""":param s: String to be cleansed"""
assert s is not (None or "")
# return trimmed(s.replace('"', '').replace("'", ""))
return trimmed(s)
def trimmed(s: str):
""":param s: String to be cleansed"""
assert s is not (None or "")
ss = trim_start_and_end(s).replace(' ', ' ')
while ' ' in ss:
ss = ss.replace(' ', ' ')
return ss
def trim_start_and_end(s: str):
""":param s: String to be cleansed"""
assert s is not (None or "")
return trim_start(trim_end(s))
def trim_start(s: str):
""":param s: String to be cleansed"""
assert s is not (None or "")
chars = []
for c in s:
if c is not ' ' or len(chars) > 0:
chars.append(c)
return "".join(chars).lower()
def trim_end(s: str):
""":param s: String to be cleansed"""
assert s is not (None or "")
chars = []
for c in reversed(s):
if c is not ' ' or len(chars) > 0:
chars.append(c)
return "".join(reversed(chars)).lower()
s1 = ' b Beer '
s2 = 'Beer b '
s3 = ' Beer b '
s4 = ' bread butter Beer b '
cdd = trim_start(s1)
cddd = trim_end(s2)
clean1 = cleansed(s3)
clean2 = cleansed(s4)
print("\nStr: {0} Len: {1} Cleansed: {2} Len: {3}".format(s1, len(s1), cdd, len(cdd)))
print("\nStr: {0} Len: {1} Cleansed: {2} Len: {3}".format(s2, len(s2), cddd, len(cddd)))
print("\nStr: {0} Len: {1} Cleansed: {2} Len: {3}".format(s3, len(s3), clean1, len(clean1)))
print("\nStr: {0} Len: {1} Cleansed: {2} Len: {3}".format(s4, len(s4), clean2, len(clean2)))
If you want to trim specified number of spaces from left and right, you could do this:
def remove_outer_spaces(text, num_of_leading, num_of_trailing):
text = list(text)
for i in range(num_of_leading):
if text[i] == " ":
text[i] = ""
else:
break
for i in range(1, num_of_trailing+1):
if text[-i] == " ":
text[-i] = ""
else:
break
return ''.join(text)
txt1 = " MY name is "
print(remove_outer_spaces(txt1, 1, 1)) # result is: " MY name is "
print(remove_outer_spaces(txt1, 2, 3)) # result is: " MY name is "
print(remove_outer_spaces(txt1, 6, 8)) # result is: "MY name is"
How do I remove leading and trailing whitespace from a string in Python?
So below solution will remove leading and trailing whitespaces as well as intermediate whitespaces too. Like if you need to get a clear string values without multiple whitespaces.
>>> str_1 = ' Hello World'
>>> print(' '.join(str_1.split()))
Hello World
>>>
>>>
>>> str_2 = ' Hello World'
>>> print(' '.join(str_2.split()))
Hello World
>>>
>>>
>>> str_3 = 'Hello World '
>>> print(' '.join(str_3.split()))
Hello World
>>>
>>>
>>> str_4 = 'Hello World '
>>> print(' '.join(str_4.split()))
Hello World
>>>
>>>
>>> str_5 = ' Hello World '
>>> print(' '.join(str_5.split()))
Hello World
>>>
>>>
>>> str_6 = ' Hello World '
>>> print(' '.join(str_6.split()))
Hello World
>>>
>>>
>>> str_7 = 'Hello World'
>>> print(' '.join(str_7.split()))
Hello World
As you can see this will remove all the multiple whitespace in the string(output is Hello World for all). Location doesn't matter. But if you really need leading and trailing whitespaces, then strip() would be find.
One way is to use the .strip() method (removing all surrounding whitespaces)
str = " Hello World "
str = str.strip()
**result: str = "Hello World"**
Note that .strip() returns a copy of the string and doesn't change the underline object (since strings are immutable).
Should you wish to remove all whitespace (not only trimming the edges):
str = ' abcd efgh ijk '
str = str.replace(' ', '')
**result: str = 'abcdefghijk'
I wanted to remove the too-much spaces in a string (also in between the string, not only in the beginning or end). I made this, because I don't know how to do it otherwise:
string = "Name : David Account: 1234 Another thing: something "
ready = False
while ready == False:
pos = string.find(" ")
if pos != -1:
string = string.replace(" "," ")
else:
ready = True
print(string)
This replaces double spaces in one space until you have no double spaces any more