How to remove whitespaces in a string except from between certain elements - python

I have a string similar to (the below one is simplified):
" word= {his or her} whatever "
I want to delete every whitespace except between {}, so that my modified string will be:
"word={his or her}whatever"
lstrip or rstrip doesn't work of course. If I delete all whitespaces the whitespaces between {} are deleted as well. I tried to look up solutions for limiting the replace function to certain areas but even if I found out it I haven't been able to implement it. There are some stuff with regex (I am not sure if they are relevant here) but I haven't been able to understand them.
EDIT: If I wanted to except the area between, say {} and "", that is:
if I wanted to turn this string:
" word= {his or her} and "his or her" whatever "
into this:
"word={his or her}and"his or her"whatever"
What would I change
re.sub(r'\s+(?![^{]*})', '', list_name) into?

See instead going arround re you can replace uisng string.replace. Which will be much more easier and less complex when you playing around strings. Espacillay when you have multiple substitutions you end up bigger regex.
st =" word= {his or her} whatever "
st2=""" word= {his or her} and "his or her" whatever """
new = " ".join(st2.split())
new = new.replace("= ", "=").replace("} ", "}").replace('" ' , '"').replace(' "' , '"')
print(new)
Some outputs
Example 1 output
word={his or her}whatever
Example 2 output
word={his or her}and"his or her"whatever

You can use by replace
def remove(string):
return string.replace(" ", "")
string = 'hell o whatever'
print(remove(string)) // Output: hellowhatever

Related

What is the simplest way to capitalize the first word in a sentence for multiple sentences in python 3.7?

For my homework I have tried to get the first word of each sentence to capitalize.
This is for python 3.7.
def fix_cap():
if "." in initialInput:
sentsplit = initialInput.split(". ")
capsent = [x.capitalize() for x in sentsplit]
joinsent = ". ".join(capsent)
print("Number of words capitalized: " + str(len(sentsplit)))
print("Edited text: " + joinsent)
elif "!" in initialInput:
sentsplit = initialInput.split("! ")
capsent = [x.capitalize() for x in sentsplit]
joinsent = "! ".join(capsent)
print("Number of words capitalized: " + str(len(sentsplit)))
print("Edited text: " + joinsent)
elif "?" in initialInput:
sentsplit = initialInput.split("? ")
capsent = [x.capitalize() for x in sentsplit]
joinsent = "? ".join(capsent)
print("Number of words capitalized: " + str(len(sentsplit)))
print("Edited text: " + joinsent)
else:
print(initialInput.capitalize())
This will work if only one type of punctuation is used, but I would like it to work with multiple types in a paragraph.
Correctly splitting a text into sentences is hard. For how to do this correctly also for cases like e.g. abbreviations, names with titles etc., please refer to other questions on this site, e.g. this one. This is only a very simple version, based on your conditions, which, I assume, will suffice for your task.
As you noticed, your code only works for one type of punctuation, because of the if/elif/else construct. But you do not need that at all! If e.g. there is no ? in the text, then split("? ") will just return the text as a whole (wrapped in a list). You could just remove the conditions, or iterate a list of possible sentence-ending punctuation. However, note that capitalize will not just upper-case the first letter, but also lower-case all the rest, e.g. names, acronyms, or words previously capitalized for a different type of punctuation. Instead, you could just upper the first char and keep the rest.
text = "text with. multiple types? of sentences! more stuff."
for sep in (". ", "? ", "! "):
text = sep.join(s[0].upper() + s[1:] for s in text.split(sep))
print(text)
# Text with. Multiple types? Of sentences! More stuff.
You could also use a regular expression to split by all sentence separators at once. This way, you might even be ablt to use capitalize, although it will still lower-case names and acronyms.
import re
>>> ''.join(s.capitalize() for s in re.split(r"([\?\!\.] )", text))
'Text with. Multiple types? Of sentences! More stuff.'
Or using re.sub with a look-behind (note the first char is still lower-case):
>>> re.sub(r"(?<=[\?\!\.] ).", lambda m: m.group().upper(), text)
'text with. Multiple types? Of sentences! More stuff.'
However, unless you know what those are doing, I'd suggest going with the first loop-based version.

Python inserting spaces in string

Alright, I'm working on a little project for school, a 6-frame translator. I won't go into too much detail, I'll just describe what I wanted to add.
The normal output would be something like:
TTCPTISPALGLAWS_DLGTLGFMSYSANTASGETLVSLYQLGLFEM_VVSYGRTKYYLICP_LFHLSVGFVPSD
The important part of this string are the M and the _ (the start and stop codons, biology stuff). What I wanted to do was highlight these like so:
TTCPTISPALGLAWS_DLGTLGF 'MSYSANTASGETLVSLYQLGLFEM_' VVSYGRTKYYLICP_LFHLSVGFVPSD
Now here is where (for me) it gets tricky, I got my output to look like this (adding a space and a ' to highlight the start and stop). But it only does this once, for the first start and stop it finds. If there are any other M....._ combinations it won't highlight them.
Here is my current code, attempting to make it highlight more than once:
def start_stop(translation):
index_2 = 0
while True:
if 'M' in translation[index_2::1]:
index_1 = translation[index_2::1].find('M')
index_2 = translation[index_1::1].find('_') + index_1
new_translation = translation[:index_1] + " '" + \
translation[index_1:index_2 + 1] + "' " +\
translation[index_2 + 1:]
else:
break
return new_translation
I really thought this would do it, guess not. So now I find myself being stuck.
If any of you are willing to try and help, here is a randomly generated string with more than one M....._ set:
'TTCPTISPALGLAWS_DLGTLGFMSYSANTASGETLVSLYQLGLFEM_VVSYGRTKYYLICP_LFHLSVGFVPSDGRRLTLYMPPARRLATKSRFLTPVISSG_DKPRHNPVARSQFLNPLVRPNYSISASKSGLRLVLSYTRLSLGINSLPIERLQYSVPAPAQITP_IPEHGNARNFLPEWPRLLISEPAPSVNVPCSVFVVDPEHPKAHSKPDGIANRLTFRWRLIG_VFFHNAL_VITHGYSRVDILLPVSRALHVHLSKSLLLRSAWFTLRNTRVTGKPQTSKT_FDPKATRVHAIDACAE_QQH_PDSGLRFPAPGSCSEAIRQLMI'
Thank you to anyone willing to help :)
Regular expressions are pretty handy here:
import re
sequence = "TTCP...."
highlighted = re.sub(r"(M\w*?_)", r" '\1' ", sequence)
# Output:
"TTCPTISPALGLAWS_DLGTLGF 'MSYSANTASGETLVSLYQLGLFEM_' VVSYGRTKYYLICP_LFHLSVGFVPSDGRRLTLY 'MPPARRLATKSRFLTPVISSG_' DKPRHNPVARSQFLNPLVRPNYSISASKSGLRLVLSYTRLSLGINSLPIERLQYSVPAPAQITP_IPEHGNARNFLPEWPRLLISEPAPSVNVPCSVFVVDPEHPKAHSKPDGIANRLTFRWRLIG_VFFHNAL_VITHGYSRVDILLPVSRALHVHLSKSLLLRSAWFTLRNTRVTGKPQTSKT_FDPKATRVHAIDACAE_QQH_PDSGLRFPAPGSCSEAIRQLMI"
Regex explanation:
We look for an M followed by any number of "word characters" \w* then an _, using the ? to make it a non-greedy match (otherwise it would just make one group from the first M to the last _).
The replacement is the matched group (\1 indicates "first group", there's only one), but surrounded by spaces and quotes.
You just require little slice of 'slice' module , you don't need any external module :
Python string have a method called 'index' just use it.
string_1='TTCPTISPALGLAWS_DLGTLGFMSYSANTASGETLVSLYQLGLFEM_VVSYGRTKYYLICP_LFHLSVGFVPSD'
before=string_1.index('M')
after=string_1[before:].index('_')
print('{} {} {}'.format(string_1[:before],string_1[before:before+after+1],string_1[before+after+1:]))
output:
TTCPTISPALGLAWS_DLGTLGF MSYSANTASGETLVSLYQLGLFEM_ VVSYGRTKYYLICP_LFHLSVGFVPSD

Python - Print Each Sentence On New Line

Per the subject, I'm trying to print each sentence in a string on a new line. With the current code and output shown below, what's the syntax to return "Correct Output" shown below?
Code
sentence = 'I am sorry Dave. I cannot let you do that.'
def format_sentence(sentence):
sentenceSplit = sentence.split(".")
for s in sentenceSplit:
print s + "."
Output
I am sorry Dave.
I cannot let you do that.
.
None
Correct Output
I am sorry Dave.
I cannot let you do that.
You can do this :
def format_sentence(sentence) :
sentenceSplit = filter(None, sentence.split("."))
for s in sentenceSplit :
print s.strip() + "."
There are some issues with your implementation. First, as Jarvis points out in his answer, if your delimiter is the first or last character in your string or if two delimiter characters are right next to each other, None will be inserted into your array. To fix this, you need to filter out the None values. Also, instead of using the + operator, use formatting instead.
def format_sentence(sentences):
sentences_split = filter(None, sentences.split('.'))
for s in sentences_split:
print '{0}.'.format(s.strip())
You can split the string by ". " instead of ".", then print each line with an additional "." until the last one, which will have a "." already.
def format_sentence(sentence):
sentenceSplit = sentence.split(". ")
for s in sentenceSplit[:-1]:
print s + "."
print sentenceSplit[-1]
Try:
def format_sentence(sentence):
print(sentence.replace('. ', '.\n'))

Python: Remove words from a given string

I'm quite new to programming (and this is my first post to stackoverflow) however am finding this problem quite difficult. I am supposed to remove a given string in this case (WUB) and replace it with a space. For example: song_decoder(WUBWUBAWUBWUBWUBBWUBC) would give the output: A B C. From other questions on this forums I was able to establish that I need to replace "WUB" and to remove whitespace use a split/join. Here is my code:
def song_decoder(song):
song.replace("WUB", " ")
return " ".join(song.split())
I am not sure where I am going wrong with this as I the error of WUB should be replaced by 1 space: 'AWUBBWUBC' should equal 'A B C' after running the code. Any help or pointing me in the right direction would be appreciated.
You're close! str.replace() does not work "in-place"; it returns a new string that has had the requested replacement performed on it.
Return a copy of the string with all occurrences of substring old replaced by new. If the optional argument count is given, only the first count occurrences are replaced.
Do this instead:
def song_decoder(song):
song = song.replace("WUB", " ")
return " ".join(song.split())
For example:
In [14]: song_decoder("BWUBWUBFF")
Out[14]: 'B FF'
Strings are immutable in Python. So changing a string (like you try to do with the "replace" function) does not change your variable "song". It rather creates a new string which you immediately throw away by not assigning it to something. You could do
def song_decoder(song):
result = song.replace("WUB", " ") # replace "WUB" with " "
result = result.split() # split string at whitespaces producing a list
result = " ".join(result) # create string by concatenating list elements around " "s
return result
or, to make it shorter (one could also call it less readable) you can
def song_decoder(song):
return " ".join(song.replace("WUB", " ").split())
Do the both steps in a single line.
def song_decoder(song):
return ' '.join(song.replace('WUB',' ').split())
Result
In [95]: song_decoder("WUBWUBAWUBWUBWUBBWUBC")
Out[95]: 'A B C'

Multiple Re.Subs Python

Sorry, but I can't figure this out from the Python documentation or any of the stuff I've found from Google.
So, I've been working on renaming files with code 99% from one of the awesome helpers here at StackOverflow.
I'm working on putting together a renaming script that (and this is what I got help with from someone here) works with the name (not the extension).
I'm sure I'll come up with more replacements, but my problem at the moment is that I can't figure out how to do more than one re.sub.
Current Code (Replaces dots with spaces):
import os, shutil, re
def rename_file (original_filename):
name, extension = os.path.splitext(original_filename)
#Remove Spare Dots
modified_name = re.sub("\.", r" ", name)
new_filename = modified_name + extension
try:
# moves files or directories (recursively)
shutil.move(original_filename, new_filename)
except shutil.Error:
print ("Couldn't rename file %(original_filename)s!" % locals())
[rename_file(f) for f in os.listdir('.') if not f.startswith('.')]
Hoping to also
re.sub("C126", "Perception", name)
re.sub("Geo1", "Geography", name)
Also, it'd be awesome if I could have it capitalize the first letter of any word except "and|if"
I tried
modified_name = re.sub("\.", r" ", name) && re.sub(...
but that didn't work; neither did putting them on different lines. How do I do all the subs and stuff I want to do/make?
Just operate over the same string over and over again, replacing it each time:
name = re.sub(r"\.", r" ", name)
name = re.sub(r"C126", "Perception", name)
name = re.sub(r"Geo1", "Geography", name)
#DanielRoseman is right though, these are literal patterns that don't need regexes to be described/found/replaced. You can use timeit to demonstrate plain old replace() is preferrable:
In [16]: timeit.timeit("test.replace('asdf','0000')",setup="test='asdfASDF1234'*10")
Out[16]: 1.0641241073608398
In [17]: timeit.timeit("re.sub(r'asdf','0000',test)",setup="import re; test='asdfASDF1234'*10")
Out[17]: 6.126996994018555
Well, each time you call re.sub(), it returns a new, changed string. So, if you want to keep modifying each new string, keep assigning the new strings to the same variable name. Essentially, don't think of yourself as modifying the same string over and over - instead, think of yourself as modifying a new String each time.
Example: If you're using the string "lol.Geo1",
newString = re.sub("\.", r" ", originalString)
will return the string "lol Geo1", and assign it to newString. Now, if you want to change that new string, do your next substitution and it will return another string, which you can put to "newString" again -
newString = re.sub("Geo1", "Geography", newString)
Now, newString evaluates to "lol Geography". With each substitution, you are creating a new string, not the same one. That's why
modified_name = re.sub("\.", r" ", name) && re.sub(...
didn't work - "re.sub(".", r" ", name)" will return one string, "re.sub(...)" will return another string, etc., etc. - each of those strings only having their individual substitution on the original string, like this:
modified_name = "lol Geo1" && "lol.Geography"...
So, to get it to work, follow the other poster's suggestions - just keep repeating the assignment with each substitution you want, assigning the substituted newString to itself, until you've finished all of your substitutions.
Hopefully, that's a clear explanation. Feel free to ask questions. :)
modified_name = re.sub("\.", r" ", name)
modified_name = re.sub("C126", "Perception", modified_name)
modified_name = re.sub("Geo1", "Geography", modified_name)
Pass the output of one as the input of the next.
you could also stack one over the other, but this consider less readable
name = re.sub(r"Geo1", "Geography",
re.sub(r"C126", "Perception",
re.sub(r"\.", r" ", name)
)
)
Just put a vertical bar :)
name = re.sub(r"\.|C126|Geo1", r" ", name)

Categories

Resources