Having trouble adding a space after a period in a python string - python

I have to write a code to do 2 things:
Compress more than one occurrence of the space character into one.
Add a space after a period, if there isn't one.
For example:
input> This is weird.Indeed
output>This is weird. Indeed.
This is the code I wrote:
def correction(string):
list=[]
for i in string:
if i!=" ":
list.append(i)
elif i==" ":
k=i+1
if k==" ":
k=""
list.append(i)
s=' '.join(list)
return s
strn=input("Enter the string: ").split()
print (correction(strn))
This code takes any input by the user and removes all the extra spaces,but it's not adding the space after the period(I know why not,because of the split function it's taking the period and the next word with it as one word, I just can't figure how to fix it)
This is a code I found online:
import re
def correction2(string):
corstr = re.sub('\ +',' ',string)
final = re.sub('\.','. ',corstr)
return final
strn= ("This is as .Indeed")
print (correction2(strn))
The problem with this code is I can't take any input from the user. It is predefined in the program.
So can anyone suggest how to improve any of the two codes to do both the functions on ANY input by the user?

Is this what you desire?
import re
def corr(s):
return re.sub(r'\.(?! )', '. ', re.sub(r' +', ' ', s))
s = input("> ")
print(corr(s))
I've changed the regex to a lookahead pattern, take a look here.
Edit: explain Regex as requested in comment
re.sub() takes (at least) three arguments: The Regex search pattern, the replacement the matched pattern should be replaced with, and the string in which the replacement should be done.
What I'm doing here is two steps at once, I've been using the output of one function as input of another.
First, the inner re.sub(r' +', ' ', s) searches for multiple spaces (r' +') in s to replace them with single spaces. Then the outer re.sub(r'\.(?! )', '. ', ...) looks for periods without following space character to replace them with '. '. I'm using a negative lookahead pattern to match only sections, that don't match the specified lookahead pattern (a normal space character in this case). You may want to play around with this pattern, this may help understanding it better.
The r string prefix changes the string to a raw string where backslash-escaping is disabled. Unnecessary in this case, but it's a habit of mine to use raw strings with regular expressions.

For a more basic answer, without regex:
>>> def remove_doublespace(string):
... if ' ' not in string:
... return string
... return remove_doublespace(string.replace(' ',' '))
...
>>> remove_doublespace('hi there how are you.i am fine. '.replace('.', '. '))
'hi there how are you. i am fine. '

You try the following code:
>>> s = 'This is weird.Indeed'
>>> def correction(s):
res = re.sub('\s+$', '', re.sub('\s+', ' ', re.sub('\.', '. ', s)))
if res[-1] != '.':
res += '.'
return res
>>> print correction(s)
This is weird. Indeed.
>>> s=raw_input()
hee ss.dk
>>> s
'hee ss.dk'
>>> correction(s)
'hee ss. dk.'

Related

Python return if statement

Unclear on how to frame the following function correctly:
Creating a function that will take in a string and return the string in camel case without spaces (or pascal case if the first letter was already capital), removing special characters
text = "This-is_my_test_string,to-capitalize"
def to_camel_case(text):
# Return 1st letter of text + all letters after
return text[:1] + text.title()[1:].replace(i" ") if not i.isdigit()
# Output should be "ThisIsMyTestStringToCapitalize"
the "if" statement at the end isn't working out, and I wrote this somewhat experimentally, but with a syntax fix, could the logic work?
Providing the input string does not contain any spaces then you could do this:
from re import sub
def to_camel_case(text, pascal=False):
r = sub(r'[^a-zA-Z0-9]', ' ', text).title().replace(' ', '')
return r if pascal else r[0].lower() + r[1:]
ts = 'This-is_my_test_string,to-capitalize'
print(to_camel_case(ts, pascal=True))
print(to_camel_case(ts))
Output:
ThisIsMyTestStringToCapitalize
thisIsMyTestStringToCapitalize
Here is a short solution using regex. First it uses title() as you did, then the regex finds non-alphanumeric-characters and removes them, and finally we take the first character to handle pascal / camel case.
import re
def to_camel_case(s):
s1 = re.sub('[^a-zA-Z0-9]+', '', s.title())
return s[0] + s1[1:]
text = "this-is2_my_test_string,to-capitalize"
print(to_camel_case(text)) # ThisIsMyTestStringToCapitalize
The below should work for your example.
Splitting apart your example by anything that isn's alphanumeric or a space. Then capitalizing each word. Finally, returning the re-joined string.
import re
def to_camel_case(text):
words = re.split(r'[^a-zA-Z0-9\s]', text)
return "".join([word.capitalize() for word in words])
text_to_camelcase = "This-is_my_test_string,to-capitalize"
print(to_camel_case(text_to_camelcase))
use the split function to split between anything that is not a letter or a whitespace and the function .capitalize() to capitalize single words
import re
text_to_camelcase = "This-is_my_test_string,to-capitalize"
def to_camel_case(text):
split_text = re.split(r'[^a-zA-Z0-9\s]', text)
cap_string = ''
for word in split_text:
cap_word = word.capitalize()
cap_string += cap_word
return cap_string
print(to_camel_case(text_to_camelcase))

Trying to remove all punctuation characters from a string but everything I keep getting // left in

I am trying to write a function to remove all punctuation characters from a string. I've tried several permutations on translate, replace, strip, etc. My latest attempt uses a brute force approach:
def clean_lower(sample):
punct = list(string.punctuation)
for c in punct:
sample.replace(c, ' ')
return sample.split()
That gets rid of almost all of the punctuation but I'm left with // in front of one of the words. I can't seem to find any way to remove it. I've even tried explicitly replacing it with sample.replace('//', ' ').
What do I need to do?
using translate is the fastest way to remove punctuations, this will remove // too:
import string
s = "This is! a string, with. punctuations? //"
def clean_lower(s):
return s.translate(str.maketrans('', '', string.punctuation))
s = clean_lower(s)
print(s)
Use regular expressions
import re
def clean_lower(s):
return(re.sub(r'\W','',s))
Above function erases any symbols except underscore
Perhaps you should approach it from the perspective of what you want to keep:
For example:
import string
toKeep = set(string.ascii_letters + string.digits + " ")
toRemove = set(string.printable) - toKeep
cleanUp = str.maketrans('', '', "".join(toRemove))
usage:
s = "Hello! world of / and dice".translate(cleanUp)
# s will be 'Hello world of and dice'
as suggested by #jasonharper you need to redefine "sample" and it should work:
import string
sample='// Hello?) // World!'
print(sample)
punct=list(string.punctuation)
for c in punct:
sample=sample.replace(c,'')
print(sample.split())

Remove special character and apostrophe and unwanted space in string using re.sub function in python

import re
def removePunctuation(text):
return re.sub(r'[ \W,_,]+', ' ', text.lower()).lstrip()
print removePunctuation('Hi, 'you!')
print removePunctuation(' No's under_score!')
i want result :
hi you
nos under score
You may try this,
def removePunctuation(text):
return re.sub(r'^\s+|\s+$|[^A-Za-z\d\s]', '', text.lower())
or
Seems like you want to replace all the underscore with space and all the other special chars with an empty string.
>>> re.sub(r'^\s+|\s+$|[^A-Za-z\d\s]', '', " No's under_score!".lower().replace('_', ' '))
'nos under score'
>>> re.sub(r'^\s+|\s+$|[^A-Za-z\d\s]', '', " Hi, 'you!'".lower().replace('_', ' '))
'hi you'
Regex is a wonderful string manipulation tool, but within python it at times may be an overkill, and this particular example is one of its kind.
Python has some thought over neatly crafted string libraries that can do wonders without regex and for this example str.translate and unicode.translate is ideal
For Python 2.X
def removePunctuation(text):
from string import punctuation
return ' '.join(text.translate(None, punctuation))
For Unicode and Python 3.X
def removePunctuationU(text):
from string import punctuation
return u' '.join(text.translate({ord(c): None for c in punctuation}).split())

Remove characters from beginning and end or only end of line

I want to remove some symbols from a string using a regular expression, for example:
== (that occur both at the beginning and at the end of a line),
* (at the beginning of a line ONLY).
def some_func():
clean = re.sub(r'= {2,}', '', clean) #Removes 2 or more occurrences of = at the beg and at the end of a line.
clean = re.sub(r'^\* {1,}', '', clean) #Removes 1 or more occurrences of * at the beginning of a line.
What's wrong with my code? It seems like expressions are wrong. How do I remove a character/symbol if it's at the beginning or at the end of the line (with one or more occurrences)?
If you only want to remove characters from the beginning and the end, you could use the string.strip() method. This would give some code like this:
>>> s1 = '== foo bar =='
>>> s1.strip('=')
' foo bar '
>>> s2 = '* foo bar'
>>> s2.lstrip('*')
' foo bar'
The strip method removes the characters given in the argument from the beginning and the end of the string, ltrip removes them from only the beginning, and rstrip removes them only from the end.
If you really want to use a regular expression, they would look something like this:
clean = re.sub(r'(^={2,})|(={2,}$)', '', clean)
clean = re.sub(r'^\*+', '', clean)
But IMHO, using strip/lstrip/rstrip would be the most appropriate for what you want to do.
Edit: On Nick's suggestion, here is a solution that would do all this in one line:
clean = clean.lstrip('*').strip('= ')
(A common mistake is to think that these methods remove characters in the order they're given in the argument, in fact, the argument is just a sequence of characters to remove, whatever their order is, that's why the .strip('= ') would remove every '=' and ' ' from the beginning and the end, and not just the string '= '.)
You have extra spaces in your regexs. Even a space counts as a character.
r'^(?:\*|==)|==$'
First of all you should pay attention to the spaces before "{" ... those are meaningful so the quantifier in your example applies to the space.
To remove "=" (two or more) only at begin or end also you need a different regexp... for example
clean = re.sub(r'^(==+)?(.*?)(==+)?$', r'\2', s)
If you don't put either "^" or "$" the expression can match anywhere (i.e. even in the middle of the string).
And not substituting but keeping ? :
tu = ('======constellation==' , '==constant=====' ,
'=flower===' , '===bingo=' ,
'***seashore***' , '*winter*' ,
'====***conditions=**' , '=***trees====***' ,
'***=information***=' , '*=informative***==' )
import re
RE = '((===*)|\**)?(([^=]|=(?!=+\Z))+)'
pat = re.compile(RE)
for ch in tu:
print ch,' ',pat.match(ch).group(3)
Result:
======constellation== constellation
==constant===== constant
=flower=== =flower
===bingo= bingo=
***seashore*** seashore***
*winter* winter*
====***conditions=** ***conditions=**
=***trees====*** =***trees====***
***=information***= =information***=
*=informative***== =informative***
Do you want in fact
====***conditions=** to give conditions=** ?
***====hundred====*** to give hundred====*** ?
for the beginning ?**
I think that the following code will do the job:
tu = ('======constellation==' , '==constant=====' ,
'=flower===' , '===bingo=' ,
'***seashore***' , '*winter*' ,
'====***conditions=**' , '=***trees====***' ,
'***=information***=' , '*=informative***==' )
import re,codecs
with codecs.open('testu.txt', encoding='utf-8', mode='w') as f:
pat = re.compile('(?:==+|\*+)?(.*?)(?:==+)?\Z')
xam = max(map(len,tu)) + 3
res = '\n'.join(ch.ljust(xam) + pat.match(ch).group(1)
for ch in tu)
f.write(res)
print res
Where was my brain when I wrote the RE in my earlier post ??! O!O
Non greedy quantifier .*? before ==+\Z is the real solution.

A pythonic way to insert a space before capital letters

I've got a file whose format I'm altering via a python script. I have several camel cased strings in this file where I just want to insert a single space before the capital letter - so "WordWordWord" becomes "Word Word Word".
My limited regex experience just stalled out on me - can someone think of a decent regex to do this, or (better yet) is there a more pythonic way to do this that I'm missing?
You could try:
>>> re.sub(r"(\w)([A-Z])", r"\1 \2", "WordWordWord")
'Word Word Word'
If there are consecutive capitals, then Gregs result could
not be what you look for, since the \w consumes the caracter
in front of the captial letter to be replaced.
>>> re.sub(r"(\w)([A-Z])", r"\1 \2", "WordWordWWWWWWWord")
'Word Word WW WW WW Word'
A look-behind would solve this:
>>> re.sub(r"(?<=\w)([A-Z])", r" \1", "WordWordWWWWWWWord")
'Word Word W W W W W W Word'
Perhaps shorter:
>>> re.sub(r"\B([A-Z])", r" \1", "DoIThinkThisIsABetterAnswer?")
Have a look at my answer on .NET - How can you split a “caps” delimited string into an array?
Edit: Maybe better to include it here.
re.sub(r'([a-z](?=[A-Z])|[A-Z](?=[A-Z][a-z]))', r'\1 ', text)
For example:
"SimpleHTTPServer" => ["Simple", "HTTP", "Server"]
Maybe you would be interested in one-liner implementation without using regexp:
''.join(' ' + char if char.isupper() else char.strip() for char in text).strip()
With regexes you can do this:
re.sub('([A-Z])', r' \1', str)
Of course, that will only work for ASCII characters, if you want to do Unicode it's a whole new can of worms :-)
If you have acronyms, you probably do not want spaces between them. This two-stage regex will keep acronyms intact (and also treat punctuation and other non-uppercase letters as something to add a space on):
re_outer = re.compile(r'([^A-Z ])([A-Z])')
re_inner = re.compile(r'(?<!^)([A-Z])([^A-Z])')
re_outer.sub(r'\1 \2', re_inner.sub(r' \1\2', 'DaveIsAFKRightNow!Cool'))
The output will be: 'Dave Is AFK Right Now! Cool'
I agree that the regex solution is the easiest, but I wouldn't say it's the most pythonic.
How about:
text = 'WordWordWord'
new_text = ''
for i, letter in enumerate(text):
if i and letter.isupper():
new_text += ' '
new_text += letter
I think regexes are the way to go here, but just to give a pure python version without (hopefully) any of the problems ΤΖΩΤΖΙΟΥ has pointed out:
def splitCaps(s):
result = []
for ch, next in window(s+" ", 2):
result.append(ch)
if next.isupper() and not ch.isspace():
result.append(' ')
return ''.join(result)
window() is a utility function I use to operate on a sliding window of items, defined as:
import collections, itertools
def window(it, winsize, step=1):
it=iter(it) # Ensure we have an iterator
l=collections.deque(itertools.islice(it, winsize))
while 1: # Continue till StopIteration gets raised.
yield tuple(l)
for i in range(step):
l.append(it.next())
l.popleft()
To the old thread - wanted to try an option for one of my requirements. Of course the re.sub() is the cool solution, but also got a 1 liner if re module isn't (or shouldn't be) imported.
st = 'ThisIsTextStringToSplitWithSpace'
print(''.join([' '+ s if s.isupper() else s for s in st]).lstrip())

Categories

Resources