python - how to check if string contains whitespaces around it - python

How can I know if
word = " a "
has a whitespace around it (which it does here, but word is dynamic) and then, iff it does, then strip() it?

There is no point in checking before stripping. Just use str.strip(); it is safe to do so if there is no whitespace around the text:
word.strip()
If you really need to test, you could use str.startswith() and str.endswith() with tuples:
whitespace = tuple(' \n\r\t')
word.startswith(whitespace) or word.endswith(whitespace)
is true if there is whitespace at the start or end.

Can you not just strip() all words, will you not get the same result?

Compare the length before and after?
word = " a "
stripped = word.strip()
if len(word) != len(stripped):
return stripped
else:
return word

That may sound pretty silly, but as you're talking about only one space, why not do like this:
>>> a=" a "
>>> a[0]==' ' and a[-1]==' '
True
And then
if a and a[0]==' ' and a[-1]==' ':
#do the stuff you want to do

I would look into the startswith and endswith methods:
http://www.tutorialspoint.com/python/string_startswith.htm
http://www.tutorialspoint.com/python/string_endswith.htm
if word.startswith(' ') and word.endswith(' ') then
...

Related

Trying to remove all punctuation characters from a string but everything I keep getting // left in

I am trying to write a function to remove all punctuation characters from a string. I've tried several permutations on translate, replace, strip, etc. My latest attempt uses a brute force approach:
def clean_lower(sample):
punct = list(string.punctuation)
for c in punct:
sample.replace(c, ' ')
return sample.split()
That gets rid of almost all of the punctuation but I'm left with // in front of one of the words. I can't seem to find any way to remove it. I've even tried explicitly replacing it with sample.replace('//', ' ').
What do I need to do?
using translate is the fastest way to remove punctuations, this will remove // too:
import string
s = "This is! a string, with. punctuations? //"
def clean_lower(s):
return s.translate(str.maketrans('', '', string.punctuation))
s = clean_lower(s)
print(s)
Use regular expressions
import re
def clean_lower(s):
return(re.sub(r'\W','',s))
Above function erases any symbols except underscore
Perhaps you should approach it from the perspective of what you want to keep:
For example:
import string
toKeep = set(string.ascii_letters + string.digits + " ")
toRemove = set(string.printable) - toKeep
cleanUp = str.maketrans('', '', "".join(toRemove))
usage:
s = "Hello! world of / and dice".translate(cleanUp)
# s will be 'Hello world of and dice'
as suggested by #jasonharper you need to redefine "sample" and it should work:
import string
sample='// Hello?) // World!'
print(sample)
punct=list(string.punctuation)
for c in punct:
sample=sample.replace(c,'')
print(sample.split())

Removing White Space in a Palindrome String

I'm trying to modify the code to ignore white spaces in a Palindrome String. For example, the code should mark Do Geese See God as Palindrome. I've been trying to use .replace(" ", ""), but either an error pops up or the Palindrome is returned as False.
stk = Stack()
for i in range(len(sentence)):
stk.push(sentence[i])
for i in range(stk.size()):
stk.replace(" ","")
if sentence[i] != stk.pop():
return False;
return True;
I'm trying to not use stk.item[-1] or stk.item == stk.item[::-1] in Stack, by the way.
I would recommend just stripping all whitespace with a RegEx:
input = " A man a plan a canal Panama "
input = re.sub(r'\s+', '', input)
This would also handle the cases of leading and trailing whitespace, which you probably also want to ignore.
It may be a cut and paste error but you don't actually have a space in what you are replacing.
stk.replace(' ', '')
I've modified the code to make this work:
def isPalindrome(sentence):
sentence = sentence.replace (' '. '')
stk = Stack()
for i in range(len(sentence)):
stk.push(sentence[i])
for i in range(stk.size()):
stk.replace(' ','')
if sentence[i] != stk.pop():
return False;
return True;

Having trouble adding a space after a period in a python string

I have to write a code to do 2 things:
Compress more than one occurrence of the space character into one.
Add a space after a period, if there isn't one.
For example:
input> This is weird.Indeed
output>This is weird. Indeed.
This is the code I wrote:
def correction(string):
list=[]
for i in string:
if i!=" ":
list.append(i)
elif i==" ":
k=i+1
if k==" ":
k=""
list.append(i)
s=' '.join(list)
return s
strn=input("Enter the string: ").split()
print (correction(strn))
This code takes any input by the user and removes all the extra spaces,but it's not adding the space after the period(I know why not,because of the split function it's taking the period and the next word with it as one word, I just can't figure how to fix it)
This is a code I found online:
import re
def correction2(string):
corstr = re.sub('\ +',' ',string)
final = re.sub('\.','. ',corstr)
return final
strn= ("This is as .Indeed")
print (correction2(strn))
The problem with this code is I can't take any input from the user. It is predefined in the program.
So can anyone suggest how to improve any of the two codes to do both the functions on ANY input by the user?
Is this what you desire?
import re
def corr(s):
return re.sub(r'\.(?! )', '. ', re.sub(r' +', ' ', s))
s = input("> ")
print(corr(s))
I've changed the regex to a lookahead pattern, take a look here.
Edit: explain Regex as requested in comment
re.sub() takes (at least) three arguments: The Regex search pattern, the replacement the matched pattern should be replaced with, and the string in which the replacement should be done.
What I'm doing here is two steps at once, I've been using the output of one function as input of another.
First, the inner re.sub(r' +', ' ', s) searches for multiple spaces (r' +') in s to replace them with single spaces. Then the outer re.sub(r'\.(?! )', '. ', ...) looks for periods without following space character to replace them with '. '. I'm using a negative lookahead pattern to match only sections, that don't match the specified lookahead pattern (a normal space character in this case). You may want to play around with this pattern, this may help understanding it better.
The r string prefix changes the string to a raw string where backslash-escaping is disabled. Unnecessary in this case, but it's a habit of mine to use raw strings with regular expressions.
For a more basic answer, without regex:
>>> def remove_doublespace(string):
... if ' ' not in string:
... return string
... return remove_doublespace(string.replace(' ',' '))
...
>>> remove_doublespace('hi there how are you.i am fine. '.replace('.', '. '))
'hi there how are you. i am fine. '
You try the following code:
>>> s = 'This is weird.Indeed'
>>> def correction(s):
res = re.sub('\s+$', '', re.sub('\s+', ' ', re.sub('\.', '. ', s)))
if res[-1] != '.':
res += '.'
return res
>>> print correction(s)
This is weird. Indeed.
>>> s=raw_input()
hee ss.dk
>>> s
'hee ss.dk'
>>> correction(s)
'hee ss. dk.'

Removing \n from myFile

I am trying to create a dictionary of list that the key is the anagrams and the value(list) contains all the possible words out of that anagrams.
So my dict should contain something like this
{'aaelnprt': ['parental', 'paternal', 'prenatal'], ailrv': ['rival']}
The possible words are inside a .txt file. Where every word is separated by a newline. Example
Sad
Dad
Fruit
Pizza
Which leads to a problem when I try to code it.
with open ("word_list.txt") as myFile:
for word in myFile:
if word[0] == "v": ##Interested in only word starting with "v"
word_sorted = ''.join(sorted(word)) ##Get the anagram
for keys in list(dictonary.keys()):
if keys == word_sorted: ##Heres the problem, it doesn't get inside here as theres extra characters in <word_sorted> possible "\n" due to the linebreak of myfi
print(word_sorted)
dictonary[word_sorted].append(word)
If every word in "word_list.txt" is followed by '\n' then you can just use slicing to get rid of the last char of the word.
word_sorted = ''.join(sorted(word[:-1]))
But if the last word in "word_list.txt" isn't followed by '\n', then you should use rstrip().
word_sorted = ''.join(sorted(word.rstrip()))
The slice method is slightly more efficient, but for this application I doubt you'll notice the difference, so you might as well just play safe & use rstrip().
Use rstrip(), it removes the \n character.
...
...
keys == word_sorted.rstrip()
...
You should try to use the .rstrip() function in your code, it will remove the "\n"
Here you can check it .rstrip()
strip only removes characters from the beginning or end of a string.
Use rstrip() to remove \n character
Also you can use replace syntax, to replace newline with something else.
str2 = str.replace("\n", "")
So, I see a few problems here, how is anything getting into the dictionary, I see no assignments? Obviously you've only provided us a snippet, so maybe that's elsewhere.
You're also using a loop when you could be using in (it's more efficient, truly it is).
with open ("word_list.txt") as myFile:
for word in myFile:
if word[0] == "v": ##Interested in only word starting with "v"
word_sorted = ''.join(sorted(word.rstrip())) ##Get the anagram
if word_sorted in dictionary:
print(word_sorted)
dictionary[word_sorted].append(word)
else:
# The case where we don't find an anagram in our dict
dictionary[word_sorted] = [word,]

A pythonic way to insert a space before capital letters

I've got a file whose format I'm altering via a python script. I have several camel cased strings in this file where I just want to insert a single space before the capital letter - so "WordWordWord" becomes "Word Word Word".
My limited regex experience just stalled out on me - can someone think of a decent regex to do this, or (better yet) is there a more pythonic way to do this that I'm missing?
You could try:
>>> re.sub(r"(\w)([A-Z])", r"\1 \2", "WordWordWord")
'Word Word Word'
If there are consecutive capitals, then Gregs result could
not be what you look for, since the \w consumes the caracter
in front of the captial letter to be replaced.
>>> re.sub(r"(\w)([A-Z])", r"\1 \2", "WordWordWWWWWWWord")
'Word Word WW WW WW Word'
A look-behind would solve this:
>>> re.sub(r"(?<=\w)([A-Z])", r" \1", "WordWordWWWWWWWord")
'Word Word W W W W W W Word'
Perhaps shorter:
>>> re.sub(r"\B([A-Z])", r" \1", "DoIThinkThisIsABetterAnswer?")
Have a look at my answer on .NET - How can you split a “caps” delimited string into an array?
Edit: Maybe better to include it here.
re.sub(r'([a-z](?=[A-Z])|[A-Z](?=[A-Z][a-z]))', r'\1 ', text)
For example:
"SimpleHTTPServer" => ["Simple", "HTTP", "Server"]
Maybe you would be interested in one-liner implementation without using regexp:
''.join(' ' + char if char.isupper() else char.strip() for char in text).strip()
With regexes you can do this:
re.sub('([A-Z])', r' \1', str)
Of course, that will only work for ASCII characters, if you want to do Unicode it's a whole new can of worms :-)
If you have acronyms, you probably do not want spaces between them. This two-stage regex will keep acronyms intact (and also treat punctuation and other non-uppercase letters as something to add a space on):
re_outer = re.compile(r'([^A-Z ])([A-Z])')
re_inner = re.compile(r'(?<!^)([A-Z])([^A-Z])')
re_outer.sub(r'\1 \2', re_inner.sub(r' \1\2', 'DaveIsAFKRightNow!Cool'))
The output will be: 'Dave Is AFK Right Now! Cool'
I agree that the regex solution is the easiest, but I wouldn't say it's the most pythonic.
How about:
text = 'WordWordWord'
new_text = ''
for i, letter in enumerate(text):
if i and letter.isupper():
new_text += ' '
new_text += letter
I think regexes are the way to go here, but just to give a pure python version without (hopefully) any of the problems ΤΖΩΤΖΙΟΥ has pointed out:
def splitCaps(s):
result = []
for ch, next in window(s+" ", 2):
result.append(ch)
if next.isupper() and not ch.isspace():
result.append(' ')
return ''.join(result)
window() is a utility function I use to operate on a sliding window of items, defined as:
import collections, itertools
def window(it, winsize, step=1):
it=iter(it) # Ensure we have an iterator
l=collections.deque(itertools.islice(it, winsize))
while 1: # Continue till StopIteration gets raised.
yield tuple(l)
for i in range(step):
l.append(it.next())
l.popleft()
To the old thread - wanted to try an option for one of my requirements. Of course the re.sub() is the cool solution, but also got a 1 liner if re module isn't (or shouldn't be) imported.
st = 'ThisIsTextStringToSplitWithSpace'
print(''.join([' '+ s if s.isupper() else s for s in st]).lstrip())

Categories

Resources