Related
I am trying to remove special characters from each element in a string. The below code does count the elements but i can't get the .isalpha to remove the non alphabetical elements. Is anyone able to assist? Thank you in advance.
input = 'Hello, Goodbye hello hello! bye byebye hello?'
word_list = input.split()
for word in word_list:
if word.isalpha()==False:
word[:-1]
di = dict()
for word in word_list:
di[word] = di.get(word,0)+1
di
It seems you are expecting word[:-1] to remove the last character of word and have that change reflected in the list word_list. However, you have assigned the string in word_list to a new variable called word and therefore the change won't be reflected in the list itself.
A simple fix would be to create a new list and append values into that. Note that your original string is called input which shadows the builtin input() function which is not a good idea:
input_string = 'Hello, Goodbye hello hello! bye byebye hello?'
word_list = input_string.split()
new = []
for word in word_list:
if word.isalpha() == False:
new.append(word[:-1])
else:
new.append(word)
di = dict()
for word in new:
di[word] = di.get(word,0)+1
print(di)
# {'byebye': 1, 'bye': 1, 'Hello': 1, 'Goodbye': 1, 'hello': 3}
You could also remove the second for loop and use collections.Counter instead:
from collections import Counter
print(Counter(new))
You are nearly there with your for loop. The main stumbling block seems to be that word[:-1] on its own does nothing, you need to store that data somewhere. For example, by appending to a list.
You also need to specify what happens to strings which don't need modifying. I'm also not sure what purpose the dictionary serves.
So here's your for loop re-written:
mystring = 'Hello, Goodbye hello hello! bye byebye hello?'
word_list = mystring.split()
res = []
for word in word_list:
if not word.isalpha():
res.append(word[:-1])
else:
res.append(word)
mystring_out = ' '.join(res) # 'Hello Goodbye hello hello bye byebye hello'
The idiomatic way to write the above is via feeding a list comprehension to str.join:
mystring_out = ' '.join([word[:-1] if not word.isalpha() else word \
for word in mystring.split()])
It goes without saying that this assumes word.isalpha() returns False due to an unwanted character at the end of a string, and that this is the only scenario you want to consider for special characters.
One solution using re:
In [1]: import re
In [2]: a = 'Hello, Goodbye hello hello! bye byebye hello?'
In [3]: ' '.join([i for i in re.split(r'[^A-Za-z]', a) if i])
Out[3]: 'Hello Goodbye hello hello bye byebye hello'
Say I have strings,
string1 = 'Hello how are you'
string2 = 'are you doing now?'
The result should be something like
Hello how are you doing now?
I was thinking different ways using re and string search.
(Longest common substring problem)
But is there any simple way (or library) that does this in python?
To make things clear i'll add one more set of test strings!
string1 = 'This is a nice ACADEMY'
string2 = 'DEMY you know!'
the result would be!,
'This is a nice ACADEMY you know!'
This should do:
string1 = 'Hello how are you'
string2 = 'are you doing now?'
i = 0
while not string2.startswith(string1[i:]):
i += 1
sFinal = string1[:i] + string2
OUTPUT :
>>> sFinal
'Hello how are you doing now?'
or, make it a function so that you can use it again without rewriting:
def merge(s1, s2):
i = 0
while not s2.startswith(s1[i:]):
i += 1
return s1[:i] + s2
OUTPUT :
>>> merge('Hello how are you', 'are you doing now?')
'Hello how are you doing now?'
>>> merge("This is a nice ACADEMY", "DEMY you know!")
'This is a nice ACADEMY you know!'
This should do what you want:
def overlap_concat(s1, s2):
l = min(len(s1), len(s2))
for i in range(l, 0, -1):
if s1.endswith(s2[:i]):
return s1 + s2[i:]
return s1 + s2
Examples:
>>> overlap_concat("Hello how are you", "are you doing now?")
'Hello how are you doing now?'
>>>
>>> overlap_concat("This is a nice ACADEMY", "DEMY you know!")
'This is a nice ACADEMY you know!'
>>>
Using str.endswith and enumerate:
def overlap(string1, string2):
for i, s in enumerate(string2, 1):
if string1.endswith(string2[:i]):
break
return string1 + string2[i:]
>>> overlap("Hello how are you", "are you doing now?")
'Hello how are you doing now?'
>>> overlap("This is a nice ACADEMY", "DEMY you know!")
'This is a nice ACADEMY you know!'
If you were to account for trailing special characters, you'd be wanting to employ some re based substitution.
import re
string1 = re.sub('[^\w\s]', '', string1)
Although note that this would remove all special characters in the first string.
A modification to the above function which will find the longest matching substring (instead of the shortest) involves traversing string2 in reverse.
def overlap(string1, string2):
for i in range(len(s)):
if string1.endswith(string2[:len(string2) - i]):
break
return string1 + string2[len(string2) - i:]
>>> overlap('Where did', 'did you go?')
'Where did you go?'
Other answers were great guys but it did fail for this input.
string1 = 'THE ACADEMY has'
string2= '.CADEMY has taken'
output:
>>> merge(string1,string2)
'THE ACADEMY has.CADEMY has taken'
>>> overlap(string1,string2)
'THE ACADEMY has'
However there's this standard library difflib which proved to be effective in my case!
match = SequenceMatcher(None, string1,\
string2).find_longest_match\
(0, len(string1), 0, len(string2))
print(match) # -> Match(a=0, b=15, size=9)
print(string1[: match.a + match.size]+string2[match.b + match.size:])
output:
Match(a=5, b=1, size=10)
THE ACADEMY has taken
which words you want to replace are appearing in the second string so you can try something like :
new_string=[string2.split()]
new=[]
new1=[j for item in new_string for j in item if j not in string1]
new1.insert(0,string1)
print(" ".join(new1))
with the first test case:
string1 = 'Hello how are you'
string2 = 'are you doing now?'
output:
Hello how are you doing now?
second test case:
string1 = 'This is a nice ACADEMY'
string2 = 'DEMY you know!'
output:
This is a nice ACADEMY you know!
Explanation :
first, we are splitting the second string so we can find which words we have to remove or replace :
new_string=[string2.split()]
second step we will check each word of this splitter string with string1 , if any word is in that string than choose only first string word , leave that word in second string :
new1=[j for item in new_string for j in item if j not in string1]
This list comprehension is same as :
new1=[]
for item in new_string:
for j in item:
if j not in string1:
new1.append(j)
last step combines both string and join the list:
new1.insert(0,string1)
print(" ".join(new1))
I tried matching words including the letter "ab" or "ba" e.g. "ab"olition, f"ab"rics, pro"ba"ble. I came up with the following regular expression:
r"[Aa](?=[Bb])[Bb]|[Bb](?=[Aa])[Aa]"
But it includes words that start or end with ", (, ), / ....non-alphanumeric characters. How can I erase it? I just want to match words list.
import sys
import re
word=[]
dict={}
f = open('C:/Python27/brown_half.txt', 'rU')
w = open('C:/Python27/brown_halfout.txt', 'w')
data = f.read()
word = data.split() # word is list
f.close()
for num2 in word:
match2 = re.findall("\w*(ab|ba)\w*", num2)
if match2:
dict[num2] = (dict[num2] + 1) if num2 in dict.keys() else 1
for key2 in sorted(dict.iterkeys()):print "%s: %s" % (key2, dict[key2])
print len(dict.keys())
Here, I don't know how to mix it up with "re.compile~~" method that 1st comment said...
To match all the words with ab or ba (case insensitive):
import re
text = 'fabh, obar! (Abtt) yybA, kk'
pattern = re.compile(r"(\w*(ab|ba)\w*)", re.IGNORECASE)
# to print all the matches
for match in pattern.finditer(text):
print match.group(0)
# to print the first match
print pattern.search(text).group(0)
https://regex101.com/r/uH3xM9/1
Regular expressions are not the best tool for the job in this case. They'll complicate stuff way too much for such simple circumstances. You can instead use Python's builtin in operator (works for both Python 2 and 3)...
sentence = "There are no probable situations whereby that may happen, or so it seems since the Abolition."
words = [''.join(filter(lambda x: x.isalpha(), token)) for token in sentence.split()]
for word in words:
word = word.lower()
if 'ab' in word or 'ba' in word:
print('Word "{}" matches pattern!'.format(word))
As you can see, 'ab' in word evaluates to True if the string 'ab' is found as-is (that is, exactly) in word, or False otherwise. For example 'ba' in 'probable' == True and 'ab' in 'Abolition' == False. The second line takes take of dividing the sentence in words and taking out any punctuation character. word = word.lower() makes word lowercase before the comparisons, so that for word = 'Abolition', 'ab' in word == True.
I would do it this way:
Strip your string from unwanted chars using the below two
techniques, your choice:
a - By building a translation dictionary and using translate method:
>>> import string
>>> del_punc = dict.fromkeys(ord(c) for c in string.punctuation)
s = 'abolition, fabrics, probable, test, case, bank;, halfback 1(ablution).'
>>> s = s.translate(del_punc)
>>> print(s)
'abolition fabrics probable test case bank halfback 1ablution'
b - using re.sub method:
>>> import string
>>> import re
>>> s = 'abolition, fabrics, probable, test, case, bank;, halfback 1(ablution).'
>>> s = re.sub(r'[%s]'%string.punctuation, '', s)
>>> print(s)
'abolition fabrics probable test case bank halfback 1ablution'
Next will be finding your words containing 'ab' or 'ba':
a - Splitting over whitespaces and finding occurrences of your desired strings, which is the one I recommend to you:
>>> [x for x in s.split() if 'ab' in x.lower() or 'ba' in x.lower()]
['abolition', 'fabrics', 'probable', 'bank', 'halfback', '1ablution']
b -Using re.finditer method:
>>> pat
re.compile('\\b.*?(ab|ba).*?\\b', re.IGNORECASE)
>>> for m in pat.finditer(s):
print(m.group())
abolition
fabrics
probable
test case bank
halfback
1ablution
string = "your string here"
lowercase = string.lower()
if 'ab' in lowercase or 'ba' in lowercase:
print(true)
else:
print(false)
Try this one
[(),/]*([a-z]|(ba|ab))+[(),/]*
my code so far, but since i'm so lost it doesn't do anything close to what I want it to do:
vowels = 'a','e','i','o','u','y'
#Consider 'y' as a vowel
input = input("Enter a sentence: ")
words = input.split()
if vowels == words[0]:
print(words)
so for an input like this:
"this is a really weird test"
I want it to only print:
this, is, a, test
because they only contains 1 vowel.
Try this:
vowels = set(('a','e','i','o','u','y'))
def count_vowels(word):
return sum(letter in vowels for letter in word)
my_string = "this is a really weird test"
def get_words(my_string):
for word in my_string.split():
if count_vowels(word) == 1:
print word
Result:
>>> get_words(my_string)
this
is
a
test
Here's another option:
import re
words = 'This sentence contains a bunch of cool words'
for word in words.split():
if len(re.findall('[aeiouy]', word)) == 1:
print word
Output:
This
a
bunch
of
words
You can translate all the vowels to a single vowel and count that vowel:
import string
trans = string.maketrans('aeiouy','aaaaaa')
strs = 'this is a really weird test'
print [word for word in strs.split() if word.translate(trans).count('a') == 1]
>>> s = "this is a really weird test"
>>> [w for w in s.split() if len(w) - len(w.translate(None, "aeiouy")) == 1]
['this', 'is', 'a', 'test']
Not sure if words with no vowels are required. If so, just replace == 1 with < 2
You may use one for-loop to save the sub-strings into the string array if you have checked he next character is a space.
Them for each substring, check if there is only one a,e,i,o,u (vowels) , if yes, add into the another array
aFTER THAT, FROM another array, concat all the strings with spaces and comma
Try this:
vowels = ('a','e','i','o','u','y')
words = [i for i in input('Enter a sentence ').split() if i != '']
interesting = [word for word in words if sum(1 for char in word if char in vowel) == 1]
i found so much nice code here ,and i want to show my ugly one:
v = 'aoeuiy'
o = 'oooooo'
sentence = 'i found so much nice code here'
words = sentence.split()
trans = str.maketrans(v,o)
for word in words:
if not word.translate(trans).count('o') >1:
print(word)
I find your lack of regex disturbing.
Here's a plain regex only solution (ideone):
import re
str = "this is a really weird test"
words = re.findall(r"\b[^aeiouy\W]*[aeiouy][^aeiouy\W]*\b", str)
print(words)
How do I remove leading and trailing whitespace from a string in Python?
" Hello world " --> "Hello world"
" Hello world" --> "Hello world"
"Hello world " --> "Hello world"
"Hello world" --> "Hello world"
To remove all whitespace surrounding a string, use .strip(). Examples:
>>> ' Hello '.strip()
'Hello'
>>> ' Hello'.strip()
'Hello'
>>> 'Bob has a cat'.strip()
'Bob has a cat'
>>> ' Hello '.strip() # ALL consecutive spaces at both ends removed
'Hello'
Note that str.strip() removes all whitespace characters, including tabs and newlines. To remove only spaces, specify the specific character to remove as an argument to strip:
>>> " Hello\n ".strip(" ")
'Hello\n'
To remove only one space at most:
def strip_one_space(s):
if s.endswith(" "): s = s[:-1]
if s.startswith(" "): s = s[1:]
return s
>>> strip_one_space(" Hello ")
' Hello'
As pointed out in answers above
my_string.strip()
will remove all the leading and trailing whitespace characters such as \n, \r, \t, \f, space .
For more flexibility use the following
Removes only leading whitespace chars: my_string.lstrip()
Removes only trailing whitespace chars: my_string.rstrip()
Removes specific whitespace chars: my_string.strip('\n') or my_string.lstrip('\n\r') or my_string.rstrip('\n\t') and so on.
More details are available in the docs.
strip is not limited to whitespace characters either:
# remove all leading/trailing commas, periods and hyphens
title = title.strip(',.-')
This will remove all leading and trailing whitespace in myString:
myString.strip()
You want strip():
myphrases = [" Hello ", " Hello", "Hello ", "Bob has a cat"]
for phrase in myphrases:
print(phrase.strip())
This can also be done with a regular expression
import re
input = " Hello "
output = re.sub(r'^\s+|\s+$', '', input)
# output = 'Hello'
Well seeing this thread as a beginner got my head spinning. Hence came up with a simple shortcut.
Though str.strip() works to remove leading & trailing spaces it does nothing for spaces between characters.
words=input("Enter the word to test")
# If I have a user enter discontinous threads it becomes a problem
# input = " he llo, ho w are y ou "
n=words.strip()
print(n)
# output "he llo, ho w are y ou" - only leading & trailing spaces are removed
Instead use str.replace() to make more sense plus less error & more to the point.
The following code can generalize the use of str.replace()
def whitespace(words):
r=words.replace(' ','') # removes all whitespace
n=r.replace(',','|') # other uses of replace
return n
def run():
words=input("Enter the word to test") # take user input
m=whitespace(words) #encase the def in run() to imporve usability on various functions
o=m.count('f') # for testing
return m,o
print(run())
output- ('hello|howareyou', 0)
Can be helpful while inheriting the same in diff. functions.
In order to remove "Whitespace" which causes plenty of indentation errors when running your finished code or programs in Pyhton. Just do the following;obviously if Python keeps telling that the error(s) is indentation in line 1,2,3,4,5, etc..., just fix that line back and forth.
However, if you still get problems about the program that are related to typing mistakes, operators, etc, make sure you read why error Python is yelling at you:
The first thing to check is that you have your
indentation right. If you do, then check to see if you have
mixed tabs with spaces in your code.
Remember: the code
will look fine (to you), but the interpreter refuses to run it. If
you suspect this, a quick fix is to bring your code into an
IDLE edit window, then choose Edit..."Select All from the
menu system, before choosing Format..."Untabify Region.
If you’ve mixed tabs with spaces, this will convert all your
tabs to spaces in one go (and fix any indentation issues).
I could not find a solution to what I was looking for so I created some custom functions. You can try them out.
def cleansed(s: str):
""":param s: String to be cleansed"""
assert s is not (None or "")
# return trimmed(s.replace('"', '').replace("'", ""))
return trimmed(s)
def trimmed(s: str):
""":param s: String to be cleansed"""
assert s is not (None or "")
ss = trim_start_and_end(s).replace(' ', ' ')
while ' ' in ss:
ss = ss.replace(' ', ' ')
return ss
def trim_start_and_end(s: str):
""":param s: String to be cleansed"""
assert s is not (None or "")
return trim_start(trim_end(s))
def trim_start(s: str):
""":param s: String to be cleansed"""
assert s is not (None or "")
chars = []
for c in s:
if c is not ' ' or len(chars) > 0:
chars.append(c)
return "".join(chars).lower()
def trim_end(s: str):
""":param s: String to be cleansed"""
assert s is not (None or "")
chars = []
for c in reversed(s):
if c is not ' ' or len(chars) > 0:
chars.append(c)
return "".join(reversed(chars)).lower()
s1 = ' b Beer '
s2 = 'Beer b '
s3 = ' Beer b '
s4 = ' bread butter Beer b '
cdd = trim_start(s1)
cddd = trim_end(s2)
clean1 = cleansed(s3)
clean2 = cleansed(s4)
print("\nStr: {0} Len: {1} Cleansed: {2} Len: {3}".format(s1, len(s1), cdd, len(cdd)))
print("\nStr: {0} Len: {1} Cleansed: {2} Len: {3}".format(s2, len(s2), cddd, len(cddd)))
print("\nStr: {0} Len: {1} Cleansed: {2} Len: {3}".format(s3, len(s3), clean1, len(clean1)))
print("\nStr: {0} Len: {1} Cleansed: {2} Len: {3}".format(s4, len(s4), clean2, len(clean2)))
If you want to trim specified number of spaces from left and right, you could do this:
def remove_outer_spaces(text, num_of_leading, num_of_trailing):
text = list(text)
for i in range(num_of_leading):
if text[i] == " ":
text[i] = ""
else:
break
for i in range(1, num_of_trailing+1):
if text[-i] == " ":
text[-i] = ""
else:
break
return ''.join(text)
txt1 = " MY name is "
print(remove_outer_spaces(txt1, 1, 1)) # result is: " MY name is "
print(remove_outer_spaces(txt1, 2, 3)) # result is: " MY name is "
print(remove_outer_spaces(txt1, 6, 8)) # result is: "MY name is"
How do I remove leading and trailing whitespace from a string in Python?
So below solution will remove leading and trailing whitespaces as well as intermediate whitespaces too. Like if you need to get a clear string values without multiple whitespaces.
>>> str_1 = ' Hello World'
>>> print(' '.join(str_1.split()))
Hello World
>>>
>>>
>>> str_2 = ' Hello World'
>>> print(' '.join(str_2.split()))
Hello World
>>>
>>>
>>> str_3 = 'Hello World '
>>> print(' '.join(str_3.split()))
Hello World
>>>
>>>
>>> str_4 = 'Hello World '
>>> print(' '.join(str_4.split()))
Hello World
>>>
>>>
>>> str_5 = ' Hello World '
>>> print(' '.join(str_5.split()))
Hello World
>>>
>>>
>>> str_6 = ' Hello World '
>>> print(' '.join(str_6.split()))
Hello World
>>>
>>>
>>> str_7 = 'Hello World'
>>> print(' '.join(str_7.split()))
Hello World
As you can see this will remove all the multiple whitespace in the string(output is Hello World for all). Location doesn't matter. But if you really need leading and trailing whitespaces, then strip() would be find.
One way is to use the .strip() method (removing all surrounding whitespaces)
str = " Hello World "
str = str.strip()
**result: str = "Hello World"**
Note that .strip() returns a copy of the string and doesn't change the underline object (since strings are immutable).
Should you wish to remove all whitespace (not only trimming the edges):
str = ' abcd efgh ijk '
str = str.replace(' ', '')
**result: str = 'abcdefghijk'
I wanted to remove the too-much spaces in a string (also in between the string, not only in the beginning or end). I made this, because I don't know how to do it otherwise:
string = "Name : David Account: 1234 Another thing: something "
ready = False
while ready == False:
pos = string.find(" ")
if pos != -1:
string = string.replace(" "," ")
else:
ready = True
print(string)
This replaces double spaces in one space until you have no double spaces any more