Search for a word in a list - python

I want to search for the existence of the word hi.
import re
word = 'hi?'
cleanString = re.sub('\W+',' ', word)
print(cleanString.lower())
GREETING_INPUTS = ("hello", 'hi', 'hii', "hey")
if cleanString.lower() in GREETING_INPUTS:
print('yes')
else:
print('no')
When word = 'hi', it prints yes. But for word = 'hi?', it prints no. Why is it so and please suggest any solution.

Replace this line:
cleanString = re.sub('\W+',' ', word)
With:
cleanString = re.sub('\W+','', word)
Because you're replacing all the matches of '\W+' with ' ', a space, so the string would be 'hi ', so then you need to replace it with empty string '' for it to work, the string would become 'hi'

Related

How to replace a spaced comma in a string?

I have the following string which I want to parse into words. Here is string sentence
text = "If it is hot , don’t touch"
What I've tried so far:
import string
text = "If it is hot , don’t touch"
words = [word.replace(',', '') for word in text.split()]
print(words)
However I've got following result:
['If', 'it', 'is', 'hot', '', 'don’t', 'touch']
What I want as a result:
['If', 'it', 'is', 'hot', 'don’t', 'touch']
text = "If it is hot , don’t touch"
newtext = text.replace(",", "")
words = newtext.split()
print(words)
you can use a filter function, instead of the replace finction of ',' with ' '
you can use a function that returns only the words which is not equal to ','
like this:
words = [word for word in text.split() if word!=',']

Return a list of words that contain a letter

I wanna return a list of words containing a letter disregarding its case.
Say if i have sentence = "Anyone who has never made a mistake has never tried anything new", then f(sentence, a) would return
['Anyone', 'has', 'made', 'a', 'mistake', 'has', 'anything']
This is what i have
import re
def f(string, match):
string_list = string.split()
match_list = []
for word in string_list:
if match in word:
match_list.append(word)
return match_list
You don't need re. Use str.casefold:
[w for w in sentence.split() if "a" in w.casefold()]
Output:
['Anyone', 'has', 'made', 'a', 'mistake', 'has', 'anything']
You can use string splitting for it, if there is not punctuation.
match_list = [s for s in sentence.split(' ') if 'a' in s.lower()]
Here's another variation :
sentence = 'Anyone who has never made a mistake has never tried anything new'
def f (string, match) :
match_list = []
for word in string.split () :
if match in word.lower ():
match_list.append (word)
return match_list
print (f (sentence, 'a'))

Split a Python string (sentence) with appended white spaces

Might it be possible to split a Python string (sentence) so it retains the whitespaces between words in the output, but within a split substring by appending it after each word?
For example:
given_string = 'This is my string!'
output = ['This ', 'is ', 'my ', 'string!']
I avoid regexes most of the time, but here it makes it really simple:
import re
given_string = 'This is my string!'
res = re.findall(r'\w+\W?', given_string)
# res ['This ', 'is ', 'my ', 'string!']
Maybe this will help?
>>> given_string = 'This is my string!'
>>> l = given_string.split(' ')
>>> l = [item + ' ' for item in l[:-1]] + l[-1:]
>>> l
['This ', 'is ', 'my ', 'string!']
just split and add the whitespace back:
a = " "
output = [e+a for e in given_string.split(a) if e]
output[len(output)-1] = output[len(output)-1][:-1]
the last line is for deleting space after thankyou!

Converting a String to a List of Words?

I'm trying to convert a string to a list of words using python. I want to take something like the following:
string = 'This is a string, with words!'
Then convert to something like this :
list = ['This', 'is', 'a', 'string', 'with', 'words']
Notice the omission of punctuation and spaces. What would be the fastest way of going about this?
I think this is the simplest way for anyone else stumbling on this post given the late response:
>>> string = 'This is a string, with words!'
>>> string.split()
['This', 'is', 'a', 'string,', 'with', 'words!']
Try this:
import re
mystr = 'This is a string, with words!'
wordList = re.sub("[^\w]", " ", mystr).split()
How it works:
From the docs :
re.sub(pattern, repl, string, count=0, flags=0)
Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl. If the pattern isn’t found, string is returned unchanged. repl can be a string or a function.
so in our case :
pattern is any non-alphanumeric character.
[\w] means any alphanumeric character and is equal to the character set
[a-zA-Z0-9_]
a to z, A to Z , 0 to 9 and underscore.
so we match any non-alphanumeric character and replace it with a space .
and then we split() it which splits string by space and converts it to a list
so 'hello-world'
becomes 'hello world'
with re.sub
and then ['hello' , 'world']
after split()
let me know if any doubts come up.
To do this properly is quite complex. For your research, it is known as word tokenization. You should look at NLTK if you want to see what others have done, rather than starting from scratch:
>>> import nltk
>>> paragraph = u"Hi, this is my first sentence. And this is my second."
>>> sentences = nltk.sent_tokenize(paragraph)
>>> for sentence in sentences:
... nltk.word_tokenize(sentence)
[u'Hi', u',', u'this', u'is', u'my', u'first', u'sentence', u'.']
[u'And', u'this', u'is', u'my', u'second', u'.']
The most simple way:
>>> import re
>>> string = 'This is a string, with words!'
>>> re.findall(r'\w+', string)
['This', 'is', 'a', 'string', 'with', 'words']
Using string.punctuation for completeness:
import re
import string
x = re.sub('['+string.punctuation+']', '', s).split()
This handles newlines as well.
Well, you could use
import re
list = re.sub(r'[.!,;?]', ' ', string).split()
Note that both string and list are names of builtin types, so you probably don't want to use those as your variable names.
Inspired by #mtrw's answer, but improved to strip out punctuation at word boundaries only:
import re
import string
def extract_words(s):
return [re.sub('^[{0}]+|[{0}]+$'.format(string.punctuation), '', w) for w in s.split()]
>>> str = 'This is a string, with words!'
>>> extract_words(str)
['This', 'is', 'a', 'string', 'with', 'words']
>>> str = '''I'm a custom-built sentence with "tricky" words like https://stackoverflow.com/.'''
>>> extract_words(str)
["I'm", 'a', 'custom-built', 'sentence', 'with', 'tricky', 'words', 'like', 'https://stackoverflow.com']
Personally, I think this is slightly cleaner than the answers provided
def split_to_words(sentence):
return list(filter(lambda w: len(w) > 0, re.split('\W+', sentence))) #Use sentence.lower(), if needed
A regular expression for words would give you the most control. You would want to carefully consider how to deal with words with dashes or apostrophes, like "I'm".
list=mystr.split(" ",mystr.count(" "))
This way you eliminate every special char outside of the alphabet:
def wordsToList(strn):
L = strn.split()
cleanL = []
abc = 'abcdefghijklmnopqrstuvwxyz'
ABC = abc.upper()
letters = abc + ABC
for e in L:
word = ''
for c in e:
if c in letters:
word += c
if word != '':
cleanL.append(word)
return cleanL
s = 'She loves you, yea yea yea! '
L = wordsToList(s)
print(L) # ['She', 'loves', 'you', 'yea', 'yea', 'yea']
I'm not sure if this is fast or optimal or even the right way to program.
def split_string(string):
return string.split()
This function will return the list of words of a given string.
In this case, if we call the function as follows,
string = 'This is a string, with words!'
split_string(string)
The return output of the function would be
['This', 'is', 'a', 'string,', 'with', 'words!']
This is from my attempt on a coding challenge that can't use regex,
outputList = "".join((c if c.isalnum() or c=="'" else ' ') for c in inputStr ).split(' ')
The role of apostrophe seems interesting.
Probably not very elegant, but at least you know what's going on.
my_str = "Simple sample, test! is, olny".lower()
my_lst =[]
temp=""
len_my_str = len(my_str)
number_letter_in_data=0
list_words_number=0
for number_letter_in_data in range(0, len_my_str, 1):
if my_str[number_letter_in_data] in [',', '.', '!', '(', ')', ':', ';', '-']:
pass
else:
if my_str[number_letter_in_data] in [' ']:
#if you want longer than 3 char words
if len(temp)>3:
list_words_number +=1
my_lst.append(temp)
temp=""
else:
pass
else:
temp = temp+my_str[number_letter_in_data]
my_lst.append(temp)
print(my_lst)
You can try and do this:
tryTrans = string.maketrans(",!", " ")
str = "This is a string, with words!"
str = str.translate(tryTrans)
listOfWords = str.split()

Python: Splitting a string into words, saving separators

I have a string:
'Specified, if char, else 10 (default).'
I want to split it into two tuples
words=('Specified', 'if', 'char', 'else', '10', 'default')
separators=(',', ' ', ',', ' ', ' (', ').')
Does anyone have a quick solution of this?
PS: this symbol '-' is a word separator, not part of the word
import re
line = 'Specified, if char, else 10 (default).'
words = re.split(r'\)?[, .]\(?', line)
# words = ['Specified', '', 'if', 'char', '', 'else', '10', 'default', '']
separators = re.findall(r'\)?[, .]\(?', line)
# separators = [',', ' ', ' ', ',', ' ', ' ', ' (', ').']
If you really want tuples pass the results in tuple(), if you do not want words to have the empty entries (from between the commas and spaces), use the following:
words = [x for x in re.split(r'\)?[, .]\(?', line) if x]
or
words = tuple(x for x in re.split(r'\)?[, .]\(?', line) if x)
You can use regex for that.
>>> a='Specified, if char, else 10 (default).'
>>> from re import split
>>> split(",? ?\(?\)?\.?",a)
['Specified', 'if', 'char', 'else', '10', 'default', '']
But in this solution you should write that pattern yourself. If you want to use that tuple, you should convert it contents to regex pattern for that in this solution.
Regex to find all separators (assumed anything that's not alpha numeric
import re
re.findall('[^\w]', string)
I probably would first .split() on spaces into a list, then iterate through the list, using a regex to check for a character after the word boundary.
import re
s = 'Specified, if char, else 10 (default).'
w = s.split()
seperators = []
finalwords = []
for word in words:
match = re.search(r'(\w+)\b(.*)', word)
sep = '' if match is None else match.group(2)
finalwords.append(match.group(1))
seperators.append(sep)
In pass to get both separators and words you could use findall as follows:
import re
line = 'Specified, if char, else 10 (default).'
words = []
seps = []
for w,s in re.findall("(\w*)([), .(]+)", line):
words.append(w)
seps.append(s)
Here's my crack at it:
>>> p = re.compile(r'(\)? *[,.]? *\(?)')
>>> tmp = p.split('Specified, char, else 10 (default).')
>>> words = tmp[::2]
>>> separators = tmp[1::2]
>>> print words
['Specified', 'char', 'else', '10', 'default', '']
>>> print separators
[', ', ', ', ' ', ' (', ').']
The only problem is you can have a '' at the end or the beginning of words if there is a separator at the beginning/end of the sentence without anything before/after it. However, that is easily checked for and eliminated.

Categories

Resources