Say I have the following dictionary:
d = {"word1":0, "word2":0}
For this regex I need to verify that a word in the string isn't a key in that dictionary.
Is it possible to set a variable to anything not in a dictionary, for the purposes of a regex?
Forget about regex in this case:
test = "word1 word2 word3" # your string
words = test.split(' ') # words in your string
dict = {"word1":0, "word2":0} # your dict
for word in words:
if word in dict:
print word, "is a key in dict"
else:
print word, "isn't a key in dict"
>>> d = {"foo":0, "spam":0}
>>> test = "This is a string with many words, including foo and bar"
>>> any(word in d for word in test.split())
True
If punctuation is a problem (for example, "This is foo." would not find foo with this approach), and since you said all your words are alphanumeric, you could also use
>>> import re
>>> test = "This is foo."
>>> any(word in d for word in re.findall("[A-Za-z0-9]+", test))
Related
I have a list of words that I'm trying to filter from a text file stored in an array, but I'm not sure what function to use. Here's my code:
words = ["liverpool","arsenal","chelsea","manutd","mancity"]
test = ["LiverpoolFan","ArsenalFan"]
test2 = []
for i in range (len(test)):
test2[i] = test[i].lower()
if *any word in words* in test2[i]:
print("True")
I've used a test array to simulate reading from a text file.
I'm not sure what to use inbetween the two **
You can use builtin any:
>>> test2 = [team for team in words
if any(team.lower() in fanbase.lower() for fanbase in test)]
>>> test2
['liverpool', 'arsenal']
Or any with filter:
>>> def check_match(team):
return any(team.lower() in fanbase.lower() for fanbase in test)
>>> test2 = list(filter(check_match, words))
>>> test2
['liverpool', 'arsenal']
Or you could use str.join with a separator that is not in your words list, such as ',':
>>> all_fans = ','.join(test).lower()
>>> test2 = [team for team in words if team in all_fans]
>>> test2
['liverpool', 'arsenal']
You can do something like this
words = ["liverpool","arsenal","chelsea","manutd","mancity"]
fans = ["LiverpoolFan","ArsenalFan"]
for fan in fans:
for word in words:
if word.lower() in fan.lower():
print(f"{word} in {fan}", True)
brute force approach. Checks for every item if any word from words contains it
matches = []
for item in test:
for word in words:
# word.lower() thanks to #sushanth
if item.lower() in word.lower():
matches.append(item)
print("True")
Here is simple attempt, you can simply remove Fan from the test in order to check if there is team in words match that in test
import re
words = ["liverpool","arsenal","chelsea","manutd","mancity"]
test = ["LiverpoolFan","ArsenalFan"]
purified_test = [re.sub('Fan','', i) for i in test]
print(test)
for i in words:
if i.title() in test:
print('True');
or make it lower() while removing Fan
purified_test = [re.sub('Fan','', i).lower() for i in test]
for i in words:
if i in purified_test:
print('True');
or you can append it to test_2 and get the array like the following
import re
words = ["liverpool","arsenal","chelsea","manutd","mancity"]
test = ["LiverpoolFan","ArsenalFan"]
test2 = []
purified_test = [re.sub('Fan','', i).lower() for i in test]
for i in words:
if i in purified_test:
test2.append(i)
print(test2)
output
['liverpool', 'arsenal']
if all files don't all end in fan you can simple make the character set of all words that ends it like the following
import re
regex = re.compile('(Fan|FC|etc)')
I wanted to know how to iterate through a string word by word.
string = "this is a string"
for word in string:
print (word)
The above gives an output:
t
h
i
s
i
s
a
s
t
r
i
n
g
But I am looking for the following output:
this
is
a
string
When you do -
for word in string:
You are not iterating through the words in the string, you are iterating through the characters in the string. To iterate through the words, you would first need to split the string into words , using str.split() , and then iterate through that . Example -
my_string = "this is a string"
for word in my_string.split():
print (word)
Please note, str.split() , without passing any arguments splits by all whitespaces (space, multiple spaces, tab, newlines, etc).
This is one way to do it:
string = "this is a string"
ssplit = string.split()
for word in ssplit:
print (word)
Output:
this
is
a
string
for word in string.split():
print word
Using nltk.
from nltk.tokenize import sent_tokenize, word_tokenize
sentences = sent_tokenize("This is a string.")
words_in_each_sentence = word_tokenize(sentences)
You may use TweetTokenizer for parsing casual text with emoticons and such.
One way to do this is using a dictionary. The problem for the code above is it counts each letter in a string, instead of each word. To solve this problem, you should first turn the string into a list by using the split() method, and then create a variable counts each comma in the list as its own value. The code below returns each time a word appears in a string in the form of a dictionary.
s = input('Enter a string to see if strings are repeated: ')
d = dict()
p = s.split()
word = ','
for word in p:
if word not in d:
d[word] = 1
else:
d[word] += 1
print (d)
s = 'hi how are you'
l = list(map(lambda x: x,s.split()))
print(l)
Output: ['hi', 'how', 'are', 'you']
You can try this method also:
sentence_1 = "This is a string"
list = sentence_1.split()
for i in list:
print (i)
What is the easiest way in Python to replace the nth word in a string, assuming each word is separated by a space?
For example, if I want to replace the tenth word of a string and get the resulting string.
I guess you may do something like this:
nreplace=1
my_string="hello my friend"
words=my_string.split(" ")
words[nreplace]="your"
" ".join(words)
Here is another way of doing the replacement:
nreplace=1
words=my_string.split(" ")
" ".join([words[word_index] if word_index != nreplace else "your" for word_index in range(len(words))])
Let's say your string is:
my_string = "This is my test string."
You can split the string up using split(' ')
my_list = my_string.split()
Which will set my_list to
['This', 'is', 'my', 'test', 'string.']
You can replace the 4th list item using
my_list[3] = "new"
And then put it back together with
my_new_string = " ".join(my_list)
Giving you
"This is my new string."
A solution involving list comprehension:
text = "To be or not to be, that is the question"
replace = 6
replacement = 'it'
print ' '.join([x if index != replace else replacement for index,x in enumerate(s.split())])
The above produces:
To be or not to be, it is the question
You could use a generator expression and the string join() method:
my_string = "hello my friend"
nth = 0
new_word = 'goodbye'
print(' '.join(word if i != nth else new_word
for i, word in enumerate(my_string.split(' '))))
Output:
goodbye my friend
Through re.sub.
>>> import re
>>> my_string = "hello my friend"
>>> new_word = 'goodbye'
>>> re.sub(r'^(\s*(?:\S+\s+){0})\S+', r'\1'+new_word, my_string)
'goodbye my friend'
>>> re.sub(r'^(\s*(?:\S+\s+){1})\S+', r'\1'+new_word, my_string)
'hello goodbye friend'
>>> re.sub(r'^(\s*(?:\S+\s+){2})\S+', r'\1'+new_word, my_string)
'hello my goodbye'
Just replace the number within curly braces with the position of the word you want to replace - 1. ie, for to replace the first word, the number would be 0, for second word the number would be 1, likewise it goes on.
I would like a regular expression python code to:
1) Take an input of characters
2) Outputs the characters in all lower case letters
3) Compares this output in a python set.
I am no good at all with regular expressions.
Why bother?
>>> 'FOO'.lower() in set(('foo', 'bar', 'baz'))
True
>>> 'Quux'.lower() in set(('foo', 'bar', 'baz'))
False
After much google searching, and with trial an error, I a created a solution that works to separate multiple words from the input of characters.
import re
keywords = ('cars', 'jewelry', 'gas')
pattern = re.compile('[a-z]+', re.IGNORECASE)
txt = 'GAS, CaRs, Jewelrys'
keywords_found = pattern.findall(txt.lower())
n = 0
for i in keywords_found:
if i in keywords:
print keywords_found[n]
n = n + 1
Your self-answer would be better using a set rather than that loop.
Using i for a text variable and n for an index is very counter-intuitive. And keywords_found is a misnomer.
Try this:
>>> import re
>>> keywords = set(('cars', 'jewelry', 'gas'))
>>> pattern = re.compile('[a-z]+', re.IGNORECASE)
>>> txt = 'GAS, CaRs, Jewelrys'
>>> text_words = set(pattern.findall(txt.lower()))
>>> print "keywords:", keywords
keywords: set(['cars', 'gas', 'jewelry'])
>>> print "text_words:", text_words
text_words: set(['cars', 'gas', 'jewelrys'])
>>> print "text words in keywords:", text_words & keywords
text words in keywords: set(['cars', 'gas'])
>>> print "text words NOT in keywords:", text_words - (text_words & keywords)
text words NOT in keywords: set(['jewelrys'])
>>> print "keywords NOT in text words:", keywords - (text_words & keywords)
keywords NOT in text words: set(['jewelry'])
I want to check if a word is in a list of words.
word = "with"
word_list = ["without", "bla", "foo", "bar"]
I tried if word in set(list), but it is not yielding the wanted result due to the fact in is matching string rather than item. That is to say, "with" is a match in any of the words in the word_list but still if "with" in set(list) will say True.
What is a simpler way for doing this check than manually iterate over the list?
You could do:
found = any(word in item for item in wordlist)
It checks each word for a match and returns true if any are matches
in is working as expected for an exact match:
>>> word = "with"
>>> mylist = ["without", "bla", "foo", "bar"]
>>> word in mylist
False
>>>
You can also use:
milist.index(myword) # gives error if your word is not in the list (use in a try/except)
or
milist.count(myword) # gives a number > 0 if the word is in the list.
However, if you are looking for a substring, then:
for item in mylist:
if word in item:
print 'found'
break
btw, dont use list for the name of a variable
You could also create a single search string by concatenating all of the words in word_list into a single string:
word = "with"
word_list = ' '.join(["without", "bla", "foo", "bar"])
Then a simple in test will do the job:
return word in word_list