I am trying to append a list (null) with "sentences" which have # (Hashtags) from a different list.
Currently my code is giving me a new list with length of total number of elements involved in the list and not single sentences.
The code snippet is given below
import re
old_list = ["I love #stackoverflow because #people are very #helpful!","But I dont #love hastags",
"So #what can you do","Some simple senetnece","where there is no hastags","however #one can be good"]
new_list = [ ]
for tt in range(0,len(s)):
for ui in s:
if bool(re.search(r"#(\w+)",s[tt])) == True :
njio.append(s[tt])
Please let me know how to append only the single sentence.
I am not sure what you are wanting for output, but this will preserve the original sentence along with its matching set of hashtags:
>>> import re
>>> old_list = ["I love #stackoverflow because #people are very #helpful!","But I dont #love hastags",
... "So #what can you do","Some simple senetnece","where there is no hastags","however #one can be good"]
>>> hash_regex = re.compile('#(\w+)')
>>> [(hash_regex.findall(l), l) for l in old_list]
[(['stackoverflow', 'people', 'helpful'], 'I love #stackoverflow because #people are very #helpful!'), (['love'], 'But I dont #love hastags'), (['what'], 'So #what can you do'), ([], 'Some simple senetnece'), ([], 'where there is no hastags'), (['one'], 'however #one can be good')]
Related
I'm looking to count whether eth or btc appear in listy
searchterms = ['btc', 'eth']
listy = ['Hello, my name is btc', 'hello, my name is ETH,', 'i love eth', '#eth is great', 'barbaric tsar', 'nothing']
cnt = round((sum(len(re.findall('eth', x.lower())) for x in listy)/(len(listy)))*100)
print(f'{cnt}%')
The solution only looks for eth. How do I look for multiple search terms?
Bottom line: I have a list of search terms in searchterms. I'd like to see if any of those appear in listy. From there, I can perform a percentage of how many of those terms appear in the list.
you need to use the pipe "|" betwwen the values you want to search. In your code change re.findall('eth', x.lower() by re.findall(r"(eth)|(btc)", x.lower()
listy = ['Hello, my name is btc', 'hello, my name is ETH,', 'i love eth', '#eth is great', 'barbaric tsar', 'nothing']
cnt = round((sum(len(re.findall(r"(eth)|(btc)", x.lower())) for x in listy)/(len(listy)))*100)
print(f'{cnt}%')
67%
I would say instead of complicating the problem and using re, use a simple classic list comprehension.
listy = ['Hello, my name is btc', 'hello, my name is ETH,', 'i love eth', '#eth is great', 'barbaric tsar', 'nothing']
print(len([i for i in listy if 'btc' in i.lower() or 'eth' in i.lower()]) * 100 / len(listy))
It improves the readability and the simplicity of the code.
Let me know if it helps!
a bit more of code is giving a good readability. also adding additional words to search for, just needs to add them to the list search_for. for counting it uses a defaultdict.
listy = ['Hello, my name is btc', 'hello, my name is ETH,', 'i love eth', '#eth is great', 'barbaric tsar', 'nothing']
my_dict = defaultdict(int)
search_for = ['btc', 'eth']
for word in listy:
for sub in search_for:
if sub in word:
my_dict[sub] += 1
print(my_dict.items())
I have a list of words, for example:
list_of_words = ['car', 'motorcycle', 'tree']
I also have a list of sentences, for example:
list_of_sentences = ["I have a car, but I don't have a motorcycle", "I like elephants but I don't like lions"]
Goal: For each sentence in list_of_sentences, I want to find exactly how many words from list_of_words it includes. In this particular example the return should be:
[2, 0]
Note: In practice, my list_of_sentences and list_of_words lists contain thousands of items so ideally, the solution should be fast.
You should probably tokenize the sentences first to remove unwanted punctuation signs, and then find the set.intersection with the list of words:
from nltk import word_tokenize
list_of_words = set(['car', 'motorcycle', 'tree'])
list_of_sentences = ["I have a car, but I don't have a motorcycle",
"I like elephants but I don't like lions"]
[len(list_of_words.intersection(word_tokenize(s))) for s in list_of_sentences]
# [2, 0]
Edited: Another option, maybe importing more libraries, and taking into account the examples that #Chris_Rands exposed, could be:
import collections
from operator import itemgetter
import re
list_of_words = ['car', 'motorcycle', 'tree']
list_of_sentences = ["I have a car, but I don't have a motorcycle",
"My car, i like my Car and other cars too, and my tree! Yes tree tree; trees and treehouses ... car",
"I have carpal tunnel syndrome"]
count=[sum(itemgetter(*list_of_words)(collections.Counter(re.findall(r'\w+', sentence)))) for sentence in list_of_sentences]
print(count)
Output:
[2, 5, 0]
try this, where \b allows you to perform a “whole words only” search.
import re
list_of_words = ['car', 'motorcycle', 'tree']
list_of_sentences = ["I have a car, but I don't have a motorcycle",
"I have carpal tunnel syndrome"]
search_ = re.compile("\\b%s\\b" % "\\b|\\b".join(list_of_words))
print([len(search_.findall(x)) for x in list_of_sentences])
heres another way to do it with loops:
list_of_words = []
list_of_sentences = []
num_words = []
for i in list_of_sentences:
val=0
for j in list_of_words:
if j in i:
val = val+1
num_words.append(val)
print(num_words)
You can split the words of each sentence using re.split('[^\w\']', sentence) into a list, transform that list to a set, and then apply a set intersection (with the & operator) between that and a set of the words in list_of_words. The result list will have the length of the resulting set. Something like:
import re
list_of_sentences = ["I have a car, but I don't have a motorcycle",
"I like elephants but I don't like lions"]
list_of_words = ['car', 'motorcycle', 'tree']
set_of_words = {*list_of_words}
result = [len({*re.split('[^\w\']', sentence)} & set_of_words)
for sentence in list_of_sentences]
# result list contains: [2, 0]
I am looking to match a phrase inside a list.
I'm using python to match a phrase inside a list. The phrases can be inside the list, or they can not be inside a list.
list1 = ['I would like to go to a party', 'I am sam', 'That is
correct', 'I am currently living in Texas']
phrase1= 'I would like to go to a party'
phrase2= 'I am sam'
If phrase1 and phrase 2 are inside the list1, return correct or 100%. The purpose of it is to make sure that phrase 2 and phrase 2 are matched word for word.
Conversely, If the phrase is not inside a list or only one phrase is inside, for instance in list 2, then return false or 0%.
list2 = ['I am mike', 'I don\'t go to party', 'I am sam']
phrase1= 'I would like to go to a party'
phrase2= 'I am sam'
phrases can be changed so that it can be different than just those two phrases. For instance, it can be changed to whatever user sets like 'I am not good.'
It seems like you simply want to check for membership in the list:
list1 = ['I would like to go to a party', 'I am sam', 'That is correct', 'I am currently living in Texas']
phrase1 = 'I would like to go to a party'
phrase2 = 'I am sam'
if phrase1 in list1 and phrase2 in list1:
# whatever you want, this will execute if True
pass
else:
# whatever you want, this will execute if False
pass
I am not sure about I understand you but I guess maybe you can try
if phrase1 in list1
to check whether a phrase is in a list.
You can use all and a comprehension:
def check(phrase_list, *phrases):
return all(p in phrase_list for p in phrases)
In use:
list1 = ['I would like to go to a party', 'I am sam', 'That is correct', 'I am currently living in Texas']
phrase1= 'I would like to go to a party'
phrase2= 'I am sam'
print(check(list1, phrase1, phrase2))
#True
print(check(list1, 'I am sam', 'dragon'))
#False
You can also use a set
Like this:
set(list1) >= {phrase1, phrase2}
#True
Or like this:
#you can call this the same way I called the other check
def check(phrase_list, *phrases):
return set(list1) >= set(phrases)
Edit
To print 100% or 0% you could simply use an if statement or use boolean indexing:
print(('0%', '100%')[check(list1, phrases)])
To do this in your return statement:
return ('0%', '100%')[the_method_you_choose]
I'm trying to create something like:
string: How do you do today?
substring: o
>>> hOw dO yOu dO tOday?
I've already written the rest of the code (prompting for strings etc.), I am just stuck on having to capitalize the substring within the string.
>>> s='How do you do today?'
>>> sub_s='o'
>>> s.replace(sub_s, sub_s.upper())
'HOw dO yOu dO tOday?'
And can get more complicated if you only want to change some (i.e., the 2nd one), one liner:
>>> ''.join([item.upper() if i==[idx for idx, w in enumerate(s) if w==sub_s][1] else item for i, item in enumerate(s)])
'How dO you do today?'
Okay, so I have the following little function:
def swap(inp):
inp = inp.split()
out = ""
for item in inp:
ind = inp.index(item)
item = item.replace("i am", "you are")
item = item.replace("you are", "I am")
item = item.replace("i'm", "you're")
item = item.replace("you're", "I'm")
item = item.replace("my", "your")
item = item.replace("your", "my")
item = item.replace("you", "I")
item = item.replace("my", "your")
item = item.replace("i", "you")
inp[ind] = item
for item in inp:
ind = inp.index(item)
item = item + " "
inp[ind] = item
return out.join(inp)
Which, while it's not particularly efficient gets the job done for shorter sentences. Basically, all it does is swaps pronoun etc. perspectives. This is fine when I throw a string like "I love you" at it, it returns "you love me" but when I throw something like:
you love your version of my couch because I love you, and you're a couch-lover.
I get:
I love your versyouon of your couch because I love I, and I'm a couch-lover.
I'm confused as to why this is happening. I explicitly split the string into a list to avoid this. Why would it be able to detect it as being a part of a list item, rather than just an exact match?
Also, slightly deviating to avoid having to post another question so similar; if a solution to this breaks this function, what will happen to commas, full stops, other punctuation?
It made some very surprising mistakes. My expected output is:
I love my version of your couch because you love I, and I'm a couch-lover.
The reason I formatted it like this, is because I eventually hope to be able to replace the item.replace(x, y) variables with words in a database.
For this specific problem you need regular expressions. Basically, along the lines of:
table = [
("I am", "you are"),
("I'm", "you're"),
("my", "your"),
("I", "you"),
]
import re
def swap(s):
dct = dict(table)
dct.update((y, x) for x, y in table)
return re.sub(
'|'.join(r'(?:\b%s\b)' % x for x in dct),
lambda m: dct[m.group(0)],
s)
print swap("you love your version of my couch because I love you, and you're a couch-lover.")
# I love my version of your couch because you love I, and I'm a couch-lover.
But in general, natural language processing by the means of string/re functions is naive at best (note "you love I" above).
Heres a simple code:
def swap(inp):
inp = inp.split()
out = []
d1 = ['i am', 'you are', 'i\'m', 'you\'re', 'my', 'your', 'I', 'my', 'you']
d2 = ['you are', 'I am', 'you\'re', 'I\'m', 'your', 'my', 'you', 'your', 'I']
for item in inp:
itm = item.replace(',','')
if itm not in d1:
out.append(item)
else: out.append(d2[d1.index(itm)])
return ' '.join(out)
print(swap('you love your version of my couch because I love you, and you\'re a couch-lover.'))
The problem is that both index() and replace() works with substrings (in your case, sub-words).
Take a look at my answer to another question: String replacement with dictionary, complications with punctuation
The code in that answer can be used to solve your problem.