I'm looking to count whether eth or btc appear in listy
searchterms = ['btc', 'eth']
listy = ['Hello, my name is btc', 'hello, my name is ETH,', 'i love eth', '#eth is great', 'barbaric tsar', 'nothing']
cnt = round((sum(len(re.findall('eth', x.lower())) for x in listy)/(len(listy)))*100)
print(f'{cnt}%')
The solution only looks for eth. How do I look for multiple search terms?
Bottom line: I have a list of search terms in searchterms. I'd like to see if any of those appear in listy. From there, I can perform a percentage of how many of those terms appear in the list.
you need to use the pipe "|" betwwen the values you want to search. In your code change re.findall('eth', x.lower() by re.findall(r"(eth)|(btc)", x.lower()
listy = ['Hello, my name is btc', 'hello, my name is ETH,', 'i love eth', '#eth is great', 'barbaric tsar', 'nothing']
cnt = round((sum(len(re.findall(r"(eth)|(btc)", x.lower())) for x in listy)/(len(listy)))*100)
print(f'{cnt}%')
67%
I would say instead of complicating the problem and using re, use a simple classic list comprehension.
listy = ['Hello, my name is btc', 'hello, my name is ETH,', 'i love eth', '#eth is great', 'barbaric tsar', 'nothing']
print(len([i for i in listy if 'btc' in i.lower() or 'eth' in i.lower()]) * 100 / len(listy))
It improves the readability and the simplicity of the code.
Let me know if it helps!
a bit more of code is giving a good readability. also adding additional words to search for, just needs to add them to the list search_for. for counting it uses a defaultdict.
listy = ['Hello, my name is btc', 'hello, my name is ETH,', 'i love eth', '#eth is great', 'barbaric tsar', 'nothing']
my_dict = defaultdict(int)
search_for = ['btc', 'eth']
for word in listy:
for sub in search_for:
if sub in word:
my_dict[sub] += 1
print(my_dict.items())
Related
I am looking to match a phrase inside a list.
I'm using python to match a phrase inside a list. The phrases can be inside the list, or they can not be inside a list.
list1 = ['I would like to go to a party', 'I am sam', 'That is
correct', 'I am currently living in Texas']
phrase1= 'I would like to go to a party'
phrase2= 'I am sam'
If phrase1 and phrase 2 are inside the list1, return correct or 100%. The purpose of it is to make sure that phrase 2 and phrase 2 are matched word for word.
Conversely, If the phrase is not inside a list or only one phrase is inside, for instance in list 2, then return false or 0%.
list2 = ['I am mike', 'I don\'t go to party', 'I am sam']
phrase1= 'I would like to go to a party'
phrase2= 'I am sam'
phrases can be changed so that it can be different than just those two phrases. For instance, it can be changed to whatever user sets like 'I am not good.'
It seems like you simply want to check for membership in the list:
list1 = ['I would like to go to a party', 'I am sam', 'That is correct', 'I am currently living in Texas']
phrase1 = 'I would like to go to a party'
phrase2 = 'I am sam'
if phrase1 in list1 and phrase2 in list1:
# whatever you want, this will execute if True
pass
else:
# whatever you want, this will execute if False
pass
I am not sure about I understand you but I guess maybe you can try
if phrase1 in list1
to check whether a phrase is in a list.
You can use all and a comprehension:
def check(phrase_list, *phrases):
return all(p in phrase_list for p in phrases)
In use:
list1 = ['I would like to go to a party', 'I am sam', 'That is correct', 'I am currently living in Texas']
phrase1= 'I would like to go to a party'
phrase2= 'I am sam'
print(check(list1, phrase1, phrase2))
#True
print(check(list1, 'I am sam', 'dragon'))
#False
You can also use a set
Like this:
set(list1) >= {phrase1, phrase2}
#True
Or like this:
#you can call this the same way I called the other check
def check(phrase_list, *phrases):
return set(list1) >= set(phrases)
Edit
To print 100% or 0% you could simply use an if statement or use boolean indexing:
print(('0%', '100%')[check(list1, phrases)])
To do this in your return statement:
return ('0%', '100%')[the_method_you_choose]
I have a list as follows.
mylist = ['test copy', 'test project', 'test', 'project']
I want to see if my sentence includes the aforementioned mylistelements and split the sentence from the first match and obtain its first part.
For example:
mystring1 = 'it was a nice test project and I enjoyed it a lot'
output should be: it was a nice
mystring2 = 'the example test was difficult'
output should be: the example
My current code is as follows.
for sentence in L:
if mylist in sentence:
splits = sentence.split(mylist)
sentence= splits[0]
However, I get an error saying TypeError: 'in <string>' requires string as left operand, not list. Is there a way to fix this?
You need another for loop to iterate over every string in mylist.
mylist = ['test copy', 'test project', 'test', 'project']
mystring1 = 'it was a nice test project and I enjoyed it a lot'
mystring2 = 'the example test was difficult'
L = [mystring1, mystring2]
for sentence in L:
for word in mylist:
if word in sentence:
splits = sentence.split(word)
sentence= splits[0]
print(sentence)
# it was a nice
# the example
Probably the most effective way to do this is by first constructing a regex, that tests all the strings concurrently:
import re
split_regex = re.compile('|'.join(re.escape(s) for s in mylist))
for sentence in L:
first_part = split_regex.split(sentence, 1)[0]
This yields:
>>> split_regex.split(mystring1, 1)[0]
'it was a nice '
>>> mystring2 = 'the example test was difficult'
>>> split_regex.split(mystring2, 1)[0]
'the example '
If the number of possible strings is large, a regex can typically outperform searching each string individually.
You probably also want to .strip() the string (remove spaces in the front and end of the string):
import re
split_regex = re.compile('|'.join(re.escape(s) for s in mylist))
for sentence in L:
first_part = split_regex.split(sentence, 1)[0].strip()
mylist = ['test copy', 'test project', 'test', 'project']
L = ['it was a nice test project and I enjoyed it a lot','a test copy']
for sentence in L:
for x in mylist:
if x in sentence:
splits = sentence.split(x)
sentence= splits[0]
print(sentence)
the error says you are trying to check a list in sentence. so you must iterate on elements of list.
I want to write a python program to test if there are any phrase can match the string using python.
string ='I love my travel all over the world'
list =['I love','my travel','all over the world']
So I want to text if there are any one of list can match that string that can print 'I love' or 'my travel','all over the world'.
any(x in string for x in list)
Or I need to use text mining to solve the problem?
Your current solution is probably the best to use in this given scenario. You could encapsulate it as a function if you wanted.
def list_in_string(slist, string):
return any(x in string for x in slist_list)
You can't do this:
if any(x in string for x in word_list)
print x
Because the any function iterates through the entire string/list, discards the x variable, and then simply returns a Boolean (True or False).
You can however, just break apart your any function so that you can get your desired output.
string ='I love traveling all over the world'
word_list =['I love','traveling','all over the world']
for x in word_list:
if x in string:
print x
This will output:
>>>
I love
traveling
all over the world
>>>
Update using string.split() :
string =['I', 'love','traveling','all', 'over', 'the', 'world']
word_list =['I love','traveling','all over the world']
count=0
for x in word_list:
for y in x.split():
if y in string:
count+=1
if count==len(x.split()) and (' ' in x) == True:
print x
count=0
This will output:
>>>
I love
all over the world
>>>
If you want a True or False returned, you can definitely use any(), for example:
>>> string = 'I love my travel all over the world'
>>> list_string =['I love',
'my travel',
'all over the world',
'Something something',
'blah']
>>> any(x for x in list_string if x in string)
True
>>>
Otherwise, you could do some simple list comprehension:
>>> string ='I love my travel all over the world'
>>> list_string =['I love',
'my travel',
'all over the world',
'Something something',
'blah']
>>> [x for x in list_string if x in string]
['I love', 'my travel', 'all over the world']
>>>
Depending on what you want returned, both of these work perfectly.
You could also probably use regular expression, but it's a little overkill for something so simple.
For completeness, one may mention the find method:
_string ='I love my travel all over the world'
_list =['I love','my travel','all over the world','spam','python']
for i in range(len(_list)):
if _string.find(_list[i]) > -1:
print _list[i]
Which outputs:
I love
my travel
all over the world
Note: this solution is not as elegant as the in usage mentioned, but may be useful if the position of the found substring is needed.
I am trying to append a list (null) with "sentences" which have # (Hashtags) from a different list.
Currently my code is giving me a new list with length of total number of elements involved in the list and not single sentences.
The code snippet is given below
import re
old_list = ["I love #stackoverflow because #people are very #helpful!","But I dont #love hastags",
"So #what can you do","Some simple senetnece","where there is no hastags","however #one can be good"]
new_list = [ ]
for tt in range(0,len(s)):
for ui in s:
if bool(re.search(r"#(\w+)",s[tt])) == True :
njio.append(s[tt])
Please let me know how to append only the single sentence.
I am not sure what you are wanting for output, but this will preserve the original sentence along with its matching set of hashtags:
>>> import re
>>> old_list = ["I love #stackoverflow because #people are very #helpful!","But I dont #love hastags",
... "So #what can you do","Some simple senetnece","where there is no hastags","however #one can be good"]
>>> hash_regex = re.compile('#(\w+)')
>>> [(hash_regex.findall(l), l) for l in old_list]
[(['stackoverflow', 'people', 'helpful'], 'I love #stackoverflow because #people are very #helpful!'), (['love'], 'But I dont #love hastags'), (['what'], 'So #what can you do'), ([], 'Some simple senetnece'), ([], 'where there is no hastags'), (['one'], 'however #one can be good')]
Okay, so I have the following little function:
def swap(inp):
inp = inp.split()
out = ""
for item in inp:
ind = inp.index(item)
item = item.replace("i am", "you are")
item = item.replace("you are", "I am")
item = item.replace("i'm", "you're")
item = item.replace("you're", "I'm")
item = item.replace("my", "your")
item = item.replace("your", "my")
item = item.replace("you", "I")
item = item.replace("my", "your")
item = item.replace("i", "you")
inp[ind] = item
for item in inp:
ind = inp.index(item)
item = item + " "
inp[ind] = item
return out.join(inp)
Which, while it's not particularly efficient gets the job done for shorter sentences. Basically, all it does is swaps pronoun etc. perspectives. This is fine when I throw a string like "I love you" at it, it returns "you love me" but when I throw something like:
you love your version of my couch because I love you, and you're a couch-lover.
I get:
I love your versyouon of your couch because I love I, and I'm a couch-lover.
I'm confused as to why this is happening. I explicitly split the string into a list to avoid this. Why would it be able to detect it as being a part of a list item, rather than just an exact match?
Also, slightly deviating to avoid having to post another question so similar; if a solution to this breaks this function, what will happen to commas, full stops, other punctuation?
It made some very surprising mistakes. My expected output is:
I love my version of your couch because you love I, and I'm a couch-lover.
The reason I formatted it like this, is because I eventually hope to be able to replace the item.replace(x, y) variables with words in a database.
For this specific problem you need regular expressions. Basically, along the lines of:
table = [
("I am", "you are"),
("I'm", "you're"),
("my", "your"),
("I", "you"),
]
import re
def swap(s):
dct = dict(table)
dct.update((y, x) for x, y in table)
return re.sub(
'|'.join(r'(?:\b%s\b)' % x for x in dct),
lambda m: dct[m.group(0)],
s)
print swap("you love your version of my couch because I love you, and you're a couch-lover.")
# I love my version of your couch because you love I, and I'm a couch-lover.
But in general, natural language processing by the means of string/re functions is naive at best (note "you love I" above).
Heres a simple code:
def swap(inp):
inp = inp.split()
out = []
d1 = ['i am', 'you are', 'i\'m', 'you\'re', 'my', 'your', 'I', 'my', 'you']
d2 = ['you are', 'I am', 'you\'re', 'I\'m', 'your', 'my', 'you', 'your', 'I']
for item in inp:
itm = item.replace(',','')
if itm not in d1:
out.append(item)
else: out.append(d2[d1.index(itm)])
return ' '.join(out)
print(swap('you love your version of my couch because I love you, and you\'re a couch-lover.'))
The problem is that both index() and replace() works with substrings (in your case, sub-words).
Take a look at my answer to another question: String replacement with dictionary, complications with punctuation
The code in that answer can be used to solve your problem.