Searching and Remove Strings from another List of Strings - python

Hi I have the following 2 lists and I want to get a 3rd updated list basically such that if any of the strings from the list 'wrong' appears in the list 'old' it filters out that entire line of string containing it. ie I want the updated list to be equivalent to the 'new' list.
wrong = ['top up','national call']
old = ['Hi Whats with ','hola man top up','binga dingo','on a national call']
new = ['Hi Whats with', 'binga dingo']

You can use filter:
>>> list(filter(lambda x:not any(w in x for w in wrong), old))
['Hi Whats with ', 'binga dingo']
Or, a list comprehension,
>>> [i for i in old if not any(x in i for x in wrong)]
['Hi Whats with ', 'binga dingo']
If you're not comfortable with any of those, use a simple for loop based solution like below:
>>> result = []
>>> for i in old:
... for x in wrong:
... if x in i:
... break
... else:
... result.append(i)
...
>>> result
['Hi Whats with ', 'binga dingo']

>>> wrong = ['top up','national call']
>>> old = ['Hi Whats with ','hola man top up','binga dingo','on a national call']
>>> [i for i in old if all(x not in i for x in wrong)]
['Hi Whats with ', 'binga dingo']
>>>

Related

Retrieve first word in list index python

I want to know how to retrieve the first word at list index.
For example, if the list is:
['hello world', 'how are you']
Is there a way to get x = "hello how"?
Here is what I've tried so far (newfriend is the list):
x=""
for values in newfriend:
values = values.split()
values = ''.join(values.split(' ', 1)[0])
x+=" ".join(values)
x+="\n"
A simple generator expression would do, I guess, e.g.
>>> l = ["hello world", "how are you"]
>>> ' '.join(x.split()[0] for x in l)
'hello how'
You're not far off. Here is how I would do it.
# Python 3
newfriend = ['hello world', 'how are you']
x = [] # Create x as an empty list, rather than an empty string.
for v in newfriend:
x.append(v.split(' ')[0]) # Append first word of each phrase to the list.
y = ' '.join(x) # Join the list.
print(y)
import re
#where l =["Hello world","hi world"]
g=[]
for i in range(l):
x=re.findall(r'\w+',l[i])
g.append(x)
print(g[0][0]+g[1][0])

Problems computing the score of a pairwise list in an iterative way?

Let's suppose that I have the following lists (actually they have a lot of sublists):
list_1 = [['Hi my name is anon'],
['Hi I like #hokey']]
list_2 = [['Hi my name is anon_2'],
['Hi I like #Basketball']]
I would like to compute the distance of all the possible pairwises with no repetetions (combinations without replacement, product?). For example:
distance between: ['Hi my name is anon'] and ['Hi my name is anon_2']
distance between: ['Hi my name is anon'] and ['Hi I like #Basketball']
distance between: ['Hi I like #hokey'] and ['Hi my name is anon_2']
distance between: ['Hi I like #hokey'] and ['Hi I like #Basketball']
And place the scores into a list like this:
[distance_1,distance_2,distance_3,distance_4]
For this I was thinking on using itertools product or combination. This is what I tried:
strings_1 = [i[0] for i in list_1]
strings_2 = [i[0] for i in list_2]
import itertools
scores_list = [dis.jaccard(i,j) for i,j in zip(itertools.combinations(strings_1, strings_2))]
The problem is I am getting this traceback:
scores_list = [dis.jaccard(i,j) for i,j in zip(itertools.combinations(strings_1, strings_2))]
TypeError: an integer is required
How can I do efficientely this task and how can I compute this product-combination-like operation?
You need to use itertools.product to get the cartesian product, like this
[dis.jaccrd(string1, string2) for string1, string2 in product(list_1, list_2)]
The product will group the items, like this
>>> from pprint import pprint
>>> pprint(list(product(list_1, list_2)))
[(['Hi my name is anon'], ['Hi my name is anon_2']),
(['Hi my name is anon'], ['Hi I like #Basketball']),
(['Hi I like #hokey'], ['Hi my name is anon_2']),
(['Hi I like #hokey'], ['Hi I like #Basketball'])]
If you want to apply the jaccrd function only to the strings within the lists, then you might want to preprocess the lists, like this
>>> list_11 = [item for items in list_1 for item in items]
>>> list_21 = [item for items in list_2 for item in items]
>>> pprint([str1 + " " + str2 for str1, str2 in product(list_11, list_21)])
['Hi my name is anon Hi my name is anon_2',
'Hi my name is anon Hi I like #Basketball',
'Hi I like #hokey Hi my name is anon_2',
'Hi I like #hokey Hi I like #Basketball']
>>> pprint([dis.jaccard(str1, str2) for str1, str2 in product(list_11, list_21)])
...
...
As suggested by Ashwini in the comments, for your case, you can directly use itertools.starmap, like this
>>> from itertools import product, starmap
>>> list(starmap(dis.jaccrd, product(list_11, list_21)))
For example,
>>> list_1 = ["a1", "a2", "a3"]
>>> list_2 = ["b1", "b2", "b3"]
>>> from itertools import product, starmap
>>> list(starmap(lambda x, y: x + " " + y, product(list_1, list_2)))
['a1 b1', 'a1 b2', 'a1 b3', 'a2 b1', 'a2 b2', 'a2 b3', 'a3 b1', 'a3 b2', 'a3 b3']
product works, but since you just have pairs, this works as well:
[dis.jaccard(string1, string2) for string1 in list_1 for string2 in list_2]
That said, the starmap + product combination of course wins.

how to test if there is any word matched the string in Python

I want to write a python program to test if there are any phrase can match the string using python.
string ='I love my travel all over the world'
list =['I love','my travel','all over the world']
So I want to text if there are any one of list can match that string that can print 'I love' or 'my travel','all over the world'.
any(x in string for x in list)
Or I need to use text mining to solve the problem?
Your current solution is probably the best to use in this given scenario. You could encapsulate it as a function if you wanted.
def list_in_string(slist, string):
return any(x in string for x in slist_list)
You can't do this:
if any(x in string for x in word_list)
print x
Because the any function iterates through the entire string/list, discards the x variable, and then simply returns a Boolean (True or False).
You can however, just break apart your any function so that you can get your desired output.
string ='I love traveling all over the world'
word_list =['I love','traveling','all over the world']
for x in word_list:
if x in string:
print x
This will output:
>>>
I love
traveling
all over the world
>>>
Update using string.split() :
string =['I', 'love','traveling','all', 'over', 'the', 'world']
word_list =['I love','traveling','all over the world']
count=0
for x in word_list:
for y in x.split():
if y in string:
count+=1
if count==len(x.split()) and (' ' in x) == True:
print x
count=0
This will output:
>>>
I love
all over the world
>>>
If you want a True or False returned, you can definitely use any(), for example:
>>> string = 'I love my travel all over the world'
>>> list_string =['I love',
'my travel',
'all over the world',
'Something something',
'blah']
>>> any(x for x in list_string if x in string)
True
>>>
Otherwise, you could do some simple list comprehension:
>>> string ='I love my travel all over the world'
>>> list_string =['I love',
'my travel',
'all over the world',
'Something something',
'blah']
>>> [x for x in list_string if x in string]
['I love', 'my travel', 'all over the world']
>>>
Depending on what you want returned, both of these work perfectly.
You could also probably use regular expression, but it's a little overkill for something so simple.
For completeness, one may mention the find method:
_string ='I love my travel all over the world'
_list =['I love','my travel','all over the world','spam','python']
for i in range(len(_list)):
if _string.find(_list[i]) > -1:
print _list[i]
Which outputs:
I love
my travel
all over the world
Note: this solution is not as elegant as the in usage mentioned, but may be useful if the position of the found substring is needed.

Capitalize a substring within a string

I'm trying to create something like:
string: How do you do today?
substring: o
>>> hOw dO yOu dO tOday?
I've already written the rest of the code (prompting for strings etc.), I am just stuck on having to capitalize the substring within the string.
>>> s='How do you do today?'
>>> sub_s='o'
>>> s.replace(sub_s, sub_s.upper())
'HOw dO yOu dO tOday?'
And can get more complicated if you only want to change some (i.e., the 2nd one), one liner:
>>> ''.join([item.upper() if i==[idx for idx, w in enumerate(s) if w==sub_s][1] else item for i, item in enumerate(s)])
'How dO you do today?'

Append List with reg. expression

I am trying to append a list (null) with "sentences" which have # (Hashtags) from a different list.
Currently my code is giving me a new list with length of total number of elements involved in the list and not single sentences.
The code snippet is given below
import re
old_list = ["I love #stackoverflow because #people are very #helpful!","But I dont #love hastags",
"So #what can you do","Some simple senetnece","where there is no hastags","however #one can be good"]
new_list = [ ]
for tt in range(0,len(s)):
for ui in s:
if bool(re.search(r"#(\w+)",s[tt])) == True :
njio.append(s[tt])
Please let me know how to append only the single sentence.
I am not sure what you are wanting for output, but this will preserve the original sentence along with its matching set of hashtags:
>>> import re
>>> old_list = ["I love #stackoverflow because #people are very #helpful!","But I dont #love hastags",
... "So #what can you do","Some simple senetnece","where there is no hastags","however #one can be good"]
>>> hash_regex = re.compile('#(\w+)')
>>> [(hash_regex.findall(l), l) for l in old_list]
[(['stackoverflow', 'people', 'helpful'], 'I love #stackoverflow because #people are very #helpful!'), (['love'], 'But I dont #love hastags'), (['what'], 'So #what can you do'), ([], 'Some simple senetnece'), ([], 'where there is no hastags'), (['one'], 'however #one can be good')]

Categories

Resources