I want to know how to retrieve the first word at list index.
For example, if the list is:
['hello world', 'how are you']
Is there a way to get x = "hello how"?
Here is what I've tried so far (newfriend is the list):
x=""
for values in newfriend:
values = values.split()
values = ''.join(values.split(' ', 1)[0])
x+=" ".join(values)
x+="\n"
A simple generator expression would do, I guess, e.g.
>>> l = ["hello world", "how are you"]
>>> ' '.join(x.split()[0] for x in l)
'hello how'
You're not far off. Here is how I would do it.
# Python 3
newfriend = ['hello world', 'how are you']
x = [] # Create x as an empty list, rather than an empty string.
for v in newfriend:
x.append(v.split(' ')[0]) # Append first word of each phrase to the list.
y = ' '.join(x) # Join the list.
print(y)
import re
#where l =["Hello world","hi world"]
g=[]
for i in range(l):
x=re.findall(r'\w+',l[i])
g.append(x)
print(g[0][0]+g[1][0])
Related
For example my list is
lst=['hello','world','this','is','hello','world','world','hello']
subString=['hello','world']
The result I'm looking for is in this case is 2 since the list ['hello','world'] occurs twice with that same order.
I tried doing
list(filter(lambda x : x in substring,lst))
but that returns all of hello and world
You could use " ".join() on both lists to create a string and then use str.count() to count the number of occurrences of subString in lst
lst=['hello','world','this','is','hello','world','world','hello']
subString=['hello','world']
l = " ".join(lst)
s = " ".join(subString)
count = l.count(s)
print("Joined list:", l)
print("Joined substring:", s)
print("occurrences:", count)
outputs:
Joined list: hello world this is hello world world hello
Joined substring: hello world
occurrences: 2
Using the window generator from this answer and a Counter, this can be expressed as:
from collections import Counter
lst=['hello','world','this','is','hello','world','world','hello']
subString=('hello','world')
counts = Counter(window(lst, len(subString)))
print(counts[subString])
# 2
If you want to skip the Counter, you could do
print(sum(x == subString for x in window(lst, len(subString))))
You can join the elements into a list of lists and then filter by those that match your substring array.
joinedWords = [lst[n:n + len(subString)] for n in range(0, len(lst), len(subString))]
# => [['hello', 'world'], ['this', 'is'], ['hello', 'world'], ['world', 'hello']]
filtered = list(filter(lambda x: x == subString, joinedWords))
print(len(filtered)) # 2
Since all elements are string in this case, I'd create a string from each list, then count the occurrences of the second string in the first string:
lst=['hello','world','this','is','hello','world','world','hello']
subString = ['hello','world']
s = ' '.join(lst)
subs = ' '.join(subString)
print(s.count(subs))
I am trying to to write a code that takes input, splits it into a list, and then assigns a number to each string in the list.
For example: If my input was "hello hello goodbye hello" Then it would output "1121".
This is the code I have written so far:
sentence = input("What is your sentence? ").lower()
list_s = str.split(sentence)
for i in range(len(list_s)):
Use a dictionary containing the strings as keys and the number as value:
inp = 'hello hello world hello'
d = {}
idx = 1
for word in inp.lower().split():
# If the word is in the dictionary print the corresponding number
if word in d:
print(d[word], end='')
# Otherwise print the next number and add the number to the dictionary
else:
print(idx, end='')
d[word] = idx
idx += 1
I proposed the dictionary because the in operation is very fast and one needs to store 2 things: The string and the value. Making the dictionary a very suitable data structure for that problem.
If you like it shorter you could use a list comprehension (using the len(d)+1 "trick" from trincots answer):
inp = 'hello hello world hello'
d = {}
result = [d.setdefault(word, len(d)+1) for word in inp.lower().split()]
Where result would be a list containing [1, 1, 2, 1] which you could print using:
print(''.join(map(str, result)))
Or (thanks Stefan Pochmann for the hint):
print(*result, sep="")
As MSeifert, I would suggest a dictionary. You can produce the assigned number from the size of the dictionary at the moment a new word is added. The final result can be produced as a list with list comprehension:
list_s = str.split("hello hello goodbye hello")
d = {} # a dictionary, which will store for each word a number
for word in list_s:
if not word in d: # first time we see this word?
d[word] = len(d)+1 # assign to this word a unique number
result = [d[word] for word in list_s] # translate each word to its unique number
print (result) # [1, 1, 2, 1]
You can use a set:
def get_numbers(s):
s = s.split()
new_s = [a for i, a in enumerate(s) if a not in s[:i]]
return [new_s.index(i)+1 for i in s]
l = ["hello hello goodbye hello", "hello goodbye hello hello"]
final_l = list(map(get_numbers, l))
Output:
[[1, 1, 2, 1], [1, 2, 1, 1]]
I'm writing a function to implement the solution to finding the number of times a word occurs in a list of elements, retrieved from a text file which is pretty straightforward to achieve.
However, I have been at it for two days trying to figure out how to check occurrences of a string which contains multiple words, can be two or more
So for example say the string is:
"hello bye"
and the list is:
["car", "hello","bye" ,"hello"]
The function should return the value 1 because the elements "hello" and "bye" only occur once consecutively.
The closest I've gotten to the solution is using
words[0:2] = [' '.join(words[0:2])]
which would join two elements together given the index. This however is wrong as the input given will be the element itself rather than an index.
Can someone point me to the right direction?
Two possibilities.
## laboriously
lookFor = 'hello bye'
words = ["car", "hello","bye" ,"hello", 'tax', 'hello', 'horn', 'hello', 'bye']
strungOutWords = ' '.join(words)
count = 0
p = 0
while True:
q = strungOutWords [p:].find(lookFor)
if q == -1:
break
else:
p = p + q + 1
count += 1
print (count)
## using a regex
import re
print (len(re.compile(lookFor).findall(strungOutWords)))
Match the string with the join of the consecutive elements in the main list. Below is the sample code:
my_list = ["car", "hello","bye" ,"hello"]
sentence = "hello bye"
word_count = len(sentence.split())
c = 0
for i in range(len(my_list) - word_count + 1):
if sentence == ' '.join(my_list[i:i+word_count]):
c+=1
Final value hold by c will be:
>>> c
1
If you are looking for a one-liner, you may use zip and sum as:
>>> my_list = ["car", "hello","bye" ,"hello"]
>>> sentence = "hello bye"
>>> words = sentence.split()
>>> sum(1 for i in zip(*[my_list[j:] for j in range(len(words))]) if list(i) == words)
1
Let's split this problem in two parts. First, we establish a function that will return ngrams of a given list, that is sublists of n consecutive elements:
def ngrams(l, n):
return list(zip(*[l[i:] for i in range(n)]))
We can now get 2, 3 or 4-grams easily:
>>> ngrams(["car", "hello","bye" ,"hello"], 2)
[('car', 'hello'), ('hello', 'bye'), ('bye', 'hello')]
>>> ngrams(["car", "hello","bye" ,"hello"], 3)
[('car', 'hello', 'bye'), ('hello', 'bye', 'hello')]
>>> ngrams(["car", "hello","bye" ,"hello"], 4)
[('car', 'hello', 'bye', 'hello')]
Each item is made into a tuple.
Now make the phrase 'hello bye' into a tuple:
>>> as_tuple = tuple('hello bye'.split())
>>> as_tuple
('hello', 'bye')
>>> len(as_tuple)
2
Since this has 2 words, we need to generate bigrams from the sentence, and count the number of matching bigrams. We can generalize all this to
def ngrams(l, n):
return list(zip(*[l[i:] for i in range(n)]))
def count_occurrences(sentence, phrase):
phrase_as_tuple = tuple(phrase.split())
sentence_ngrams = ngrams(sentence, len(phrase_as_tuple))
return sentence_ngrams.count(phrase_as_tuple)
print(count_occurrences(["car", "hello","bye" ,"hello"], 'hello bye'))
# prints 1
I would suggest reducing the problem into counting occurrences of a string within another string.
words = ["hello", "bye", "hello", "car", "hello ", "bye me", "hello", "carpet", "shoplifter"]
sentence = "hello bye"
my_text = " %s " % " ".join([item for sublist in [x.split() for x in words] for item in sublist])
def count(sentence):
my_sentence = " %s " % " ".join(sentence.split())
return my_text.count(my_sentence)
print count("hello bye")
>>> 2
print count("pet shop")
>>> 0
Hi I have the following 2 lists and I want to get a 3rd updated list basically such that if any of the strings from the list 'wrong' appears in the list 'old' it filters out that entire line of string containing it. ie I want the updated list to be equivalent to the 'new' list.
wrong = ['top up','national call']
old = ['Hi Whats with ','hola man top up','binga dingo','on a national call']
new = ['Hi Whats with', 'binga dingo']
You can use filter:
>>> list(filter(lambda x:not any(w in x for w in wrong), old))
['Hi Whats with ', 'binga dingo']
Or, a list comprehension,
>>> [i for i in old if not any(x in i for x in wrong)]
['Hi Whats with ', 'binga dingo']
If you're not comfortable with any of those, use a simple for loop based solution like below:
>>> result = []
>>> for i in old:
... for x in wrong:
... if x in i:
... break
... else:
... result.append(i)
...
>>> result
['Hi Whats with ', 'binga dingo']
>>> wrong = ['top up','national call']
>>> old = ['Hi Whats with ','hola man top up','binga dingo','on a national call']
>>> [i for i in old if all(x not in i for x in wrong)]
['Hi Whats with ', 'binga dingo']
>>>
I want to write a python program to test if there are any phrase can match the string using python.
string ='I love my travel all over the world'
list =['I love','my travel','all over the world']
So I want to text if there are any one of list can match that string that can print 'I love' or 'my travel','all over the world'.
any(x in string for x in list)
Or I need to use text mining to solve the problem?
Your current solution is probably the best to use in this given scenario. You could encapsulate it as a function if you wanted.
def list_in_string(slist, string):
return any(x in string for x in slist_list)
You can't do this:
if any(x in string for x in word_list)
print x
Because the any function iterates through the entire string/list, discards the x variable, and then simply returns a Boolean (True or False).
You can however, just break apart your any function so that you can get your desired output.
string ='I love traveling all over the world'
word_list =['I love','traveling','all over the world']
for x in word_list:
if x in string:
print x
This will output:
>>>
I love
traveling
all over the world
>>>
Update using string.split() :
string =['I', 'love','traveling','all', 'over', 'the', 'world']
word_list =['I love','traveling','all over the world']
count=0
for x in word_list:
for y in x.split():
if y in string:
count+=1
if count==len(x.split()) and (' ' in x) == True:
print x
count=0
This will output:
>>>
I love
all over the world
>>>
If you want a True or False returned, you can definitely use any(), for example:
>>> string = 'I love my travel all over the world'
>>> list_string =['I love',
'my travel',
'all over the world',
'Something something',
'blah']
>>> any(x for x in list_string if x in string)
True
>>>
Otherwise, you could do some simple list comprehension:
>>> string ='I love my travel all over the world'
>>> list_string =['I love',
'my travel',
'all over the world',
'Something something',
'blah']
>>> [x for x in list_string if x in string]
['I love', 'my travel', 'all over the world']
>>>
Depending on what you want returned, both of these work perfectly.
You could also probably use regular expression, but it's a little overkill for something so simple.
For completeness, one may mention the find method:
_string ='I love my travel all over the world'
_list =['I love','my travel','all over the world','spam','python']
for i in range(len(_list)):
if _string.find(_list[i]) > -1:
print _list[i]
Which outputs:
I love
my travel
all over the world
Note: this solution is not as elegant as the in usage mentioned, but may be useful if the position of the found substring is needed.