In how many lists does the word appear? - python

I have different lists in python:
list1 = [hello,there,hi]
list2 = [my,name,hello]
I need to make a dictionary with the key being the number of lists a word appears in. So my answer would look like
{2:hello,1:hi ....}
I am new to python and I have no idea how to do this.

You need to use a dictionary to store key-value results.
Here is some code to help you get started, but you'll has to modify to your exact solution.
#!/usr/bin/python
list1 = ["hello","there","hi"]
list2 = ["my","name","hello"]
result = dict()
for word in list1:
if word in result.keys():
result[word] = result[word] + 1
else:
result[word] = 1
for word in list2:
if word in result.keys():
result[word] = result[word] + 1
else:
result[word] = 1
print result

As first step, make reverse dictionary like so
initialize it
words_count = {}
and then for each list of words do like so
for word in list_of_words:
if not word in words_count:
words_count[word] = 1
else:
words_count[word] += 1
then reverse words_count like so:
inv_words_count = {v: k for k, v in words_count.items()}
inv_words_count is the desired result

I have slightly modified your input lists (list1 & list2) as shown below:
list1 = ['hello,there,hi'] # Added quotes as it is a string
list2 = ['my,name,hello']
Here is the logic:
list1 = list1[0].split(',')
list2 = list2[0].split(',')
list_final = list1 + list2
dict_final = {}
for item in list_final:
if item in dict_final.keys():
dict_final.update({item:(dict_final.get(item) + 1)})
else:
dict_final.update({item:1})
Hope it will work as you are expecting :)

Related

python list of lists contain substring

I have the list_of_lists and I need to get the string that contains 'height' in the sublists and if there is no height at all I need to get 'nvt' for the whole sublist.
I have tried the following:
list_of_lists = [['width=9','length=3'],['width=6','length=4','height=4']]
_lists = []
for list in list_of_lists:
list1 = []
for st in list:
if ("height" ) in st:
list1.append(st)
else:
list1.append('nvt')
_lists.append(list1)
OUT = _lists
the result I need to have is :
_lists = ['nvt', 'height=4']
what I'm getting is:
_lists = [['nvt','nvt'],['nvt','nvt','height=4']]
This is a good case for implementing a for/else construct as follows:
list_of_lists = [['width=9','length=3'],['width=6','length=4','height=4']]
result = []
for e in list_of_lists:
for ss in e:
if ss.startswith('height'):
result.append(ss)
break
else:
result.append('nvt')
print(result)
Output:
['nvt', 'height=4']
Note:
This could probably be done with a list comprehension but I think this is more obvious and probably has no significant difference in terms of performance
This should work, you can assign height variable to first value in the sublist where s.startswith("height") is True, and if nothing matches this filter, you can assign height to 'nvt'.
_lists = []
for sublist in list_of_lists:
height = next(filter(lambda s: s.startswith("height"), sublist), 'nvt')
_lists.append(height)
And if you wish to be crazy, you can use list comprehension to reduce the code to the:
_lists = [next(filter(lambda s: s.startswith("height"), sublist), 'nvt') for sublist in list_of_lists]
Try this (Python 3.x):
import re
list_of_lists = [['width=9','length=3'],['width=6','length=4','height=4']]
_lists = []
r = re.compile("height=")
for li in list_of_lists:
match = list(filter(r.match, li))
if len(match) > 0:
_lists.extend(match)
else:
_lists.append('nvt')
OUT = _lists
print(OUT)

How to extract a string contained in nested list?

Please help me out to extract the string containing particular text. I have tried with below:
lst = [['abc', 'abgoodhj', 'rygbadkk'], ['jhjbadnm'], ['hjhj', 'iioytu'], ['hjjh', 'ghjgood1hj', 'jjkkbadgghhj', 'hjhgkll']]
for lst1 in lst:
good_wrd = [txt for txt in lst1 if txt.contains('good')]
bad_wrd = [txt for txt in lst1 if txt.contains('bad')]
I want the words that contain good and bad.
use list comprehension to create a new list.
good_wrd = [
word
for sub_lst in lst
for word in sub_lst
if "good" in word
]
bad_wrd = [
word
for sub_lst in lst
for word in sub_lst
if "bad" in word
]
Alternatively using for loops:
good_wrd = []
bad_wrd = []
for sub_lst in lst:
for word in sub_lst:
if "bad" in word:
bad_wrd.append(word)
elif "good" in word:
good_wrd.append(word)
This would work:
lst = [['abc', 'abgoodhj', 'rygbadkk'], ['jhjbadnm'], ['hjhj', 'iioytu'], ['hjjh', 'ghjgood1hj', 'jjkkbadgghhj', 'hjhgkll']]
good_wrd = []
bad_wrd = []
for lst1 in lst:
good_wrd.extend([txt for txt in lst1 if 'good' in txt])
bad_wrd.extend([txt for txt in lst1 if 'bad' in txt])
print(good_wrd)
print(bad_wrd)
target1 = 'good'
target2 = 'bad'
goods = []
bads = []
for lis in lst:
for txt in lis:
if target1 in txt:
goods.append(txt)
elif target2 in txt:
bads.append(txt)

Removal of Duplicates from a dictionary

I have made an anagram below from a words.txt file.
with open('words.txt', 'r') as read:
line = read.readlines()
def make_anagram_dict(line):
word_list = {}
for word in line:
word = word.lower()
key = ''.join(sorted(word))
if key in word_list and len(word) > 5 and word not in word_list:
word_list[key].append(word)
else:
word_list[key] = [word]
return word_list
if __name__ == '__main__':
word_list = make_anagram_dict(line)
for key, words in word_list.items():
if len(words) >:
print('Key value' + ' '*len(key) + '| words')
print(key + ' '*len(key) + ':' + str(words))
print('---------------------------------------------')
The output I get looks like this (on a random part)
Key value | words
hortwy :['worthy\n', 'wrothy\n']
---------------------------------------------
But I also get output like this (the duplicate part i am trying to fix)
Key value | words
eipprz :['zipper\n', 'zipper\n']
---------------------------------------------
the problem is that in the words.txt file, It coins duplicates except for the capital letter at the start:
i.e Zipper and zipper. It therefore creates an anagram of zipper, when it shouldn't. I tried to fix it with the part in bold. I would really appreciate any help!
Going from list to set and then back to list removes duplicates. To apply this to the values of a dictionary use a comprehension:
new_dict = {key: list(set(value)) for key, value in old_dict.items()}
Example
my_dict = {"abc": ["abc", "acb", "bac", "bca"],
"aab": ["aab", "aba", "baa", "aab"]}
new_dict = {key: list(set(value)) for key, value in my_dict.items()}
print(new_dict)
> {'abc': ['abc', 'bca', 'bac', 'acb'], 'aab': ['aba', 'aab', 'baa']}
It would in fact by more efficient to just use set from the start. Ie:
if key in word_list and len(word) > 5 and word not in word_list:
word_list[key].add(word)
else:
word_list[key] = {word}
If you really need it as a list, you can convert back at the end. I'm not 100% sure whether this method will always preserve order however.

Sorting a list based on upper and lower case

I have a list:
List1 = ['name','is','JOHN','My']
I want to append the pronoun as the first item in a new list and append the names at last. Other items should be in the middle and their positions can change.
So far I have written:
my_list = ['name','is','JOHN','My']
new_list = []
for i in my_list:
if i.isupper():
my_list.remove(i)
new_list.append(i)
print(new_list)
Here, I can't check if an item is completely upper case or only its first letter is upper case.
Output I get:
['name','is','JOHN','My']
Output I want:
['My','name','is','JOHN']
or:
['My','is','name','JOHN']
EDIT: I have seen this post and it doesn’t have answers to my question.
i.isupper() will tell you if it's all uppercase.
To test if just the first character is uppercase and the rest lowercase, you can use i.istitle()
To make your final result, you can append to different lists based on the conditions.
all_cap = []
init_cap = []
non_cap = []
for i in my_list:
if i.isupper():
all_cap.append(i)
elif i.istitle():
init_cap.append(i)
else:
non_cap.append(i)
new_list = init_cap + non_cap + all_cap
print(new_list)
DEMO
How about this:
s = ['name', 'is', 'JOHN', 'My']
pronoun = ''
name = ''
for i in s:
if i.isupper():
name = i
if i.istitle():
pronoun = i
result = [pronoun, s[0], s[1], name]
print(result)
Don't # me pls XD. Try this.
my_list = ['name','is','JOHN','My']
new_list = ['']
for i in range(len(my_list)):
if my_list[i][0].isupper() and my_list[i][1].islower():
new_list[0] = my_list[i]
elif my_list[i].islower():
new_list.append(my_list[i])
elif my_list[i].isupper():
new_list.append(my_list[i])
print(new_list)

Filter a list of strings by frequency

I have a list of strings:
a = ['book','book','cards','book','foo','foo','computer']
I want to return anything in this list that's x > 2
Final output:
a = ['book','book','book']
I'm not quite sure how to approach this. But here's two methods I had in mind:
Approach One:
I've created a dictionary to count the number of times an item appears:
a = ['book','book','cards','book','foo','foo','computer']
import collections
def update_item_counts(item_counts, itemset):
for a in itemset:
item_counts[a] +=1
test = defaultdict(int)
update_item_counts(test, a)
print(test)
Out: defaultdict(<class 'int'>, {'book': 3, 'cards': 1, 'foo': 2, 'computer': 1})
I want to filter out the list with this dictionary but I'm not sure how to do that.
Approach two:
I tried to write a list comprehension but it doesn't seem to work:
res = [k for k in a if a.count > 2 in k]
A very barebone answer is that you should replace a.count by a.count(k) in your second solution.
Although, do not attempt to use list.count for this, as this will traverse the list for each item. Instead count occurences first with collections.Counter. This has the advantage of traversing the list only once.
from collections import Counter
from itertools import repeat
a = ['book','book','cards','book','foo','foo','computer']
count = Counter(a)
output = [word for item, n in count.items() if n > 2 for word in repeat(item, n)]
print(output) # ['book', 'book', 'book']
Note that the list comprehension is equivalent to the loop below.
output = []
for item, n in count.items():
if n > 2:
output.extend(repeat(item, n))
Try this:
a_list = ['book','book','cards','book','foo','foo','computer']
b_list = []
for a in a_list:
if a_list.count(a) > 2:
b_list.append(a)
print(b_list)
# ['book', 'book', 'book']
Edit: You mentioned list comprehension. You are on the right track! You can do it with list comprehension like this:
a_list = ['book','book','cards','book','foo','foo','computer']
c_list = [a for a in a_list if a_list.count(a) > 2]
Good luck!
a = ['book','book','cards','book','foo','foo','computer']
list(filter(lambda s: a.count(s) > 2, a))
Your first attempt builds a dictionary with all of the counts. You need to take this a step further to get the items that you want:
res = [k for k in test if test[k] > 2]
Now that you have built this by hand, you should check out the builtin Counter class that does all of the work for you.
If you just want to print there are better answers already, if you want to remove you can try this.
a = ['book','book','cards','book','foo','foo','computer']
countdict = {}
for word in a:
if word not in countdict:
countdict[word] = 1
else:
countdict[word] += 1
for x, y in countdict.items():
if (2 >= y):
for i in range(y):
a.remove(x)
You can try this.
def my_filter(my_list, my_freq):
'''Filter a list of strings by frequency'''
# use set() to unique my_list, then turn set back to list
unique_list = list(set(my_list))
# count frequency in unique_list
frequencies = []
for value in unique_list:
frequencies.append(my_list.count(value))
# filter frequency
return_list = []
for i, frequency in enumerate(frequencies):
if frequency > my_freq:
for _ in range(frequency):
return_list.append(unique_list[i])
return return_list
a = ['book','book','cards','book','foo','foo','computer']
my_filter(a, 2)
['book', 'book', 'book']

Categories

Resources