Finding words of the same length [closed] - python

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed last month.
Improve this question
I am new to programming and I am trying to figure out this problem. I have a list of elements and I want to find words that have the same length. This is what I've tried:
list1 = ["dog","cat","people","tank","pop","joop","count"]
list3 = []
for i in range(len(list1)):
for j in range(len(list1):
if len(list1[i]) == len(list1[j]):
list3.append(list[i])
return list3
I want to return list3 = [ "dog","cat","joop","tank"] because each word in this last has the same length as at least one other word in the list.

If we understand your question by now, you do want to group all same-size words? IF so, then you could try this defaultdict from collections module:
If this is not what you expect, then please make the goal clear.
L = ['dog', 'cat', 'bike', 'book', 'packet']
from collections import defaultdict
ddc = defaultdict(list)
for item in L:
size = len(item) # find the size of each word
ddc[size].append(item) # then group them together by size
print(ddc)
defaultdict(<class 'list'>, {3: ['dog', 'cat'], 4: ['bike', 'book'], 6: ['packet']})

My advice for situations like this is to use a map rather than using a list:
list1 = ["dog","cat","people","tank","pop","joop","count"]
results = {}
for item in list1:
length = len(item)
if length in results:
results[length].append(item)
else:
results[length] = [item]
print(results) # {3: ['dog', 'cat', 'pop'], 6: ['people'], 4: ['tank', 'joop'], 5: ['count']}
This works by iterating over all words in list1, getting the number of characters in each and adding it to the dictionary entry associated with that length. You can then filter that dictionary for entries that contain more than one word:
import itertools
filtered = list(itertools.chain(*filter(lambda v: len(v) > 1, results.values())))
print(filtered) # ['dog', 'cat', 'pop', 'tank', 'joop']
This code works by calling the filter function with a lambda that checks if the value associated with each key (length) has more than one value, and then chaining these sub-lists together into a single list, which is returned. For more information on how itertools.chain works and why I included the *, see this answer.

Another solution would be to do something like this:
def find_multiples(words):
lengths = [len(word) for word in words]
multiples = [word for word in words if lengths.count(len(word)) > 1]
return multiples
After doing some testing on this, the average time for running this function 10 times was 1.629 seconds, if speed is something you would like, and it is fairly compact.

If it is possible the same word appears multiple times, you might choose this solution:
def tryit():
list1 = ["dog","cat","people","tank","pop","joop","count"]
mydict = {}
l = len(list1)
for i in range(l):
vi = list1[i]
li = len(vi)
for j in range(l):
vj=list1[j]
lj = len(vj)
if li == lj and i != j :
if li in mydict:
additem(mydict[li],vi)
additem(mydict[li],vj)
else:
mydict[li] = [vi,vj]
return mydict
def additem(list,v):
if v in list:
return # don't place duplicates in list
else:
list.append(v)
# This depends on the fact that list is mutable, so
# the passed list is updated.
if __name__ == '__main__':
print(tryit()) enter code here
Result:
{3: ['dog', 'cat', 'pop'], 4: ['tank', 'joop']}

Related

Filter a list of strings by frequency

I have a list of strings:
a = ['book','book','cards','book','foo','foo','computer']
I want to return anything in this list that's x > 2
Final output:
a = ['book','book','book']
I'm not quite sure how to approach this. But here's two methods I had in mind:
Approach One:
I've created a dictionary to count the number of times an item appears:
a = ['book','book','cards','book','foo','foo','computer']
import collections
def update_item_counts(item_counts, itemset):
for a in itemset:
item_counts[a] +=1
test = defaultdict(int)
update_item_counts(test, a)
print(test)
Out: defaultdict(<class 'int'>, {'book': 3, 'cards': 1, 'foo': 2, 'computer': 1})
I want to filter out the list with this dictionary but I'm not sure how to do that.
Approach two:
I tried to write a list comprehension but it doesn't seem to work:
res = [k for k in a if a.count > 2 in k]
A very barebone answer is that you should replace a.count by a.count(k) in your second solution.
Although, do not attempt to use list.count for this, as this will traverse the list for each item. Instead count occurences first with collections.Counter. This has the advantage of traversing the list only once.
from collections import Counter
from itertools import repeat
a = ['book','book','cards','book','foo','foo','computer']
count = Counter(a)
output = [word for item, n in count.items() if n > 2 for word in repeat(item, n)]
print(output) # ['book', 'book', 'book']
Note that the list comprehension is equivalent to the loop below.
output = []
for item, n in count.items():
if n > 2:
output.extend(repeat(item, n))
Try this:
a_list = ['book','book','cards','book','foo','foo','computer']
b_list = []
for a in a_list:
if a_list.count(a) > 2:
b_list.append(a)
print(b_list)
# ['book', 'book', 'book']
Edit: You mentioned list comprehension. You are on the right track! You can do it with list comprehension like this:
a_list = ['book','book','cards','book','foo','foo','computer']
c_list = [a for a in a_list if a_list.count(a) > 2]
Good luck!
a = ['book','book','cards','book','foo','foo','computer']
list(filter(lambda s: a.count(s) > 2, a))
Your first attempt builds a dictionary with all of the counts. You need to take this a step further to get the items that you want:
res = [k for k in test if test[k] > 2]
Now that you have built this by hand, you should check out the builtin Counter class that does all of the work for you.
If you just want to print there are better answers already, if you want to remove you can try this.
a = ['book','book','cards','book','foo','foo','computer']
countdict = {}
for word in a:
if word not in countdict:
countdict[word] = 1
else:
countdict[word] += 1
for x, y in countdict.items():
if (2 >= y):
for i in range(y):
a.remove(x)
You can try this.
def my_filter(my_list, my_freq):
'''Filter a list of strings by frequency'''
# use set() to unique my_list, then turn set back to list
unique_list = list(set(my_list))
# count frequency in unique_list
frequencies = []
for value in unique_list:
frequencies.append(my_list.count(value))
# filter frequency
return_list = []
for i, frequency in enumerate(frequencies):
if frequency > my_freq:
for _ in range(frequency):
return_list.append(unique_list[i])
return return_list
a = ['book','book','cards','book','foo','foo','computer']
my_filter(a, 2)
['book', 'book', 'book']

How to create a frequency matrix?

I just started using Python and I just came across the following problem:
Imagine I have the following list of lists:
list = [["Word1","Word2","Word2","Word4566"],["Word2", "Word3", "Word4"], ...]
The result (matrix) i want to get should look like this:
The Displayed Columns and Rows are all appearing words (no matter which list).
The thing that I want is a programm that counts the appearence of words in each list (by list).
The picture is the result after the first list.
Is there an easy way to achieve something like this or something similar?
EDIT:
Basically I want a List/Matrix that tells me how many times words 2-4566 appeared when word 1 was also in the list, and so on.
So I would get a list for each word that displays the absolute frequency of all other 4555 words in relationship with this word.
So I would need an algorithm that iterates through all this lists of words and builts the result lists
As far as I understand you want to create a matrix that shows the number of lists where two words are located together for each pair of words.
First of all we should fix the set of unique words:
lst = [["Word1","Word2","Word2","Word4566"],["Word2", "Word3", "Word4"], ...] # list is a reserved word in python, don't use it as a name of variables
words = set()
for sublst in lst:
words |= set(sublst)
words = list(words)
Second we should define a matrix with zeros:
result = [[0] * len(words)] * len(words) # zeros matrix N x N
And finally we fill the matrix going through the given list:
for sublst in lst:
sublst = list(set(sublst)) # selecting unique words only
for i in xrange(len(sublst)):
for j in xrange(i + 1, len(sublst)):
index1 = words.index(sublst[i])
index2 = words.index(sublst[j])
result[index1][index2] += 1
result[index2][index1] += 1
print result
I find it really hard to understand what you're really asking for, but I'll try by making some assumptions:
(1) You have a list (A), containing other lists (b) of multiple words (w).
(2) For each b-list in A-list
(3) For each w in b:
(3.1) count the total number of appearances of w in all of the b-lists
(3.2) count how many of the b-lists, in which w appears only once
If these assumptions are correct, then the table doesn't correspond correctly to the list you've provided. If my assumptions are wrong, then I still believe my solution may give you inspiration or some ideas on how to solve it correctly. Finally, I do not claim my solution to be optimal with respect to speed or similar.
OBS!! I use python's built-in dictionaries, which may become terribly slow if you intend to fill them with thousands of words!! Have a look at: https://docs.python.org/2/tutorial/datastructures.html#dictionaries
frq_dict = {} # num of appearances / frequency
uqe_dict = {} # unique
for list_b in list_A:
temp_dict = {}
for word in list_b:
if( word in temp_dict ):
temp_dict[word]+=1
else:
temp_dict[word]=1
# frq is the number of appearances
for word, frq in temp_dict.iteritems():
if( frq > 1 ):
if( word in frq_dict )
frq_dict[word] += frq
else
frq_dict[word] = frq
else:
if( word in uqe_dict )
uqe_dict[word] += 1
else
uqe_dict[word] = 1
I managed to come up with the right answer to my own question:
list = [["Word1","Word2","Word2"],["Word2", "Word3", "Word4"],["Word2","Word3"]]
#Names of all dicts
all_words = sorted(set([w for sublist in list for w in sublist]))
#Creating the dicts
dicts = []
for i in all_words:
dicts.append([i, dict.fromkeys([w for w in all_words if w != i],0)])
#Updating the dicts
for l in list:
for word in sorted(set(l)):
tmpL = [w for w in l if w != word]
ind = ([w[0] for w in dicts].index(word))
for w in dicts[ind][1]:
dicts[ind][1][w] += l.count(w)
print dicts
Gets the result:
['Word1', {'Word4': 0, 'Word3': 0, 'Word2': 2}], ['Word2', {'Word4': 1, 'Word1': 1, 'Word3': 2}], ['Word3', {'Word4': 1, 'Word1': 0, 'Word2': 2}], ['Word4', {'Word1': 0, 'Word3': 1, 'Word2': 1}]]

Creating a dictionary where the key is an integer and the value is the length of a random sentence

Super new to to python here, I've been struggling with this code for a while now. Basically the function returns a dictionary with the integers as keys and the values are all the words where the length of the word corresponds with each key.
So far I'm able to create a dictionary where the values are the total number of each word but not the actual words themselves.
So passing the following text
"the faith that he had had had had an affect on his life"
to the function
def get_word_len_dict(text):
result_dict = {'1':0, '2':0, '3':0, '4':0, '5':0, '6' :0}
for word in text.split():
if str(len(word)) in result_dict:
result_dict[str(len(word))] += 1
return result_dict
returns
1 - 0
2 - 3
3 - 6
4 - 2
5 - 1
6 - 1
Where I need the output to be:
2 - ['an', 'he', 'on']
3 - ['had', 'his', 'the']
4 - ['life', 'that']
5 - ['faith']
6 - ['affect']
I think I need to have to return the values as a list. But I'm not sure how to approach it.
I think that what you want is a dic of lists.
result_dict = {'1':[], '2':[], '3':[], '4':[], '5':[], '6' :[]}
for word in text.split():
if str(len(word)) in result_dict:
result_dict[str(len(word))].append(word)
return result_dict
Fixing Sabian's answer so that duplicates aren't added to the list:
def get_word_len_dict(text):
result_dict = {1:[], 2:[], 3:[], 4:[], 5:[], 6 :[]}
for word in text.split():
n = len(word)
if n in result_dict and word not in result_dict[n]:
result_dict[n].append(word)
return result_dict
Check out list comprehensions
Integers are legal dictionaries keys so there is no need to make the numbers strings unless you want it that way for some other reason.
if statement in the for loop controls flow to add word only once. You could get this effect more automatically if you use set() type instead of list() as your value data structure. See more in the docs. I believe the following does the job:
def get_word_len_dict(text):
result_dict = {len(word) : [] for word in text.split()}
for word in text.split():
if word not in result_dict[len(word)]:
result_dict[len(word)].append(word)
return result_dict
try to make it better ;)
Instead of defining the default value as 0, assign it as set() and within if condition do, result_dict[str(len(word))].add(word).
Also, instead of preassigning result_dict, you should use collections.defaultdict.
Since you need non-repetitive words, I am using set as value instead of list.
Hence, your final code should be:
from collections import defaultdict
def get_word_len_dict(text):
result_dict = defaultdict(set)
for word in text.split():
result_dict[str(len(word))].add(word)
return result_dict
In case it is must that you want list as values (I think set should suffice your requirement), you need to further iterate it as:
for key, value in result_dict.items():
result_dict[key] = list(value)
What you need is a map to list-construct (if not many words, otherwise a 'Counter' would be fine):
Each list stands for a word class (number of characters). Map is checked whether word class ('3') found before. List is checked whether word ('had') found before.
def get_word_len_dict(text):
result_dict = {}
for word in text.split():
if not result_dict.get(str(len(word))): # add list to map?
result_dict[str(len(word))] = []
if not word in result_dict[str(len(word))]: # add word to list?
result_dict[str(len(word))].append(word)
return result_dict
-->
3 ['the', 'had', 'his']
2 ['he', 'an', 'on']
5 ['faith']
4 ['that', 'life']
6 ['affect']
the problem here is you are counting the word by length, instead you want to group them. You can achieve this by storing a list instead of a int:
def get_word_len_dict(text):
result_dict = {}
for word in text.split():
if len(word) in result_dict:
result_dict[len(word)].add(word)
else:
result_dict[len(word)] = {word} #using a set instead of list to avoid duplicates
return result_dict
Other improvements:
don't hardcode the key in the initialized dict but let it empty instead. Let the code add the new keys dynamically when necessary
you can use int as keys instead of strings, it will save you the conversion
use sets to avoid repetitions
Using groupby
Well, I'll try to propose something different: you can group by length using groupby from the python standard library
import itertools
def get_word_len_dict(text):
# split and group by length (you get a list if tuple(key, list of values)
groups = itertools.groupby(sorted(text.split(), key=lambda x: len(x)), lambda x: len(x))
# convert to a dictionary with sets
return {l: set(words) for l, words in groups}
You say you want the keys to be integers but then you convert them to strings before storing them as a key. There is no need to do this in Python; integers can be dictionary keys.
Regarding your question, simply initialize the values of the keys to empty lists instead of the number 0. Then, in the loop, append the word to the list stored under the appropriate key (the length of the word), like this:
string = "the faith that he had had had had an affect on his life"
def get_word_len_dict(text):
result_dict = {i : [] for i in range(1, 7)}
for word in text.split():
length = len(word)
if length in result_dict:
result_dict[length].append(word)
return result_dict
This results in the following:
>>> get_word_len_dict(string)
{1: [], 2: ['he', 'an', 'on'], 3: ['the', 'had', 'had', 'had', 'had', 'his'], 4: ['that', 'life'], 5: ['faith'], 6: ['affect']}
If you, as you mentioned, wish to remove the duplicate words when collecting your input string, it seems elegant to use a set and convert to a list as a final processing step, if this is needed. Also note the use of defaultdict so you don't have to manually initialize the dictionary keys and values as a default value set() (i.e. the empty set) gets inserted for each key that we try to access but not others:
from collections import defaultdict
string = "the faith that he had had had had an affect on his life"
def get_word_len_dict(text):
result_dict = defaultdict(set)
for word in text.split():
length = len(word)
result_dict[length].add(word)
return {k : list(v) for k, v in result_dict.items()}
This gives the following output:
>>> get_word_len_dict(string)
{2: ['he', 'on', 'an'], 3: ['his', 'had', 'the'], 4: ['life', 'that'], 5: ['faith'], 6: ['affect']}
Your code is counting the occurrence of each word length - but not storing the words themselves.
In addition to capturing each word into a list of words with the same size, you also appear to want:
If a word length is not represented, do not return an empty list for that length - just don't have a key for that length.
No duplicates in each word list
Each word list is sorted
A set container is ideal for accumulating the words - sets naturally eliminate any duplicates added to them.
Using defaultdict(sets) will setup an empty dictionary of sets -- a dictionary key will only be created if it is referenced in our loop that examines each word.
from collections import defaultdict
def get_word_len_dict(text):
#create empty dictionary of sets
d = defaultdict(set)
# the key is the length of each word
# The value is a growing set of words
# sets automatically eliminate duplicates
for word in text.split():
d[len(word)].add(word)
# the sets in the dictionary are unordered
# so sort them into a new dictionary, which is returned
# as a dictionary of lists
return {i:sorted(d[i]) for i in d.keys()}
In your example string of
a="the faith that he had had had had an affect on his life"
Calling the function like this:
z=get_word_len_dict(a)
Returns the following list:
print(z)
{2: ['an', 'he', 'on'], 3: ['had', 'his', 'the'], 4: ['life', 'that'], 5: ['faith'], 6: ['affect']}
The type of each value in the dictionary is "list".
print(type(z[2]))
<class 'list'>

Adding more than one value to dictionary when looping through string

Still super new to Python 3 and have encountered a problem... I am trying to create a function which returns a dictionary with the keys being the length of each word and the values being the words in the string.
For example, if my string is: "The dogs run quickly forward to the park", my dictionary should return
{2: ['to'] 3: ['The', 'run', 'the'], 4: ['dogs', 'park], 7: ['quickly', 'forward']}
Problem is that when I loop through the items, it is only appending one of the words in the string.
def word_len_dict(my_string):
dictionary = {}
input_list = my_string.split(" ")
unique_list = []
for item in input_list:
if item.lower() not in unique_list:
unique_list.append(item.lower())
for word in unique_list:
dictionary[len(word)] = []
dictionary[len(word)].append(word)
return (dictionary)
print (word_len_dict("The dogs run quickly forward to the park"))
The code returns
{2: ['to'], 3: ['run'], 4: ['park'], 7: ['forward']}
Can someone point me in the right direction? Perhaps not giving me the answer freely, but what do I need to look at next in terms of adding the missing words to the list. I thought that appending them to the list would do it, but it's not.
Thank you!
This will solve all your problems:
def word_len_dict(my_string):
input_list = my_string.split(" ")
unique_set = set()
dictionary = {}
for item in input_list:
word = item.lower()
if word not in unique_set:
unique_set.add(word)
key = len(word)
if key not in dictionary:
dictionary[key] = []
dictionary[key].append(word)
return dictionary
You were wiping dict entries each time you encountered a new word. There were also some efficiencly problems (searching a list for membership while growing it resulted in an O(n**2) algorithm for an O(n) task). Replacing the list membership test with a set membership test corrected the efficiency problem.
It gives the correct output for your sample sentence:
>>> print(word_len_dict("The dogs run quickly forward to the park"))
{2: ['to'], 3: ['the', 'run'], 4: ['dogs', 'park'], 7: ['quickly', 'forward']}
I noticed some of the other posted solutions are failing to map words to lowercase and/or failing to remove duplicates, which you clearly wanted.
you can create first the list of the unique words like this in order to avoid a first loop, and populate the dictionary on a second step.
unique_string = set("The dogs run quickly forward to the park".lower().split(" "))
dict = {}
for word in unique_string:
key, value = len(word), word
if key not in dict: # or dict.keys() for better readability (but is the same)
dict[key] = [value]
else:
dict[key].append(value)
print(dict)
You are assigning an empty list to the dictionary item before you append the latest word, which erases all previous words.
for word in unique_list:
dictionary[len(word)] = [x for x in input_list if len(x) == len(word)]
Your code is simply resetting the key to an empty list each time, which is why you only get one value (the last value) in the list for each key.
To make sure there are no duplicates, you can set the default value of a key to a set which is a collection that enforces uniqueness (in other words, there can be no duplicates in a set).
def word_len_dict(my_string):
dictionary = {}
input_list = my_string.split(" ")
for word in input_list:
if len(word) not in dictionary:
dictionary[len(word)] = set()
dictionary[len(word)].add(word.lower())
return dictionary
Once you add that check, you can get rid of the first loop as well. Now it will work as expected.
You can also optimize the code further, by using the setdefault method of dictionaries.
for word in input_list:
dictionary.setdefault(len(word), set()).add(word.lower())
Pythonic way,
Using itertools.groupby
>>> my_str = "The dogs run quickly forward to the park"
>>> {x:list(y) for x,y in itertools.groupby(sorted(my_str.split(),key=len), key=lambda x:len(x))}
{2: ['to'], 3: ['The', 'run', 'the'], 4: ['dogs', 'park'], 7: ['quickly', 'forward']}
This option starts by creating a unique set of lowercase words and then takes advantage of dict's setdefault to avoid searching the dictionary keys multiple times.
>>> a = "The dogs run quickly forward to the park"
>>> b = set((word.lower() for word in a.split()))
>>> result = {}
>>> {result.setdefault(len(word), []).append(word.lower()) for word in b}
{None}
>>> result
{2: ['to'], 3: ['the', 'run'], 4: ['park', 'dogs'], 7: ['quickly', 'forward']}

Removing small words in python from nested lists

In python I've a a list named list_1 for the purpose of this question.
Imbedded within that list is a number of smaller lists.
I want to go through each of these lists one by one and remove any words that are smaller than 3 characters.
I've tried a number of methods and come up with nothing i can get working.
I thought i may be able to create a loop that went through each word and checked it's length, but i can't appear to get anything working at all.
Suggestions welcome.
Edit: Ended up using code
while counter < len(unsuitableStories): #Creates a loop that runs until the counter is the size of the news storys.
for word in unsuitableStories[counter]:
wordindex = unsuitableStories[counter].index(word)
unsuitableStories[counter][wordindex-1] = unsuitableStories[counter][wordindex-1].lower()
if len(word) <= 4:
del unsuitableStories[counter][wordindex-1]
counter = counter + 1 # increases the counter
You can use nested list comprehension, like this
lists = [["abcd", "abc", "ab"], ["abcd", "abc", "ab"], ["abcd", "abc", "ab"]]
print [[item for item in clist if len(item) >= 3] for clist in lists]
# [['abcd', 'abc'], ['abcd', 'abc'], ['abcd', 'abc']]
This can also be written with filter function, like this
print [filter(lambda x: len(x) >= 3, clist) for clist in lists]
Here is an code example, not just printing. Primitive but effective :D
lst = []
lst2 = ['me', 'asdfljkae', 'asdfek']
lst3 = ['yo' 'dsaflkj', 'ja']
for lsts in lst:
for item in lsts:
if len(item) > 3:
lsts.remove(item)
EDIT:
The other answer is probably better. But this one works too.

Categories

Resources